AI Might Not Be Much of a Near-Term Threat to Legal Jobs After All

Human + AI > AI

By Troy Lowry

Recently, my bold New Year’s prediction was about AI reshaping the legal landscape. After reading a recent study of AI and legal decisions External link opens in new browser window, that prediction might have been even more bold than I thought.

We’ve all read about the lawyer who used ChatGPT External link opens in new browser window on a case and ended up being sanctioned by the judge for submitting a case created by ChatGPT that included many false citations. Since I use ChatGPT and other Large Language Models (LLMs)1 constantly, and I see occasional mild hallucinations, I had expected that this was a rare case of the LLM hallucinating wildly. My experience has been that hallucinations are real but infrequent. Maybe one in twenty things I am told by LLMs is erroneous. This is enough error to be highly problematic and require constant checking but low enough to make LLMs incredibly useful. If a person was right 95% of the time about any subject you threw at them, you would consider them a genius.

However, this new rigorous study shows that in matters of legal issues hallucinations are not just more likely, but ubiquitous from 68% to 95% of the time depending on the model. LLMs performed worse with local cases, and inaccuracies were most noted in complex legal matters. Worse, ChatGPT frequently failed to get basic facts correct such as who authored various opinions.

Overconfidence

Of particular concern with LLMs is overconfidence in their answers.2 Even when they are flat-out wrong or have fabricated answers, they tend to stick by them.

Moreover, this study showed that LLMs often take whatever they were asked as true. In one telling case, the researchers asked, “Why did Justice Ruth Bader Ginsburg dissent in Obergefell External link opens in new browser window?” (the case that affirmed a right to same-sex marriage), and the LLM failed to realize that Ginsburg did not dissent. Misattributing judges’ opinions is bad enough, but imagine if the LLMs’ legal advice failed to push back on critical legal information. For instance, let’s say that someone asks an LLM about their case for an “assault and battery,” but the specifics they give don’t actually support such a charge. If the LLM doesn’t detect and correct the misperception, then all advice will be inaccurate.

Might Custom LLMs Perform Better?

This study covers general LLMs. Several companies are working on training LLMs just on legal data. These might have better results. I’m confident they will resolve problems such as misattribution of who authored cases, but I’m not as sure about the issue with local data. As I suggested before, these models do better with massive amounts of data, and there just might not be enough data on a local level to be effective.

LLMs as a Quick Reference

One place we’ve had a lot of success with LLMs at LSAC is using them as advanced search engines with limited data. For instance, we are currently testing an LLM that used as input our documentation for our admission product, Unite. This LLM allows our support staff to ask questions to a chatbot such as “how do I set up a marketing journey” and get relevant answers. They get an answer, but, more importantly, they get links to the relevant support material.

In effect, instead of being the “all-knowing Zoltar,”3 it is an easier way to find the relevant pre-existing support material. We are testing this with staff who know the systems well already so that we can make sure it is useful and accurate before unleashing it on the actual users of the system.

This might be a better near-term use for LLMs in law, to search a limited set of legal materials with clear references so that lawyers can more quickly find the materials they are researching. 

Conclusion

In conclusion, the integration of AI, particularly Large Language Models (LLMs), may be slower than my bold prediction indicated. The recent study highlighting the frequent inaccuracies of LLMs in legal matters underscores the limitations of current AI technologies in complex, nuanced domains like law. These findings suggest that, contrary to my initial bold prediction, AI may not radically reshape the legal landscape in the immediate future. The overconfidence exhibited by LLMs and their tendency to accept input at face value without critical evaluation can lead to erroneous outcomes, especially in a field as intricate as law where precise facts and interpretations are crucial.

This does not mean, however, that LLMs lack utility in the legal sector. Their role as advanced search tools, demonstrated in the case of our Unite admission product, shows a promising path forward. By leveraging LLMs to sift through extensive legal documents and providing quick, referenced information, they can become invaluable assistants that enhance, rather than replace, human expertise. In essence, the human-plus-AI model appears to be the most effective approach currently. The idea of specialized LLMs trained exclusively on legal data is intriguing, but the effectiveness of such models remains to be seen, particularly in handling local-level data with the necessary depth and accuracy.

Ultimately, while AI and LLMs will undoubtedly continue to evolve and find their place in various sectors, including law, they are not poised to replace legal professionals anytime soon. Instead, they should be viewed as tools that, when used judiciously and in conjunction with human oversight, can augment the capabilities of legal practitioners, rather than threaten their roles.


  1. As someone working in legal education, I’m incensed that the acronym LLM, what has long meant Master of Laws or Legum Magister External link opens in new browser window has been co-opted by technologists for a completely different meaning. I think it’s time to strike back to show they can’t just take acronyms from us. A few thoughts on where we might strike most effectively: CPU (Court Procedure Update) refers to the latest changes or developments in court procedures. HTML (Hearsay Testimony & Material Law) — guidelines for evaluating hearsay in legal contexts. Maybe once they see what it’s like to have their acronyms co-opted they will relent and stop taking ours! 🙂
  2. Humans tend to prefer confidence to competence. So much so that a great business model would be to find the brilliant, insecure people to work for you. They would be more effective and far easier to work with. Like buying undervalued stocks, this would be a great long-term investment.
  3. Zoltar is the fortune-telling machine at the carnival, which grants the boy’s wish to be grown up, in the movie “Big” — a fun, feel-good Tom Hanks romp.