Decrypt6/3/2026Education3 min read

AI Outperforms Law Professors in Legal Reasoning Tasks, Study Finds

artificial intelligence large language models legal reasoning

Quick Look

A Stanford-led study found law professors preferred AI-generated answers over human-written ones in legal reasoning tasks.
Google's Gemini and NotebookLM outperformed human instructors, with AI answers also deemed less harmful.

AI-generated summary

Why It Matters

A study led by Stanford University examined the performance of large language models (LLMs) on legal reasoning tasks. Law professors were asked to compare answers generated by AI with those written by human instructors.

Font size

Law professors preferred answers generated by artificial intelligence over answers written by fellow professors, according to a recent study led by Stanford University that examined how large language models perform on legal reasoning tasks.

In the study, 16 professors from 14 U.S. law schools—including Stanford, Yale, New York University, the University of Chicago, Georgetown, UCLA, and the University of Virginia—created 40 contract law questions covering legal doctrine, case law, hypotheticals, and policy issues. Researchers saw it as an ideal way to test the capabilities of modern AI.

“Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth,” the researchers wrote. “Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test.”

In 2,918 blinded comparisons, professors selected the answer they would rather give a student. Google’s Gemini 2.5 Pro won 75.92% of its matchups against human instructors, while the tech giant’s NotebookLM won 74.75% of the time, giving AI-generated results the nod over humans in roughly three-quarters of responses.

According to the researchers, to determine whether the results reflected a broader professional consensus, the researchers analyzed how often professors agreed when evaluating the same answer pairs.

“Observed agreement exceeded the level expected if judgments were entirely idiosyncratic, indicating that the LLMs’ success reflects alignment with common disciplinary criteria,” they wrote.

The study found that AI models also outperformed human instructors across multiple categories, including recall questions relating to case, code, or doctrine, hypotheticals, and policy discussions.

“To probe whether any LLM advantage might be driven by surface-level writing style rather than substantive content, we additionally engineered a set of lexico-syntactic features—answer length, structural organization, reasoning nuance, legal anchors, confidence tone, clarity, and pedagogical support—and tested how much of the preference pattern they could explain,” the study said.

AI-generated answers were also flagged as harmful less often than those written by professors, with Gemini recording a 3.41% harmfulness rate and NotebookLM 3.64%, compared with 12.06% for human instructors. In a separate analysis of additional models, Anthropic’s Claude Opus 4.7 ranked first, followed by OpenAI’s ChatGPT 5.4 and Gemini 2.5 Pro, while every AI model evaluated outperformed human instructors on average.

The researchers cautioned that the study did not measure whether the answers matched each professor's individual teaching preferences, leaving open the possibility that AI-generated responses were viewed as generally acceptable rather than tailored to any one instructor's approach.

“While LLM responses are generally preferred over those of human instructors, our evaluation setting does not allow us to directly measure the extent to which instructor preferences are satisfied,” the study said. “It is at least theoretically possible that LLMs, although generally delivering stronger responses, still generate answers that are merely viewed as “good enough.”

The study comes as courts, law firms, and law schools increasingly grapple with how artificial intelligence should be used in the legal profession.

In March, the Los Angeles Superior Court began testing AI tools to help judges manage growing caseloads, while law schools are adding AI training programs.

“The potential benefits of these new technologies as a force multiplier in the practice of law just can’t be ignored,” Mississippi College School of Law Dean John P. Anderson previously told Decrypt. “Whether our students plan to be litigators or transactional attorneys, their future employers will expect familiarity with these AI tools. We want the firms hiring our students to be confident that every MC Law grad is competent in AI technologies.

What to Watch

AI outlook — possibilities, not facts

Law schools will continue to integrate AI training programs.
Very likely · Within months
AI tools will be further adopted in legal practice to manage caseloads.
Very likely · Within months