Eilmeldung
INAmit Shah Directs Zero Coal Leakage Plan Amid Rising Illegal Mining in JharkhandBRHomem é preso após estuprar menina de 12 anos em Bento GonçalvesINIndian Army Conducts Mock Drill in J&K's Poonch to Boost PreparednessINTLTrump to meet Zelensky at Nato summit as Ukraine warns of 'massive strike'ARأزمة في الصحافة السنغالية بعد خروج المنتخب من مونديال 2026 وسط مزاعم عن تجاوزات مسؤولينKR숙박·음식점업, 전산업 생산 증가 속 제자리걸음…하반기 소비심리 회복 관건KR사업자 증가율 최저 수준…폐업 절반이 "장사 안 돼서"RUВ Харькове военкомы жестко задержали мужчину на глазах у материRUЗемлетрясение магнитудой 4,6 произошло у южных КурилRUСША хотят надавить на союзников по НАТО для обеспечения безопасности в Ормузском проливеINAmit Shah Directs Zero Coal Leakage Plan Amid Rising Illegal Mining in JharkhandBRHomem é preso após estuprar menina de 12 anos em Bento GonçalvesINIndian Army Conducts Mock Drill in J&K's Poonch to Boost PreparednessINTLTrump to meet Zelensky at Nato summit as Ukraine warns of 'massive strike'ARأزمة في الصحافة السنغالية بعد خروج المنتخب من مونديال 2026 وسط مزاعم عن تجاوزات مسؤولينKR숙박·음식점업, 전산업 생산 증가 속 제자리걸음…하반기 소비심리 회복 관건KR사업자 증가율 최저 수준…폐업 절반이 "장사 안 돼서"RUВ Харькове военкомы жестко задержали мужчину на глазах у материRUЗемлетрясение магнитудой 4,6 произошло у южных КурилRUСША хотят надавить на союзников по НАТО для обеспечения безопасности в Ормузском проливе
Newsgather
BackProbably aims to eliminate LLM hallucinations with new data science tool
Probably aims to eliminate LLM hallucinations with new data science tool
In Entwicklung
TechCrunch16.06.2026Technik2 dk okumaUnited States

Probably aims to eliminate LLM hallucinations with new data science tool

Auf einen Blick

  • Probably, a startup that raised $9 million in seed funding, is developing a data science tool to combat hallucinations and factual errors in LLMs.
  • Their "data science mech suit" system uses a deterministic validator to check LLM outputs, allowing for smaller, more efficient models and reduced costs.

KI-generierte Zusammenfassung

Warum es wichtig ist

Large Language Models (LLMs) often produce hallucinations or factual errors, a problem the industry is still working to solve. Probably aims to create a more rigorous system to prevent these errors from reaching users.

Schriftgröße

As LLMs have grown more powerful, hallucinations have proven stubbornly difficult to avoid. Errors pop up in even the smartest models, and while there are ways to catch those errors, the industry is still figuring out the best way to do it.

Probably, which just raised $9 million in seed funding from Andreessen Horowitz, is trying to build a more rigorous way to catch those errors.

As founder Peter Elias (pictured above) puts it, the company’s goal is to prevent hallucinations and simple factual errors from ever reaching the user, and achieve the kind of 99.99% accuracy that’s common in deterministic systems but much more difficult to reach with AI. As it turns out, bringing LLMs to that level of accuracy requires rethinking many of the basic assumptions of AI engineering.

Probably’s first product is a data science tool, built to produce quick answers from complex datasets. Each result comes with a citation and an audit trail for how it was developed, an increasingly common practice among AI tools.

But keeping errors from creeping into those summaries required an elaborate harness system that Elias describes as a “data science mech suit.” The LLM’s first-pass answers are checked against a deterministic validator system, which bounces back any results that don’t match the dataset. Crucially, the LLM has been trained against the validator, and the whole system is optimized for fast and accurate answers, the company said.

“What we learned building this was that the better your harness engineering is, the weaker the model can be,” Elias says. “If you can refine the context enough, the model does not have to work very hard to do the right thing. Basically, it’s an exercise in reducing ambiguity.”

That allows Probably’s data science tool to run on significantly smaller AI models. Elias says the current version is running on a model that’s “four classes weaker than the frontier models,” which means it can be run on local hardware (that is, a desktop computer instead of a data center), which reduces a huge amount of the token costs associated with AI use.

It’s a welcome idea at a time when token costs are rising and many customers are reassessing their AI budgets. And, Elias’ idea doesn’t end with data science, as the same engine can be extended to cover use cases like accounting or medical services — as Elias puts it, “any precision-sensitive use case.”

“I think it’s really interesting that the big AI labs have not even attempted to do this,” Elias says. “They’re incentivized not to, because they make money the more times you have to correct the model.”

Worauf zu achten ist

KI-Ausblick — Möglichkeiten, keine Fakten

  • Probably's engine will be extended to accounting or medical services.

    Wahrscheinlich · Mittelfristig

Offene Fragen

  • Will Probably's system scale to larger, more complex LLMs?
  • How will competitors respond to this approach?
  • What are the specific limitations of the 'data science mech suit'?

Verwandte Themen

This article was originally published by TechCrunch.

Ähnliche Meldungen

Mehr zu diesem ThemaLLM