Databricks’ OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs
via venturebeat.com
Short excerpt below. Read at the original source.
There is no shortage of AI benchmarks in the market today, with popular options like Humanity’s Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others. AI agents excel at solving abstract math problems and passing PhD-level exams that most benchmarks are based on, but Databricks has a question for the enterprise: Can they actually handle […]