Databricks’ OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs

via venturebeat.com

Short excerpt below. Read at the original source.

There is no shortage of AI benchmarks in the market today, with popular options like Humanity’s Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others. AI agents excel at solving abstract math problems and passing PhD-level exams that most benchmarks are based on, but Databricks has a question for the enterprise: Can they actually handle […]

Read at Source