OpenAI is training models to ‘confess’ when they lie – what it means for future AI

Short excerpt below. Click through to read at the original source.

A new study made a version of GPT-5 Thinking admit its own misbehavior. But it’s not a quick fix for bigger safety issues.

Read at Source