Google’s Gemma 4 AI models get 3x speed boost by predicting future tokens

via arstechnica.com

Short excerpt below. Read at the original source.

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google’s take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a […]

Read at Source