Microsoft's most capable new PHI 4 AI model can match the performance of larger systems

Microsoft unveiled several new "Open" AI models on Wednesday, most capable of competing with Openai's O3-Mini on at least one benchmark.

All new attractive licensed models – PHI 4 MINI reasoning, PHI 4 reasoning and PHI 4 reasoning Plus – are “inference” models, meaning they are able to spend more time using fact-checking solutions to complex problems. They expanded Microsoft's PHI "small models" family, which was founded a year ago to build the foundation for AI developers to apply at the edge.

PHI 4 MINI reasoning is trained on approximately one million synthetic mathematical problems produced by the R1 inference model of DeepSeek, a Chinese AI startup. With about 3.8 billion parameters, PHI 4 mini reasoning is designed for educational applications, such as "embedded coaching" on lightweight devices, Microsoft says.

Parameters roughly correspond to the problem-solving skills of the model, and models with more parameters usually perform better than models with fewer parameters.

Phi 4 reasoning is a 14 billion parameter model that is trained using "high-quality" web data and the "planning demonstration" of Openai's above-mentioned O3 Mini. According to Microsoft, it is best suited for mathematical, scientific and coding applications.

As for Phi 4 Challioning Plus, it is a PHI-4 model previously released by Microsoft that works on inference models to achieve better accuracy on specific tasks. Microsoft claims that PHI 4 inference plus R1's performance level, R1's model has more parameters (671 billion). The company's internal benchmarks also feature PHI 4 reasoning and O3-Mini matching on math skills test Omnimath.

PHI 4 MINI reasoning, PHI 4 reasoning and PHI 4 reasoning plus provided on the AI ​​development platform Hugs Face, accompanied by detailed technical reports.

TechCrunch Events

Berkeley, CA | June 5

Book now

"Using distillation, enhanced learning and high-quality data, these (new) models balance scale and performance," Microsoft wrote in a blog post. "They are small enough to be used in low-latency environments, but maintain strong inference capabilities to match larger models. This fusion allows devices with even limited resources to effectively perform complex inference tasks."