China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (opens original article in a new tab)
Xiaomi's MiMo-V2.5-Pro-UltraSpeed achieved over 1,000 tokens per second on a 1-trillion-parameter model using standard 8-GPU hardware, surpassing models like ChatGPT and Claude through FP4 quantization and DFlash speculative decoding. A limited API trial is available from June 9 to 23.
- Xiaomi's MiMo-V2.5-Pro-UltraSpeed achieved 1,000 tokens per second on a 1-trillion-parameter model using standard 8-GPU hardware.
- The speed was achieved through FP4 quantization and DFlash speculative decoding techniques.
- A limited API trial for the model runs from June 9 to 23, priced at 3× standard rates.
Conversation
No comments yet
Threaded discussion is coming next — this is where the community conversation about this story will live.