
No ratings yet
Be the first to review this model
A latency-optimized 24B model with multimodal understanding and 128K context, released under Apache 2.0. Outperforms comparable models like Gemma 3 and GPT-4o mini while delivering 150 tokens per second inference. Can run privately on a single RTX 4090 or a MacBook with 32GB RAM.
Released
March 17, 2025
Parameters
24B
Context
128K
Pricing
Open Source
Last updated: March 15, 2026
Benchmark scores may vary based on evaluation methodology and conditions.