Granite 4 Models Available on Continue

Great news for developers using Continue. We're partnering to make the latest Granite 4 models available right where you work, with a focus on efficiency, long-context reasoning, and flexible deployment from laptop to data center. This post summarizes what's coming, why it matters for engineering teams, and how to try it in Continue.
What's Launching
We're highlighting a family of Granite 4 models designed for practical performance and efficiency:
Granite 4.0 Small (MoE)
An enterprise workhorse for everyday tasks, capable of running multiple long-context concurrent sessions on entry-level enterprise GPUs. The MoE architecture activates only 9B of 32B parameters, delivering strong performance at lower compute costs.
Granite 4.0 Tiny (MoE)
Perfect for high-volume tasks where speed and efficiency are priorities. Runs on consumer hardware like an NVIDIA 3060, whereas comparable models would require enterprise GPUs. The MoE approach keeps only ~1B parameters active at inference time.
Granite 4.0-H-Micro (Dense)
Uses a hybrid Mamba-2 + Transformer design for efficient processing in resource-constrained environments. Ideal for edge deployments where consistent latency matters.
Granite 4.0 Micro (Dense)
Traditional transformer architecture for users when Mamba-2 support isn't yet optimized in their infrastructure.
Why This Matters
Efficiency by Design
Granite 4.0 combines Mixture of Experts (MoE) with Mamba-2 and transformer components for significant memory reductions compared to traditional dense models:
- Faster inference: Selective parameter activation reduces memory usage and accelerates token generation
- Better parallel processing: Efficient batching for multiple concurrent users—ideal for chatbots and agentic workflows
- Runs on accessible hardware: Deploy on consumer-grade GPUs instead of expensive enterprise hardware
Long Context Windows
Tested up to 128K tokens, with no hard architectural limits on context length. The ceiling is hardware-dependent, not model-dependent—longer contexts are possible with better hardware.
Open Source
Released under Apache 2.0 license for free commercial and non-commercial use with complete customization freedom.
Use Cases That Shine With Granite Models
Document Analysis
Process large technical documents and codebases efficiently. Continue's chat interface lets you explore your code through conversation—ideal for summarization and pattern analysis across multiple files.
RAG Workflows
The long context window works well for retrieval-augmented generation, pulling relevant information from knowledge bases or document repositories while maintaining accuracy.
Agentic Workflows
Run multiple AI agents concurrently for complex, multi-step tasks. Continue supports agent workflows through Hub Agents (pre-configured) or Local Agents (fully customizable via config.yaml).
Edge Deployments
Granite 4.0 Tiny and Micro work on resource-constrained devices for on-device chatbots, local document analysis, or smart assistants without cloud dependency.
Try It in Continue
All Granite 4.0 models are available today on hub.continue.dev and ready to plug into your agents.
Documentation and Resources
- Continue Documentation
- Understanding Agents
- MCP Integration Cookbooks
- Granite Workshop Tutorials
- Granite Community on GitHub
- Official Granite Documentation
Leveraging Granite in Agent Workflows
Granite 4's efficiency makes it particularly well-suited for Continue's agent workflows:
- Long-context reasoning: Process entire codebases or large documents without hitting context limits
- Multi-step automation: Chain together code analysis, refactoring, and testing tasks
- Flexible deployment: Run locally for privacy-sensitive work or use cloud deployment for team collaboration
- MCP integration: Extend Continue with Model Context Protocol tools for custom workflows
Automating Documentation
This documentation writing agent is an example perfect for using Granite's long context to analyze code changes, detect documentation gaps, and generate clear explanations—great for CI/CD integration and keeping docs current across large codebases.
What Developers Are Building
Granite powers production applications at Lockheed Martin (10,000+ developers), major telco companies (90%+ cost reduction), and the US Open (220% increase in automated match reports).
The community has built tools for personalized learning platforms, financial assistants, and product design optimization.
What's Next
- IBM's official Granite 4.0 announcement (October 2, 2025)
- Ongoing improvements to context scaling and hardware profiles
If you're using Continue in VS Code or the CLI, this is a great moment to kick the tires on Granite 4. Feedback is welcome—tell us what works, and which adapters you want next.
