Granite 4 Models Available on Continue

Brian

02 Oct 2025 • 3 min read

Great news for developers using Continue. We're partnering to make the latest Granite 4 models available right where you work, with a focus on efficiency, long-context reasoning, and flexible deployment from laptop to data center. This post summarizes what's coming, why it matters for engineering teams, and how to try it in Continue.

What's Launching

We're highlighting a family of Granite 4 models designed for practical performance and efficiency:

Granite 4.0 Small (MoE)

An enterprise workhorse for everyday tasks, capable of running multiple long-context concurrent sessions on entry-level enterprise GPUs. The MoE architecture activates only 9B of 32B parameters, delivering strong performance at lower compute costs.

Granite 4.0 Tiny (MoE)

Perfect for high-volume tasks where speed and efficiency are priorities. Runs on consumer hardware like an NVIDIA 3060, whereas comparable models would require enterprise GPUs. The MoE approach keeps only ~1B parameters active at inference time.

Granite 4.0-H-Micro (Dense)

Uses a hybrid Mamba-2 + Transformer design for efficient processing in resource-constrained environments. Ideal for edge deployments where consistent latency matters.

Granite 4.0 Micro (Dense)

Traditional transformer architecture for users when Mamba-2 support isn't yet optimized in their infrastructure.

Why This Matters

Efficiency by Design

Granite 4.0 combines Mixture of Experts (MoE) with Mamba-2 and transformer components for significant memory reductions compared to traditional dense models:

Faster inference: Selective parameter activation reduces memory usage and accelerates token generation
Better parallel processing: Efficient batching for multiple concurrent users—ideal for chatbots and agentic workflows
Runs on accessible hardware: Deploy on consumer-grade GPUs instead of expensive enterprise hardware

Long Context Windows

Tested up to 128K tokens, with no hard architectural limits on context length. The ceiling is hardware-dependent, not model-dependent—longer contexts are possible with better hardware.

Open Source

Released under Apache 2.0 license for free commercial and non-commercial use with complete customization freedom.

Use Cases That Shine With Granite Models

Document Analysis

Process large technical documents and codebases efficiently. Continue's chat interface lets you explore your code through conversation—ideal for summarization and pattern analysis across multiple files.

RAG Workflows

The long context window works well for retrieval-augmented generation, pulling relevant information from knowledge bases or document repositories while maintaining accuracy.

Agentic Workflows

Run multiple AI agents concurrently for complex, multi-step tasks. Continue supports agent workflows through Hub Agents (pre-configured) or Local Agents (fully customizable via config.yaml).

Edge Deployments

Granite 4.0 Tiny and Micro work on resource-constrained devices for on-device chatbots, local document analysis, or smart assistants without cloud dependency.

Try It in Continue

All Granite 4.0 models are available today on hub.continue.dev and ready to plug into your agents.

Documentation and Resources

Leveraging Granite in Agent Workflows

Granite 4's efficiency makes it particularly well-suited for Continue's agent workflows:

Long-context reasoning: Process entire codebases or large documents without hitting context limits
Multi-step automation: Chain together code analysis, refactoring, and testing tasks
Flexible deployment: Run locally for privacy-sensitive work or use cloud deployment for team collaboration
MCP integration: Extend Continue with Model Context Protocol tools for custom workflows

Automating Documentation

This documentation writing agent is an example perfect for using Granite's long context to analyze code changes, detect documentation gaps, and generate clear explanations—great for CI/CD integration and keeping docs current across large codebases.

What Developers Are Building

Granite powers production applications at Lockheed Martin (10,000+ developers), major telco companies (90%+ cost reduction), and the US Open (220% increase in automated match reports).

The community has built tools for personalized learning platforms, financial assistants, and product design optimization.

What's Next

IBM's official Granite 4.0 announcement (October 2, 2025)
Ongoing improvements to context scaling and hardware profiles

If you're using Continue in VS Code or the CLI, this is a great moment to kick the tires on Granite 4. Feedback is welcome—tell us what works, and which adapters you want next.