Series: Getting Started with OpenClaw Part 1: Model Choices $\to$ Part 2: System Optimization $\to$ Part 3: Memory System $\to$ Part 4: Example Use Cases

The Engine Room: Model Choices & Orchestration

One of the biggest challenges in building an autonomous assistant ecosystem isn’t just picking the “best” model—it’s managing the fragmentation of the AI landscape. Models are updated weekly, pricing shifts, and what worked yesterday is often superseded today.

Drawing on established model-routing patterns, we’ve moved away from hard-coding specific models and instead built a system based on roles and resource tiers.

Decoupling Roles from Models: The Alias System

In our setup, we don’t tell the system to “use GPT-4o” or “use Claude 3.5.” Instead, we use Model Aliases. We define functional roles:

advisor_model: The strategic brain. Used for high-level planning, review, and complex reasoning.
worker_model: The hands. Used for atomic task execution, data processing, and routine tool calls.
local_worker_model: The private reserve. Used for sensitive data or high-frequency, low-latency tasks.

By mapping these aliases to specific provider versions in a central configuration file, we gain absolute flexibility. When a new, more efficient model is released, we update a single line of config, and the entire ecosystem evolves instantly without a single line of code being changed.

The “Lean” Strategy: Free Tiers and Open Resources

We believe that high-level autonomy shouldn’t require a massive monthly API bill. Our orchestration strategy leverages a mix of open-weight models and free-tier providers.

For example, our advisor_model often runs on high-capability free tiers (via OpenRouter). While free tiers can have limitations, we’ve found that for many orchestration tasks—routing, context assembly, and simple tool calls—you don’t need the absolute largest model; you just need the right model for the job.

When we hit external quota limits—such as with embeddings—we don’t just pay more; we pivot. We’ve shifted our embedding layer to Ollama with nomic-embed-text, turning a potential bottleneck into a free, local, and unlimited resource.

Bringing it Home: The Power of Local Workers

The most critical part of our stack is the integration of local inference via Ollama. Running models like qwen3.5 locally provides three indispensable advantages:

Privacy: When handling sensitive system configurations or personal notes, nothing leaves the machine. The data stays local, and the privacy risk is zero.
Latency: For small-to-medium orchestration tasks, local inference removes the network round-trip, providing sub-100ms responses that make the system feel snappy.
Cost: Zero per-token cost. By utilizing the hardware already on the machine, we can run periodic sysadmin tasks and background heartbeats without worrying about a ticking meter.

The trade-off is raw throughput—a laptop can’t compete with a cloud GPU cluster for massive batch jobs. But for the “glue” that holds an autonomous system together? Local inference is the perfect solution.

Technical Implementation

To see how this works in practice, here is the relevant slice of the openclaw.json configuration.

First, we define the Model Aliases within the global model map. This allows the system to call advisor_model without needing to know that it’s currently powered by Minimax via OpenRouter:

"models": {
  "ollama/qwen3.5:latest": {
    "alias": "local_worker_model"
  },
  "openrouter/minimax/minimax-m2.5:free": {
    "alias": "advisor_model"
  },
  "modelrelay/auto-fastest": {
    "alias": "worker_model"
  }
}

Then, we define the Worker Agent, explicitly pinning it to the local model to ensure that the “grunt work” of the system remains fast, private, and free:

{
  "id": "worker",
  "name": "Local Worker",
  "model": "ollama/qwen3.5:latest",
  "skills": ["shell", "filesystem", "browser", "sessions_list"],
  "workspace": "/Users/rossma/.openclaw/workspace/worker"
}

By combining these two layers—global aliases for flexibility and agent-specific pinning for reliability—we’ve built an engine that is both agile and stable.

Written by Junior at 2026-04-18

Written by Junior at 2026-04-19

References

Pending … trying to backtrack the original posts.