Choosing the Right LLM for Memex: A Practical Guide

2026-05-03

One of the first questions new Memex users ask is: which model should I use? The honest answer is that it depends on what you care about most — cost, quality, speed, or privacy. This guide breaks down the practical differences based on how each provider actually performs inside Memex.

Memex supports twelve providers. You do not need to understand all of them. Most users will be happy with one of the top four: Gemini, OpenAI, Claude, or Ollama. The rest are valuable for specific situations — Chinese language support, AWS infrastructure, or multi-provider routing.

Quick comparison

Provider	Quality	Speed	Cost	Best for
Google Gemini	Good	Fast	Free tier available	Getting started, free usage, photo-heavy records
OpenAI (GPT)	Good to excellent	Fast	$5-20/month typical	All-round quality, reliable structured output
Anthropic Claude	Excellent	Medium	$10-30/month typical	Best card quality, nuanced insights, long context
Ollama (local)	Varies by model	Depends on hardware	Free	Full offline, zero cloud dependency, privacy maximalists
Kimi / Qwen / Doubao	Good	Fast	Competitive pricing	Chinese language records, users in China

Google Gemini — the easiest starting point

If you are setting up Memex for the first time and do not have strong preferences, start with Gemini. Two reasons: the free tier is generous enough for casual daily use, and the OAuth option lets you sign in with your Google account instead of managing an API key.

Gemini's multimodal support is also the strongest of the major providers. It handles photos well — important for Memex since photo records include EXIF data, OCR text, and image labels that all get sent to the model for card generation. Audio understanding is also supported if you use Gemini for voice-heavy workflows.

The tradeoff is that Gemini's structured output quality is a step below Claude for complex card types. For everyday records — tasks, events, quick notes — the difference is negligible. For nuanced insight generation across many records, Claude tends to produce richer results.

OpenAI — the reliable all-rounder

OpenAI is the provider most people already have an account with. GPT models produce consistent, reliable structured output. Card generation is accurate. Insights are solid. Speed is good.

The main consideration is cost. OpenAI does not have a permanent free tier like Gemini. New accounts get some free credits, but ongoing use requires a paid plan. For typical Memex usage — a few records per day — expect to spend $5-20 per month depending on the model you choose and how much you record.

OpenAI also supports OAuth sign-in, which simplifies setup. If you already use ChatGPT, you can connect the same account.

Anthropic Claude — best quality, higher cost

Claude consistently produces the highest quality output in Memex. Card generation is more nuanced — it picks up on subtleties in your records that other models miss. Insight generation is richer, with better narrative summaries and more meaningful pattern detection. The long context window also helps when the Insight Agent needs to analyze many records at once.

The cost is higher than Gemini or OpenAI for equivalent usage. Claude does not have a free tier. Typical Memex usage runs $10-30 per month depending on volume.

A practical approach: use Claude for the Insight Agent (where quality matters most) and a cheaper model for the Card Agent and PKM Agent (where speed matters more). Memex lets you configure each agent independently, so this mixed setup is straightforward.

Ollama — fully offline, zero cost

Ollama lets you run open-source models locally. If you have a computer on your home network running Ollama, you can point Memex at it and the entire AI pipeline works without any cloud dependency. Your prompts never leave your local network.

The quality depends entirely on which model you run. Larger models (70B+ parameters) produce results comparable to cloud providers. Smaller models (7B-13B) work but with noticeably lower card generation accuracy and shallower insights.

Ollama is the right choice if privacy is your absolute top priority and you are willing to manage the infrastructure. It is not the right choice if you want zero setup — running a local model requires some technical comfort.

Chinese language providers

If you record primarily in Chinese, the dedicated Chinese providers often outperform Western models for Chinese text understanding:

Kimi (Moonshot) — strong Chinese language quality, competitive pricing. Models like kimi-k2.5 handle Chinese records well.
Aliyun Qwen — Alibaba's models. Good quality, especially qwen3.5-plus and qwen-max. OpenAI-compatible API.
Volcengine Doubao — ByteDance's models. Fast inference, good for high-volume use.
Zhipu GLM — GLM-4.7 and GLM-4-Plus. Solid Chinese language understanding.
MiniMax — MiniMax-M2.5 and M1. Uses Anthropic-compatible API format.

All of these are configured the same way in Memex — select the provider, enter your API key, and optionally adjust the base URL. The app handles the API format differences internally.

Other options

AWS Bedrock — if your organization uses AWS and you want to route through Bedrock's Claude access. Useful for enterprise users with existing AWS billing.
OpenRouter — a meta-provider that gives you access to multiple models through one API key. Useful if you want to experiment with different models without managing separate accounts.
Xiaomi MIMO — MiMo-7B-RL. A smaller model, best for users who want to experiment with Xiaomi's ecosystem.

Per-agent configuration

This is one of Memex's most useful features and most people do not realize it exists. Each built-in agent can use a different model. The configuration is in avatar → Model Configuration, where you can set the model for each agent independently.

A cost-optimized setup might look like:

Card Agent — Gemini Flash (fast, cheap, good enough for card type classification)
PKM Agent — Gemini Pro or GPT (needs decent reasoning for knowledge filing)
Insight Agent — Claude (highest quality for pattern discovery and narrative generation)
Comment Agent — Gemini Flash (speed matters more than depth for comments)

This way you get Claude-quality insights without paying Claude prices for every agent call. The total cost drops significantly compared to running Claude across the board.

My recommendation

Start with Gemini. It is free, fast, and good enough to understand how Memex works. Once you have used the app for a week and understand which agents matter most to you, consider upgrading the Insight Agent to Claude for richer pattern discovery. If cost is not a concern, Claude across the board produces the best overall experience.

If you have not set up Memex yet, read the getting started guide first. For more on what the agents actually do, see our post on the engineering behind Memex. The source code is on GitHub.

FAQ

Which LLM provider is best for Memex?

There is no single best provider. Gemini offers the best free tier and strong multimodal support. Claude produces the highest quality card generation and insights. OpenAI is a solid all-rounder. Ollama is best for fully offline use. The right choice depends on your priorities: cost, quality, privacy, or speed.

Can I use different models for different agents?

Yes. Each agent in Memex can be configured with a different model independently. A common setup is using a cheaper, faster model for card generation and a stronger model for insight discovery.

Does Memex work with free LLM tiers?

Yes. Google Gemini has a generous free tier that works well for casual use. OpenAI also offers free credits for new accounts. Gemini OAuth lets you sign in with your Google account without managing an API key at all.

Can I run Memex completely offline?

Yes, with Ollama. If you run a local model through Ollama on a machine your phone can reach, the entire AI pipeline works without internet. On-device inference via Gemma 4 is also supported on Android for fully phone-local operation.