Best LLM API Models for RAG

RAG workloads often need enough context for retrieved passages plus economical input pricing. This page starts with large context windows and practical cost signals.

50Models listed

1M + 500KCost example tokens

USD / 1MNormalized prices

Quick shortlist

Start with Grok 4.1 Fast.

This guide is sorted by context window, so the first rows are the strongest starting point for RAG, long documents, and large codebase context.

Lead model 🔥Grok 4.1 Fast

ProviderxAI

Sample cost$0.45

Context2M

The ranking is a discovery aid, not a final recommendation. Always compare the model against your workload and verify provider pricing before production use.

How to read this ranking

Models are sorted by context window size. Use this page when your workflow needs long documents, large retrieval payloads, or multi-file context.

Model Ranking

Browse all models

Model	Provider	Prompt	Output	Example Cost	Your Cost	Context	Rank	Release
🔥Grok 4.1 Fast	xAI	$0.2	$0.5	$0.45	$0.45	2M	#18	2025-11-19
Grok 4.20 Multi-Agent	xAI	$2	$6	$5	$5	2M	Unranked	2026-03-31
Grok 4.20	xAI	$1.25	$2.5	$2.5	$2.5	2M	Unranked	2026-03-31
Grok 4 Fast	xAI	$0.2	$0.5	$0.45	$0.45	2M	Unranked	2025-09-19
🔥GPT-5.5	OpenAI	$5	$30	$20	$20	1.05M	#19	2026-04-24
OpenAI GPT Latest	OpenAI	$5	$30	$20	$20	1.05M	Unranked	2026-04-27
GPT-5.5 Pro	OpenAI	$30	$180	$120	$120	1.05M	Unranked	2026-04-24
GPT-5.4 Pro	OpenAI	$30	$180	$120	$120	1.05M	Unranked	2026-03-05
GPT-5.4	OpenAI	$2.5	$15	$10	$10	1.05M	Unranked	2026-03-05
🔥Owl Alpha	OpenRouter	$0	$0	$0	$0	1.05M	#17	2026-04-28
🔥DeepSeek V4 Flash	DeepSeek	$0.126	$0.252	$0.25	$0.25	1.05M	#4	2026-04-24
🔥Gemini 3 Flash Preview	Google	$0.5	$3	$2	$2	1.05M	#6	2025-12-17
🔥DeepSeek V4 Pro	DeepSeek	$0.435	$0.87	$0.87	$0.87	1.05M	#8	2026-04-24
🔥Gemini 2.5 Flash Lite	Google	$0.1	$0.4	$0.3	$0.3	1.05M	#11	2025-07-22
🔥Gemini 2.5 Flash	Google	$0.3	$2.5	$1.55	$1.55	1.05M	#13	2025-06-17
Gemini 3.1 Flash Lite	Google	$0.25	$1.5	$1	$1	1.05M	Unranked	2026-05-07
Google Gemini Pro Latest	google	$2	$12	$8	$8	1.05M	Unranked	2026-04-27
Google Gemini Flash Latest	google	$0.5	$3	$2	$2	1.05M	Unranked	2026-04-27
MiMo-V2.5-Pro	Xiaomi	$1	$3	$2.5	$2.5	1.05M	Unranked	2026-04-22
MiMo-V2.5	Xiaomi	$0.4	$2	$1.4	$1.4	1.05M	Unranked	2026-04-22
Lyria 3 Pro Preview	Google	$0	$0	$0	$0	1.05M	Unranked	2026-03-30
Lyria 3 Clip Preview	Google	$0	$0	$0	$0	1.05M	Unranked	2026-03-30
MiMo-V2-Pro	Xiaomi	$1	$3	$2.5	$2.5	1.05M	Unranked	2026-03-18
Gemini 3.1 Flash Lite Preview	Google	$0.25	$1.5	$1	$1	1.05M	Unranked	2026-03-03
Gemini 3.1 Pro Preview Custom Tools	Google	$2	$12	$8	$8	1.05M	Unranked	2026-02-25
Gemini 3.1 Pro Preview	Google	$2	$12	$8	$8	1.05M	Unranked	2026-02-19
Gemini 2.5 Flash Lite Preview 09-2025	Google	$0.1	$0.4	$0.3	$0.3	1.05M	Unranked	2025-09-25
Gemini 2.5 Pro	Google	$1.25	$10	$6.25	$6.25	1.05M	Unranked	2025-06-17
Gemini 2.5 Pro Preview 06-05	Google	$1.25	$10	$6.25	$6.25	1.05M	Unranked	2025-06-05
Gemini 2.5 Pro Preview 05-06	Google	$1.25	$10	$6.25	$6.25	1.05M	Unranked	2025-05-07
Llama 4 Maverick	Meta	$0.15	$0.6	$0.45	$0.45	1.05M	Unranked	2025-04-05
Gemini 2.0 Flash Lite	Google	$0.075	$0.3	$0.22	$0.22	1.05M	Unranked	2025-02-25
Gemini 2.0 Flash	Google	$0.1	$0.4	$0.3	$0.3	1.05M	Unranked	2025-02-05
GPT-4.1	OpenAI	$2	$8	$6	$6	1.05M	Unranked	2025-04-14
GPT-4.1 Mini	OpenAI	$0.4	$1.6	$1.2	$1.2	1.05M	Unranked	2025-04-14
GPT-4.1 Nano	OpenAI	$0.1	$0.4	$0.3	$0.3	1.05M	Unranked	2025-04-14
Palmyra X5	Writer	$0.6	$6	$3.6	$3.6	1.04M	Unranked	2026-01-21
MiniMax-01	MiniMax	$0.2	$1.1	$0.75	$0.75	1M	Unranked	2025-01-15
🔥Claude Opus 4.7	Anthropic	$5	$25	$17.5	$17.5	1M	#2	2026-04-16
🔥Claude Sonnet 4.6	Anthropic	$3	$15	$10.5	$10.5	1M	#3	2026-02-17
Claude Opus 4.7 (Fast)	Anthropic	$30	$150	$105	$105	1M	Unranked	2026-05-12
Grok 4.3	xAI	$1.25	$2.5	$2.5	$2.5	1M	Unranked	2026-04-30
Anthropic Claude Sonnet Latest	Anthropic	$3	$15	$10.5	$10.5	1M	Unranked	2026-04-27
Qwen3.5 Plus 2026-04-20	Qwen	$0.4	$2.4	$1.6	$1.6	1M	Unranked	2026-04-27
Qwen3.6 Flash	Qwen	$0.25	$1.5	$1	$1	1M	Unranked	2026-04-27
Claude Opus Latest	Anthropic	$5	$25	$17.5	$17.5	1M	Unranked	2026-04-21
Claude Opus 4.6 (Fast)	Anthropic	$30	$150	$105	$105	1M	Unranked	2026-04-07
Qwen3.6 Plus	Qwen	$0.325	$1.95	$1.3	$1.3	1M	Unranked	2026-04-02
Qwen3.5-Flash	Qwen	$0.065	$0.26	$0.2	$0.2	1M	Unranked	2026-02-25
Qwen3.5 Plus 2026-02-15	Qwen	$0.26	$1.56	$1.04	$1.04	1M	Unranked	2026-02-16

Pricing FAQ

How is the sample workload cost calculated?

The sample workload uses 1 million input tokens plus 500 thousand output tokens, then applies each model's normalized USD price per 1 million tokens.

Why do input and output token prices matter separately?

Many applications are output-token heavy, while retrieval and classification workloads may be input-token heavy. Comparing both prices helps avoid picking a model that is cheap for the wrong workload shape.

Should I verify prices before production use?

Yes. AI Model Matrix normalizes public pricing metadata for comparison, but provider availability, limits, and prices can change. Always verify the final contract or provider dashboard before production use.

Related Guides

Cheapest LLM APIs

Sort models by estimated workload cost and normalized token prices.

Open guide

Largest Context Windows

Find models for long documents, retrieval, and codebase context.

Open guide

Coding Models

Compare code-oriented models by cost, context, and popularity rank.

Open guide

Free Models

Browse zero-price models for prototypes and evaluation.

Open guide

RAG Models

Start from large context windows and practical input-cost constraints.

Open guide

Chatbot Costs

Find budget-sensitive models for output-heavy assistant traffic.

Open guide