LLM Development Timeline

Comprehensive developer-focused chronicle of LLM evolution from foundational research to production APIs

54 Major Milestones • 2015-2025 • Detailed Technical Specifications

92.8%
Peak MMLU
2M tokens
Max Context
96.3% HumanEval
Best Coding
$0.075/1M tokens
Lowest Cost
15T
Max Training Tokens
1501
Peak Elo Rating
Research
Pre-Transformer Era

Attention Mechanism

Jun 2015

Bahdanau et al. introduce attention mechanism for neural machine translation

Developer Info
Architecture:Seq2seq with attention
Availability:Research paper only
Google Research
Attention Is All You Need

Transformer Architecture

Jun 2017

Vaswani et al. introduce the transformer architecture

Developer Info
Architecture:Self-attention mechanism, encoder-decoder
Availability:Research paper, TensorFlow implementation
Parallelizable training, better long-range dependencies
fast.ai
Transfer Learning for NLP

ULMFiT

Dec 2017

Universal Language Model Fine-tuning introduces transfer learning to NLP

Developer Info
Architecture:AWD-LSTM
Availability:Open source, fastai library
Discriminative fine-tuning, gradual unfreezing
Allen AI
Contextual Embeddings

ELMo

Feb 2018

Embeddings from Language Models - first major contextual word embeddings

Developer Info
Architecture:Bidirectional LSTM
Availability:Open source, TensorFlow Hub
Context-dependent word representations
OpenAI
Generative Pre-training

GPT-1

Jun 2018

First GPT model - proves generative pre-training works

Developer Info
Parameters
117M
Context
512
API Access
No
Architecture:Transformer decoder
Availability:Research paper, weights not initially released
Fine-tuning:Manual implementation required
Task-agnostic pre-training + task-specific fine-tuning
Google
Bidirectional Training

BERT Base

Oct 2018

Bidirectional Encoder Representations from Transformers

Developer Info
Parameters
110M
Context
512
API Access
No official API
Architecture:Transformer encoder (bidirectional)
Availability:Open source, TensorFlow
Fine-tuning:Full fine-tuning supported
Masked language modeling, bidirectional context
OpenAI
Scale Begins

GPT-2 Small

Feb 2019

Smallest GPT-2 variant released

Developer Info
Parameters
117M
Context
1,024
API Access
No
Architecture:Transformer decoder
Availability:Open source (initial release)
Fine-tuning:Community implementations
Zero-shot task transfer
OpenAI
Responsible Release

GPT-2 1.5B

Aug 2019

Full GPT-2 released after staged rollout due to safety concerns

Developer Info
Parameters
1.5B
Context
1,024
API Access
No
Architecture:Transformer decoder
Availability:Open source
Fine-tuning:HuggingFace Transformers support
Coherent long-form generation
Google
Text-to-Text Framework

T5

Oct 2019

Transfer Text-to-Text Transformer - unified framework

Developer Info
Parameters
11B (XXL)
Context
512
API Access
Via Google Cloud AI Platform
Architecture:Encoder-decoder transformer
Availability:Open source, multiple sizes
Fine-tuning:Supported across all sizes
All tasks as text-to-text
OpenAI
API-First Era Begins

GPT-3 davinci

May 2020

Largest language model, API-only access, few-shot learning breakthrough

Developer Info
Parameters
175B
Context
2,048
MMLU
43.9%
Cost/1M
$60
API Access
Waitlist, then public beta
Architecture:Transformer decoder
Availability:API only (closed source)
Fine-tuning:Available via API
Training Cutoff:2019-10
Few-shot learning, in-context learning
OpenAI
Cost Tiers

GPT-3 curie

Jul 2020

More affordable GPT-3 variant for simpler tasks

Developer Info
Parameters
6.7B
Context
2,048
Cost/1M
$6
API Access
Public beta
Architecture:Transformer decoder
Availability:API only
Fine-tuning:Supported
Balance of speed and capability
OpenAI
Code Generation Era

Codex (code-davinci-002)

Jul 2021

GPT-3 fine-tuned on code, powers GitHub Copilot

Developer Info
Parameters
12B
Context
8,192
HumanEval
47%
0
API Access
Limited beta
Architecture:Transformer decoder
Availability:API (free during beta)
Fine-tuning:Not available
Training Cutoff:2021-06
Code completion, code explanation
OpenAI
Instruction Following

GPT-3 davinci-instruct-beta

Aug 2021

First instruction-tuned GPT-3 model using RLHF

Developer Info
Parameters
175B
Context
2,048
API Access
Beta testers
Architecture:Transformer decoder + RLHF
Availability:API only
Fine-tuning:Not available
Better instruction following, reduced harmful outputs
OpenAI
RLHF Production

GPT-3.5 text-davinci-002

Jan 2022

Production instruction-tuned model with extended context

Developer Info
Parameters
175B
Context
4,096
Cost/1M
$20
API Access
Public
Architecture:Transformer decoder + RLHF + PPO
Availability:API
Fine-tuning:Supported
Training Cutoff:2021-06
Better code understanding, extended context
OpenAI
Enhanced Instructions

GPT-3.5 text-davinci-003

Mar 2022

Improved instruction following and reduced harmful content

Developer Info
Parameters
175B
Context
4,096
Cost/1M
$20
API Access
Public
Architecture:Transformer decoder + enhanced RLHF
Availability:API
Fine-tuning:Supported
Training Cutoff:2021-06
Improved instruction adherence
OpenAI
Conversational AI Mainstream

ChatGPT (gpt-3.5-turbo)

Nov 2022

Chat-optimized model, 100M users in 2 months

Developer Info
Parameters
175B
Context
4,096
MMLU
70%
HumanEval
76.8%
Cost/1M
$1.5
API Access
Public API (March 2023)
Architecture:Transformer decoder + RLHF for dialogue
Availability:Web interface + API
Fine-tuning:Added November 2023
Training Cutoff:2021-09
Multi-turn conversations, system messages
Meta
Open Research Models

LLaMA 65B

Feb 2023

High-performance open research model

Developer Info
Parameters
65B
Context
2,048
MMLU
63.4%
API Access
No
Architecture:Transformer decoder
Availability:Research license (weights leaked)
Fine-tuning:Community supported
Training Cutoff:2022-12
Competitive with larger models
OpenAI
Multimodal Reasoning

GPT-4

Mar 2023

First major multimodal LLM with vision capabilities

Developer Info
Parameters
1.7T (estimated)
Context
8,192
MMLU
86.4%
HumanEval
67%
Cost/1M
$30
API Access
Waitlist, then public
Architecture:Transformer + Mixture of Experts (rumored)
Availability:API + ChatGPT Plus
Fine-tuning:Not available initially
Training Cutoff:2023-12
Advanced reasoning, vision understanding
OpenAI
Extended Context Premium

GPT-4-32k

Mar 2023

Extended context variant of GPT-4

Developer Info
Parameters
1.7T (estimated)
Context
32,768
MMLU
86.4%
Cost/1M
$60
API Access
Waitlist
Architecture:Same as GPT-4
Availability:API (limited access)
~50 pages of text context
OpenAI
Affordable Extended Context

GPT-3.5-turbo-16k

Jun 2023

Extended context for GPT-3.5 at accessible pricing

Developer Info
Parameters
175B
Context
16,384
MMLU
70%
Cost/1M
$3
API Access
Public
Architecture:Transformer decoder
Availability:API
Fine-tuning:Supported
Training Cutoff:2021-09
Affordable long context
Anthropic
100K Context Revolution

Claude 2

Jul 2023

First production model with 100K context window

Developer Info
Parameters
Undisclosed
Context
100,000
MMLU
78.5%
HumanEval
71.2%
Cost/1M
$11.02
API Access
Public
Architecture:Transformer decoder + Constitutional AI
Availability:Web + API
Fine-tuning:Not available
Training Cutoff:2023-01
Massive context, harmlessness focus
Meta
Open Source Commercial

Llama 2 70B

Jul 2023

First major open-source model with commercial license

Developer Info
Parameters
70B
Context
4,096
MMLU
68.9%
HumanEval
29.9%
Cost/1M
$0.9
API Access
Via third parties
Architecture:Transformer decoder
Availability:Open source, permissive license
Fine-tuning:Full weights available
Training Cutoff:2023-07
Free commercial use, self-hostable
OpenAI
Legacy Completions

GPT-3.5-turbo-instruct

Sep 2023

GPT-3.5 in completions format for backwards compatibility

Developer Info
Parameters
175B
Context
4,096
Cost/1M
$1.5
Architecture:Transformer decoder
Availability:API
Fine-tuning:Supported
Training Cutoff:2021-09
Non-chat format for specific use cases
OpenAI
128K Context Breakthrough

GPT-4 Turbo (1106)

Nov 2023

Massive context increase and cost reduction

Developer Info
Parameters
1.7T (estimated)
Context
128,000
MMLU
86.4%
HumanEval
87.2%
Cost/1M
$10
API Access
Public
Architecture:Optimized transformer
Availability:API + ChatGPT Plus
Fine-tuning:Not available
Training Cutoff:2023-04
300 page context, 3x cheaper than GPT-4
OpenAI
Cost Optimization

GPT-3.5-turbo-0125

Jan 2024

Updated GPT-3.5 with better accuracy and 50% cost reduction

Developer Info
Parameters
175B
Context
16,385
MMLU
70%
HumanEval
76.8%
Cost/1M
$0.5
API Access
Public
Architecture:Optimized transformer
Availability:API
Fine-tuning:Supported
Training Cutoff:2021-09
Best value for simple tasks
Google
Google Enters Race

Gemini Pro 1.0

Feb 2024

Google's first production LLM API

Developer Info
Parameters
Undisclosed
Context
32,000
MMLU
79.1%
Cost/1M
$0.5
API Access
Public
Architecture:Multimodal transformer
Availability:API + Google AI Studio
Fine-tuning:Not initially available
Training Cutoff:2023-11
Native multimodal understanding
Anthropic
Fast AI

Claude 3 Haiku

Feb 2024

Fastest Claude model with instant responses

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
75.2%
HumanEval
75.9%
Cost/1M
$0.25
API Access
Public
Architecture:Optimized transformer
Availability:API
Training Cutoff:2023-08
Near-instant responses, low latency
Anthropic
Balanced Performance

Claude 3 Sonnet

Feb 2024

Mid-tier Claude with excellent balance

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
79%
HumanEval
73%
Cost/1M
$3
API Access
Public
Architecture:Transformer decoder
Availability:API + Claude.ai
Training Cutoff:2023-08
Speed/intelligence balance
Anthropic
GPT-4 Challenger

Claude 3 Opus

Feb 2024

Matches or exceeds GPT-4 on many benchmarks

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
86.8%
HumanEval
84.9%
Cost/1M
$15
API Access
Public
Architecture:Large transformer decoder
Availability:API + Claude.ai Pro
Training Cutoff:2023-08
Top-tier reasoning, vision
OpenAI
Refined Intelligence

GPT-4 Turbo (2024-04-09)

Apr 2024

Updated GPT-4 Turbo with improved performance

Developer Info
Parameters
1.7T (estimated)
Context
128,000
MMLU
86.4%
HumanEval
87.2%
Cost/1M
$10
API Access
Public
Architecture:Optimized transformer
Availability:API + ChatGPT Plus
Fine-tuning:Experimental
Training Cutoff:2023-12
OpenAI
Omni-Modal Era

GPT-4o

May 2024

Native audio, vision, and text in single model

Developer Info
Parameters
Undisclosed
Context
128,000
MMLU
87.2%
HumanEval
90.2%
Cost/1M
$2.5
API Access
Public
Architecture:Unified multimodal transformer
Availability:API + ChatGPT (free & Plus)
Fine-tuning:Supported (text only)
Training Cutoff:2023-10
Real-time multimodal interaction
Google
2M Token Context

Gemini 1.5 Pro

May 2024

Unprecedented context window breakthrough

Developer Info
Parameters
Undisclosed
Context
2,000,000
MMLU
85.9%
HumanEval
84.1%
Cost/1M
$1.25
API Access
Public
Architecture:Sparse mixture-of-experts transformer
Availability:API + Google AI Studio
Fine-tuning:Not available
Training Cutoff:2024-01
Massive context for long documents
Google
Fast Multimodal

Gemini 1.5 Flash

May 2024

Fast, affordable model with 1M context

Developer Info
Parameters
Undisclosed
Context
1,000,000
MMLU
78.9%
HumanEval
74.1%
Cost/1M
$0.075
API Access
Public
Architecture:Efficient transformer
Availability:API + Google AI Studio
Training Cutoff:2024-01
Best value for high-frequency tasks
Anthropic
Coding Excellence

Claude 3.5 Sonnet

Jun 2024

Best coding model, beats GPT-4o on many benchmarks

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
88.7%
HumanEval
92%
Cost/1M
$3
API Access
Public
Architecture:Advanced transformer
Availability:API + Claude.ai
Fine-tuning:Not available
Training Cutoff:2024-04
Superior coding, agentic tasks
OpenAI
Affordable Intelligence

GPT-4o mini

Jul 2024

Most cost-efficient small model

Developer Info
Parameters
Undisclosed
Context
128,000
MMLU
82%
HumanEval
87.2%
Cost/1M
$0.15
API Access
Public
Architecture:Compact transformer
Availability:API + ChatGPT
Fine-tuning:Supported
Training Cutoff:2023-10
Best price/performance for high volume
Meta
Open Source Flagship

Llama 3.1 405B

Jul 2024

Largest open source model, rivals closed-source leaders

Developer Info
Parameters
405B
Context
128,000
MMLU
88.6%
HumanEval
89%
Cost/1M
$2.7
API Access
Via providers (Together, Groq, etc)
Architecture:Dense transformer decoder
Availability:Open source, permissive license
Fine-tuning:Full weights available
Training Cutoff:2023-12
Competitive with GPT-4, fully open
OpenAI
Reasoning Models

o1-preview

Sep 2024

Chain-of-thought reasoning for complex problems

Developer Info
Parameters
Undisclosed
Context
128,000
MMLU
92.3%
Cost/1M
$15
API Access
Limited release
Architecture:Transformer + extended thinking
Availability:API + ChatGPT Plus/Pro
Fine-tuning:Not available
Training Cutoff:2023-10
Visible chain-of-thought reasoning
OpenAI
Affordable Reasoning

o1-mini

Sep 2024

Faster, cheaper reasoning model for STEM

Developer Info
Parameters
Undisclosed
Context
128,000
MMLU
85.2%
HumanEval
94.6%
Cost/1M
$3
API Access
Limited release
Architecture:Compact reasoning model
Availability:API + ChatGPT Plus/Pro
Training Cutoff:2023-10
Fast reasoning at lower cost
Anthropic
Computer Use

Claude 3.5 Sonnet (Updated)

Oct 2024

Adds computer use capabilities - can control desktop

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
88.7%
HumanEval
93.7%
Cost/1M
$3
API Access
Public
Architecture:Advanced transformer
Availability:API + Claude.ai
Training Cutoff:2024-04
Computer control, better coding
Google
Multimodal Output

Gemini 2.0 Flash

Dec 2024

First model with native image and audio generation

Developer Info
Parameters
Undisclosed
Context
1,048,576
MMLU
71.9%
HumanEval
74.4%
Cost/1M
$0.1
API Access
Public
Architecture:Multimodal transformer
Availability:API + Google AI Studio
Training Cutoff:2024-08
Multimodal input AND output
DeepSeek
Value Revolution

DeepSeek R1

Jan 2025

Exceptional coding and math at breakthrough low pricing

Developer Info
Parameters
671B (MoE)
Context
64,000
MMLU
79.8%
HumanEval
96.3%
Cost/1M
$0.55
API Access
Public API available
Architecture:Mixture-of-Experts (MoE)
Availability:Open source + API
Fine-tuning:Weights available
Training Cutoff:2024-06
GPT-4 level performance at 1/30th cost
Google
Deep Think

Gemini 2.5 Pro

Mar 2025

Introduces Deep Think mode for complex reasoning

Developer Info
Parameters
Undisclosed
Context
1,000,000
MMLU
89.8%
HumanEval
91.5%
Cost/1M
$10
API Access
Public
Architecture:Advanced transformer
Availability:API + Google AI Studio
Training Cutoff:2025-01
Deep Think for complex problems
OpenAI
Fifth Generation

GPT-5

Aug 2025

Major architectural improvements and reasoning advances

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
90.4%
HumanEval
93.5%
Cost/1M
$15
API Access
Staged rollout
Architecture:Next-gen transformer
Availability:API + ChatGPT Plus/Pro
Fine-tuning:Enterprise only (initially)
Training Cutoff:2025-03
Significant reasoning improvements
Alibaba
Chinese AI Leader

Qwen 3 72B

Sep 2025

Third generation with multilingual excellence

Developer Info
Parameters
72B
Context
131,072
MMLU
87.8%
HumanEval
89.2%
Cost/1M
$0.9
API Access
Via Alibaba Cloud
Architecture:Dense transformer
Availability:Open source + API
Fine-tuning:Weights available
Training Cutoff:2025-07
Best Chinese + multilingual performance
xAI
Maximum Context

Grok 4 Fast

Oct 2025

World record 2M token context with fast inference

Developer Info
Parameters
Undisclosed
Context
2,000,000
MMLU
84.2%
HumanEval
86.4%
Cost/1M
$2
API Access
Limited beta
Architecture:Optimized transformer
Availability:API + X Premium
Training Cutoff:2025-09
Largest context window available
DeepSeek
Enhanced MoE

DeepSeek V3.2

Nov 2025

Improved mixture-of-experts for better efficiency

Developer Info
Parameters
671B (MoE)
Context
128,000
MMLU
85.7%
HumanEval
95.8%
Cost/1M
$0.6
API Access
Public
Architecture:Enhanced MoE
Availability:Open source + API
Fine-tuning:Supported
Training Cutoff:2025-08
Best coding performance per dollar
OpenAI
Adaptive Reasoning

GPT-5.1

Nov 2025

Dynamically adjusts thinking time based on complexity

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
91.2%
HumanEval
94.3%
Cost/1M
$18
API Access
Public
Architecture:Advanced transformer
Availability:API + ChatGPT Pro
Training Cutoff:2025-05
Dynamic computational allocation
Google
1500 Elo Breakthrough

Gemini 3 Pro

Nov 2025

First model to break 1500 LMArena Elo barrier

Developer Info
Parameters
Undisclosed
Context
1,000,000
MMLU
91.5%
HumanEval
93.8%
Cost/1M
$15
API Access
Public
Architecture:Next-gen transformer
Availability:API + Gemini Advanced
Training Cutoff:2025-08
Highest Elo rating achieved
Anthropic
SWE-bench Champion

Claude Sonnet 4.5

Nov 2025

Tops real-world GitHub issue resolution at 77.2%

Developer Info
Parameters
Undisclosed
Context
1,000,000
MMLU
88.3%
HumanEval
93.7%
Cost/1M
$3
API Access
Public
Architecture:Advanced transformer
Availability:API + Claude.ai
Fine-tuning:Enterprise (limited)
Training Cutoff:2025-07
Best real-world coding performance
Anthropic
Agentic AI

Claude Opus 4.5

Nov 2025

Optimized for long-running agents and complex workflows

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
88.7%
HumanEval
92%
Cost/1M
$15
API Access
Public
Architecture:Large transformer
Availability:API + Claude.ai Pro
Training Cutoff:2025-08
Best for autonomous agents
Anthropic
Fast + Affordable

Claude 4.5 Haiku

Nov 2025

Fastest Claude 4.5 with excellent value

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
78.5%
HumanEval
82.3%
Cost/1M
$0.8
API Access
Public
Architecture:Efficient transformer
Availability:API + Claude.ai
Training Cutoff:2025-07
Best for high-volume, low-latency
xAI
AI Race November

Grok 4.1

Nov 2025

Released during intense 6-day November competition

Developer Info
Parameters
Undisclosed
Context
131,072
MMLU
88.9%
HumanEval
90.7%
Cost/1M
$5
API Access
Limited
Architecture:Transformer
Availability:API + X Premium
Training Cutoff:2025-11
Real-time knowledge, X integration
OpenAI
Code Red Response

GPT-5.2

Dec 2025

Released after internal "code red" to reclaim leadership

Developer Info
Parameters
Undisclosed
Context
200,000
MMLU
92.8%
HumanEval
95.1%
Cost/1M
$20
API Access
Public
Architecture:Advanced transformer
Availability:API + ChatGPT Plus/Pro
Training Cutoff:2025-06
Highest reasoning benchmarks
Google
Frontier Speed

Gemini 3 Flash

Dec 2025

Frontier intelligence built for speed at exceptional value

Developer Info
Parameters
Undisclosed
Context
1,000,000
MMLU
85.4%
HumanEval
87.2%
Cost/1M
$0.15
API Access
Public
Architecture:Optimized transformer
Availability:API + Gemini
Training Cutoff:2025-10
Fast frontier model at low cost

Explore Current Models

Compare the latest models, explore detailed benchmarks, and find the perfect API for your application.