AI Timeline
180 entries · Sep 30, 2012 – Mar 2, 2026
Details on OpenAI’s contract with the Department of War, outlining safety red lines, legal protections, and how AI systems will be deployed in classified environments.
Microsoft and OpenAI continue to work closely across research, engineering, and product development, building on years of deep collaboration and shared success.
OpenAI and Amazon announce a strategic partnership bringing OpenAI’s Frontier platform to AWS, expanding AI infrastructure, custom models, and enterprise AI agents.
Today we’re announcing $110B in new investment at a $730B pre money valuation. This includes $30B from SoftBank, $30B from NVIDIA, and $50B from Amazon.
Stateful Runtime for Agents in Amazon Bedrock brings persistent orchestration, memory, and secure execution to multi-step AI workflows powered by OpenAI.
OpenAI shares updates on its mental health safety work, including parental controls, trusted contacts, improved distress detection, and recent litigation developments.
OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure revi
OpenAI and Figma launch a new Codex integration that connects code and design, enabling teams to move between implementation and the Figma canvas to iterate and ship faster.
Our latest threat report examines how malicious actors combine AI models with websites and social platforms—and what it means for detection and defense.
OpenAI appoints Arvind KC as Chief People Officer to help scale the company, strengthen its culture, and lead how work evolves in the age of AI.
SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments.
We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.
OpenAI for India expands AI access across the country—building local infrastructure, powering enterprises, and advancing workforce skills.
OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
A new preprint shows GPT-5.2 proposing a new formula for a gluon amplitude, later formally proved and verified by OpenAI and academic collaborators.
Introducing Lockdown Mode and Elevated Risk labels in ChatGPT to help organizations defend against prompt injection and AI-driven data exfiltration.
How OpenAI built a real-time access system combining rate limits, usage tracking, and credits to power continuous access to Sora and Codex.
Anthropic releases Claude Sonnet 4 and Claude Opus 4, both capable of switching between instant responses and extended thinking. Opus 4 is Anthropic's most capable model and sets new benchmarks on complex coding and agentic tasks.
OpenAI releases o3 and o4-mini, the latest in their reasoning model line. Both can use tools (web search, Python, image generation) natively during their thinking process. o4-mini achieves o3-level math at a fraction of the cost.
Meta releases Llama 4 Scout and Maverick, natively multimodal mixture-of-experts models supporting up to 10 million token context. Scout (17B active params) is the most efficient; Maverick rivals GPT-4o and Gemini 2.0 Flash.
Google releases Gemini 2.5 Pro with strong reasoning capabilities via an internal thinking process. Tops the LMArena leaderboard and achieves best-in-class results on math, science, and coding evaluations.
OpenAI releases GPT-4.5, their largest model trained with unsupervised learning at unprecedented scale. Focuses on improved emotional intelligence, reduced hallucination, and better world knowledge. Available to ChatGPT Pro users first.
Anthropic releases Claude 3.7 Sonnet with an optional extended thinking mode, letting the model reason through hard problems before responding. Sets new records on coding benchmarks (SWE-bench) and agentic tasks.
Elon Musk's xAI releases Grok 3, trained on a 100,000 H100 GPU cluster. Claims top performance on math and science reasoning benchmarks. Includes a "Think" mode for extended chain-of-thought reasoning.
DeepSeek releases R1, an open-source reasoning model trained with reinforcement learning. Achieves performance comparable to OpenAI o1 on math and coding benchmarks at a fraction of the cost.
Chinese AI lab DeepSeek releases V3, a 671B MoE model that matches or beats GPT-4o and Claude 3.5 Sonnet on many benchmarks. Trained for approximately $6M — a fraction of comparable Western models — sparking intense discussion about efficiency.
Google releases Gemini 2.0 Flash, a fast multimodal model with native image generation, tool use, and improved reasoning. Designed for agentic workflows and real-time applications.
After nearly a year as a closed demo, OpenAI makes Sora generally available to ChatGPT Plus and Pro subscribers. Includes a creative mode, storyboard editor, and remix tools. Turbo variant enables faster generation.
Meta releases Llama 3.3 70B, a text-only model delivering performance comparable to Llama 3.1 405B on key benchmarks while being dramatically cheaper to run. Immediately becomes the go-to open-source model for cost-sensitive deployments.
Anthropic releases Claude 3.5 Haiku, the fastest model in the Claude 3.5 family. Delivers Claude 3 Opus-level performance at a fraction of the latency and cost. Designed for high-throughput applications.
OpenAI raises $6.6 billion in a funding round led by Thrive Capital, with participation from Microsoft, Nvidia, and SoftBank. Values the company at $157 billion, one of the largest private fundraises ever.
Meta releases Llama 3.2 with two multimodal vision models (11B, 90B) and two lightweight models (1B, 3B) optimized for edge and mobile devices. First Llama models to process images alongside text.
OpenAI releases o1-preview and o1-mini, a new model family trained to reason through problems step-by-step using reinforcement learning. Significantly outperforms GPT-4o on math, science, and coding at the cost of slower responses.
Mistral AI releases Mistral Large 2 (123B parameters) with 128k context, strong multilingual support, and improved coding and math performance. Available via API and for self-hosting.
Meta releases Llama 3.1 including a 405B parameter flagship model that competes with GPT-4o and Claude 3.5 Sonnet. First open-source model to be considered genuinely frontier-level.
OpenAI releases GPT-4o mini, a small efficient model that outperforms GPT-3.5 Turbo while being cheaper. Becomes the new default for cost-sensitive applications.
Anthropic introduces Artifacts in Claude.ai, allowing Claude to create and display runnable code, documents, and interactive web apps alongside conversations.
OpenAI launches GPT-4o ('omni'), their first natively multimodal model that processes text, audio, and images in a unified architecture. Introduces real-time conversational voice mode.
Meta releases Llama 3 in 8B and 70B parameter sizes, claiming best-in-class performance for open models. Available under a custom license permitting commercial use for most applications.
Anthropic releases three new models: Claude 3 Haiku (fast/cheap), Claude 3 Sonnet (balanced), and Claude 3 Opus (most capable). Opus claims to outperform GPT-4 on multiple benchmarks.
Mistral AI releases Mixtral 8x7B, a sparse mixture-of-experts model that routes tokens through 2 of 8 expert networks per token. Matches or beats Llama 2 70B and GPT-3.5 on most benchmarks at lower inference cost.
Google DeepMind announces Gemini, its most capable model family. Comes in three sizes: Ultra (most capable), Pro (balanced), and Nano (on-device). Claims to outperform GPT-4 on several benchmarks. Gemini Pro ships in Bard immediately.
At the first OpenAI DevDay, Sam Altman announces GPT-4 Turbo with a 128k context window, knowledge cutoff of April 2023, and lower prices. Also announces the GPT Store, Assistants API, and custom GPTs.
Mistral AI releases Mistral 7B under Apache 2.0, a 7B model using grouped-query attention and sliding window attention. Despite its small size, it outperforms Llama 2 13B across most benchmarks and becomes a widely adopted open model.
Meta releases Code Llama, a family of coding-specialized models fine-tuned from Llama 2 in 7B, 13B, and 34B sizes. Supports fill-in-the-middle completions and long context (100k tokens). Free for research and commercial use.
Meta releases Llama 2 (7B, 13B, 70B) with a commercial-use license in partnership with Microsoft. Includes instruction-tuned chat variants. Becomes the dominant open-source LLM baseline for fine-tuning and research.
Anthropic releases Claude 2 with a 100,000 token context window (able to process full books), stronger coding and math performance, and improved instruction following. Available in the US and UK via Claude.ai.
Google opens Bard, its conversational AI product, to the public in the US and UK following a stumbling demo in February. Initially powered by LaMDA and later upgraded to PaLM 2, it later becomes Gemini.
OpenAI releases GPT-4, a multimodal model that accepts text and image input. Scores in the top percentiles on the bar exam, SAT, and GRE. Powers ChatGPT Plus and marks a step-change in reasoning capability.
Anthropic launches Claude, its first publicly available AI assistant trained with Constitutional AI (CAI). Positioned as a safer, more steerable alternative to ChatGPT. Released in limited beta via API and Slack integration.
Meta AI releases LLaMA (Large Language Model Meta AI) as a research artifact in sizes from 7B to 65B. Though access-gated, the weights leak within days, sparking an open-source LLM movement and countless fine-tuned derivatives.
OpenAI launches ChatGPT as a free research preview. Built on GPT-3.5 with RLHF fine-tuning, it becomes the fastest-growing consumer application in history, reaching 100 million users in two months and triggering an industry-wide AI race.
OpenAI open-sources Whisper, a speech recognition model trained on 680,000 hours of audio. Achieves near-human transcription accuracy across dozens of languages and is released as freely usable open-source software.
Stability AI releases Stable Diffusion weights publicly, making high-quality image generation available for anyone to run locally. The open release triggers an explosion of derivative models, fine-tunes, and applications.
Midjourney opens its Discord-based image generation tool to the public. Its distinctive artistic aesthetic and accessible interface make it one of the first AI tools to reach millions of non-technical users.
OpenAI demonstrates DALL-E 2, producing photorealistic images and artistic compositions with dramatically improved quality over the original. Combines CLIP with a diffusion model. Released in limited beta before broader access.
DeepMind releases the AlphaFold Protein Structure Database with predicted structures for nearly all human proteins. Solves a 50-year grand challenge in biology. Over 1 million structures made freely available to researchers.
GitHub and OpenAI launch Copilot, an AI coding assistant trained on public GitHub repositories. Powered by Codex (a GPT-3 descendant), it autocompletes functions and writes boilerplate from comments. First mass-market AI coding tool.
OpenAI demonstrates DALL-E, a 12B parameter version of GPT-3 trained to generate images from text descriptions. Shows surprisingly creative and compositional outputs, marking the beginning of mainstream text-to-image AI.
OpenAI publishes the GPT-3 paper, describing a 175B parameter model that performs tasks from just a few examples in the prompt — no fine-tuning required. Sets off widespread excitement about large language models.
After a controversial phased release citing misuse concerns, OpenAI releases the full 1.5B parameter GPT-2. The model generates coherent long-form text and sparks public debate about AI safety and responsible disclosure.
OpenAI releases the first GPT (Generative Pre-trained Transformer), a 117M parameter model pre-trained on BooksCorpus. Demonstrates that unsupervised pre-training followed by fine-tuning achieves strong NLP results.