Multimodal Agents
AI agents that process text, images, audio, and video· 48 agents
by Meta AI
Meta's latest open-source LLM family. Maverick (400B MoE) rivals GPT-5. Scout (17B) runs on consumer hardware. Multimodal with vision. Fully open weights.
by DeepSeek
DeepSeek V3.2 — 685B MoE open-source frontier model. Matches GPT-5 and Claude 4.5 on most benchmarks at near-zero inference cost. Freely downloadable.
by Alibaba Cloud / Tongyi
Alibaba's flagship open-source LLM. 235B MoE (22B active). Multilingual, strong on coding and math. Qwen3-Coder variant matches Claude Code on HumanEval.
by Google DeepMind
Google's open-weight model family for on-device and research use. 2B to 27B parameters. Runs on laptops, phones, and edge devices. Strong safety tuning.
by Meta AI
Meta AI powered by Llama 4. Built into WhatsApp, Instagram, Facebook, and Messenger for 3B+ users. Web search, image generation, and real-time answers.
by Hugging Face
Hugging Face's open-source chat UI for any model. Access Llama 4, DeepSeek, Mistral, Gemma, and 100+ open-weight models. Free, no API key required.
by Stability AI / Black Forest Labs
FLUX 1.1 Pro Ultra by Black Forest Labs — current state of the art in open-source image generation. Photorealistic, fast, commercially licensable. 100M+ imag...
by Google DeepMind
Google DeepMind's multimodal AI assistant. Gemini 2.5 Pro with native thinking, 1M token context, and tight integration across Google Workspace, Android, and Search.
by Anthropic
Anthropic's AI assistant powered by Claude Opus 4.6 and Sonnet 4.6. Extended thinking, 200K context, and 300K output via Batches API. Strong in coding, analysis, and nuanced reasoning.
by OpenAI
OpenAI's flagship AI assistant powered by GPT-5 and GPT-5.2 Thinking. Unified system with intelligent routing between fast responses and deep reasoning. The most widely used AI chatbot globally.
by Leonardo AI
AI creative suite with 150M+ users. Fine-tuned models for gaming assets, product images, and social media. Real-time canvas, video gen, and 3D asset pipeline.
by Google DeepMind
Google DeepMind's latest video generation model. Veo 3.1 creates 4K video with native audio — ambient sounds, dialogue, music — all from a single prompt.
by Ideogram
Best-in-class AI image generator for text rendering. Ideogram v3 produces accurate, beautiful typography in images — a longstanding AI limitation now solved.
by Adobe
Adobe's commercially-safe generative AI. Trained on licensed content — zero copyright risk. Integrated into Photoshop, Illustrator, Premiere Pro, and Express.
by Kuaishou Technology
Kuaishou's Kling 3.0 — top-ranked AI video generator on LogRocket. Cinematic quality, superior character consistency, and affordable pricing vs Runway.
by Luma AI
Luma AI's video generation model. Photorealistic, physically accurate 5-second clips from text or images. Used by Hollywood VFX studios.
by HeyGen
AI video platform for creating talking-avatar videos. Used by 500K+ businesses for training, marketing, and product videos. 175+ AI avatars, 40+ languages.
by Synthesia
AI video generation platform with human avatars. Create training, marketing, and onboarding videos in 140+ languages without cameras or studios.
by OpenAI
OpenAI's second-generation video model. Cinema-quality 1080p video up to 60 seconds from text, image, or video. Physics simulation, precise camera control.
by Midjourney Inc
The leading AI image generator for artistic and commercial work. V7 introduces consistent characters, style references, and improved photorealism. 25M+ users.
by Runway AI
Hollywood-grade AI video generation. Gen-4 Turbo produces 4K video clips with reference-consistent characters. Used by major studios and content creators.
by Quora
Multi-model AI chat by Quora. One subscription accesses Claude, GPT-5, Gemini, Llama 4, and 100+ models. Create and monetize custom bots.
by Mistral AI
Mistral AI's chat powered by Mistral Large 3. Ultra-fast, multilingual, canvas mode, web search, and document analysis. Europe's leading LLM company.
by xAI
xAI's AI powered by Grok 4 — four AI agents running in parallel. Real-time X/Twitter data, Aurora image gen, video understanding, and deep reasoning.
by inception
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and...
by openai
GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K ou...
by google
30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3,...
by google
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window...
by google
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token ...
by google
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token ...
by ~anthropic
This model always redirects to the latest model in the Claude Opus family.
by openai
GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more a...
by openai
GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. ...
by google
Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you ca...
by google
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window...
by mistralai
Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It ...
by GitHub Actions
Formula WorkPaper runtime for Node.js services and agent tools with JSON persistence and formula readback.
by isdk
AI Agent Script is a framework for defining AI Agents, their properties, and behaviors for interactive conversations. This document provides an overview of t...
by wfmedia
Cognitive browser automation that thinks like your users—and helps AI agents navigate too. Simulate real user cognition with abandonment detection, constitut...
by djtony707
TITAN — Autonomous AI agent framework with self-improvement, multi-agent orchestration, 36 LLM providers, 16 channel adapters, GPU VRAM management, mesh netw...
by GitHub Actions
MCP server + Excalidraw whiteboard UI for AI-assisted diagramming (Claude Code / Codex).
by GitHub Actions
LangChain.js adapters for Model Context Protocol (MCP)
by mcpcat
Analytics tool for MCP (Model Context Protocol) servers - tracks tool usage patterns and provides insights
by ruvnet
Production-ready AI agent orchestration platform with 66 specialized agents, 213 MCP tools, ReasoningBank learning memory, and autonomous multi-agent swarms....
by GitHub Actions
GitLab MCP server for projects, merge requests, issues, pipelines, wiki, releases, and more
by x-ai
Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, ...
by alvbln
Alvin Bot — open-source, self-hosted autonomous AI agent on Telegram, Slack, Discord, WhatsApp, Signal, terminal & web. Built on the Claude Agent SDK with a ...
by jkheadley
Persistent autonomy infrastructure for AI agents
Have a Multimodal Agents agent?
Submit it to appear alongside 48 others in this category.
Submit in Multimodal Agents →