Multimodal Agents
AI agents that process text, images, audio, and video· 35 agents
by Alibaba Cloud / Tongyi
Alibaba's flagship open-source LLM. 235B MoE (22B active). Multilingual, strong on coding and math. Qwen3-Coder variant matches Claude Code on HumanEval.
by Meta AI
Meta's open-source multimodal models. Llama 4 Scout (17B active, 16 experts, 10M context) and Maverick (17B active, 128 experts, 1M context). Released April ...
by Hugging Face
Hugging Face's open-source chat UI for any model. Access Llama 4, DeepSeek, Mistral, Gemma, and 100+ open-weight models. Free, no API key required.
by Google DeepMind
Google DeepMind's multimodal AI assistant. Gemini 2.5 Pro with native thinking, 1M token context, and tight integration across Google Workspace, Android, and Search.
by upstash
Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors
by CopilotKit
The Frontend Stack for Agents & Generative UI. React + Angular. Makers of the AG-UI Protocol
by iOfficeAI
Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you l...
by UfoMiao
Zero-Config Code Flow for Claude code & Codex
by nexu-io
The simplest desktop client for OpenClaw 🦞 — bridge your Agent to WeChat, Feishu, Slack & Discord in one click. Works with Claude Code, Codex & any LLM. BYO...
by AlexAnys
🇨🇳 OpenClaw中文用例与案例大全 | 46个真实场景 | 国内特色 + 海外案例的国内适配 | 自动化办公·内容创作·运维·AI助理·知识管理 | 新手友好 | Chinese guide for OpenClaw AI agent use cases
by uditgoenka
Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat ...
by covibes
Your autonomous engineering team in a CLI. Point Zeroshot at an issue, walk away, and return to production-grade code. Supports Claude Code, OpenAI Codex, Op...
by qingchencloud
🦞 OpenClaw 可视化管理面板 — 内置 AI 助手(工具调用 + 图片识别 + 多模态),一键安装 | Visual management panel with built-in AI assistant (tool calling + vision + multimodal + i18n(11))
by n8n-io
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
by microsoft
A programming framework for agentic AI
by lintsinghua
《御舆:解码 Agent Harness》42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析,15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站:
by OpenAI
OpenAI's second-generation video model. Cinema-quality 1080p video up to 60 seconds from text, image, or video. Physics simulation, precise camera control.
by Obviously AI
No-code predictive AI. Connect any data source, train an ML model in 2 minutes, and get predictions on churn, sales, fraud, and pricing without any code.
by Julius AI
AI data analyst. Upload CSV, Excel, SQL databases — get visualizations, insights, and statistical analysis in plain English. No coding required.
by anthropic
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative...
by openai
GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reaso...
by kwaipilot
KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS inte...
by qwen
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mix...
by qwen
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9...
by wanshuiyin
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment auto...
Qwen2.5 Coder 7B Instruct — a text-generation model by undefined on HuggingFace. 2,522,695 downloads, 678 likes.
by qwen
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-expe...
by AntonOsika
CLI platform to experiment with codegen. Precursor to: https://lovable.dev
by qwen
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-ex...
by can1357
⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more
by qwen
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed ...
by adongwanai
https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习|数据合成
by openai
GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K ou...
by laihenyi
Convert NotebookLM PDFs to PPTX with separated background images and editable text layers using Gemini AI
by qwen
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-e...
Have a Multimodal Agents agent?
Submit it to appear alongside 35 others in this category.
Submit in Multimodal Agents →