FunASR
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Industrial speech recognition. 170x faster than Whisper. 50+ languages. Speaker diarization · Emotion detection · Streaming · One API call
Quick Start · Colab · Benchmark · Model selection · Migration guide · Use cases · Deployment matrix · Models · Agent Integration · Docs · Contribute
Quick Start
No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.
pip install torch torchaudio
pip install funasr
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav")
Output — structured text with speaker labels, timestamps, and punctuation:
[00:00.4 → 00:03.8] Speaker 0: Let's discuss the Q3 plan.
[00:04.2 → 00:07.1] Speaker 1: Sounds good. I have three points.
[00:07.5 → 00:12.3] Speaker 0: Go ahead. We have 30 minutes.
That's it. One model, one call — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.
Deploy as API server:
funasr-server --device cuda→ OpenAI-compatible endpoint at localhost:8000Use with AI agents: MCP Server for Claude/Cursor · OpenAI API for LangChain/Dify/AutoGen
Why FunASR?
| FunASR | Whisper | Cloud APIs | |
|---|---|---|---|
| Speed | 170x realtime | 13x realtime | ~1x realtime |
| Speaker ID | ✅ Built-in | ❌ Needs pyannote | ✅ Extra cost |
| Emotion | ✅ Happy/Sad/Angry | ❌ | ❌ |
| Languages | 50+ | 57 | Varies |
| Streaming | ✅ WebSocket | ❌ | ✅ |
| vLLM Acceleration | ✅ 2-3x faster | ❌ | N/A |
| Self-hosted | ✅ MIT license | ✅ MIT license | ❌ Cloud only |
| Cost | Free | Free | $0.006/min+ |
| CPU viable | ✅ 17x realtime | ❌ Too slow | N/A |
Trying FunASR for the first time? Use the Colab quickstart before setting up a local environment. Choosing a first model? Start with the model selection guide. Planning a switch from Whisper or a cloud ASR provider? Use the migration guide and benchmark example to test representative audio, map features, and roll out safely.
Benchmark
184 long-form audio files (192 min). Full report →
같은 카테고리 다른 리소스
Next.js
React 기반 풀스택 프레임워크. App Router + RSC가 사실상 표준.
shadcn/ui
복사-붙여넣기 React 컴포넌트 모음. npm 의존성이 아닌 코드 소유권 모델.
Supabase
PostgreSQL 기반 BaaS. Auth · Realtime · Storage · Edge Functions 통합.
Anthropic MCP
Claude가 외부 도구/데이터에 접근하도록 해주는 프로토콜 표준. 생태계의 근간.