Reve AI
리소스 마켓
MCP개발무료

FunASR

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

17k

(简体中文|English|日本語|한국어)

Industrial speech recognition. 170x faster than Whisper. 50+ languages. Speaker diarization · Emotion detection · Streaming · One API call

Quick Start · Colab · Benchmark · Model selection · Migration guide · Use cases · Deployment matrix · Models · Agent Integration · Docs · Contribute


Quick Start

Open In Colab

No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.

pip install torch torchaudio
pip install funasr
from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav")

Output — structured text with speaker labels, timestamps, and punctuation:

[00:00.4 → 00:03.8] Speaker 0: Let's discuss the Q3 plan.
[00:04.2 → 00:07.1] Speaker 1: Sounds good. I have three points.
[00:07.5 → 00:12.3] Speaker 0: Go ahead. We have 30 minutes.

That's it. One model, one call — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.

Deploy as API server: funasr-server --device cuda → OpenAI-compatible endpoint at localhost:8000

Use with AI agents: MCP Server for Claude/Cursor · OpenAI API for LangChain/Dify/AutoGen

Why FunASR?

FunASRWhisperCloud APIs
Speed170x realtime13x realtime~1x realtime
Speaker ID✅ Built-in❌ Needs pyannote✅ Extra cost
Emotion✅ Happy/Sad/Angry
Languages50+57Varies
Streaming✅ WebSocket
vLLM Acceleration✅ 2-3x fasterN/A
Self-hosted✅ MIT license✅ MIT license❌ Cloud only
CostFreeFree$0.006/min+
CPU viable✅ 17x realtime❌ Too slowN/A

Trying FunASR for the first time? Use the Colab quickstart before setting up a local environment. Choosing a first model? Start with the model selection guide. Planning a switch from Whisper or a cloud ASR provider? Use the migration guide and benchmark example to test representative audio, map features, and roll out safely.


Benchmark

184 long-form audio files (192 min). Full report →


GitHub에서 전체 내용 보기