Skip to content

Latest commit

 

History

History
705 lines (574 loc) · 30.5 KB

asr_002.md

File metadata and controls

705 lines (574 loc) · 30.5 KB

20200727

caffe

https://github.com/pannous/tensorflow-speech-recognition
https://github.com/pannous/caffe-speech-recognition

Common Voice, Mozilla

https://voice.mozilla.org/zh-CN/datasets

caffe

http://www.caffecn.cn

Sound_Sensor

https://www.waveshare.net/wiki/Sound_Sensor

DuerOS

https://github.com/MyDuerOS/DuerOS-Python-Client
https://developer.baidu.com/forum/topic/show/244631?pageNo=1

nrf, i2s, tensorflow lite

https://github.com/Bloom-Agritech/ruderalis-firmware/blob/master/lib/DHT22Gen3_RK/src/DHT22Gen3_RK.cpp
https://github.com/oivoii/nrf-tensorflow

Theano

https://github.com/Theano/Theano

Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

https://arxiv.org/abs/1703.05390

ML-KWS

https://github.com/UT2UH/ML-KWS-for-ESP32
https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Deployment
https://github.com/zhouilu/ARM_TensorFlow/tree/master/Pretrained_models/Basic_LSTM
https://blog.csdn.net/xj853663557/article/details/83784452

dtw, esp32

https://github.com/kasiim/ESP-EYE-speaker-verification

ESP32-A1S-AudioKit, for Arduino

https://github.com/RealCorebb/ESP32-A1s-Audio-Kit

wm8978, wm8960

ac101, ac108

ADAFRUIT "MUSIC MAKER" MP3 SHIELD FOR ARDUINO W/3W STEREO AMP

https://www.adafruit.com/product/1788
VS1053 CODEC + MICROSD BREAKOUT - MP3/WAV/MIDI/OGG PLAY + RECORD
https://www.adafruit.com/product/1381
https://learn.adafruit.com/adafruit-music-maker-shield-vs1053-mp3-wav-wave-ogg-vorbis-player
https://github.com/adafruit/Adafruit_VS1053_Library
https://github.com/adafruit/Adafruit_VS1053_Library/blob/master/examples/record_ogg/record_ogg.ino

SIPEED R6+1 麦克风阵列

https://cn.dl.sipeed.com/MAIX/HDK/Sipeed-R6%2b1_MicArray/Specifications
MSM261S4030H0

Maix-Bit V2.0(with MEMS microphone)

https://cn.dl.sipeed.com/MAIX/HDK/Sipeed-Maix-Bit/Maix-Bit%20V2.0(with%20MEMS%20microphone)
MSM261S4030H0

ATOM ECHO / M5StickC

SPM1423 (MEMS PDM Microphone)
https://docs.m5stack.com/#/zh_CN/atom/atomecho
https://github.com/m5stack/M5-ProductExampleCodes/blob/master/Core/Atom/AtomEcho/Arduino/Factory_Test/Factory_Test.ino
https://docs.m5stack.com/#/en/core/m5stickc
https://docs.m5stack.com/#/zh_CN/core/m5stickc
https://github.com/m5stack/M5StickC/blob/master/examples/Basics/FactoryTest/FactoryTest.ino

关于麦克风的参数介绍 - 驻极体麦克风(ECM)和硅麦(MEMS)

https://blog.csdn.net/weixin_39671078/article/details/82414208

语音识别基本原理, 英文, 罗宾纳

Fundamentals of Speech Recognition
search baidupan, 语音识别基本原理

ds-cnn

https://github.com/shaharpit809/Speech-Denoising-using-DNN-CNN-and-RNN
http://www.eepw.com.cn/article/201801/375072.htm
https://github.com/dingminglu/DeepBrain-ESP32-Audio-SDK
https://github.com/peter-360/esp32_chuwugui
https://github.com/Dod-o/Statistical-Learning-Method_Code
ML-KWS todo download

sr

https://github.com/kacperlukawski/speech-recognition.git

CAE

https://blog.csdn.net/zmdsjtu/article/details/52816692
https://blog.csdn.net/qq_33835307/article/details/81006954
https://github.com/li900309/PingYeAudio/blob/master/app/src/main/java/com/cn/cae/MainActivity.java
http://wiki.t-firefly.com/zh_CN/ROC-RK3328-PC/module_mic.html
CAE:Circular Array Enhancement
https://www.xfyun.cn/doc/solutions/hardwareUniversal/CAE-Android-SDK.html#功能简介

Hello Edge: Keyword Spotting on Microcontrollers

https://arxiv.org/abs/1711.07128v3
https://arxiv.org/pdf/1711.07128.pdf

STM32F4使用FPU+DSP库进行FFT运算的测试过程二

https://www.cnblogs.com/NickQ/p/8541156.html
https://www.cnblogs.com/NickQ/p/8540487.html
arm_cfft_radix4_instance_f32

Matlab---串口操作---数据採集篇

https://www.cnblogs.com/mengfanrong/p/5168805.html
matlab学习笔记2:搭建简易的串口,并将数据保存至csv
https://blog.csdn.net/weixin_38494129/article/details/86469261

Offline Speech Recognition on Raspberry Pi 4 with Respeaker

https://www.hackster.io/dmitrywat/offline-speech-recognition-on-raspberry-pi-4-with-respeaker-c537e7
https://www.seeedstudio.com/blog/2020/01/23/offline-speech-recognition-on-raspberry-pi-4-with-respeaker/
https://github.com/mozilla/DeepSpeech-examples
https://github.com/mozilla/DeepSpeech/releases

使用DeepSpeech2进行语音识别

https://www.helplib.cn/fansisi/deepspeech-pytorch

jasperproject

https://github.com/jasperproject/jasper-client
https://github.com/jasperproject/jasper-client/blob/master/client/stt.py

手写实现李航《统计学习方法》书中全部算法

https://github.com/Dod-o/Statistical-Learning-Method_Code

kfrlib

https://www.kfrlib.com
https://github.com/kfrlib/kfr
https://github.com/BA17-loma-1/Audio_Signal_Processing_Toolbox

deepspeech

https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz

rhasspy, 唤醒词引擎, 意图引擎, 语音合成引擎

https://github.com/synesthesiam/rhasspy
https://github.com/Picovoice/porcupine
https://github.com/MycroftAI/mycroft-precise
【IOT】轻量级语音识别框架汇总
https://blog.csdn.net/wangbotao1990/article/details/98743437

esp32 online asr sdk

https://github.com/espressif/esp-va-sdk

盘点下市面上都有哪些智能语音开发板

http://www.elecfans.com/d/646154.html
rokid, 若琪智能音箱
https://developer.rokid.com/#/doc

唤醒词引擎

在开源社区,除了snowboy(不开源)这个比较出名的唤醒词引擎,其实还有好几个,
这些唤醒词引擎可以在一个叫rhasspy的离线语音助手开源项目中看到:
第一个是porcupine,是加拿大一家公司Picovoice的项目(不开源),另一个是mycroft-precise(开源),公司在美国。
至于snowboy(不开源),我以前说过了被百度收购了。而至于pocketsphinx(开源)就不说了(太古老了)。
其实如果不考虑设备的可移动性,甚至deepspeech(开源)、Julius(开源)、kaldi(开源)
这些重量级的项目也有可能可以用来做成嵌入式产品,
例如用于树莓派,所以不仅仅只是这几个,应该数量不少

如何用树莓派3b运行简单的deepspeech命令行进行wav文件的语音识别。

前面我提到一篇文章介绍这方面的经验:
https://www.seeedstudio.com/blog/2020/01/23/offline-speech-recognition-on-raspberry-pi-4-with-respeaker/
另外还有一篇,用的是0.6.0版本:
https://dev.webonomic.nl/trying-out-deepspeech-on-a-raspberry-pi-4
(一)关于硬件:官方明确支持树莓派3和树莓派4,所以大部分arm linux开发板都应该支持。但性能有差距,后面我会说。
(二)关于软件:建议使用最新的raspbian系统ROM,我用的是2020-05-27 Buster,非full版的ROM,使用自带的python 3.7和pip3安装deepspeech==0.6.0。
旧版的python 3和python 2应该无法安装0.6.0。至于为什么pip3会搜索到一个deepspeech-tflite的软件包,那个可以忽略不管。另外需要用到sudo, 否则无法把deepspeech命令添加到PATH
(三)关于deepspeech命令行参数:推荐使用tflite后缀(tensorflow lite)的模型数据文件(--model参数):output_graph.tflite。
其他还有两个模型数据文件,一个是pb后缀(protobuf的缩写),另一个是pbmm后缀(protobuf的mmap版),我没有测试,
理论上tflite版占用内存要小,不容易崩溃。另外两个参数trie(字典树)和lm(语言模型),据说这两个是可选参数,尤其是lm文件的体积非常大,
但我没有测试,最好加上(具体用法参考deepspeech帮助)
(四)关于软件包依赖。deepspeech据说是基于tensorflow的,但实际安装python包时没有依赖于tensorflow,我猜测这里有玄机 (五)关于性能。如果用树莓派3b运行,速度是10秒左右(据说树莓派4可以提升到2秒左右),具体结果如下
(分别是听写结果、听写时间、wav持续时间):

  • experience proof of, 12.486s, 1.975s
  • why should one halt on the way, 14.917s, 2.735s
  • your paris efficient i said, 11.295s, 2.590s
    所以这个语音识别引擎有两个硬伤:一是官方模型数据只支持英语。二是比较重量级,需要性能比较好的硬件才能达到实时听写的效果

TensorFlow 到底有几种模型格式?

https://blog.csdn.net/zjc910997316/article/details/82853791
lm:语言模型
lm和trie可选

基于Freeswitch + Unimrcp + 谷歌ASR 的语音识别的实现

https://blog.csdn.net/chuiyg/article/details/90767769

pocketsphinx

git clone https://github.com/cmusphinx/sphinxbase.git
git clone https://github.com/cmusphinx/pocketsphinx.git
cd sphinxbase
./autogen.sh
make
cd ..
cd pocketsphinx
./autogen.sh
make
src/programs/pocketsphinx_continuous -inmic yes -hmm model/en-us/en-us 
-lm model/en-us/en-us.lm.bin -dict model/en-us/cmudict-en-us.dict
  • simple.jsfg
#JSGF V1.0;
grammar all;
public <all> = turn ( on | off ) the lights;
src/programs/pocketsphinx_continuous -inmic yes -hmm model/en-us/en-us 
-dict model/en-us/cmudict-en-us.dict -jsgf simple.jsfg

DeepPavlov

https://github.com/deepmipt/DeepPavlov
http://www.diegorobot.com
https://github.com/andelf/PyAIML

website, html5

https://github.com/2fps/recorder
https://github.com/2fps/demo
https://github.com/giscafer/street-address-search/tree/40344aab0f0ed0d4b9d4deb72adeaa7a9fbd43d8

HTK

http://htk.eng.cam.ac.uk/download.shtml
https://labrosa.ee.columbia.edu/doc/HTKBook21/node1.html
https://www.zhihu.com/question/65516424
https://www.cnblogs.com/ansersion/p/4155951.html

Common Voice

https://commonvoice.mozilla.org/zh-CN/datasets

Python+TensorFlow机器学习实战, 第9章

lstm

https://blog.csdn.net/yj13811596648/article/details/89499432

hmm

https://github.com/WiseDoge/plume/blob/master/plume/hmm.py
search github: forward_prob model numpy

vosk-api

https://alphacephei.com/vosk/models
http://t.rock-chips.com/forum.php?mod=viewthread&tid=1478

人工智能开发系列(6) 语音命令识别, RK3399ProD

http://t.rock-chips.com/forum.php?mod=viewthread&tid=456

角蜂鸟

https://www.senscape.com.cn/hornedsungem/

How to Make a Simple Tensorflow Speech Recognizer

https://v.youku.com/v_show/id_XMjYzNDUzODc1Ng==.html
https://github.com/pannous/tensorflow-speech-recognition

kaldi

https://www.jianshu.com/p/4e74861b47e9

baidu ai studio, dataset

https://aistudio.baidu.com/aistudio/projectoverview/public/1?tags=23
合成数据集下载:
CMU ARCTIC (en)-李开复实验室: http://festvox.org/cmu_arctic/
LJSpeech (en): 2.6G https://keithito.com/LJ-Speech-Dataset/
thchs30: 清华大学30小时的数据集(中文) 6.4G http://www.openslr.org/18/

deepvoice3, tts

https://github.com/r9y9/deepvoice3_pytorch

tensorflow datasets

https://tensorflow.google.cn/datasets/catalog/ljspeech
(TODO) baidupan search LJSpeech-1.1.tar.bz2
see https://tensorflow.google.cn/datasets/catalog/ljspeech, open github page to get download url
https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio

Mastering Machine Learning with scikit-learn

https://github.com/PacktPublishing/Mastering-Machine-Learning-with-scikit-learn-Second-Edition

Speech Recognition: Free Software and Complete Privacy

https://unix.stackexchange.com/q/256138/16704

基于kaldi的在线中文识别初体验

https://note.abeffect.com/articles/2020/02/10/1581269426654.html
https://note.abeffect.com/articles/2020/02/10/1581269678158.html

Alibaba-MIT-Speech

https://github.com/alibaba/Alibaba-MIT-Speech

超神经

https://hyper.ai/datasets/6792

Android NNAPI

https://github.com/JDAI-CV/DNNLibrary
https://zhuanlan.zhihu.com/p/30926958
rk3399 android例子
http://wiki.friendlyarm.com/wiki/index.php/NanoPi_M4V2/zh
https://github.com/rockchip-linux/tensorflow/tree/master/tensorflow/contrib/lite/java/demo
https://tensorflow.google.cn/lite/guide/android

[经验] 【瑞芯微RK1808计算棒试用体验】搭建Linux(Ubuntu18.04)环境试用RK1808

http://bbs.elecfans.com/jishu_1873423_1_1.html

rknn_toolkit, for PC

http://wiki.t-firefly.com/zh_CN/Core-1808-JD4/npu_rknn_toolkit.html

在ARM板子上把玩Tensorflow Lite

https://blog.csdn.net/computerme/article/details/80345065
https://blog.csdn.net/mhsszm/article/details/80610042

Sipeed Maixシリーズの活用レシピ

https://hrkz.tokyo/sipeed-maix-ideas/
https://github.com/andriyadi/Maix-SpeechRecognizer
https://github.com/Technica-Corporation/Speech_Recognition-Maixduino
https://en.bbs.sipeed.com/t/topic/870
CNN+CTC

如何自制一个超迷你的语音助手

https://zhuanlan.zhihu.com/p/72896282

The Ultimate Guide To Speech Recognition With Python

https://realpython.com/python-speech-recognition/
python语音识别终极指南
https://cloud.tencent.com/developer/article/1109408?fromSource=waitui

  • apiai
  • assemblyai
  • google-cloud-speech
  • pocketsphinx
  • SpeechRecognition
  • watson-developer-cloud
  • wit

语音识别中的CTC算法的基本原理解释

https://www.cnblogs.com/qcloud1001/p/9041218.html
白话CTC(connectionist temporal classification)算法讲解
https://blog.csdn.net/luodongri/article/details/77005948
CTC Algorithm Explained Part 1:Training the Network(CTC算法详解之训练篇)
http://xiaodu.io/ctc-explained

PaddleOCR

https://github.com/PaddlePaddle/PaddleOCR

adafruit, tensorflow lite

https://learn.adafruit.com/tensorflow-lite-for-circuit-playground-bluefruit-quickstart?view=all
https://adafruit.github.io/arduino-board-index/package_adafruit_index.json
https://learn.adafruit.com/tensorflow-lite-for-circuit-playground-bluefruit-quickstart?view=all#micro-speech-demo
https://github.com/adafruit/Adafruit_TFLite
search baidupan, tflite_tensorflow_lite_adafruit
Arduino_TensorFlowLite

esp-sr, WakeNet

https://github.com/espressif/esp-sr/blob/master/wake_word_engine/README_cn.md
https://github.com/espressif/esp-sr/tree/master/wake_word_engine
https://arxiv.org/abs/1703.05390
CRNN+CTC
https://github.com/espressif/esp-sr/tree/master/speech_command_recognition
我以前猜测ESP32用的算法是LSTM+CTC,不过根据现在官方的说法,应该是CRNN+CTC。
当然这个说法也是猜测,不排除它的最新版用的是更先进的算法(参考:
https://github.com/espressif/esp-sr/tree/master/speech_command_recognition )说起CRNN+CTC,网上比较普遍的说法是一种OCR文字识别技术,另一个值得注意的地方是,
官方提到的CRNN原始论文:
https://arxiv.org/abs/1703.05390
,(参考:
https://github.com/espressif/esp-sr/blob/master/wake_word_engine/README_cn.md
)其实就是我之前说的ML-KWS是一样的,所以可以得到这样的结论,ESP32的WakeNet旧版本(闭源)和ARM的ML-KWS(开源)是同源的(CRNN),
MultiNet是加上CTC版本(CRNN+CTC),而WakeNet新版本(闭源)则基于Dilated CNN,
ESP32的算法都使用了MFCC

国产离线语音识别芯片对比

https://zhuanlan.zhihu.com/p/166078186
技术阶段/识别类型/算法类型/算法名称/企业类型/代表厂商/主处理器
1.0/特定人识别/模型匹配/VQ\DTW/传统型/凌阳/MCU或者通用DSP
2.0/非特定人识别/概率统计/GMM+HMM/传统型/新塘(赛维)、山景、九芯、ICRoute、唯创/MCU或者通用DSP
3.0/非特定人识别/辨别器分类、深度神经网络/DNN、RNN、CNN+HMM/互联网型|纯芯片型/讯飞、思必驰、云知声、士兰微(阿里、百度、互问、华镇)|探境、知存、启英、清微、人麦、国芯

Sparkfun Edge, TinyML

https://github.com/sparkfun/Tensorflow_AIOT2019
嵌入式下的深度学习 Sparkfun Edge with TensorFlow(一)Hello World
https://www.cnblogs.com/guangnianxd/p/12542184.html
Arduino BSP
https://github.com/sparkfun/Arduino_Boards/blob/master/IDE_Board_Manager/package_sparkfun_index.json
https://github.com/sparkfun/SparkFun_Edge
https://learn.sparkfun.com/tutorials/using-sparkfun-edge-board-with-ambiq-apollo3-sdk
Arduino IDE, magic wand
https://learn.sparkfun.com/tutorials/programming-the-sparkfun-edge-with-arduino

speex

低码率音频编码参考设计
http://blog.sina.com.cn/s/blog_4680937f0102ycic.html
https://github.com/xiph/opus

Intelligent-speech-robot

https://github.com/1158114251/-Intelligent-speech-robot
https://mc.dfrobot.com.cn/thread-25649-1-1.html

SmartSpeaker, stm32f407

一个基于云端语音识别的智能控制设备,类似于天猫精灵,小爱同学。采用的芯片为stm32f407,wm8978,esp8266。
https://github.com/lovelyterry/SmartSpeaker

uSpeech, µSpeech, arduino, stm32

https://github.com/arjo129/uSpeech
https://arjo129.wordpress.com/experiments/µspeech/
https://hsel.co.uk/2016/01/06/stm32f0-uspeech-port/
https://github.com/pyrohaz/STM32F0-uSpeechPort

《深度实践OCR:基于深度学习的文字识别》 随书代码

CRNN+CTC文字识别
https://github.com/ocrbook/ocrinaction

TinyML-ESP32

https://github.com/HollowMan6/TinyML-ESP32
https://github.com/tanakamasayuki/Arduino_TensorFlowLite_ESP32

61凌阳单片机

https://github.com/super-1943/MCU/tree/master/sunplus
https://github.com/weimingtom/MCU/tree/master/sunplus

一些常用的语音特征提取算法

https://www.cnblogs.com/LXP-Never/p/11725378.html

(NOT GOOD, only for code reading on pudn web page) pudn, biguo100

语音识别程序,STC12C5620AD单片机,利用DFT算法
http://www.pudn.com/Download/item/id/1988530.html
http://www.biguo100.com/news/33409.html
基于MSP430单片机,workbench环境,LPCC算法,实现简单语音识别
http://www.biguo100.com/news/9782.html
http://www.pudn.com/Download/item/id/830873.html

search phoneme speech

【安信可ESP32语音开发板专题①】ESP32-A1S音频开发板之离线语音识别控制LED灯

https://blog.csdn.net/Boantong_/article/details/104457259
https://docs.ai-thinker.com/esp32
https://docs.ai-thinker.com/esp32-audio-kit
https://github.com/donny681/esp-adf/tree/master/ai-examples
https://github.com/Ai-Thinker-Open/Ai-Thinker-Open_ESP32-A1S_ASR_SDK/tree/master/examples/Smart_home_scene_AI

VAD_campare

https://github.com/mengsaisi/VAD_campare

腾讯云语音, 腾讯云叮当语音识别ASR平台

https://dingdang.qq.com/doc/page/285
https://www.it610.com/article/1288354813658079232.htm
https://dingdang.qq.com/doc.html?dir=/doc/tvs/cloud/api.html

Xunfei (iflytek) WebAPI v2

关于讯飞的AIUI对接,如果使用场景不是安卓,而是某些单片机或者arm linux之类,建议最好用WebAPI V2的接口去对接,
这样就可以绕过dll和so的兼容问题(官方只适配了x86,除非用的是安卓)。不过讯飞的WebAPI有点诡异,如果你不把应用发布成正式版,
是看不到兜底设置的机器人回答结果(例如图灵机器人),原因是讯飞不允许在测试环境下使用正式环境的设置
(也就是说,默认情况下是不添加兜底设置的),除非你在scene参数后面加_box后缀,例如这样:
(当然你也可以通过审核弄成发布,这样就不需要那么麻烦了)。另一个注意事情是不要开启白名单,
否则也不会返回正确的聊天回答
https://github.com/IflytekAIUI/DemoCode/blob/master/webapi_v2/java/WebaiuiDemo.java
https://console.xfyun.cn/app/myapp
https://console.xfyun.cn/services/iat

腾讯云语音识别

https://cloud.tencent.com/document/product/1093
https://cloud.tencent.com/document/product/1093/35646
https://cloud.tencent.com/document/product/1093/37308
https://cloud.tencent.com/document/product/1093/35735
https://cloud.tencent.com/document/sdk/Java
https://github.com/TencentCloud/tencentcloud-sdk-java

VGGVox models for speaker identification and verification

https://github.com/a-nagrani/VGGVox
https://blog.csdn.net/weixin_41738734/article/details/86109333

说话人识别

端到端语音识别时代来临:网易杭州研究院的智能语音探索之路

https://cloud.tencent.com/developer/news/491629

应用、算法、芯片,“三位一体”浅析语音识别

http://news.eeworld.com.cn/xfdz/article_2017101874336_2.html

ArduinoTensorFlowLiteTutorials

https://github.com/arduino/ArduinoTensorFlowLiteTutorials

SpeechCmdRecognition

https://github.com/douglas125/SpeechCmdRecognition

Arduino Portenta H7

https://github.com/hpssjellis/my-examples-for-the-arduino-portentaH7

TTGO_T_Watch_Baidu_Rec, T-Watch

用 TTGO_T_Watch 手表做的百度语音识别终端
TTGO_T_Watch 主板自带有8M PSRAM, 扩展板有多种,有一种扩展板集成了INMP441 I2S 麦克风录入芯片, 可以处理语音.
声音监听器。声音监听器。监听周围的声音,并识别成文字。识别的文字经过配置可以转发到其它设备,
如树莓派,分发给其它设备联动。 每次识别最长10秒录音并识别。平均一次录音文字识别时间1-10秒不等
https://github.com/lixy123/TTGO_T_Watch_Baidu_Rec
https://github.com/Xinyuan-LilyGO/TTGO_TWatch_Library

《Tensorflow入门与实战》, 第六章《循环神经网络》,6.4《用LSTM+CTC实现语音识别》

https://github.com/thewintersun/tensorflowbook/blob/master/Chapter6/asr_lstm_ctc/asr_lstm_ctc.py
search baidupan, 源代码_TensorFlow入门与实战.zip
https://www.ituring.com.cn/book/2398
语音识别(LSTM+CTC)
https://www.cnblogs.com/followees/p/10422809.html
FundamentalsOfAI_book_code
https://github.com/koryako/FundamentalsOfAI_book_code
定义一个向前计算的LSTM单元
https://github.com/search?q=定义一个向前计算的LSTM单元,40个隐藏单元&type=code
https://github.com/luvensaitory/project
https://github.com/koryako/FundamentalsOfAI_book_code

mxnet speech_recognition踩坑记

https://blog.csdn.net/zhqh100/article/details/103887097
https://github.com/baidu-research/warp-ctc/blob/master/README.zh_cn.md
https://github.com/apache/incubator-mxnet/tree/v1.7.x/example/speech_recognition
https://github.com/samsungsds-rnd/deepspeech.mxnet/tree/master/Libri_sample
https://github.com/baidu-research/ba-dls-deepspeech

LAS-asr

https://github.com/ChaosCY/LAS-asr
https://github.com/thomasschmied/Speech_Recognition_with_Tensorflow

mlpack LSTM

https://github.com/mlpack/examples/blob/master/lstm_stock_prediction/lstm_stock_prediction.cpp

Deep Audio-Visual Speech Recognition

https://www.cnblogs.com/tangbinchn/p/12809360.html
计算机视觉方向简介 | 唇语识别技术
https://zhuanlan.zhihu.com/p/48670591

LSTM Speech Recognition实战

https://blog.csdn.net/antkillerfarm/article/details/84232764

KWS, Keyword Spotting in Noise Using MFCC and LSTM Networks, matlab

https://www.mathworks.com/help/audio/examples/keyword-spotting-in-noise-using-mfcc-and-lstm-networks.html

esp32_kws

https://github.com/42io/esp32_kws

KWS-for-XMC

https://github.com/Infineon/KWS-for-XMC

基于kaldi训练唤醒词模型的一种方法

https://blog.csdn.net/cj1989111/article/details/88017908
https://github.com/xiangxyq/kaldi/tree/master/egs/wakeup_words
https://github.com/xiangxyq/3gpp_vad

语音识别系列4--语音识别CTC之模型训练源码解析

https://blog.csdn.net/u012361418/article/details/90313249

基于DTW的孤立词语音识别系统

https://blog.csdn.net/king_audio_video/article/details/90113627

tensorflow-android-speech-kws

https://github.com/shichaog/tensorflow-android-speech-kws

Deep Learning with Applications Using Python

Python深度学习实战,基于TensorFlow和Keras的聊天机器人以及人脸、物体和语音识别
https://github.com/Apress/Deep-Learning-Apps-Using-Python/blob/master/Chapter11_Speech%20to%20text%20and%20vice%20versa
https://github.com/NavinManaswi/Book-Deep-Learning-Applications-with-Applications-Using-Python

基于TensorFlow,人声识别如何在端上实现?

https://developer.aliyun.com/article/592687
https://github.com/weedwind/MFCC
https://github.com/weedwind/CTC-speech-recognition
search baidupan, weedwind_MFCC-master.zip

kaldi, cvte

https://gitee.com/yangmiao123/SpeechRecognition
search baidupan, cvte.zip

Edison - Keyword Spotting on Microcontroller

https://hütter.ch/posts/edison-kws-on-mcu/
(???) STM32L4, microphone
https://github.com/noah95/edison
ref
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/speech_commands/models.py
(??? seem IMP) KWS mcu, mfcc
https://github.com/majianjia/nnom/tree/master/examples/keyword_spotting

语音信号处理实验教程(MATLAB源代码)

https://github.com/veenveenveen/SpeechSignalProcessingCourse

基于Kaldi(DNN)的小词汇量汉语语音识别平台搭建

https://veenveenveen.github.io/article/technology/ASR/ASR_Kaldi_DNN_Chinese.html#kaldi-简介
https://github.com/veenveenveen/ASR_Kaldi_DNN_Chinese
基于HTK开源框架的汉语语音识别 (GMM-HMM)
https://veenveenveen.github.io/article/technology/ASR/ASR_HTK_Chinese.html
https://github.com/veenveenveen/chinese_voice

STM32和百度云-天工最新物联网开发板,B-L475E-IOT01A探索套件操作说明

https://blog.csdn.net/annic9/article/details/80434389
https://cloud.baidu.com/doc/IOT/s/7jwvy87a2
STM32L475, esp8266, esp32
https://github.com/baidu/baidu-iot-samples/tree/master/STM32/I-CUBE-BAIDU

Python实现基于BIC的语音对话分割(一)

https://blog.csdn.net/wblgers1234/article/details/75896605
https://github.com/zimuyanzi/BIC

kaldi编译

search baidupan, kaldi_20200917_pre.tar.gz, work_kaldi
经历了两天时间,终于用虚拟机x86 debian(我用的编译环境是raspberry pi x86 desktop 2020年2月版镜像,32位debian)
编译完kaldi。需要修改代码,有些地方会出问题,例如这里:
jcsilva/docker-kaldi-android#11
简单来说是三步走:
(1)第一步执行tools下的make和make openblas,安装第三方库。
(2)第二步执行src下的configure和make,编译执行文件
(3)第三步执行yesno下的run.sh,测试执行文件是否正常
最后会看到一个WER是0的零错误报告,具体参考这篇中的《解码和测试》:
https://www.jianshu.com/p/09deba57f339

How to eat Pytorch in 20 days ?

https://github.com/lyhue1991/eat_pytorch_in_20_days

kaldi-ctc

https://github.com/lingochamp/kaldi-ctc
https://zhuanlan.zhihu.com/p/23177950

社区分享 | 从零开始学习 TinyML(一)

https://blog.csdn.net/wfing/article/details/106995562

ardu-badge, Arduino_TensorFlowLite

https://www.ardu-badge.com/Arduino_TensorFlowLite/zip
https://community.platformio.org/t/arduino-nano-33-ble-tensorflow-lite-undefined-references/14387/2
same as adafruit, Arduino_TensorFlowLite

ESP32 支持运行 TensorFlow Lite Micro

https://zhuanlan.zhihu.com/p/228593457

Arduino 机器学习实战入门(上)

https://blog.csdn.net/weixin_44507034/article/details/105602112
Arduino机器学习实战入门(下)
https://blog.csdn.net/weixin_44507034/article/details/105613754
https://medium.com/tensorflow/how-to-get-started-with-machine-learning-on-arduino-7daf95b4157
https://cloud.tencent.com/developer/article/1534288

Voiceprint-Recognition

AliosThings 嵌入式声纹识别项目
https://github.com/SunYanCN/Voiceprint-Recognition
alibaba/AliOS-Things#976

aiBook

https://github.com/xiyanxiyan10/aiBook
语音合成
https://github.com/ibab/tensorflow-wavenet
https://github.com/tomlepaine/fast-wavenet
语音识别
https://github.com/buriburisuri/speech-to-text-wavenet
https://github.com/pannous/tensorflow-speech-recognition

《人工智能》清华大学版, AIDemo

http://aibook.cslt.org/slides/index.html
http://aibook.cslt.org/aidemo/demo.html
第三章: 倾听你的声音, see
https://github.com/jcsilva/deep-clustering
search baidupan, aibook_speech.zip

search speechSeparation

https://github.com/pchao6/LSTM_PIT_Speech_Separation
https://github.com/ododoyo/DANet