https://github.com/pannous/tensorflow-speech-recognition
https://github.com/pannous/caffe-speech-recognition
https://voice.mozilla.org/zh-CN/datasets
https://www.waveshare.net/wiki/Sound_Sensor
https://github.com/MyDuerOS/DuerOS-Python-Client
https://developer.baidu.com/forum/topic/show/244631?pageNo=1
https://github.com/Bloom-Agritech/ruderalis-firmware/blob/master/lib/DHT22Gen3_RK/src/DHT22Gen3_RK.cpp
https://github.com/oivoii/nrf-tensorflow
https://github.com/Theano/Theano
https://arxiv.org/abs/1703.05390
https://github.com/UT2UH/ML-KWS-for-ESP32
https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Deployment
https://github.com/zhouilu/ARM_TensorFlow/tree/master/Pretrained_models/Basic_LSTM
https://blog.csdn.net/xj853663557/article/details/83784452
https://github.com/kasiim/ESP-EYE-speaker-verification
https://github.com/RealCorebb/ESP32-A1s-Audio-Kit
https://www.adafruit.com/product/1788
VS1053 CODEC + MICROSD BREAKOUT - MP3/WAV/MIDI/OGG PLAY + RECORD
https://www.adafruit.com/product/1381
https://learn.adafruit.com/adafruit-music-maker-shield-vs1053-mp3-wav-wave-ogg-vorbis-player
https://github.com/adafruit/Adafruit_VS1053_Library
https://github.com/adafruit/Adafruit_VS1053_Library/blob/master/examples/record_ogg/record_ogg.ino
https://cn.dl.sipeed.com/MAIX/HDK/Sipeed-R6%2b1_MicArray/Specifications
MSM261S4030H0
https://cn.dl.sipeed.com/MAIX/HDK/Sipeed-Maix-Bit/Maix-Bit%20V2.0(with%20MEMS%20microphone)
MSM261S4030H0
SPM1423 (MEMS PDM Microphone)
https://docs.m5stack.com/#/zh_CN/atom/atomecho
https://github.com/m5stack/M5-ProductExampleCodes/blob/master/Core/Atom/AtomEcho/Arduino/Factory_Test/Factory_Test.ino
https://docs.m5stack.com/#/en/core/m5stickc
https://docs.m5stack.com/#/zh_CN/core/m5stickc
https://github.com/m5stack/M5StickC/blob/master/examples/Basics/FactoryTest/FactoryTest.ino
https://blog.csdn.net/weixin_39671078/article/details/82414208
Fundamentals of Speech Recognition
search baidupan, 语音识别基本原理
https://github.com/shaharpit809/Speech-Denoising-using-DNN-CNN-and-RNN
http://www.eepw.com.cn/article/201801/375072.htm
https://github.com/dingminglu/DeepBrain-ESP32-Audio-SDK
https://github.com/peter-360/esp32_chuwugui
https://github.com/Dod-o/Statistical-Learning-Method_Code
ML-KWS todo download
https://github.com/kacperlukawski/speech-recognition.git
https://blog.csdn.net/zmdsjtu/article/details/52816692
https://blog.csdn.net/qq_33835307/article/details/81006954
https://github.com/li900309/PingYeAudio/blob/master/app/src/main/java/com/cn/cae/MainActivity.java
http://wiki.t-firefly.com/zh_CN/ROC-RK3328-PC/module_mic.html
CAE:Circular Array Enhancement
https://www.xfyun.cn/doc/solutions/hardwareUniversal/CAE-Android-SDK.html#功能简介
https://arxiv.org/abs/1711.07128v3
https://arxiv.org/pdf/1711.07128.pdf
https://www.cnblogs.com/NickQ/p/8541156.html
https://www.cnblogs.com/NickQ/p/8540487.html
arm_cfft_radix4_instance_f32
https://www.cnblogs.com/mengfanrong/p/5168805.html
matlab学习笔记2:搭建简易的串口,并将数据保存至csv
https://blog.csdn.net/weixin_38494129/article/details/86469261
https://www.hackster.io/dmitrywat/offline-speech-recognition-on-raspberry-pi-4-with-respeaker-c537e7
https://www.seeedstudio.com/blog/2020/01/23/offline-speech-recognition-on-raspberry-pi-4-with-respeaker/
https://github.com/mozilla/DeepSpeech-examples
https://github.com/mozilla/DeepSpeech/releases
https://www.helplib.cn/fansisi/deepspeech-pytorch
https://github.com/jasperproject/jasper-client
https://github.com/jasperproject/jasper-client/blob/master/client/stt.py
https://github.com/Dod-o/Statistical-Learning-Method_Code
https://www.kfrlib.com
https://github.com/kfrlib/kfr
https://github.com/BA17-loma-1/Audio_Signal_Processing_Toolbox
https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz
https://github.com/synesthesiam/rhasspy
https://github.com/Picovoice/porcupine
https://github.com/MycroftAI/mycroft-precise
【IOT】轻量级语音识别框架汇总
https://blog.csdn.net/wangbotao1990/article/details/98743437
https://github.com/espressif/esp-va-sdk
http://www.elecfans.com/d/646154.html
rokid, 若琪智能音箱
https://developer.rokid.com/#/doc
在开源社区,除了snowboy(不开源)这个比较出名的唤醒词引擎,其实还有好几个,
这些唤醒词引擎可以在一个叫rhasspy的离线语音助手开源项目中看到:
第一个是porcupine,是加拿大一家公司Picovoice的项目(不开源),另一个是mycroft-precise(开源),公司在美国。
至于snowboy(不开源),我以前说过了被百度收购了。而至于pocketsphinx(开源)就不说了(太古老了)。
其实如果不考虑设备的可移动性,甚至deepspeech(开源)、Julius(开源)、kaldi(开源)
这些重量级的项目也有可能可以用来做成嵌入式产品,
例如用于树莓派,所以不仅仅只是这几个,应该数量不少
前面我提到一篇文章介绍这方面的经验:
https://www.seeedstudio.com/blog/2020/01/23/offline-speech-recognition-on-raspberry-pi-4-with-respeaker/
另外还有一篇,用的是0.6.0版本:
https://dev.webonomic.nl/trying-out-deepspeech-on-a-raspberry-pi-4
(一)关于硬件:官方明确支持树莓派3和树莓派4,所以大部分arm linux开发板都应该支持。但性能有差距,后面我会说。
(二)关于软件:建议使用最新的raspbian系统ROM,我用的是2020-05-27 Buster,非full版的ROM,使用自带的python 3.7和pip3安装deepspeech==0.6.0。
旧版的python 3和python 2应该无法安装0.6.0。至于为什么pip3会搜索到一个deepspeech-tflite的软件包,那个可以忽略不管。另外需要用到sudo,
否则无法把deepspeech命令添加到PATH
(三)关于deepspeech命令行参数:推荐使用tflite后缀(tensorflow lite)的模型数据文件(--model参数):output_graph.tflite。
其他还有两个模型数据文件,一个是pb后缀(protobuf的缩写),另一个是pbmm后缀(protobuf的mmap版),我没有测试,
理论上tflite版占用内存要小,不容易崩溃。另外两个参数trie(字典树)和lm(语言模型),据说这两个是可选参数,尤其是lm文件的体积非常大,
但我没有测试,最好加上(具体用法参考deepspeech帮助)
(四)关于软件包依赖。deepspeech据说是基于tensorflow的,但实际安装python包时没有依赖于tensorflow,我猜测这里有玄机
(五)关于性能。如果用树莓派3b运行,速度是10秒左右(据说树莓派4可以提升到2秒左右),具体结果如下
(分别是听写结果、听写时间、wav持续时间):
- experience proof of, 12.486s, 1.975s
- why should one halt on the way, 14.917s, 2.735s
- your paris efficient i said, 11.295s, 2.590s
所以这个语音识别引擎有两个硬伤:一是官方模型数据只支持英语。二是比较重量级,需要性能比较好的硬件才能达到实时听写的效果
https://blog.csdn.net/zjc910997316/article/details/82853791
lm:语言模型
lm和trie可选
https://blog.csdn.net/chuiyg/article/details/90767769
git clone https://github.com/cmusphinx/sphinxbase.git
git clone https://github.com/cmusphinx/pocketsphinx.git
cd sphinxbase
./autogen.sh
make
cd ..
cd pocketsphinx
./autogen.sh
make
src/programs/pocketsphinx_continuous -inmic yes -hmm model/en-us/en-us
-lm model/en-us/en-us.lm.bin -dict model/en-us/cmudict-en-us.dict
- simple.jsfg
#JSGF V1.0;
grammar all;
public <all> = turn ( on | off ) the lights;
src/programs/pocketsphinx_continuous -inmic yes -hmm model/en-us/en-us
-dict model/en-us/cmudict-en-us.dict -jsgf simple.jsfg
https://github.com/deepmipt/DeepPavlov
http://www.diegorobot.com
https://github.com/andelf/PyAIML
https://github.com/2fps/recorder
https://github.com/2fps/demo
https://github.com/giscafer/street-address-search/tree/40344aab0f0ed0d4b9d4deb72adeaa7a9fbd43d8
http://htk.eng.cam.ac.uk/download.shtml
https://labrosa.ee.columbia.edu/doc/HTKBook21/node1.html
https://www.zhihu.com/question/65516424
https://www.cnblogs.com/ansersion/p/4155951.html
https://commonvoice.mozilla.org/zh-CN/datasets
-
tensorflow_speech_recognition_demo, 9.2 听懂数字
https://github.com/llSourcell/tensorflow_speech_recognition_demo
英文数字语音识别
https://blog.csdn.net/weixin_44345862/article/details/86887448
https://github.com/pannous/tensorflow-speech-recognition/blob/master/number_classifier_tflearn.py -
spoken_numbers_pcm dataset
https://github.com/pannous/tensorflow-speech-recognition
see /spoken_numbers_pcm.tar -
ChineseTrain, 9.3 听懂中文
https://github.com/illool/TensorFlow/tree/master/ChineseTrain
http://www.openslr.org/18/
https://github.com/18515350435/TensorFlowTest/blob/master/TensorFlow/LSTM构建语音分类模型/12声音分类.py
很多其他例子
https://github.com/weimingtom/TensorFlowTest/tree/master/TensorFlow
https://github.com/illool/TensorFlow
https://github.com/XqFeng-Josie/Tensorflow -
Tacotron, 9.4 语音合成
https://github.com/Kyubyong/tacotron
https://blog.csdn.net/yj13811596648/article/details/89499432
https://github.com/WiseDoge/plume/blob/master/plume/hmm.py
search github: forward_prob model numpy
https://alphacephei.com/vosk/models
http://t.rock-chips.com/forum.php?mod=viewthread&tid=1478
http://t.rock-chips.com/forum.php?mod=viewthread&tid=456
https://www.senscape.com.cn/hornedsungem/
https://v.youku.com/v_show/id_XMjYzNDUzODc1Ng==.html
https://github.com/pannous/tensorflow-speech-recognition
https://www.jianshu.com/p/4e74861b47e9
https://aistudio.baidu.com/aistudio/projectoverview/public/1?tags=23
合成数据集下载:
CMU ARCTIC (en)-李开复实验室: http://festvox.org/cmu_arctic/
LJSpeech (en): 2.6G https://keithito.com/LJ-Speech-Dataset/
thchs30: 清华大学30小时的数据集(中文) 6.4G http://www.openslr.org/18/
https://github.com/r9y9/deepvoice3_pytorch
https://tensorflow.google.cn/datasets/catalog/ljspeech
(TODO) baidupan search LJSpeech-1.1.tar.bz2
see https://tensorflow.google.cn/datasets/catalog/ljspeech, open github page to get download url
https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio
https://github.com/PacktPublishing/Mastering-Machine-Learning-with-scikit-learn-Second-Edition
https://unix.stackexchange.com/q/256138/16704
- http://cmusphinx.sourceforge.net/
- http://www.kiecza.net/daniel/linux/
- http://www.speech.cs.cmu.edu/comp.speech/Section6/Recognition/ears.html
- http://julius.osdn.jp/
- http://kaldi.sourceforge.net/
- https://github.com/alumae/kaldi-gstreamer-server
- https://web.archive.org/web/19990508201353/http://biz.yahoo.com/bw/990426/ny_ibm_1.html
- http://nico.nikkostrom.com/
- http://freespeech.sourceforge.net/
- http://www-i6.informatik.rwth-aachen.de/rwth-asr/
- http://shout-toolkit.sourceforge.net/
- http://voxhub.io/silvius
- http://simon-listens.org/index.php?id=122
- http://xvoice.sourceforge.net/
- https://appdb.winehq.org/objectManager.php?sClass=application&iId=2077
- https://sourceforge.net/projects/natlink/
- https://pypi.python.org/pypi/dragonfly
- https://github.com/TristenHayfield/damselfly
- https://github.com/DragonComputer/Dragonfire
https://note.abeffect.com/articles/2020/02/10/1581269426654.html
https://note.abeffect.com/articles/2020/02/10/1581269678158.html
https://github.com/alibaba/Alibaba-MIT-Speech
https://hyper.ai/datasets/6792
https://github.com/JDAI-CV/DNNLibrary
https://zhuanlan.zhihu.com/p/30926958
rk3399 android例子
http://wiki.friendlyarm.com/wiki/index.php/NanoPi_M4V2/zh
https://github.com/rockchip-linux/tensorflow/tree/master/tensorflow/contrib/lite/java/demo
https://tensorflow.google.cn/lite/guide/android
http://bbs.elecfans.com/jishu_1873423_1_1.html
http://wiki.t-firefly.com/zh_CN/Core-1808-JD4/npu_rknn_toolkit.html
https://blog.csdn.net/computerme/article/details/80345065
https://blog.csdn.net/mhsszm/article/details/80610042
https://hrkz.tokyo/sipeed-maix-ideas/
https://github.com/andriyadi/Maix-SpeechRecognizer
https://github.com/Technica-Corporation/Speech_Recognition-Maixduino
https://en.bbs.sipeed.com/t/topic/870
CNN+CTC
https://zhuanlan.zhihu.com/p/72896282
https://realpython.com/python-speech-recognition/
python语音识别终极指南
https://cloud.tencent.com/developer/article/1109408?fromSource=waitui
- apiai
- assemblyai
- google-cloud-speech
- pocketsphinx
- SpeechRecognition
- watson-developer-cloud
- wit
https://www.cnblogs.com/qcloud1001/p/9041218.html
白话CTC(connectionist temporal classification)算法讲解
https://blog.csdn.net/luodongri/article/details/77005948
CTC Algorithm Explained Part 1:Training the Network(CTC算法详解之训练篇)
http://xiaodu.io/ctc-explained
https://github.com/PaddlePaddle/PaddleOCR
https://learn.adafruit.com/tensorflow-lite-for-circuit-playground-bluefruit-quickstart?view=all
https://adafruit.github.io/arduino-board-index/package_adafruit_index.json
https://learn.adafruit.com/tensorflow-lite-for-circuit-playground-bluefruit-quickstart?view=all#micro-speech-demo
https://github.com/adafruit/Adafruit_TFLite
search baidupan, tflite_tensorflow_lite_adafruit
Arduino_TensorFlowLite
https://github.com/espressif/esp-sr/blob/master/wake_word_engine/README_cn.md
https://github.com/espressif/esp-sr/tree/master/wake_word_engine
https://arxiv.org/abs/1703.05390
CRNN+CTC
https://github.com/espressif/esp-sr/tree/master/speech_command_recognition
我以前猜测ESP32用的算法是LSTM+CTC,不过根据现在官方的说法,应该是CRNN+CTC。
当然这个说法也是猜测,不排除它的最新版用的是更先进的算法(参考:
https://github.com/espressif/esp-sr/tree/master/speech_command_recognition
)说起CRNN+CTC,网上比较普遍的说法是一种OCR文字识别技术,另一个值得注意的地方是,
官方提到的CRNN原始论文:
https://arxiv.org/abs/1703.05390
,(参考:
https://github.com/espressif/esp-sr/blob/master/wake_word_engine/README_cn.md
)其实就是我之前说的ML-KWS是一样的,所以可以得到这样的结论,ESP32的WakeNet旧版本(闭源)和ARM的ML-KWS(开源)是同源的(CRNN),
MultiNet是加上CTC版本(CRNN+CTC),而WakeNet新版本(闭源)则基于Dilated CNN,
ESP32的算法都使用了MFCC
https://zhuanlan.zhihu.com/p/166078186
技术阶段/识别类型/算法类型/算法名称/企业类型/代表厂商/主处理器
1.0/特定人识别/模型匹配/VQ\DTW/传统型/凌阳/MCU或者通用DSP
2.0/非特定人识别/概率统计/GMM+HMM/传统型/新塘(赛维)、山景、九芯、ICRoute、唯创/MCU或者通用DSP
3.0/非特定人识别/辨别器分类、深度神经网络/DNN、RNN、CNN+HMM/互联网型|纯芯片型/讯飞、思必驰、云知声、士兰微(阿里、百度、互问、华镇)|探境、知存、启英、清微、人麦、国芯
https://github.com/sparkfun/Tensorflow_AIOT2019
嵌入式下的深度学习 Sparkfun Edge with TensorFlow(一)Hello World
https://www.cnblogs.com/guangnianxd/p/12542184.html
Arduino BSP
https://github.com/sparkfun/Arduino_Boards/blob/master/IDE_Board_Manager/package_sparkfun_index.json
https://github.com/sparkfun/SparkFun_Edge
https://learn.sparkfun.com/tutorials/using-sparkfun-edge-board-with-ambiq-apollo3-sdk
Arduino IDE, magic wand
https://learn.sparkfun.com/tutorials/programming-the-sparkfun-edge-with-arduino
低码率音频编码参考设计
http://blog.sina.com.cn/s/blog_4680937f0102ycic.html
https://github.com/xiph/opus
https://github.com/1158114251/-Intelligent-speech-robot
https://mc.dfrobot.com.cn/thread-25649-1-1.html
一个基于云端语音识别的智能控制设备,类似于天猫精灵,小爱同学。采用的芯片为stm32f407,wm8978,esp8266。
https://github.com/lovelyterry/SmartSpeaker
https://github.com/arjo129/uSpeech
https://arjo129.wordpress.com/experiments/µspeech/
https://hsel.co.uk/2016/01/06/stm32f0-uspeech-port/
https://github.com/pyrohaz/STM32F0-uSpeechPort
CRNN+CTC文字识别
https://github.com/ocrbook/ocrinaction
https://github.com/HollowMan6/TinyML-ESP32
https://github.com/tanakamasayuki/Arduino_TensorFlowLite_ESP32
https://github.com/super-1943/MCU/tree/master/sunplus
https://github.com/weimingtom/MCU/tree/master/sunplus
https://www.cnblogs.com/LXP-Never/p/11725378.html
语音识别程序,STC12C5620AD单片机,利用DFT算法
http://www.pudn.com/Download/item/id/1988530.html
http://www.biguo100.com/news/33409.html
基于MSP430单片机,workbench环境,LPCC算法,实现简单语音识别
http://www.biguo100.com/news/9782.html
http://www.pudn.com/Download/item/id/830873.html
https://blog.csdn.net/Boantong_/article/details/104457259
https://docs.ai-thinker.com/esp32
https://docs.ai-thinker.com/esp32-audio-kit
https://github.com/donny681/esp-adf/tree/master/ai-examples
https://github.com/Ai-Thinker-Open/Ai-Thinker-Open_ESP32-A1S_ASR_SDK/tree/master/examples/Smart_home_scene_AI
https://github.com/mengsaisi/VAD_campare
https://dingdang.qq.com/doc/page/285
https://www.it610.com/article/1288354813658079232.htm
https://dingdang.qq.com/doc.html?dir=/doc/tvs/cloud/api.html
关于讯飞的AIUI对接,如果使用场景不是安卓,而是某些单片机或者arm linux之类,建议最好用WebAPI V2的接口去对接,
这样就可以绕过dll和so的兼容问题(官方只适配了x86,除非用的是安卓)。不过讯飞的WebAPI有点诡异,如果你不把应用发布成正式版,
是看不到兜底设置的机器人回答结果(例如图灵机器人),原因是讯飞不允许在测试环境下使用正式环境的设置
(也就是说,默认情况下是不添加兜底设置的),除非你在scene参数后面加_box后缀,例如这样:
(当然你也可以通过审核弄成发布,这样就不需要那么麻烦了)。另一个注意事情是不要开启白名单,
否则也不会返回正确的聊天回答
https://github.com/IflytekAIUI/DemoCode/blob/master/webapi_v2/java/WebaiuiDemo.java
https://console.xfyun.cn/app/myapp
https://console.xfyun.cn/services/iat
https://cloud.tencent.com/document/product/1093
https://cloud.tencent.com/document/product/1093/35646
https://cloud.tencent.com/document/product/1093/37308
https://cloud.tencent.com/document/product/1093/35735
https://cloud.tencent.com/document/sdk/Java
https://github.com/TencentCloud/tencentcloud-sdk-java
https://github.com/a-nagrani/VGGVox
https://blog.csdn.net/weixin_41738734/article/details/86109333
https://cloud.tencent.com/developer/news/491629
http://news.eeworld.com.cn/xfdz/article_2017101874336_2.html
https://github.com/arduino/ArduinoTensorFlowLiteTutorials
https://github.com/douglas125/SpeechCmdRecognition
https://github.com/hpssjellis/my-examples-for-the-arduino-portentaH7
用 TTGO_T_Watch 手表做的百度语音识别终端
TTGO_T_Watch 主板自带有8M PSRAM, 扩展板有多种,有一种扩展板集成了INMP441 I2S 麦克风录入芯片, 可以处理语音.
声音监听器。声音监听器。监听周围的声音,并识别成文字。识别的文字经过配置可以转发到其它设备,
如树莓派,分发给其它设备联动。 每次识别最长10秒录音并识别。平均一次录音文字识别时间1-10秒不等
https://github.com/lixy123/TTGO_T_Watch_Baidu_Rec
https://github.com/Xinyuan-LilyGO/TTGO_TWatch_Library
https://github.com/thewintersun/tensorflowbook/blob/master/Chapter6/asr_lstm_ctc/asr_lstm_ctc.py
search baidupan, 源代码_TensorFlow入门与实战.zip
https://www.ituring.com.cn/book/2398
语音识别(LSTM+CTC)
https://www.cnblogs.com/followees/p/10422809.html
FundamentalsOfAI_book_code
https://github.com/koryako/FundamentalsOfAI_book_code
定义一个向前计算的LSTM单元
https://github.com/search?q=定义一个向前计算的LSTM单元,40个隐藏单元&type=code
https://github.com/luvensaitory/project
https://github.com/koryako/FundamentalsOfAI_book_code
- old:
语音识别(LSTM+CTC)
https://www.cnblogs.com/followees/p/10422809.html
search github, tf.nn.ctc_loss reduce_mean mfcc LSTMCell
https://github.com/igormq/ctc_tensorflow_example
https://github.com/pannous/tensorflow-speech-recognition/blob/master/lstm_ctc_to_chars.py
CTC tensorflow example 代码解析
https://blog.csdn.net/he_wen_jie/article/details/80586345
https://blog.csdn.net/zhqh100/article/details/103887097
https://github.com/baidu-research/warp-ctc/blob/master/README.zh_cn.md
https://github.com/apache/incubator-mxnet/tree/v1.7.x/example/speech_recognition
https://github.com/samsungsds-rnd/deepspeech.mxnet/tree/master/Libri_sample
https://github.com/baidu-research/ba-dls-deepspeech
https://github.com/ChaosCY/LAS-asr
https://github.com/thomasschmied/Speech_Recognition_with_Tensorflow
https://github.com/mlpack/examples/blob/master/lstm_stock_prediction/lstm_stock_prediction.cpp
https://www.cnblogs.com/tangbinchn/p/12809360.html
计算机视觉方向简介 | 唇语识别技术
https://zhuanlan.zhihu.com/p/48670591
https://blog.csdn.net/antkillerfarm/article/details/84232764
https://github.com/42io/esp32_kws
https://github.com/Infineon/KWS-for-XMC
https://blog.csdn.net/cj1989111/article/details/88017908
https://github.com/xiangxyq/kaldi/tree/master/egs/wakeup_words
https://github.com/xiangxyq/3gpp_vad
https://blog.csdn.net/u012361418/article/details/90313249
https://blog.csdn.net/king_audio_video/article/details/90113627
https://github.com/shichaog/tensorflow-android-speech-kws
Python深度学习实战,基于TensorFlow和Keras的聊天机器人以及人脸、物体和语音识别
https://github.com/Apress/Deep-Learning-Apps-Using-Python/blob/master/Chapter11_Speech%20to%20text%20and%20vice%20versa
https://github.com/NavinManaswi/Book-Deep-Learning-Applications-with-Applications-Using-Python
https://developer.aliyun.com/article/592687
https://github.com/weedwind/MFCC
https://github.com/weedwind/CTC-speech-recognition
search baidupan, weedwind_MFCC-master.zip
https://gitee.com/yangmiao123/SpeechRecognition
search baidupan, cvte.zip
https://hütter.ch/posts/edison-kws-on-mcu/
(???) STM32L4, microphone
https://github.com/noah95/edison
ref
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/speech_commands/models.py
(??? seem IMP) KWS mcu, mfcc
https://github.com/majianjia/nnom/tree/master/examples/keyword_spotting
https://github.com/veenveenveen/SpeechSignalProcessingCourse
https://veenveenveen.github.io/article/technology/ASR/ASR_Kaldi_DNN_Chinese.html#kaldi-简介
https://github.com/veenveenveen/ASR_Kaldi_DNN_Chinese
基于HTK开源框架的汉语语音识别 (GMM-HMM)
https://veenveenveen.github.io/article/technology/ASR/ASR_HTK_Chinese.html
https://github.com/veenveenveen/chinese_voice
https://blog.csdn.net/annic9/article/details/80434389
https://cloud.baidu.com/doc/IOT/s/7jwvy87a2
STM32L475, esp8266, esp32
https://github.com/baidu/baidu-iot-samples/tree/master/STM32/I-CUBE-BAIDU
https://blog.csdn.net/wblgers1234/article/details/75896605
https://github.com/zimuyanzi/BIC
search baidupan, kaldi_20200917_pre.tar.gz, work_kaldi
经历了两天时间,终于用虚拟机x86 debian(我用的编译环境是raspberry pi x86 desktop 2020年2月版镜像,32位debian)
编译完kaldi。需要修改代码,有些地方会出问题,例如这里:
jcsilva/docker-kaldi-android#11
简单来说是三步走:
(1)第一步执行tools下的make和make openblas,安装第三方库。
(2)第二步执行src下的configure和make,编译执行文件
(3)第三步执行yesno下的run.sh,测试执行文件是否正常
最后会看到一个WER是0的零错误报告,具体参考这篇中的《解码和测试》:
https://www.jianshu.com/p/09deba57f339
https://github.com/lyhue1991/eat_pytorch_in_20_days
https://github.com/lingochamp/kaldi-ctc
https://zhuanlan.zhihu.com/p/23177950
https://blog.csdn.net/wfing/article/details/106995562
https://www.ardu-badge.com/Arduino_TensorFlowLite/zip
https://community.platformio.org/t/arduino-nano-33-ble-tensorflow-lite-undefined-references/14387/2
same as adafruit, Arduino_TensorFlowLite
https://zhuanlan.zhihu.com/p/228593457
https://blog.csdn.net/weixin_44507034/article/details/105602112
Arduino机器学习实战入门(下)
https://blog.csdn.net/weixin_44507034/article/details/105613754
https://medium.com/tensorflow/how-to-get-started-with-machine-learning-on-arduino-7daf95b4157
https://cloud.tencent.com/developer/article/1534288
AliosThings 嵌入式声纹识别项目
https://github.com/SunYanCN/Voiceprint-Recognition
alibaba/AliOS-Things#976
https://github.com/xiyanxiyan10/aiBook
语音合成
https://github.com/ibab/tensorflow-wavenet
https://github.com/tomlepaine/fast-wavenet
语音识别
https://github.com/buriburisuri/speech-to-text-wavenet
https://github.com/pannous/tensorflow-speech-recognition
http://aibook.cslt.org/slides/index.html
http://aibook.cslt.org/aidemo/demo.html
第三章: 倾听你的声音, see
https://github.com/jcsilva/deep-clustering
search baidupan, aibook_speech.zip
https://github.com/pchao6/LSTM_PIT_Speech_Separation
https://github.com/ododoyo/DANet