Skip to content
pluteski edited this page Apr 23, 2017 · 13 revisions

speech2text.py

Basic use case

$ python speech2text.py -i ~/temp/transcription/2001 -b ~/temp/transcription/ -o=/tmp/transcription/testout/ --verbose

In the above command:

  • input folder is ~/temp/transcription/2001
  • base folder is ~/temp/transcription
  • output folder is ~/tmp/transcription/testout
  • verbose mode is enabled.

Handling filepaths containing spaces

Reading from flash drive:

$ ls /Volumes/Samsung\ USB/
$ python speech2text.py -i /Volumes/Samsung\ USB/AudioJournals_TEST/ICD-BP100\ 2002_PARTIAL/ -b /Volumes/Samsung\ USB/AudioJournals_TEST/ -o=/tmp/transcription/stt_test2

Reading from and writing back to flash drive

$ python speech2text.py -i /Volumes/Samsung\ USB/AudioJournals_TEST/ICD-BP100\ 2002_PARTIAL/ -b /Volumes/Samsung\ USB/AudioJournals_TEST/ -o=/Volumes/Samsung\ USB/AudioJournals_TEST/ibm_stt

Command Options

The -b option allows specifying a prefix of the input filepath that is used to organize the output results.

This functionality allows me to process a subset of files while having the result set organized within a larger context. One of the early challenges I encountered was to process the data in a piecemail selective fashion while maintaining an overall organization of the output to (a) reflect the organization of the input data, which simplifies locating the input file that corresponds to the output file, which is important in case that the audio filenames are not uniquely named, and (b) simplify the later processing steps, such as reprocessing, tallying results, performing file-level analyses.

Example

The following command handles a small subset of files deeply embedded in folder while properly organizing the output to fit into a larger dataset that was processed previously or will be processed subsequently, regardless of whether it is done in large batches, small batches, or one file at a time.

$ python speech2text.py -i /Volumes/Samsung\ USB/AudioJournals_TEST/2016_Partial/Family/ /Volumes/Samsung\ USB/AudioJournals_TEST/ -b /Volumes/Samsung\ USB/AudioJournals_TEST/ibm_stt

The -s option allows specifying the speech-to-text client filepath, overriding the default location.

Following example uses a python executable on the laptop drive instead of on the flash drive where the data resides:

$ python speech2text.py -i /Volumes/Samsung\ USB/AudioJournals_TEST/2016_Partial/Family/ -b /Volumes/Samsung\ USB/AudioJournals_TEST/ -o /Volumes/Samsung\ USB/AudioJournals_TEST/ibm_stt -s ~/code/speech-to-text/speech-to-text-websockets-python/sttClient.py

The -m option specifies a maximum number of items to be processed. This is handy for development, or where you want to avoid overusing a processing allowance.

The -k option instructs not to overwrite previous results. This avoids reprocessing files that have already been transcribed successfully.

The following command does max of 10 items, does not overwrite previous results

$ time python speech2text.py
 -i /Volumes/Samsung\ USB/AudioJournals_TEST/2016_Partial/
 -b /Volumes/Samsung\ USB/AudioJournals_TEST/
 -o /Volumes/Samsung\ USB/AudioJournals_TEST/ibm_stt
 -s ~/code/speech-to-text/speech-to-text-websockets-python/sttClient.py
 -m 10
 -k
Example IBM runs
$ time python speech2text.py \
-i /Volumes/Samsung\ USB/AudioJournals/ICD-BP100\ 2003/ \
-b /Volumes/Samsung\ USB/AudioJournals/ \
-o /Volumes/Samsung\ USB/AudioJournals/ibm_stt \
-s ~/code/speech-to-text/speech-to-text-websockets-python/sttClient.py

$ time python speech2text.py \
-i /Volumes/Samsung\ USB/AudioJournals/2009/ \
-b /Volumes/Samsung\ USB/AudioJournals/ \
-o /Volumes/Samsung\ USB/AudioJournals/ibm_stt \
-s ~/code/speech-to-text/speech-to-text-websockets-python/sttClient.py
Example Google runs
$ time python /Users/mark/code/speech-to-text/speech2text.py \
-i ~/temp/transcription/2001/ \
-b ~/temp/transcription/ \
-o ~/temp/transcription/google_stt \
-s ~/temp/gcloud/python-docs-samples/speech/grpc/transcribe_async.py \
-g -k -m8

$ time python /Users/mark/code/speech-to-text/speech2text.py \
-i /Volumes/Samsung\ USB/AudioJournals/ICD-BP100\ 2003/ \
-b /Volumes/Samsung\ USB/AudioJournals/ \
-o /Volumes/Samsung\ USB/AudioJournals/google_stt \
-s ~/temp/gcloud/python-docs-samples/speech/grpc/transcribe_async.py \
-g -k -m100

compare.py

Prerequisites

This depends on the output of the following scripts:

  • text2stats.py
  • tally_audio.py

Example usage

$ python compare.py -f /temp/stt/AudioJournals/text2stats