Some utiltities
ML - Machine Learning
pdfsplit.py - PDF Splitting using python
subtitle-processing.py - Subtitle text extraction using graphical interface
weekly_lsp_stats.py - MySQL logs database processing
print_unicode_range.py - print unicode range from start to end from argv
#mongorestore.py - to restore mongo collection
#pet2concordance.py - postedit to concordance db
subtitle.py - extract text from .srt file, create four type of files[placeholder, new lines, without new lines, story mode]
data_range_API2concordance.py - parsing API response
list_match.py - find and replace from a file(having mapping words) with the input file to be replaced
ngram-generator.py - Generate n grams frequency on an input text file
tmx2tab.py - Extract text from tmx into tab seperated format
pdf2text/doc - Extract text from pdf into doc/txt
mongo_restore.py/sh - restoring mongo collections one by one
pet2concordance.py - Extract from postedit db and insert into conordancedb
docxtable2text.py - convert table text in doc/docx into tab seperated text
generate_docx.py - generate docx file from txt file
bulk_pdf_extract_api.py - extract text from pdf using tikka from input folder
create_srt.py - create srt file from transcription file with timeline
prime.py - script to determine prime number
vlc_mp4_flac.py - convert mp4 to flac files using vlc command from input folder to output folder
normalizer.py to remove white spaces, tabs, new lines
remove_rich_text.py to remove tags and text between tags from rich text transcription file.