项目作者: motazsaad
项目描述 :
language modeling
高级语言: Shell
项目地址: git://github.com/motazsaad/language-modeling.git
language-modeling
scripts for language modeling
This project is a collection of scripts that help for language modeling. These scripts include:
- text cleaning
- text normalization
- vocabulary counts and frequencies
- language models building
- testing language models
steps for preparing texts:
- source ~/py3env/bin/activate
- prepare/prepare_text_for_lm.sh
- prepare/normalize_months.sh
steps for building Vocab:
A Vocabulary can be built in two ways
- Based on a frequency theshold (build_vocab/get_vocabs_greater_than_n.sh)
- Based on most frequent N terms (build_vocab/get_vocabs_most_freq.sh)
Both scripts use build_vocab/wordfreq2vocab.py
usage: wordfreq2vocab.py [-h] -t TEXT -v VOCABULARY -f FREQUENCY
[-top TOP | -gt GT | -all]
steps for building LM:
build_lm/build_lm.sh
add the scripts
- run_build_lm_v1.1.sh build LM
- test_LM_decoding.sh decode using DMP
- sclite.sh test the results
- formating.sh reformatting utterances id and test it
- mix_lm.sh interpolate two language models