Multi-language Forced Alignment in a Heterogenous Corpus
by Sai Krishna for CCExtractor development
The current transcripts corresponding to the videos are both imperfect (OOV and lag). This project seeks to correct the transcripts by developing techniques to first detect errors in alignment and then produce correction algorithms to reduce the frequency of these errors. By combining different techniques, an accurate forced alignment package will be generated, which will be able to operate in adverse conditions found in both the transcript and audio.