Integrated Speech Corpus Analysis (ISCAN)
is a web application to manage corpora and perform large-scale analyses
through PolyglotDB. It includes both a REST API and a front end application
for non-technical users to use PolyglotDB.
- McAuliffe, M., Coles, A., Goodale, M., Mihuc, S., Wagner, M., Stuart-Smith, J., & Sonderegger, M. (2019). ISCAN: a system for integrated phonetic analyses across speech corpora. In Proceedings of the International Congress of Phonetic Sciences 2019.
- McAuliffe, M., Goodale, M., Tanner, J., Coles, A., Willerton, V., & Sonderegger, M. (2019). Integrated Speech Corpus Analysis (ISCAN). [https://github.com/MontrealCorpusTools/ISCAN]
- Stuart-Smith, J., Sonderegger, M., & McAuliffe, M. (2018). Integrated Speech Corpus ANalysis – ISCAN: A new tool for large-scale, cross-corpus, sociolinguistic analysis. New York, NY. [PDF]
Speech Corpus Tools
is a graphical application for interacting, querying, and visualizing large speech
corpora. It parses a wide range of formats into a database, which allow for fast
and consistent queries across different sources of corpora.
- McAuliffe, M., Stengel-Eskin, E., Socolof, M., & Sonderegger, M. (2017). Polyglot and Speech Corpus Tools: A system for representing, integrating, and querying speech corpora. In Proceedings of Interspeech 2017. http://doi.org/10.21437/Interspeech.2017-1390
- McAuliffe, M., & Sonderegger, M. (2016). Easier speech corpus analysis: A practical introduction to Montreal Corpus Tools (including Speech Corpus Tools). Glasgow, UK: Scottish Graduate School of Social Science; University of Glasgow.
- McAuliffe, M., Stengel-Eskin, E., Socolof, M., & Sonderegger, M. (2016). Speech Corpus Tools. [http://montrealcorpustools.github.io/speechcorpustools/]
Montreal Forced Aligner
Montreal Forced Aligner
is a command line utility for performing forced alignment on audio datasets using
orthographic transcriptions and a pronunciation dictionary. It is trainable on
larger datasets and can align smaller datasets through pretrained models.
It is built using Kaldi.
- McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2019). Montreal Forced Aligner. [https://montrealcorpustools.github.io/Montreal-Forced-Aligner/]
- McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal forced aligner: Trainable text-speech alignment using kaldi. In Proceedings of Interspeech 2017. http://doi.org/10.21437/Interspeech.2017-1386
has Python implementations of algorithms reported in the linguistic literature
with the ability to run these algorithms on a wide variety of corpora.
The primary contributors to this project are Kathleen Currie Hall,
Blake Allen, Michael Fry,
Scott Mackie and myself.
- McAuliffe, M. (2015). Statistical phonological analysis in corpora using Phonological Corpus Tools. Montreal, CA.
- Hall, K. C., Allen, B., Fry, M., Mackie, S., & McAuliffe, M. (2015). Phonological CorpusTools. [https://github.com/PhonologicalCorpusTools/CorpusTools/releases]
- Hall, K. C., Allen, B., Fry, M., Mackie, S., & McAuliffe, M. (2014). Phonological CorpusTools: A free, open-source tool for phonological analysis. In 14th Conference for Laboratory Phonology. Tokyo, Japan.
Omnic Intelligence is a web application
for automatically and manually annotating events in professional Overwatch
matches on Twitch and Youtube. Automatic annotation is done through deep neural
PolyglotDB is the
package responsible for the storage and database aspects of Speech Corpus Tools.
represents most of my work in signal processing for creating MFCC, amplitude envelope,
and gammatone representations of speech. Future versions will also include algorithms
to calculate linguistically-relevant measurements such as pitch and formants.
Python package for calling Praat scripts, available here.
Python port of Bruce Hayes’ BLICK for calculating phonotatic probability
in English, available here.