Michael McAuliffe


Integrated Speech Corpus Analysis (ISCAN)

ISCAN is a web application to manage corpora and perform large-scale analyses through PolyglotDB. It includes both a REST API and a front end application for non-technical users to use PolyglotDB.


  1. McAuliffe, M., Coles, A., Goodale, M., Mihuc, S., Wagner, M., Stuart-Smith, J., & Sonderegger, M. (2019). ISCAN: a system for integrated phonetic analyses across speech corpora. In Proceedings of the International Congress of Phonetic Sciences 2019.
  2. McAuliffe, M., Goodale, M., Tanner, J., Coles, A., Willerton, V., & Sonderegger, M. (2019). Integrated Speech Corpus Analysis (ISCAN). [https://github.com/MontrealCorpusTools/ISCAN]
  3. Stuart-Smith, J., Sonderegger, M., & McAuliffe, M. (2018). Integrated Speech Corpus ANalysis – ISCAN: A new tool for large-scale, cross-corpus, sociolinguistic analysis. New York, NY. [PDF]

Speech Corpus Tools

Speech Corpus Tools is a graphical application for interacting, querying, and visualizing large speech corpora. It parses a wide range of formats into a database, which allow for fast and consistent queries across different sources of corpora.

  1. McAuliffe, M., Stengel-Eskin, E., Socolof, M., & Sonderegger, M. (2017). Polyglot and Speech Corpus Tools: A system for representing, integrating, and querying speech corpora. In Proceedings of Interspeech 2017. http://doi.org/10.21437/Interspeech.2017-1390
  2. McAuliffe, M., & Sonderegger, M. (2016). Easier speech corpus analysis: A practical introduction to Montreal Corpus Tools (including Speech Corpus Tools). Glasgow, UK: Scottish Graduate School of Social Science; University of Glasgow.
  3. McAuliffe, M., Stengel-Eskin, E., Socolof, M., & Sonderegger, M. (2016). Speech Corpus Tools. [http://montrealcorpustools.github.io/speechcorpustools/]

Montreal Forced Aligner

Montreal Forced Aligner is a command line utility for performing forced alignment on audio datasets using orthographic transcriptions and a pronunciation dictionary. It is trainable on larger datasets and can align smaller datasets through pretrained models. It is built using Kaldi.

  1. McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2019). Montreal Forced Aligner. [https://montrealcorpustools.github.io/Montreal-Forced-Aligner/]
  2. McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal forced aligner: Trainable text-speech alignment using kaldi. In Proceedings of Interspeech 2017. http://doi.org/10.21437/Interspeech.2017-1386

Phonological CorpusTools

Phonological CorpusTools has Python implementations of algorithms reported in the linguistic literature with the ability to run these algorithms on a wide variety of corpora. The primary contributors to this project are Kathleen Currie Hall, Blake Allen, Michael Fry, Scott Mackie and myself.

  1. McAuliffe, M. (2015). Statistical phonological analysis in corpora using Phonological Corpus Tools. Montreal, CA.
  2. Hall, K. C., Allen, B., Fry, M., Mackie, S., & McAuliffe, M. (2015). Phonological CorpusTools. [https://github.com/PhonologicalCorpusTools/CorpusTools/releases]
  3. Hall, K. C., Allen, B., Fry, M., Mackie, S., & McAuliffe, M. (2014). Phonological CorpusTools: A free, open-source tool for phonological analysis. In 14th Conference for Laboratory Phonology. Tokyo, Japan.

Omnic Intelligence

Omnic Intelligence is a web application for automatically and manually annotating events in professional Overwatch matches on Twitch and Youtube. Automatic annotation is done through deep neural network models.

Python packages


PolyglotDB is the package responsible for the storage and database aspects of Speech Corpus Tools.


Python-acoustic-similary represents most of my work in signal processing for creating MFCC, amplitude envelope, and gammatone representations of speech. Future versions will also include algorithms to calculate linguistically-relevant measurements such as pitch and formants.


Python package for calling Praat scripts, available here.


Python port of Bruce Hayes’ BLICK for calculating phonotatic probability in English, available here.