• dmortens@cs.cmu.edu

Research Interests

Computational linguistics and NLP

My computational interests center around applying computational morphology and phonology to problems in natural language processing. While linguistic representations have long been employed in a variety of NLP tasks, these representations have been primarily morphological, syntactic, semantic, and discourse-analytic. My mission has been to expand the role of morphological, and especially phonological, representations in NLP. Some specific research directions include the following:

  • Phonological representations for cross-lingual transfer learning
  • Phonological representations (segments, features) in named entity recognition (NER)
  • Predicting the forms of loanwords from phonological models
  • Approximate phonological matching and entity linking
  • The implementation and application of automatic (rule-based, unsupervised, and semi-supervised) syllabification
  • Tools for compiling metarules (with phonology feature specifications) into finite state transducers

My interests in computational morphology are perhaps more straighforward:

  • Developing more user-friendly and expressive frameworks for writing morphological analyzers, particularly for languages with non-concatenative morphology
  • Rich morphological representations for machine translation from morphologically rich source langauges
  • Machine translation into morphologically rich target languages

Finally, I have long-standing interests in computational historical phonology

  • Test hypothetical proto-language reconstructions and sequences of sound changes through computational means
  • Cognate prediction problem: given two corpora of related languages and a word from one corpus, predict the form of its cognate in the other language, even if it is not present in the corpus
  • Proto-language inference problem: given two corpora of related languages, infer a proto-language and two transducers (representing sequences of sound changes) such that the number of unique cognate pairs predicted from the proto-language and the two transducers is maximized

Theoretical and descriptive linguistics

I have long worked on languages of East and Southeast Asia. The specific languages and groups in which I am interested and which I have worked on are as follows:

  • Hmong-Mien
    • Western Hmongic
  • Tibeto-Burman
    • Tangkhulic
    • Kuki-Chin
    • Jingpho

Aside from specific languages, I am interested in a variety of linguistic subfields and issues. Here is a representative outline of my theoretical interests:

  • Phonology
    • Tone
    • Phonation type/register
    • Chain shifts and other counterfeeding opacity
    • Abstractness of phonological relationships
    • Phonetics-Phonology interface
    • Phonology-Morphology interface
  • Morphology
    • Compounding
    • Process morphology
    • Reduplication
    • Phonological constraints on morphotactics
    • Affix ordering
  • Historical Linguistics
    • Comparative reconstruction
    • Reconstructing phonological grammars
    • Speaker misunderstanding and misinterpretation as a source of linguistic innovation
    • Language contact
  • Language Description