Projects

Built an NLP system to track correlations between COVID-19 case numbers and social media discussion topics extracted from a corpus of Twitter data

Multiclass text classifier using a recurrent neural network (RNN)

Used a support vector machine (SVM) to build a multiclass text classifier

Built a part-of-speech (POS) tagger using a version of a beam-search algorithm

Text classifier using Maximum entropy

Using the k-nearest neighbors algorithm to build a text classifier

Used naive bayes for a multiclass text classifier

Used the Cocke-Kasami-Younger (CKY) algorithm to write a parser for context-free grammars (CFGs) rendered in Chomsky Normal Form (CNF)

Built a version of a naïve Bayesian classifier to classify text based on language models of 15 different languages

Used prefix trie to search for target sequences in a single pass through a complete human genome corpus. Optimized algorithm performance to search large data files in minimal time with memory-mapped files

Used a finite state machine (FST) to implement a tagger that identifies syllables in Thai

An Evaluation of an Automatically Generated Grammar of Ik

Built an HPSG-based grammar of the endangered Kuliak language, Ik, from field linguists’ data using the LinGO Grammar Matrix. Used it to parse, generate and translate Ik sentences and performed error analysis