Projects
Built an NLP system to track correlations between COVID-19 case numbers and social media discussion topics extracted from a corpus of Twitter data
Multiclass text classifier using a recurrent neural network (RNN)
Used a support vector machine (SVM) to build a multiclass text classifier
Built a part-of-speech (POS) tagger using a version of a beam-search algorithm
Text classifier using Maximum entropy
Using the k-nearest neighbors algorithm to build a text classifier
Used naive bayes for a multiclass text classifier
Used the Cocke-Kasami-Younger (CKY) algorithm to write a parser for context-free grammars (CFGs) rendered in Chomsky Normal Form (CNF)
Built a version of a naïve Bayesian classifier to classify text based on language models of 15 different languages
Used prefix trie to search for target sequences in a single pass through a complete human genome corpus. Optimized algorithm performance to search large data files in minimal time with memory-mapped files
Used a finite state machine (FST) to implement a tagger that identifies syllables in Thai
An Evaluation of an Automatically Generated Grammar of Ik
Built an HPSG-based grammar of the endangered Kuliak language, Ik, from field linguists’ data using the LinGO Grammar Matrix. Used it to parse, generate and translate Ik sentences and performed error analysis