I'm a machine learning + data engineering consultant. My professional experiences range across data engineering, natural language processing, and artificial intelligence. My weapon of choice is Python.
Built test cases for testing web vulnerability scanner using C# .NET
Supported internal procurement web application with Java
2003-07
Pennsylvania State University, University Park
BSc, Information Science and Techology, Design and Development
Decisive
Predictive analytics for social marketing content
2014-2015
Data Engineer
I was part of a fantastic startup, working directly with founders David Dundas and Ryan Witt. During my time there, we went through a product pivot, from a self-serve mobile advertising plateform to a social content analytics product. These presented different challenges and allowed me to grow both in breadth and depth.
For the advertising platform, I developed a data pipeline to consume more than 500M events daily. It processed in nearly realtime, with less 1 hour from ingestion to the RESTful API. The endpoints were further designed for efficient reporting and data analysis.
During the pivot to social analytics, I delivered an analytics dashboard MVP within two weeks. The MVP includes an extract-transform-load pipeline for numerous social platforms. The collected social contents were profiled into various parameters for the data science team to model.
Soshio
Social analytics for Chinese language and market
2012-2013
Founder, CTO
Bootstrapped SaaS Startup
For my previous startup, I've built a social analytics product with a bespoke NLP engine to analyze the Chinese social media. It collected the social conversations surround a keyword and determined the prevailing emotions.
I developed the core text mining engine to extract sentiment and emotions from text. Working with a language very different from English presented challenges not covered by the popular tools. I was able to explore new techniques for this problem.
The task was further complicated by the data size which was required to ingest. At nearly 1M weekly, there are a lot of data to acquire and process through the language engine. I achieved this by leveraging crawler queues and analytics workers.
To effectively communicate the results, I used visualization tools, from basic trend plot to more complex heat maps. Users are able to drill into the analytics to compare results across social keywords. With responsive design, users can also access it on devices of all sizes.