Hello

About me

Trained as a computer scientist, I've spent a significant portion of my career working as a data scientist. My work has focused on building practical machine learning solutions for complex problems, particularly in the healthcare sector. I’ve developed personalized recommender systems to enhance user engagement, implemented customer segmentation models to identify behavioral patterns, and created scalable pipelines for processing large datasets. My projects often involve translating business needs into actionable machine learning models, deploying production-ready solutions.

Along the way, I’ve developed a passion for building things—tools, websites, tutorials; sometimes useful, sometimes just for fun. What I enjoy most is the process of bringing ideas to life. Check out my GitHub page for some of the things I’ve built.

Outside of work, I’m always up for an adventure. I love traveling to new places and, like everyone else, I take way too many photos (I know, but trust me, I actually go back and look at them!). When I’m not traveling, you can find me tying knots (yes, there’s more to knots than just tying shoes) or folding origami. These activities are my way of staying zen—though occasionally, my attempts at a perfect crane end up looking more like abstract art.

Beyond personal hobbies, I’ve found that some of my best work happens when I collaborate with others. I’ve contributed to various volunteer efforts—community websites, open-source tools, and research collaborations. If you have an interesting project or idea, feel free to reach out. I’m always open to exploring new collaborations.

Profiles

LinkedIn

GitHub

Papers & Publications

Defining, Predicting, and Preventing Disengaged Users

This project presents a practical machine learning approach to managing user engagement, starting with defining engagement metrics to evaluating the solution using experimental design. We built a data pipeline for feature extraction, model training using PySpark and Scikit-Learn, and data visualization with Seaborn is used to assess performance.

A contextual recommender system is then implemented with Alternating Least Squares (ALS) to personalize interactions, and its effectiveness is evaluated through A/B testing. The project focuses on building an ML pipeline to produce actionable insights from a 19-month dataset of financial transactions, from analysis to A/B testing plans.

GitHub RepositoryRead on Medium

Medium Article Data Science Hands On Tutorials FinTech Supervised ML Unsupervised ML Recommender Systems A/B Testing ALS PySpark Scikit-Learn Seaborn Python

NLP: Text Summarization and Keyword Extraction on Property Rental Listings — Part 1

Using The dataset, sourced from Airbnb rentals in Tokyo, this project explores the application of Natural Language Processing (NLP). This hands on tutorial walks through data preparation, custom lemmatization and various NLP techniques to analyze rental listing descriptions; focusing on keyword extraction and text summarization. Using spaCy and Scikit-Learn, we implement methods such as Named Entity Recognition (NER), TF-IDF, and Google’s T5 LLM model.

GitHub RepositoryRead on Medium

Medium Article Data Science Hands On Tutorials NLP Text Summarization spaCy Scikit-Learn NER Text Analytics LLM Python

Consolidating Next.js Logging: From Winston to Google Cloud

This article outlines the process of centralizing Next.js logs in the cloud using Google Cloud Logging and Winston. It covers the necessary steps, practical considerations, and detailed configurations required for seamless integration. By the end, readers will have a robust logging system designed to enhance the reliability and performance of a Next.js application.

Medium Article Data Engineering Hands On Tutorials Observability Next.Js Google Cloud Typescript

Building Containerized Workflows Using the BioDepot-Workflow-Builder

We present the BioDepot-workflow-builder (Bwb), a software tool that allows users to create and execute reproducible bioinformatics workflows using a drag-and-drop interface. Graphical widgets represent Docker containers executing a modular task. Widgets are linked graphically to build bioinformatics workflows that can be reproducibly deployed across different local and cloud platforms.

Each widget contains a form-based user interface to facilitate parameter entry, and a console to display intermediate results. Bwb provides tools for rapid customization of widgets, containers and workflows. Saved workflows can be shared using Bwb’s native format or exported as shell scripts.

GitHub RepositoryRead Publication

Research Paper Bioinformatics Reproducibility of Research Software Development Cloud Computing Docker Flask RNA sequencing R Python

GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software

We introduce GUIdock-VNC, a tool that allows us to run bioinformatics workflows with graphical user interfaces in Docker containers, accessible through a web browser, which enhances reproducibility in research. GUIdock-VNC also supports cloud deployment with OAuth2 for secure access, simplifying the setup process. We tested the tool using gene network inference and observed minimal performance overhead compared to native applications.

Read Publication

Research Paper Bioinformatics Reproducibility of Research Software Development Cloud Computing Docker Flask RNA sequencing R Python

GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research

GUIdock allows for the facile distribution of a systems biology application along with its graphics environment. Complex graphics based workflows, ubiquitous in systems biology, can now be easily exported and reproduced on many different platforms. GUIdock uses Docker, an open source project that provides a container with only the absolutely necessary software dependencies and configures a common X Windows (X11) graphic interface on Linux, Macintosh and Windows platforms.

As proof of concept, we present a Docker package that contains a Bioconductor application written in R and C++ called networkBMA for gene network inference. Our package also includes Cytoscape, a java-based platform with a graphical user interface for visualizing and analyzing gene networks, and the CyNetworkBMA app, a Cytoscape app that allows the use of networkBMA via the user-friendly Cytoscape interface.

GitHub RepositoryRead Publication

Research Paper Bioinformatics Reproducibility of Research Software Development Cloud Computing Docker Flask RNA sequencing Python C++

Predicting discontinuation of docetaxel treatment for metastatic castration-resistant prostate cancer (mCRPC) with random forest

We applied the k-nearest neighbor method for missing data imputation, the hill climbing algorithm and random forest importance for feature selection, and the random forest algorithm for classification. We also empirically studied the performance of many classification algorithms, including support vector machines and neural networks. Additionally, we found using random forest importance for feature selection provided slightly better results than the more computationally expensive method of hill climbing.

Read Publication

Research Paper Data Science Bioinformatics Cancer Research Supervised ML R