Some of my work

Most work is in private codebases, but you can find some public code on my Github.

Analytics platform

I was technical lead for the Analytics Platform at Squarespace, which consisted of BigQuery, Trino, dbt, Airflow, and Looker. Some efforts included: enabling dynamic Airflow DAG creation from dbt artifacts, CI/CD processes for checking dbt code quality, and setting up pipelines for moving on-prem data to the cloud with Trino, dbt, and Airflow. This platform supported hundreds of data pipelines written by dozens of analysts and data scientists.

Untitled.001.jpeg

Analytics platform

I designed and built the analytics platform at Saturn Cloud from scratch using Snowflake, Superset, and Prefect. I built a Python library to encapsulate all the components, with a CLI to deploy Snowflake views and Prefect flows. I managed the Superset installation and set up charts and dashboards for the company’s analytics efforts.

etl-thumb@2x.jpg

Spark ETL Framework

I built a Python and Spark-based ETL framework for the data pipeline at Modernizing Medicine. The main concept is having object-oriented wrappers around the functional logic of ETL transformations. This allowed for re-usable logic, de-coupled storage/IO logic, and a framework for unit testing. It also incorporated data cataloging and quality checks by keeping table and column metadata as part of the code. I presented an initial iteration of the framework at Spark Summit 2018.

phd-thumb@2x.jpg

Predicting melanoma risk

My PhD research focused on machine learning models to predict individual patient risk of developing melanoma from routinely captured electronic health records. Involved processing de-identified data from over 20 million patients into a research dataset to create over 100,000 features used for building predictive models. This involved processing raw datasets with Spark, then performing machine learning in the PyData stack, and analyzing results with R.

pydata-thumb@2x.jpg

PyData Miami

I was a co-organizer and frequent speaker for PyData Miami from 2018-2021 and was a big part of making the 2019 conference happen.

 I am open to consulting via hourly or milestone-based projects. Get in touch if you want to work together!