Student and instructor working on laptop.
Data Science, MS

Internship Practicum

All students gain real world experience for nine months of the program (15 hours/week) tackling data science and analytics problems at organizations around the San Francisco Bay Area and beyond. Each year, roughly 60 companies come to pitch their projects to our students during “Pitch Week” (past and current partners include those shown below). Students with complementary strengths are matched up to form a team. To ensure success of the projects, each team is actively mentored by a faculty member who participates in weekly meetings to supervise and provide technical and mathematical expertise.

Following an initial hypothesis, students typically engage in data acquisition, exploratory data analysis, feature extraction, model development and evaluation, as well as oral and written communication of results. Class schedules are set so that students can work onsite two days per week.

  • Internships begin in mid-October.
  • Students devote 15 hours a week to internship work on average.
  • Projects may be paid or unpaid.
Remote video URL

Video Transcript

How Can Your Organization Participate?

Interested in working with a group of highly motivated and committed students from this exciting graduate program?

Students working on laptops.


  • A select list:

    • American Civil Liberties Union (ACLU) Foundation of Northern California
    • Bay Area Rapid Transit (BART)
    • BlackRock
    • Blueboard
    • Boost Sport
    • California Forward
    • Cerenetics
    • Environmental Defense Fund
    • Facebook
    • California Department of Fisheries and Wildlife
    • First Republic Bank
    • Freedom Financial Network
    • Golden State Warriors
    • Hims & Hers
    • Metromile
    • NexTracker
    • Nisum
    • New York Mets
    • Oportun
    • Orange Silicon Valley
    • Pocket Gems
    • Propeller Health
    • Recology
    • Reddit
    • Reputation
    • Stanford Graduate School of Business
    • Stanford Medicine
    • Subwifi
    • Target
    • The Nature Conservancy
    • UCSF Department of Radiation Oncology
    • UCSF Brain Networks Laboratory
    • UCSF Bakar Computational Health Sciences Institute (Gastro)
    • UCSF Hospital Medicine
    • Velux
    • W.L. Gore & Associates

Past Projects


    Our Team: Qianyun Li

    Goal: At the ACLU, the student identified potential discrimination in school suspensions by performing feature importance analysis with machine learning models and statistical tests.


    Our Team: Max Shinnerl

    Goal: At the ACLU, the student analyzed COVID-19 vaccine equitable distribution data. They developed interactive maps with Leaflet to visualize shortcomings of the distribution algorithm and automated the cleaning of legislative record data. They also developed a pipeline for storing data to enable remote SQL queries using Amazon RDS and S3 from AWS.


    Our Team: Suren Gunturu

    Goal: At AWS, the student employed machine learning techniques to interpret user natural language questions to SQL queries. They did this by interpreting features such as database information and input questions and mapped them to queries. They read available architecture on the topic and implemented them both from scratch using a Seq2Seq architecture as well as calling HuggingFace pretrained transformers for this task.


    Our Team: Sophie Wang, Eriko Funasato

    Goal: Students at Bold developed an end-to-end machine learning pipeline using Python’s Scikit-learn to classify churned customers. They also presented feature importance from the model to aid decision making. After being deployed in production, the pipeline increased the customer retention rate. Their work also included collaboration with the customer success team and performing A/B testing on email campaigns.


    Our Team: Veeral Shah, Ricky Zhang

    Goal: At Boost, students built and deployed a logistic regression pipeline to dynamically predict college basketball in-game win probability using Python and PostgreSQL. They established novel metrics for efficiency, excitement, and tension by analyzing mean, variance, and volatility trends of in-game win probability output.


    Our Team: Nicolas Decavel-Bueff, Taince Tan

    Goal: Students at engineered and integrated machine learning techniques to perform NER as a tool to better collect and preprocess data. On another project, they worked on creating a content-based recommendation system to help identify competitors.


    Our Team: Zhimin Lyu, Victor Palacios, Daniel Carrera

    Goal: At Cerenetics, students developed and deployed a Python multi-threading application for a brain functional MRI data preprocessing pipeline (DICOM- BIDS - normalized time series) to extract voxel signals and predict the presence of mental health disorders. They also created and implemented a novel Iterative Spectral Clustering algorithm for brain functional MRI voxel clustering.


    Our Team: Emre Okcular, Yue Zhao

    Goal: Students at applied machine learning to website ad clicks and inner clicks data using Python's Scikit-learn and Matplotlib for visualization.


    Our Team: Kexin Wang, Wenyao Zhang

    Goal: At Electronic Arts, students built an anomaly detection process with supervised models (2D CNN) and improved model robustness with an unsupervised algorithm (Autoencoder) using Keras.


    Our Team: Yihong Shen, Jordan Uyeki

    Goal: Students at Eventbrite used SQL and Python to compare revenue opportunities across different creator segments and to better understand creator behavior over time. They also compared various methods for event recommendation systems (collaborative filtering, networks, ERGM models, etc).


    Our Team: Zixi Luo

    Goal: At Facebook, the student worked on the Facebook Community Product Group team to understand how businesses use Facebook groups. Their ultimate goal was to build a machine learning model to predict Facebook groups run by businesses and understand how they can improve the user experience.


    Our Team: Flora Chen, Hsuan-Yu Lin

    Goal: At Jumio, students conducted EDA on identify thresholds that were effective at catching financial fraud. On another project, they built a flask app and set up modeling endpoints on AWS.


    Our Team: Shiqi Tao, Rahul Bethavalli

    Goal: Students at LaHaus employed NLP and deep learning techniques to identify description quality using Python. They also conceptualized and developed a suggestion system to recommend the most relevant custom page tags for real estate listings using a probabilistic random forest model. This resulted in an increase in the click-through rate by 70% post-deployment in production. On another project, they worked on improving the existing image captions for listings and leveraged zero-shot transfer learning of CLIP from OpenAI to generate qualitative and diverse captions. They implemented the end-to-end production pipeline using AWS, Pytorch, openAI, and Airflow.


    Our Team: Ye Tao, Michelle JanneyCoyle

    Goal: At LexisNexis, students used machine learning techniques to perform legal analytics and conducted a deep learning model for a classfication and text generation task. Additionally, they used matrix factorization to build a recommendation system in Python, and on another project they built a deep learning NLP API accessed by distributed spark job.


    Our Team: Catie Cronister

    Goal: At MedStar, the student built a deep learning model to predict the proper radiology protocol that a physician would prescribe and authored a paper based on their work.


    Our Team: Weronica Green, Huidon Xu

    Goal: Students at Metromile built and deployed a deep learning-based end-to-end computer vision system to identify vehicle quality issues using Resnet in PyTorch. They used the model predictions to run statistical analysis on various business metrics using SQL and Python. Lastly, they created an app that allows stakeholders to interact with the model predictions.


    Our Team: Okeefe Niemann, Danh Nguyen

    Goal: At the Metropolitan Transportation Commission, students created data pipelines to both organize and quality check jurisdiction entries. In addition, they created and fine-tuned deep learning models to classify buildings into zones.


    Our Team: Moh Kaddoura, Trevor Santiago

    Goal: Students at the New York Mets created an outfield defense model using multivariate distributions, powerful classifiers (RF and XGboost) and clustering. They also used SciPy and NumPy to create a matchup model that accurately predicts success rates for a certain batter against a certain pitcher, or vice versa.


    Our Team: Vaishnavi Kashyap, Phillip Navo, Sandhya Kiran Reddy Donthireddy

    Goal: At Novi, students engineered a pipeline to automate extraction of applicable columns from Excel files using Pandas and FuzzyMatch. Additionally, they conducted funnel analysis to understand customer engagement with the company platform. On another project, they leveraged Google Data Studio and Google Analytics and powered web analytics dashboards with high-level Business metrics and user engagement.


    Our Team: Tian Qi, Matthew Hui

    Goal: Students at PG&E conducted exploratory data analysis to discover power outage patterns and employed machine learning techniques in order to identify assets that experience high risk events in the future using Python, SQL, AWS and Plantir Foundry.


    Our Team: Audrey Barszcz

    Goal: At Phylagen, the student utilized multiple machine learning models along with Shap feature importance to identify a subset of features that were the most predictive for classifying an outcome. On another project, they trained embeddings using a GloVe neural network model on genetic sequences.


    Our Team: Yi Huang, Siwei Ma

    Goal: Students at Pocket Gems used reinforcement learning to build a dragon agent that flies, follows and attacks in unity. They also developed a search engine and web server from scratch with NLP techniques.


    Our Team: Noah Matsuyoshi

    Goal: At Propeller Health, the student predicted early life failures of sensors for medical device monitoring using Redshift (SQL) and Python.


    Our Team: Yueling Wu, Hashneet Kaur

    Goal: At Ranker, students prototyped a video recommendation engine using LightFM’s collaborative filtering model based on users' implicit feedback on various website events such as trailer viewed or item clicked / added to watchlist. On another projects, they generated a script to minimize the "position on list" bias issue using descriptive statistics and SQL to increase reliability of crowdsourced lists, performed audit on the current ranking algorithm, and identified discrepancies for the engineering team to resolve. They also identified trending shows by scraping data from Twitter, applying NLP techniques (e.g., parts of speech (POS) analysis, fuzzy string matching and sentiment analysis) and leveraging number of tweets and sentiment score.


    Our Team: Amee Tan, Shruti Roy 

    Goal: Students at Recology automated sequencing of garbage pickup using telematics data, DBSCAN Clustering and Haversine Distance calculation in Python. On another project, they predicted garbage collection time using XGBoost and Isolation Forest.


    Our Team: Lucia Page-Harley, Maruo Napoli

    Goal: At Reddit, students built a time series forecasting dashboard to understand and predict different video metrics. On another project, they performed analyses using SQL and Python visualizations to understand the German user-base at Reddit and planned/analyzed experiments to improve their product experience.


    Our Team: Kaiqi Guo

    Goal: At the Stanford Graduate School of Business, the student explored different approaches such as BERT to detect and correct error in digitization of historical documents.


    Our Team: Daniel Blessing, Victor Nazlukhanyan

    Goal: Students at the Stanford Medicine Department of Radiology conducted deep learning research and implemented computer vision methods to synthetically produce contrast-enhanced MRI images. Architectures included generative adversarial networks and U-Nets.


    Our Team: Anni Liu, Aneri Dand

    Goal: Students at employed machine learning techniques to forecast sales for Syrup's retailer clients. They used Jinja3 and Plotly to build dashboards for tracking metrics, providing insights to retailers, as well as logging the results of machine learning experiments.


    Our Team: Elyse Cheung-Sutton, Yingtong Lin, Eileen Wang, Remi LeBlanc 

    Goal: Students at the Schmidt Family Foundation's 11th Hour mBio project built web scrapers used on websites for African GMOs, IRS financial data, and news articles and created visualizations displaying the scraped information. They built a website to serve the analysis results using React and Django and trained a language model using and Pytorch to support classification of African news articles. In order to serve information about the uses of agricultural biotechnology, they also consolidated data into one central hub to serve through a web application and deployed this containerized web application with Docker.


    Our Team: Christabelle Pabalan

    Goal: At UCSF, the student used computer vision and deep learning techniques, including multitask learning and ensemble learning, to predict cognitive scores for Alzheimer's patients.


    Our Team: Berkay Canogullari, Tianxiang Zhou

    Goal: Students at UCSF predicted the outcome (local failure and patient survival) for large brain metastasis treated with radiation. The project consisted of performing tumor segmentation using deep learning followed by extraction of imaging features for prediction of treatment outcomes.


    Our Team: Jared Mlekush, Shuyan Li, Dashiell Brookhart, Min Che

    Goal: Students at UCSF worked with physicians to predict the likelihood of success of salvage radiation treatment to help oncologists determine treatment options for prostate cancer patients. They utilized logistic regression, Cox Proportional-Hazards models, and feature importance analysis to create Kaplan-Meier estimators for patients. They also analyzed physician’s notes to create a predictive model for determining diagnostic error using techniques from Natural Language Processing (NLP) including Bag of Words and Word2vec and Machine learning models such as Random Forest, XGBoost, and Logistic Regression.


    Our Team: Evan Chen 

    Goal: At UCSF, the student engaged in medical image preprocessing and deep learning (image segmentation) utilizing Python, SQL, Linear/Logistic Regression, more advanced Machine Learning, and Radiation Oncology treatment planning software.


    Our Team: Sicheng Zhou, Christopher Pang

    Goal: At UCSF, students built a data pipeline to automatically generate datasets for cross-validation by pulling samples from main dataset. They developed deep learning solutions to generate high quality synthetic x-ray images from Digitally Reconstructed Radio-graphs (DRRs) images using Cycle-Consistent Generative Adversarial Networks (CycleGAN), which improves middle frequency power, an image quality score, by 20% on average compared with baseline Histogram Matching. This model could improve real-time x-ray imaging tracking during radiation therapy. They also visualized and compared synthetic x-ray images and Fourier Analysis results using customized HTML and Jinjia tem-plates with Flask framework and presented the results to principle investigators.


    Our Team: Patrick Poon, Boliang Liu

    Goal: Students at UCSF collaborated with UCSF researchers to feature engineer and query patient's information using SQL and Spark. With the data, multiple machine learning models were used to forecast the need of the administration of antibiotics for these patients in 2-3 days using information from the first 24 hours utilizing Logistic Regression, Random Forest, XGBoost, and neural networks in PyTorch.


    Our Team: Efrem Ghebreab, Anawat Putwanphen

    Goal: Students at Virgo developed a classification system for Ulcerative Colitis and Crohn's Disease utilizing deep learning and video image processing techniques.


    Our Team: Youchen Zhang, Kristofor Johnson 

    Goal: Students at W.L. Gore & Associates deployed Deep Learning Computer Vision techniques with Python's PyTorch package to segment microscopic images. They also built a Python package for internal deployment to easily train new models and architectures on different hyperparameters.


    Our Team: Grant Phillips, Stephen Embry

    Goal: Students at W.L. Gore developed deep learning models to perform image classification, image segmentation, and keypoint detection on cornea image datasets using PyTorch.


    Our Team: Luke Thomas 

    Goal: At W.L. Gore, the student built a table extraction and merger system leveraging an AWS service for OCR, and IPython Widgets as a GUI.


    Our Team: Zachary Dougherty

    Goal: At Wanamaker, the student developed architecture for analyzing and preprocessing Google Analytics data through a Markov chain attribution model.


    Our Team: Kyle Brooks, Joshua Majano

    Goal: Students at Washington State University utilized web scraping technologies to scrape international league data to be utilized in a model to predict an international player's projected performance in the NCAA. Additionally, they built out models to predict the same performance metric for NCAA transfer players.


    Our Team: Daren Ma, Ming-Chuan Tsai, Haree Srinivasan

    Goal: Students at ABC News used Python to write a machine learning model to predict election results and used Docker and AWS to deploy the pipeline.


    Our Team: Jacob Goffin

    Goal: At Accountability Counsel, Jacob created web-scraping scripts in Python & Selenium to build a first-of-its-kind database of human rights complaints. He also built a document-search (using Django/ElasticSearch) on thousands of .pdf documents, allowing users to quickly find relevant human rights cases to support their research.


    Our Team: Ivette Sulca, Hoda Noorian

    Goal: Students at Airbnb developed an evaluation tool prototype that identifies socioeconomic bias on Airbnb algorithms and experiments. They analyzed past A/B tests and built a dashboard using Python and Superset.


    Our Team: Esther Liu, Jack Dong

    Goal: At Beam Solutions, students used machine learning techniques to classify transaction data and perform text clustering. They also worked on industry research and database mapping for potential new customers.


    Our Team: Hannah Lyon

    Goal: At Cuyana, Hannah used Markov chains to develop a data-driven marketing attribution model that informed marketing spend. She created a customer propensity model using gradient boosting to determine critical site features that were then enhanced by the digital team to improve conversion. Additionally, she combined SQL and Tableau data for ad-hoc analysis of payment methods, trained neural networks to produce product embeddings used for a recommendation system on website product pages, and modeled repeat purchaser behavior predicting second purchases.


    Our Team: Maxine Liu, Zhentao Hou

    Goal: Students at Eventbrite built a classifier and a deep learning model to improve event recommendations. They also researched cases for and against investing in online events from the perspectives of opportunity size, product data, and potential revenue impact. On another project, they analyzed text data with NLP libraries to identify features that are indicative of event listing quality.


    Our Team: Kevin Wong

    Goal: At Faire, Kevin developed a SQL-based outlier flagging mechanism. Additionally, he conducted a deep-dive analysis of the effectiveness of the Faire mobile app on retailer behavior using SQL, python, statistics, and propensity-score matching.


    Our Team: Peng Liu, Wenjie Duan

    Goal: Students at FLYR developed a SQL/python workflow that predicted flight revenue by finding similar flights with clustering and Random Forest models.


    Our Team: Vivian Chu

    Goal: Vivian worked with FracTracker on the collection and aggregation of oil and gas data for the state of California, before conducting production analysis of oil wells at the pool level. Financial data was then added to predict the status of each of the oil wells as an asset or liability.


    Our Team: Kyrill Rekun, Xueying Li

    Goal: At the Golden State Warriors, students used machine learning techniques to create a last-minute ticket buyer model that predicts the probability of a person being a last-minute, planner, or in-between buyer. Using the lifetimes Python package, they built a proxy lifetime value spend model for customers to aid in marketing and ticket targeting. These projects utilized tools such as Pandas, Seaborn, and sklearn.


    Our Team: Peng Liu, Wenjie Duan

    Goal: Students at Gore Medical developed PyTorch CNN models using the API to detect key points in medical optical coherence tomography images, thus allowing for automated assessment of an implant. They achieved these results using transfer learning and data augmentation.


    Our Team: Ariana Moncada, Matthew Sarmiento

    Goal: At Hohonu at the University of Hawaii, students created a tidal forecasting pipeline that helps populate a Django web application and Plotly plots for forecasts. They clustered multiple time series datasets together to increase the performance of their multivariate time series models in R and Python.


    Our Team: Bing Wang

    Goal: At the Human Rights Data Analysis Group (HRDAG), Bing gleaned critical location of death information from unstructured text fields in Arabic using Google Translate and Python Pandas, adding identifiable records to Syrian conflict data. She wrote R scripts and bash Makefiles to create blocks of similar records on killings in the Sri Lankan conflict to reduce the size of search space in the semi-supervised machine learning record linkage (database de-duplication) process.


    Our Team: Shreejaya Bharathan, Geoffrey Hung

    Goal: Students at Manifold developed a Python library that utilizes machine learning and deep learning to solve for the parameters of dynamical systems defined by differential equations using PyTorch, Docker and MLFlow.


    Our Team: Matthew King, Lin Meng

    Goal: At Metromile, students created a crash classification model to predict the primary point of impact during a collision using telematics data collected from customers. On another project, they used deep learning to classify images of fraudulent cars.


    Our Team: Rushil Sheth

    Goal: At the New York Mets, Rushil created infield and outfield shift models using multivariate distributions, powerful classifiers (RF and XGboost) and clustering.


    Our Team: Kamron Afshar, Michael Schulze

    Goal: Students at MTC used deep learning to train a Neural Net Image Classifier on images of buildings to classify their use. They generated the data set using Google API. They also built a Selenium crawler data pipeline that scrapes legal codes and collected them in a Redshift database to track changes.


    Our Team: Lisa Chua, Shane Buchanan

    Goal: At NakedPoppy, students improved the recommendation system for new customers by incorporating content-based and collaborative filtering trained on clickstream data. They used NLP techniques to extract key aspects from Google reviews and implemented feature-based opinion mining on product reviews to assist in the scoring of new products. Later, they conducted market basket analysis on transaction data to provide customers with “pair with” recommendations and increase engagement.


    Our Team: Collin Prather

    Goal: At the Baltimore Orioles, Collin implemented a Deep Recurrent Survival Analysis model (LSTM in PyTorch) to predict the probability that an American League manager will remove their pitcher using in-game time series data. Another prominent project was developing a model to predict relief pitchers’ level of fatigue, then deploying a containerized (Docker) web application on AWS to host the model and explanatory visualizations to communicate the analysis to key stakeholders in the Orioles front office.


    Our Team: Kathy Yi, Sean Sturtevant, Jingwen Yu, Nithish Kumar Bolleddula

    Goal: Students at PG&E used SQL, Python and AWS Sagemaker to employ machine learning techniques to predict whether or not a PG&E asset is likely to experience a failure. On another project at PG&E, students built computer vision models on drone imagery to identify defects in power grid lines.


    Our Team: Nicholas Parker, Mundy Reimer

    Goal: Students at Phylagen worked on projects with data from microbiome samples and laboratory processes that involved software development, data analysis, and machine learning.


    Our Team: Qingmengting Wang, Tian (Arthur) Qin

    Goal: At Pocket Gems, students completed two NLP projects using LSTM and Dialogflow.


    Our Team: Andrew Eaton, Xuxu Pan

    Goal: Students at Propellor Health built a Random Forest model to predict how long it would take to solve a customer support ticket using word embeddings from the ticket texts and a Continuous Bag of Words (CBOW) model. They also published live dashboards with information on ticket counts and complaint rates on a Tableau Server.


    Our Team: Yunzheng Zhao, Shishir Kumar

    Goal: At Recology, students used linear regression to generate route statistics and service time estimation from GIS and trash collection data. They also analyzed routing data and identified anomalies in the reporting and data-capturing process.


    Our Team: Kevin Loftis, Esme Luo

    Goal: Students at Reddit worked on graph-based subreddit community detection. They developed a subreddit graph based on user view overlap and performed community detection on graph to cluster similar subreddits using Python and NetworkX. This doubled the subscription rate of subreddits compared to the existing system. On another project, they worked on a streaming feature extraction pipeline where they ​architected and developed a Flink streaming data processor in Scala using Docker, Flink, Kafka, Circle CI, and Kubernetes.


    Our Team: Meng Lin, Hao Xu

    Goal: At Reputation, students used entity matching in deep learning for matching addresses and performed topic modeling to analyze topic trends in reviews.


    Our Team: Alaa Abdel Latif, Annette (Zijun) Lin

    Goal: Students at the Salk Institute for Biological Studies built super-resolution deep learning models using and PyTorch.


    Our Team: Sunny Kwong

    Goal: At Sparta Science, Sunny worked on improving the reliability of balance tests by performing multiscale entropy analysis with R and Python on force plate scans.


    Our Team: Jiaqi Chen, Sakshi Singla

    Goal: At Specialty's Cafe & Bakery, Jiaqi performed revenue forecasting employing time series analysis and EDA and also worked on building a recommendation engine using machine learning.


    Our Team: Jingxian Li

    Goal: Students at the Stanford Graduate School of Business cleaned SEC 10-K documents and built word2vec models based on this corpus. They also came up with different ways to evaluate models and learned to use the BERT model.


    Our Team: Lea Genuit, Alan Flint

    Goal: At Trulia, Lea employed deep learning techniques using Pytorch to identify rotated scanned documents by a factor of 90 degrees. She also implemented an improvement of the current solution (Tesseract, an OCR engine) by working on a patch of the image using Python. Then, she compared the results of Tesseract and the CNN models. On another project at Trulia, Alan built a power analysis tool in Python for Trulia's A/B testing platform. This entailed coding and deploying an ETL pipeline and designing an interactive application using Streamlit. His second project involved employing an interpretable machine learning model to identify site features that influence positive outcomes for interested home buyers.


    Our Team: Dillon Quan

    Goal: At TruStar, Dillon built parsers to normalize data ingested into the data lake to centralize samples into one format for predictive analytics usage downstream using Spark and Scala. His second project focused on analyzing URLs and how to generate scores to determine their level of maliciousness using Python and Pytorch.


    Our Team: Qingyi Sun, Akanksha

    Goal: Working with the Brain Networks Laboratory at UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), students focused on characterizing diseases, such as Autism and Alzheimer’s disease, making diagnosis and prognosis from multi-channel brain Magnetoencephalography (MEG) data. They built an LSTM (Long Short-Term Memory) model using PyTorch to analyze brain MEG data and extract information to make predictions on characteristic parameters of interest. On another project, they worked on pretraining 3D Convolutional Neural Networks with brain MRI data. The models were pretrained using a segmentation task.


    Our Team: Linqi Sheng

    Goal: Working with UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), Linqi built an LSTM (Long Short-Term Memory) model using PyTorch to analyze brain MEG data, extract information, and make predictions on characteristic parameters of interest.


    Our Team: Roja Immanni

    Goal: Working with the UCSF Radiation Oncology Department, Roja found that medical image datasets are fundamentally different from natural image datasets in terms of the number of available training observations and the number of classes for the classification task. She hypothesized that compared to architectures used for natural images, those needed for medical imaging can be simpler. She proposed smaller architectures and showed how they perform similarly while significantly saving training time and memory. This is joint work with Gilmer Valdes at UCSF.


    Our Team: Zachary Barnes

    Goal: Working with UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), Zachary used UCSF's Spark environment for EHR data to create a data set, generate labels for hospital acquired sepsis patients, and create prediction models using sklearn and Pytorch.


    Our Team: Sihan Chen

    Goal: Working with the Morin Lab at UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), Sihan built a 3D Residual U-net to precisely segment metastases from brain MRI images with PyTorch. He evaluated the effects of number, size, and locations of metastases on the accuracy, which has resulted in a scientific conference presentation and a manuscript and helped UCSF design a state-of-the-art model.


    Our Team: Shrikar Thodla

    Goal: Working with the Vasant Lab at UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), Shrikar worked on multiple projects. These included using deep learning to segment and classify medical images, attempting to generate 3D images from multiple 2D image views, leading migration of full-stack components from GCP to IBM, detecting accidental rotations in images using CNNs built in PyTorch, and optimizing code to read images from a database.


    Our Team: Srikar Murali, Sean Tey

    Goal: Students at United Healthcare cleaned and processed millions of insurance claims transactions with SQL and did hypothesis testing on demographics-related data. On another project, they predicted members who are likely to be hospitalized in the near future as part of a system for identifying administratively complex members with a Gradient Boosting Trees model using the CatBoost library.


    Our Team: Andrew Young, Charles Siu

    Goal: At Valimail, students tackled the problem of classifying a backlog of 100K+ unknown internet domains generated by Valimail Defend. They developed an end-to-end machine learning pipeline that classifies trusted domains by detecting whether they belong to low-risk categories such as real estate. The Gradient Boosting Machine (GBM) model achieved a 95%+ precision rate with test data when classifying real estate domains using Natural Language Processing (NLP) for web content analysis. On another project, they designed and implemented REST APIs using Flask in Dockerized modules in the pipeline and built web scrapers using BeautifulSoup to gather multiple external data sources for ML model training.


    Our Team: Mikio Tada, Stephanie Jung

    Goal: Students at Virgo developed a Python script to extract data frames from 120 hours of video. They used Google AutoML to train deep learning models to automate video recording during endoscopic medical procedures and to develop an automatic procedure type tagging system. On another project, they built a prototype object detection tool for real-time polyp tracking during a colonoscopy using CVAT for data labeling and Google AugoML to train the deep learning model.


    Our Team: Samarth Inani, Akansha Shrivastava

    Goal: At Walmart Labs, students developed an image inpainting tool to remove occlusions from high-resolution furniture images using partial convolutions. They also worked on a research-oriented project to enhance the color detection algorithm to improve the accuracy of the color attribute in the product description of furniture listed on using Pytorch and Open-CV.


    Our Team: Max Calehuff, Xintao (Todd) Zhang, Wendeng Hu

    Goal: Students working with the Wicklow AI in Medicine Research Initiative (WAMRI) and MedStar Georgetown University Hospital used NLP to create an automated grading program for medical student imaging reports.


    Our Team: Andy Cheon, Aakanksha Nallabothula Surya

    Goal: At Zyper, students built and deployed an image classification convolutional neural network (CNN) with PyTorch to help brands efficiently recruit fans with desired aesthetic types on social media. They applied feature importance methods using machine learning in Python to identify top factors that drive engagement rates of user-generated content. They also developed a user location prediction pipeline using NLP tools (NLTK, spaCy) to improve upon the existing location predictor and discovered and visualized trends from group chat content from 15 brand communities using mainly Pandas and ggplot.





    Our Team: Sankeerti Haniyur

    Goal: On this project, the student employed deep learning & NLP techniques to automatically tag cybersecurity documents. She then built a named entity recognition model to detect indicators of compromise in the documents.


    Our Team: Darren Thomas, Liying Li

    Goal: Students employed NLP techniques in Python for name recognition and used Pytorch and an LSTM to detect fraudulent transactions. On another project, scraped data using restful API, creating an application using Flask in Python. They also applied unsupervised machine learning models to build clustering and anomaly detection models using Python.


    Our Team: Benjamin Khuong, Ziqi Pan

    Goal: Students worked on an object detection project to detect defects in CT scans of machine parts. Their project was focused on designing computer vision based solutions for automatic defect-detection on industrial devices. They implemented state of the art deep learning algorithms such as Faster R-CNNs, R-FCNs, and 3D convolutional neural networks.


    Our Team: Wenkun Xiao, Nicole Kacirek

    Goal: Students worked closely with the marketing team to optimize campaign messages by applying NLP and machine learning techniques to competitors’ product reviews and social media posts. They also built and productionised a CLTV (customer lifetime value) and revenue prediction model which was put into production.


    Our Team: Brian Chivers, Evan Liu

    Goal: Students developed an unsupervised learning algorithm to detect anomalies in AWS network traffic.


    Our Team: Rebecca Reilly, Minchen Wang

    Goal: Students focused on increasing revenue using topic modeling, employing Python and the spaCy library to discover industry relationships using advertiser behavior. They employed machine learning technologies to predict online ad prices and identify important features. On another project, they created an NLP classifier to correctly identify acceptable and appropriate sentences.


    Our Team: Nan Lin, Lance Fernando

    Goal: Students built machine learning models to predict the LTV (lifetime value) of customers. On another project, they deduplicated over 5 million venue addresses using fuzzy string similarity metrics and a HMM, then utilized this data to create a search ranking method to recommend venues to event creators.


    Our Team: Aditi Sharma, Zhi Li

    Goal: Students built a content-based recommendation system for cars and employed auction price prediction.


    Our Team: Byron Han, Yuhan Wang

    Goal: Students used SQL to extract data from AWS, then employed NLP techniques to build a text classification pipeline.


    Our Team: Connor Swanson

    Goal: The student built anomaly detection systems in Python for environmental data. He also built time series forecasting models to predict future environmental shifts and built dashboards to host their findings.


    Our Team: Tyler Ursuy, Anush Kocharyan

    Goal: Students classified each Kiva partner into risk categories by implementing a Random Forest risk detection model that monitors the financial, geographic, and economic information of Kiva’s global partners. They also built an interactive online dashboard to provide easy access to data analyses, data visualizations, and model predictions which will help Kiva reduce the amount of time and money spent on manually inspecting partner information and conducting scheduled in-person visits.


    Our Team: Hongdou Li, Zhe Yuan

    Goal: Students employed machine learning techniques to predict solar panel performance across the country and provided business inference.


    Our Team: Hai Le, Jon-Ross Presta

    Goal: Students automated the data generation process for a dashboard with a Python script. They also trained an NLP model which takes the subject line, information about the app that sends the email, and information about the recipient segment to predict email open rates using PyTorch. On another project, the students used Python/PyTorch to build an NLP model to predict user engagement based on message content.


    Our Team: Edward Richard Owens, Prakhar Agrawal 

    Goal: Students created a system that optimizes the operation of HVAC systems by detecting the stabilization of building temperature from sensor data. On another project, they built a golf simulator with the model utilizing a video of a person hitting a golf ball and outputting the ball’s trajectory using machine learning and physics. They employed methods and architectures such as background removal, darknet (YOLO) and optical flow for computer vision.


    Our Team: Shivee Singh, Xiao Han

    Goal: Students used machine learning and deep learning to identify microplastics in the ocean water using OpenCV Python and PyTorch. Their main focus was to build object detection models trying to locate microfibers from underwater images to approximate the total volume and distribution of microfibers in the ocean.


    Our Team: Christopher Olley, Wei Wei

    Goal: Students used machine learning and deep learning to identify drivers based on their telematics data (speed and acceleration). On another project, the students extracted events and created features based on this data to train tree based models using Python. They extracted labeled trip data from SQL and Amazon S3 storage and built the ML/DL models to identify users using Python and SQL.


    Our Team: Sarah Melancon, Brian Wright

    Goal: Students used Python and Spark to combine and aggregate add-on related data from a variety of data sources into a single data source. They also built a dashboard based on this data source using Redash. The students built an ETL pipeline that aggregated several data sources into one combined dataset.


    Our Team: Jacques Sham, Quinn Keck

    Goal: Students built a data lake on AWS, involving S3 and Redshift, using tools available in the market (Trifacta and Python). On another project, they analyzed Clipper and FasTrak data, tracked key performance indicators, and built dashboards. They developed machine learning and times series models to predict daily Clipper Card usage within 4%.


    Our Team: Chong Geng

    Goal: The student developed metrics to define the success of the product in terms of user engagement and answering efficiency. He also applied NLP techniques to upgrade the recommender system and built a dashboard to visualize the results.


    Our Team: Nina Hua, Donya Fozoonmayeh

    Goal: Students employed machine learning for product recommendations and used PySpark to apply a model in a distributed environment. They also implemented machine learning techniques to classify skin color from an image and worked a recommendation system to improve user experience.


    Our Team: Evan Calkins, Jinghui Zhao, Ran Huang

    Goal: Students developed an algorithm to support targeted marketing campaigns, which identifies similar mobile users based on their location patterns. They built an n-gram language model for the African language of Wolof to improve functionality of a chatbot using Python. On another project, they calculated relative store location optimality by comparing user movements and travel patterns using a large dataset (4TB) of mobile user information processed on a 9-node Spark cluster.


    Our Team: Gokul Krishna Guruswamy, Louise Lai

    Goal: Students used PyTorch to train deep learning object detection and classification models to identify faults in equipment and to detect small-scale objects in millions of large drone images. They worked extensively in AWS cloud environment (EC2, S3, lambda, SageMaker, etc.) to productionize these models.


    Our Team: Paul Kim, Katja Wittfoth

    Goal: Students used deep learning techniques to identify different types contaminants in waste bins. They also automated identification of contaminants in complex images of waste bins by developing a multi-label image classification model using deep learning, Pytorch, Python, and AWS.


    Our Team: Xu Lian, Philip Trinh

    Goal: Students built a machine learning model to predict a truck's accident occurrence using Sklearn. They used data analytics and machine learning methods to provide policy recommendations on how Recology can increase safety when collection drivers are out in the city. They also merged sheets from different sources using Pandas and PySpark.


    Our Team: Yixin Sun, Julia Amaya Tavares

    Goal: Students built a machine learning pipeline on Airflow to estimate subreddit retention ability. They used Python spaCy package to build a small tool to extract keywords from post comments. On another project, they used TensorFlow to create a multi-label classifier for post titles, and SQL / Pandas for data acquisition and pre-processing.


    Our Team: Randy Ma, Xi Yang

    Goal: Students developed a review sentiment classifier using a deep learning model with LSTM and Self-Attention to improve reputation assessment (Python, PyTorch). They extracted customer concerns by building a multi-gram keyword extraction tool using syntactic dependency analysis. They also built an automated operational insight reporting tool (SQL, Python) to assess strengths & weaknesses of the client’s user experiences.


    Our Team: Crystal Sun, Marwa Oussaifi

    Goal: Students created web-based visualization tools for presenting the number of accessible jobs and trip patterns within San Francisco with D3.js. They automated complex data preprocessing and data pipelines to accommodate different scenarios when collecting, processing and piping the data using python. On another project, they implemented different ML algorithms to predict auto ownership per household.


    Our Team: Xinran Zhang, Zitong Zeng

    Goal: Students developed a Scala notebook to help the customer service team analyze user-retention metrics such as DAU and Return Retention. They provided an anonymization routine for sensitive impressions and events data using Spark UDF and Murmurhash3. They explored alternatives to traditional parametric tests to improve the performance credibility of A/B test analysis. They also researched and implemented outlier detection methods in Scala.


    Our Team: Xinke Sun, Jyoti Prakash Maheswari

    Goal: Students used SQL to track KPIs and built tables to store daily metrics using Python. The students applied deep learning techniques to understand the content of real-estate listings consisting of images and text and to predict lead submission.


    Our Team: Viviana M. Peña-Márquez, Neha Tevathia

    Goal: Students built an NLP model to identify the malware names using CBOW model and leveraged the open source data from Twitter. They used Pytorch to build the CBOW model. Created and implemented pipeline to automatically collect tweets using Twitter’s API, applied machine learning and natural language processing algorithms to detect entities, and feed daily detections to a dashboard.


    Our Team: Tian Qi, Jessica Wang

    Goal: The students deployed a machine learning pipeline to predict the paid users within the next two weeks using Python and SQL. In another project, the students predicted short term purchase using Python.


    Our Team: Jenny Kong

    Goal: The student used machine learning with fMRI data to classify network patterns of concurrently activating brain regions that arise during successful high-fidelity memory retrieval.


    Our Team: Miguel Romero Calvo

    Goal: The student employed deep learning techniques to improve the performance of Neural Networks in small data. He also conducted research on training and transfer learning methodologies.


    Our Team: Anish Dalal, Robert Sandor

    Goal: Students employed deep learning techniques in computer vision to accurately segment ventricles in the brain using Pytorch. On another project, they built a text classifier that predicts cancer patient survival from physician notes using Python, PyTorch, Bash, and FastAI.


    Our Team: Alan Perry, Tianqi Wang

    Goal: Using Python, students employed deep learning techniques to make segmentation of different organs, to make dose volume diagnosis, and to achieve MRI to CT images transformation.


    Our Team: Max Alfaro, Divya Bhargavi

    Goal: Students built deep learning models to classify different views of echocardiograms. They performed exploratory data analysis to become familiar with medical terminology.


    Our Team: Victoria Suarez, Harrison Mamin

    Goal: Students built recommender system to predict which matched candidates to job posting using Python, which improved recruiters' efficiency by 56%. They researched methods of detecting unconscious gender bias in performance reviews using word embeddings and neural networks. On another project, the students worked on two approaches to extract causal language pairs from text; one using a deterministic rule-based engine and one using a neural network, integrating them into a web-based UI using Flask.


    Our Team: Adam Reevesman, Meng-Ting Chang

    Goal: Students built a rule-based algorithm to identify when a user finished a route but forgot to stop their tracker in the MapMyFitness app using Python. They also preformed functions related to EDA.


    Our Team: Tomohiko Ishihara, Maria Vasilenko

    Goal: Students gathered user reviews on Personal Health Record apps on Apple App Store and Google Play Store and used Latent Dirichlet Analysis to try to see what app features users talk about most. They built models to predict whether a member is likely to get pregnant by creating a data set, performing feature engineering and building machine learning models. On another project, they collected user reviews from GooglePlay and Appstore and performed topic modeling (LDA) as implemented in Gensim.


    Our Team: Joy Qi, Jialiang Shi

    Goal: Students built machine learning classification models to identify lists of legitimate email domains versus fraudulent email domains. They employed machine learning techniques to classify whether an unknown domain is trusted or untrusted. On another project, they created scraping script to scrape social links on web pages.


    Our Team: Yihan Wang, Jian Wang

    Goal: Students predicted water utility customer nonpayment with a Random Forest model and implemented the model in Python into Valor’s codebase. They segmented utility customers with K-means clustering to understand their behavior. On another project they applied multiple time series model for identifying malfunctioned water meters. They used SQL and Python to build end-to-end workflow for the project.


    Our Team: Shulun Chen

    Goal: The student used SQL, Python, and Swagger to build data pipelines.


    Our Team: Ziyu Fan

    Goal: The student applied data science and machine learning techniques to forecast E-commerce retailer sales using Python. On another project, she used machine learning and NLP to find anomalies in product matching.


    Our Team: Brian Dorsey, Fiorella Tenorio

    Goal: Students used Python, TensorFlow, and Time Series demand prediction models. They worked on a model to predict the probability of client purchases and a demand prediction model.



    Our Team: Arpita Jena, Devesh Maheshwari, Alexander Howard

    Goal: Students employed NLP and deep learning techniques to classify sensitive information in Capital One's internal domain using Python.The result was wrapped in a Flask web app. Another project involved software engineering with the goal of automating Capital One's AWS authentication process.


    Our Team: Yiqiang Zhao, Gongting Peng

    Goal: Students employed machine learning methods to build a data pipeline for anomaly detection. They also used Python for data exploration.


    Our Team: Stephen Hsu

    Goal: Students worked within a multidisciplinary team to offer data science services to a nonprofit organization. Specifically, students developed an NLP-based model in Python to classify forum posts so that forum questions could be appropriately matched with professionals who are best positioned to answer them.


    Our Team: Timothy Lee

    Goal: Students did data pipeline work using the Python API service. Their work involved classification of PDF files using Python XGBoost and the collecting of research data samples using Python.


    Our Team: Holly Capell Students at Eventbrite used machine learning in Python to model ticket sell-through rates in order to help the company identify platform features that drive event sell-out. They performed cohort analyses using Python to help understand the revenue life-cycle of Eventbrite customers and investigated seasonality in ticket sales, using SQL to query data and R to create data visualizations.


    Our Team: Bingyi Li, Christopher Csiszar

    Goal: Students built a web-based system to classify municipal bonds in order to assure government compliance using Python and Flask. They used big data analytics, machine learning and clustering algorithms to automate the classification of the bank's municipal bond portfolio into High Quality Liquid Asset bonds. This work replaced the need for inefficient and costly external consultants to perform this task quarterly.


    Our Team: Yue Lan, Akshay Tiwari

    Goal: Students wrote SQL scripts to perform exploratory data analysis and built a data pipeline to ingest airline customer data. They also employed machine learning techniques to build and validate models using python to predict bookings and cancellations of airline tickets as part of the Flyr airline revenue management system They also worked on another project that used machine learning techniques to predict customer budget and price sensitivity.


    Our Team: Jake Toffler

    Goal: Students clustered individual pitchers' pitches by pitch type using level-set trees, a density-based clustering method, in Python.


    Our Team: Shikhar Gupta, Fei Liu

    Goal: Students used deep learning CNN techniques to identify diseases in chest X-rays.


    Our Team: Ting Ting Liu, Jose Antonio Rodilla Xerri

    Goal: Students employed machine learning techniques to identify relevant factors that may affect whether or not a Kiva loan will reach full funding. They developed a web application powered by a random forest model in order to predict the success of loans, highlight which factors are driving those loans, and provide suggestions on how to improve them.


    Our Team: Vinay Patlolla, Jason Carpenter

    Goal: Students worked on two projects with Manifold. In the first project, they used machine learning models such as Logistic Regression, Random Forest and XGBoost to detect faults in oil pipeline using Python. In the second project, they developed a multi-camera multitracking pipeline to track people in a scene using deep learning and clustering techniques.


    Our Team: Chenxi Ge

    Goal: Students worked on a complex computer vision problem using deep learning with the goal of locating characters to decode the character sequence.


    Our Team: Tyler White, Jing Song

    Goal: Students used Spark to obtain data to build a public-facing Firefox Health report dashboard. They used time series analysis to predict ESR usage and checked the validity of t-tests with non-parametric tests.


    Our Team: Danai Avgerinou, Shannon McNish

    Goal: Students worked on a data engineering project to build a small centralized data warehouse to host MTC's data. They also worked on a data science project using NLP with FastTrak survey data and made discoveries involving ridership patterns of Clipper users.


    Our Team: Natalie Ha, Christopher Dong

    Goal: Students built a text classification model to categorize survey responses and found correlations with NPS. On another project, they built a Tableau dashboard for funnel analysis on reported content in the platform. They also built and deployed (with Airflow) a machine learning model using Spark ML to predict survey text responses and created complex SQL queries to calculate metrics regarding content moderation.


    Our Team: Guoqiang Liang

    Goal: Students employed machine learning techniques to assign probabilities of churn using Python and Spark. On another project, they used NLP techniques to classify legal documents.

    Our Team: Ernest Kim, Davi Alexander Schumacher


    Our Team: Dixin Yan, Spencer Stanley

    Goal: At Pocket Gems, students employed machine learning techniques to build a churn model and a matchmaking model for a newly developed game. They also researched and developed models to help the marketing team with channel attribution and creatives optimization. On another project, they used time series methods to predict the impact of paid advertising channels on organic install volume.

    PRICE F(X)

    Our Team: Neerja Doshi, Alvira Swalin

    Goal: Students employed machine learning (Python) and deep learning (PyTorch) techniques to build a product recommendation system.


    Our Team: Khoury Ibrahim, Danielle Savage

    Goal: Students used deep learning techniques to build a multi-label image recognition CNN using PyTorch to identify contaminants in images of landfill, recycling, and compost in Recology's images of waste.


    Our Team: Sara Mahar, Nicha Ruchirawat

    Goal: Students automated the real-time detection of a data feed failure from Google, Bing and Facebook sources using a suite of standardized hypothesis tests. On another project, they identified significant clusters of words from tens of thousands of omni-channel reviews with Latent Dirichlet Allocation (LDA) topic modeling and k-means clustering.


    Our Team: Kishan Panchal

    Goal: Students used machine learning techniques to create a weekly cohort-based churn prediction system for season ticket holders. On another project, they created a data ingestion system to get external ticket data into the team's data warehouse.


    Our Team: John Rumpel, Kaya Tollas

    Goal: Students used Python to compute accessibility metrics for transit stops (this was later used in their study on TNCs and ridership). On another project they prepared data for input into the SFCTA travel model. And on another project they visualized traffic incidents with an interactive map using javascript.


    Our Team: Mathew Shaw, Cara Qin

    Goal: Students employed machine learning techniques to identify suspicious users, predict LTV, and classify game themes.


    Our Team: Daniel Grzenda, Jade Yun

    Goal: Students employed graph theory to quantify variants and analyze protein data from the blood of patients using Python.


    Our Team: Nimesh Sinha, Zizhen Song

    Goal: Students used natural language processing and machine learning techniques to build a data pipeline recommendation engine. On another project, they worked on clustering customers based on login data.


    Our Team: Ker-Yu Ong, Chen Wang

    Goal: Students compared cloud databases (AWS, Google Bigquery, Snowflake and Databricks) by running benchmarking queries for research use cases. They also ran machine learning models to classify WSJ articles and used NLP techniques to extract information from news articles and identify topics in Amazon product reviews.


    Our Team: David Kes

    Goal: Students developed an exponentially weighted moving average (EWMA) control charting scheme to detect bus detours for a variety of transit agencies using Python. The algorithm was used to help automate the customer success team's process for detecting defaults in any transit agencies systems.


    Our Team: Thy Khue Ly, Beiming Liu

    Goal: Students used machine learning to predict default risks of customers and also to cluster them into groups based on their credit card transactions using Python. On another project they used NLP to predict transaction categories, and on a final project they used time-series and machine learning to predict user annual income with transactional data.


    Our Team: Feiran Ji, Lingzhi Du

    Goal: Students predicted users’ purchasing behavior for future games using machine learning techniques and deployed an end-to-end pipeline to put the model into production on Hadoop clusters using Spark. Additionally, they visualized insights and developed an interactive dashboard to be used in conjunction with the predictive model.


    Our Team: Siavash Mortezavi, Kerem Can Turgutlu

    Goal: Students used traditional machine learning techniques to predict overall survival of meningioma cancer patients and used deep learning and computer vision to automatically segment brain structures.


    Our Team: Sangyu Shen, Qian Li

    Goal: Students employed machine learning techniques to classify patients with side effects from radiation therapy using Python.


    Our Team: Ryan Campa, Zhengjie Xu

    Goal: Students used machine learning to predict stride and cadence to help runners improve their form. They also used unsupervised learning to identify organized race events from millions of rows of workout data.


    Our Team: Savannah Logan, Sooraj Mangalath Subrahmannian

    Goal: Students applied NLP techniques in Python to identify the main complaints in a website survey. They then employed machine learning techniques to identify areas of possible improvement in coverage rejection time.


    Our Team: Taylor Pellerin, Devin Bowers

    Goal: Students employed machine learning techniques to help identify fraudulent email sending behavior. They prototyped internal tooling, documentation, and more. Additionally, they built a machine learning classifier to help identify new legitimate email services. This allows Valimail to quickly scan through email aggregate reports to identify legitimate services that email on a customer's behalf.


    Our Team: Jingjue Wang, Kunal Kotian

    Goal: Students trained a recurrent neural network to forecast water consumption and flagged unusual water meter readings by comparing the deviation of forecasts from true values. They wrote production code for a pipeline to extract and transform data, train deep learning models using TensorFlow, and generate forecasts for several water consumption time series.


    Our Team: Nishan Madawanarachchi, Chengcheng Xu

    Goal: Students predicted weight loss among customers using linear regression with R. On another project, they used logistic regression in Python to predict the urgency level of clients' messages using logistic regression in Python. They also built a chat bot which aimed to help new users with the onboarding process.


    Our Team: Ford Higgins, Ian Pieter Smeenk

    Goal: Students contributed to a 'football genome' project for stylistic classification of teams using Python. They built a college basketball statistical model that builds on top of existing models in order to improve them and designed tools for football coaches to use to as an aid in scouting opposing teams. These projects were completed using Python, R, SQL and D3.js.


    Our Team: Deena Liz John, Patrick Yang

    Goal: Students used Python, SQL and Looker to implement A:B testing at Vungle, revolving around the comparison of different ad templates, levels of compression, and more. They also aided in the development of an in-house A:B testing platform.


    Our Team: Liz Chen, Yu Tian

    Goal: Students developed an end-to-end pipeline in Python using computer vision and deep learning technologies for a company promotional product to recognize online promotions from images. On another project, they deployed REST APIs into production and designed experiments to compare the results from different methods.


    Our Team: Vanessa Zheng

    Goal: Students developed fraud detection models on a high-dimensional imbalanced dataset using Python. On another project, they devised and evaluated global risk metrics to monitor, condition and strengthen fraud models with SQL & Python.


    Our Team: Sri Santhosh Hari

    Goal: Students used time series techniques to forecast customer churn. Additionally, they used machine learning techniques like Random Forest and XGBoost to identify key features affecting bookings to predict members' likelihood of booking a car.