Practicum Project Program

One of the most meaningful ways we collaborate with our Data Institute member organizations is through our esteemed nine-month practicum program. This program provides MS in Data Science students with the opportunity to witness the societal impact of their work as they apply their classroom-learned skills to tackle real-world challenges in real-time, delivering tangible value to our project partners.

Members are guaranteed access to faculty-led project teams of 1-6 students, depending on membership agreement. These teams are equipped to address data science problems across various industries, leveraging advanced AI techniques to tackle nuanced, complex, and highly challenging data problems that align with your organization's broader business context and objectives.

Through this program, our students gain invaluable experience while providing innovative solutions that drive meaningful impact for our member organizations.

Have a challenge you'd like solved?

The first step is to discuss possible projects from your organization and how they align with faculty and student expertise.

Submit a Project

Past Projects

As you assess your needs and develop your project, we also encourage you to explore some of our past projects. This will provide insight into the diverse range of challenges our member organizations and student teams have successfully tackled together.

  • AGMonitor

    Student Team: Chenxi Li, Theodore Mefford
    Faculty Mentor(s): Shan Wang
    Company Liaison(s): Stanley Knutsen, Dr. Tim Hartz

    Project Outcomes: The "Crop Alert to Protect Farms and Save Water" project aimed to decrease water usage during droughts while preserving crop yields and quality. Utilizing AgMonitor's vast data resources, students developed and validated water stress and soil moisture predictors. This environmentally beneficial initiative impacted agriculture's water consumption, benefiting 200,000 acres in California and utilizing the expansive OpenET dataset across 14 states.

    Alaska Airlines

    Student Team: Joren James, Haonan Li, Anirav Jain
    Faculty Mentor(s): Shan Wang
    Company Liaison(s): Tak Wong

    Project Outcomes: In two innovative projects, students endeavored to elevate Alaska Airlines' marketing approach and enhance the guest experience. Project 1 focused on refining the promotion of the Mileage Plan program and the Alaska Airlines Visa Signature Card. Through meticulous data analysis, students pinpointed optimal moments for marketing, considering guest interactions, flight frequency, geographical relevance, and signup likelihood. This strategic approach maximized the impact of marketing efforts. Project 2 delved into audience segmentation, uncovering diverse guest preferences, from fare-conscious travelers to those seeking amenities. Tailored promotions aligned with distinct guest segments, improving the overall Alaska Airlines experience.


    Student Team: Adit Shrimal, Kuan Pin Chen, Maneel Karri, Ajayeswar Peddyreddy
    Faculty Mentor(s): Robert Clements
    Company Liaison(s): Brad Kenstler, Anila Joshi, Vidya Sagar Ravipati, Divya Bhargavi

    Project Outcomes: MLSL enlisted students to develop modular ML solutions for targeted industries (healthcare life sciences, media & entertainment, manufacturing). Their goals included collaborating with MLSL's repeatable solutions team on various projects, spanning multi-modal solutions, computer vision, forecasting, and knowledge graph modeling, addressing specific industry needs and challenges.


    Student Team: Johnny Ka Chun Chau, Yuan Yao
    Faculty Mentor(s): Robert Clements
    Company Liaison(s): Chayan Chakrabarti

    Project Outcomes: In this project, students were tasked with using machine learning to build prototype features designed to enhance user productivity and satisfaction. Students worked on various ML models, including deep learning and gradient boosted trees, experimenting with new approaches. They also played a role in designing advanced features and embeddings, evaluated model performance, and collaborated closely with experienced machine learning scientists, engineers, and data scientists to contribute to prototype platform features.


    Student Team: Amy Tang, Theo Byunghyn Kim
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Srividya Krithivasan, Victor Mora

    Project Outcomes: Students collaborated with internal data science teams to create a Finance Chatbot. The project aimed to enhance sales analytics by employing NLP/AI technology for query responses. They explored various NLP algorithms and datasets, concluding with creative visualizations for stakeholder communication and successful deployment within the firm's infrastructure.


    Student Team: Matt Marwedel, Jazz Sun 
    Faculty Mentor(s): Robert Clements
    Company Liaison(s): Michael Su, Jason Weiner

    Project Outcomes: Students undertook a project involving NLP analysis of client feedback surveys. Their goal was to extract features from unstructured feedback and create a localized model to differentiate between experience provider-related issues, concierge-related issues, and external problems. Additionally, they worked on data ETL, focusing on transitioning ETL processes from cloud-based no-code tools to an Airflow-based pipeline for tools like Zendesk and Salesforce. They also planned a data mart exercise to determine tables for prosumer usage, serving COO, engineering, data analysts, and others.

    Boston Children’s Hospital

    Student Team: Yu-Chuan Chiu, Deepak Singh
    Faculty Mentor(s): William Bosl
    Company Liaison(s): Michelle Bosquet Enlow, PhD

    Project Outcomes: Students engaged in a project titled "supervised tensor and matrix joint factorization for multimodal data fusion and biomarker extraction." They utilized Python, tensor and matrix factorization, Bayesian statistics, and machine learning to analyze EEG data for early prediction of mental and neurodevelopmental disorders. Their computational objective was to develop a coupled tensor and matrix factorization algorithm (SupCP+M) and apply it to a neurodevelopmental dataset containing EEG, clinical measures, sociodemographic indicators, and genetic data. The project aimed to extract interpretable nonlinear EEG features as potential biomarkers for brain-based disorders, with a focus on childhood anxiety and cognitive neurodevelopment. Students also worked on graphical representations of latent features and offered opportunities for learning in nonlinear dynamical analysis and computational neuroscience.

    Buck Institute

    Student Team: Lingraj Vannur    
    Faculty Mentor(s): Daniel O’Connor
    Company Liaison(s): Chunkai Zhou, PhD

    Project Outcomes: Students in the Zhou lab developed a deep learning-based imaging analysis platform to map aging-related protein changes in cells, aiming to create an aging molecular roadmap. Using Python, Java, and TensorFlow, they enhanced existing neural networks and streamlined data analysis while co-authoring research papers. In the second project, they explored the potential of Alphafold2 and molecular dynamics simulations to predict protein folding and assist drug/antibody selection, contributing to structural biology advancements with machine learning tools.

    California Department of Fish and Wildlife

    Student Team: Xin Ai, Sharon Dodda
    Faculty Mentor(s): James Wilson 
    Company Liaison(s): Alex Heeren, Brett Furnas

    Project Outcomes: Students at the Wildlife Health Laboratory (WHL) in collaboration with CDFW scientists focused on resolving human-wildlife conflicts, particularly with black bears. Their research aimed to update the state's black bear conservation plan. Using text and sentiment analysis, they examined social media data from platforms like Twitter and Nextdoor, expanding previous work on coyotes. Students aimed to identify patterns in black bear discussions and develop a real-time data dashboard for wildlife monitoring.


    Student Team: Zemin Cai, Harrison Jinglun Yu
    Faculty Mentor(s): Shan Wang 
    Company Liaison(s): Cathleen Clerkin

    Project Outcomes: Candid's Insights department engaged students in impactful research projects in data ethics. These projects included an examination of diversity, equity, and inclusion within nonprofits, an exploration of nonprofits' societal impact, and an investigation into real-time grantmaking data, particularly in relation to issues like racial equity. Students were tasked with identifying factors influencing organizations' willingness to share demographic data and analyzing data to predict nonprofits' societal impact. Additionally, they explored methodologies to provide real-time insights into philanthropic trends while addressing potential biases and confounding factors. These projects harnessed various data science techniques and underscored the importance of ethical considerations in data analysis.

    Carbon Health

    Student Team: Guru Gopalakrish
    Faculty Mentor(s): Mustafa Hajij
    Company Liaison(s): Hoda Noorian

    Project Outcomes: This project addressed predicting no-show appointments in urgent care, researched industry best practices, and built a model MVP. They also sought to personalize appointment reason lists based on user data, leveraging Recommendation Systems, with potential production implementation and impact analysis on appointment conversions.


    Student Team: Kang-Chi Ho, Yichen Zhao
    Faculty Mentor(s): Robert Clements
    Company Liaison(s): Nir Barazida, Guy Smoilovsky, Dean Pleban 

    Project Outcomes: Students involved in these projects undertook a wide range of tasks and initiatives. In the first project, they delved into the integration of machine learning tools with DagsHub, fostering innovation through novel integrations and content creation. The second project centered around replicating and expanding upon Chinchilla's research, involving the tracking of components and a comprehensive review of prior work, all aimed at increasing the accessibility of Large Language Models. Lastly, in the third project, students took part in extending a HackerNews bot's functionalities. This extension allowed for user input regarding content preferences and the development of a recommendation engine, with the ultimate goal of delivering valuable contributions to the technology community.

    Environmental Defense Fund

    Student Team: Varun Hande, Adam Ansari
    Faculty Mentor(s): Mustafa Hajij
    Company Liaison(s): Christopher Cusack

    Project Outcomes: Students improved fishery monitoring by enhancing ML algorithms for SmartPass, a smart camera system. The aim was to democratize AI algorithms, making them accessible to more practitioners and boost global fisheries management.


    Student Team: Akshay Pamnani, Patricia Ornelas
    Faculty Mentor(s): Victor Palacios
    Company Liaison(s): Thiago Marzagão, Esther Liu

    Project Outcomes: Students utilized Python, SQL (with Google Big Query), basic statistics (mostly hypothesis testing), machine learning, and Tableau. In the first project, they improved calorie burn estimation for more accurate user tracking and better recommendations. In the second project, machine learning helped predict workout duration, optimizing exercise recommendations.

    Four Analytics

    Student Team: Ensun Park, Nischal Mishra
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Kirby Zhang

    Project Outcomes: Students aimed to enhance a pricing system based on labor hours. They considered factors like client history, scope, location, and space size. In cases with ample historical data, they sought a real-time ML model, incorporating market rates, square footage, days, etc., to align prices with client expectations. They were also tasked with using clustering techniques for cases with less historical data.

    W.L. Gore & Associates

    Student Team: Cho Hsum Yang, Camilo Chaves Atlassian
    Faculty Mentor(s): Daniel O’Connor
    Company Liaison(s): Vasu Venkateshwaran, Noah Hodgson, James Cronin

    Project Outcomes: Students worked with image data from microscopy and pathology experiments at Gore, aiming to relate material structure to properties. They utilized ML and computer vision techniques for semantic/panoptic segmentation, boundary/key point detection, and practical metric extraction. They also explored data augmentation and synthetic generation. Finally, they developed user-friendly ML model training and usage code within an existing Python library.

    Kidas Inc.

    Student Team: Raghavendra Kommavarapu
    Faculty Mentor(s): Mustafa Hajij
    Company Liaison(s): Amit Yungman

    Project Outcomes: Students optimized point-of-interest detection algorithms, including hate speech and sexual content detection, using data and metadata. They took part in developing age detection in audio and text, emotion detection in audio and text, and voice changer detection in audio. Additionally, they worked on displaying data visualizations on personal pages based on user activity and algorithm results using Python.


    Student Team: Jinwei Sun
    Faculty Mentor(s): Victor Palacios
    Company Liaison(s): Victor Palacios

    Project Outcomes: The student team learned KNIME and Pytorch focusing graph neural networks. They produced business-oriented articles and videos showcasing tool usage, gaining skills for explaining deep learning to non-technical audiences. This role also involved teaching KNIME in paid courses, emphasizing the intersection of education and data science, including public speaking and business engagement.

    Metaphor Data

    Student Team: Aydin Schwartz, Prithvi Nuthanakalva
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Kirit Basu, Mars Lan

    Project Outcomes: The team has developed a Q&A Slack/Teams bot using OpenAI's ChatGPT LLM to answer natural language questions related to customer's datasets, dashboards, and knowledge base. They have also added a Generative AI feature to summarize long Slack threads into digestible knowledge that can be persisted for future references. Both features have since then been rolled out to customers for testing. 

    Learn about Metaphor Data

    Metropolitan Transportation Commission

    Student Team: Akul Bajaj, Lantin Su
    Faculty Mentor(s): Cody Carroll
    Company Liaison(s): Kearey Smith, Kaya Tollas, Aksel Olsen

    Project Outcomes: Students undertook four projects for the Metropolitan Transportation Commission (MTC), encompassing data engineering, machine learning, and data analysis. Their primary objective was to automate data processes, enhance data accuracy, and facilitate informed decision-making. These projects involved diverse tools and techniques such as Python, AWS, natural language processing, data visualization, image classification, and machine learning. The students contributed to improving regional planning, resilience evaluation, data management, and predictive modeling within MTC, aligning with the organization's mission to enhance transportation infrastructure and resilience.

    Oportun Inc.

    Student Team: Hanna Siew Tsien Lee, Shubhangi Badwaik
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Jonathan Sage

    Project Outcomes: Students used Python, SQL, AWS Cloud, and machine learning in two projects. The first, "Member re-engagement Propensity Modeling," aimed to understand customer behavior and engagement across Oportun's ecosystem, enabling better personalization. Techniques included graph analysis and building a re-engagement propensity model. The second project involved migrating Credit Card/Embedded Finance to a containerized infrastructure, enhancing workflow and reducing costs while providing hands-on experience with modern data infrastructure.


    Student Team: Kyle Kayhan Eryilmaz, Youshi Zhang
    Faculty Mentor(s): Daniel Jerison
    Company Liaison(s): Tristin Beckman 

    Project Outcomes: Students collected video transcripts and metadata from various media platforms, employing pretrained language models like BERT, RoBERTa, and BART for sentiment analysis, topic modeling, entity recognition, and narrative detection. They utilized SQL and Python for data extraction and analysis, and employed frameworks like HuggingFace, PyTorch, Sci-kit learn, and Metaflow, alongside AWS, for model training and deployment. Their projects aimed to identify influential content creators and extract interview details from video content, enhancing understanding of content dissemination and creator communities.


    Student Team: Matthew Wheeler, Nhi Pham Nguyen
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Michael Signorotti

    Project Outcomes: Students worked on the Image Labeling Infrastructure Development project. They aimed to streamline the collection, quality control, and utilization of labeled data for the computer vision team. They enhanced existing code, created labeling and quality control scripts, and planned to migrate this to a workflow execution tool. Tools such as SageMaker, GroundTruth, Jenkins, Jupyter Lab, GitHub, and Python were utilized.

    Propeller Health

    Student Team: Preetham Pathi, Manish Vuppugandla
    Faculty Mentor(s): Shan Wang
    Company Liaison(s): Connelly Doan, Noah Matsuyoshi

    Project Outcomes: The students' project at Propeller focused on deriving insights from behavioral analytics data related to respiratory disease patients using the mobile app. They constructed a Patient Experience Product Metrics Tableau workbook, delving into app behavior data and exploring creative ways to display and analyze metrics. They also conducted exploratory modeling to understand the relationship between app engagement and patient retention, providing direction for patient engagement strategies. Technologies included Redshift (SQL) for reporting queries and Python/Amazon Sagemaker for modeling.

    Salk Institute 

    Student Team: Yu-Hsin Wang, Mohana Medisetty
    Faculty Mentor(s): Cody Carroll
    Company Liaison(s): Uri Manor

    Project Outcomes: The students engaged in projects at the WABC involving vast image datasets from various sample types, including brain, tumor, and plant tissues. They leveraged Python-based libraries for deep learning, addressing tasks such as disease state prediction, developing a deep learning-based image degradation tool, object tracking in live cell videomicroscopy data, and motion prediction from single snapshots. Additionally, they explored new loss functions for super-resolution to enhance image quality. The goal was to streamline these tasks into accessible pipelines like imjoy or napari.

    San Francisco County Transportation Agency

    Student Team: Pei Wang, Madhav Ponnudurai 
    Faculty Mentor(s): James Wilson
    Company Liaison(s): Dan Tischler

    Project Outcomes: The students worked on three projects for SFCTA. Project #1 involved building a public-facing count portal to facilitate identification and dissemination of vehicle, pedestrian, and bicycle counts collected over a decade. Project #2 utilized the SimWrapper platform to create dashboards reporting travel demand forecasting model outputs and facilitating scenario comparisons. Project #3 focused on developing methods to enhance SimWrapper's capacity to display large skim datasets for better QA/QC and analysis of transportation network changes.

    SoFi Stadium

    Student Team: Ity Soni, Justin Can
    Faculty Mentor(s): Daniel Jerison
    Company Liaison(s): Melanie Palmer

    Project Outcomes: The students contributed to the Data Strategy team at SoFi Stadium and Hollywood Park, utilizing Google Analytics Suite, Python, R, and machine learning techniques. They worked on three projects: creating an internal pricing tool for events, conducting consumer market basket analysis to optimize marketing strategies, and performing sentiment analysis on event surveys to identify guest pain points and improve operational workflows. These projects aimed to enhance revenue generation and customer experience.

    Stanford Graduate School of Business

    Student Team: Rushil Manglik
    Faculty Mentor(s): Victor Palacios
    Company Liaison(s): Natalya Rapstine, Amy Ng

    Project Outcomes: The students engaged in a project called "Layout Parser" at the GSB, where they tackled challenges related to parsing table text or numbers from old documents, some dating back to pre-1900. They explored deep learning approaches using modern layout parsers to automate the extraction of information from tables with varying layouts. The goal was to improve accuracy and efficiency when dealing with old or misformatted tables, where manual transcription was time-consuming and costly.

    Stanford University, Ophthalmic Informatics and Artificial Intelligence Group

    Student Team: Vichitra Kumar, Devendra Govil
    Faculty Mentor(s): Cody Carroll
    Company Liaison(s): Sophia Wang

    Project Outcomes: Students explored the integration of various data modalities, including electronic health records, free-text data, and ophthalmic patient images, to create predictive models for glaucoma progression.  They also worked on enhancing model trustworthiness by developing approaches for explaining complex clinical prediction models that use multiple data modalities, such as structured data, text data, and imaging data from electronic health records.


    Student Team: Bharadwaj Allu, Harsh Praharaj
    Faculty Mentor(s): Mustafa Hajij
    Company Liaison(s): Michael Terry, Alex Davidoff

    Project Outcomes: The students worked on two projects within the context of SubWire. One project involved creating a model to collect and analyze user behavior metrics on the SubWire app, including watch time, shares, and their impact on user retention. The other project utilized web scraping techniques to gather user data from various social media platforms, aiming to develop a predictive model for virality based on relationships and engagement metrics.


    Student Team: Tejashree Ladhake, Akhil Gopi, Abhradeep Mukherjee
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Joey Ahnn 

    Project Outcomes: The students designed and developed algorithms for generating complementary recipes based on user-entered recipes. They created an automated and scalable data pipeline that collects recipe and review data from various sources. This data was then integrated with a neural network-based flavor graph to calculate candidate recipes that pair well with the user's input. The resulting output takes into account both complementarity and diversity to enhance the overall user experience.

    The Nature Conservancy

    Student Team: Wan Chun Liao, Jessica Xinyi Wang
    Faculty Mentor(s): Cody Carroll
    Company Liaison(s): Kirk Klausmeyer, Nathaniel Rindlaub

    Project Outcomes: Students collaborated with The Nature Conservancy's Conservation Technology team, contributing to environmental conservation through data science. In Project 1, they developed a data pipeline to estimate flooding extent on fields used to support migratory wetland birds. In Project 2, they refined a wireless camera trap system using machine learning to identify invasive species and protect endemic wildlife on islands, focusing on Santa Cruz Island off California's coast. Their work helped enhance monitoring and conservation efforts.

    University of California, San Francisco: Clinical Informatics

    Student Team: Ankit Gupta, Joy Chuyi Huang
    Faculty Mentor(s): Shan Wang
    Company Liaison(s): Xinran Liu, MD, MS, FAMIA

    Project Outcomes: Students at UCSF collaborated on two projects. In the first project, they aimed to revolutionize physician evaluation metrics, similar to how sabermetrics transformed baseball. They explored various data science techniques, from traditional statistics to NLP, to assess physician discharge effectiveness. In the second project, students worked on predicting acute postpartum care utilization to reduce maternal morbidity. They refined an existing model using clinical data and machine learning, ultimately striving to optimize outpatient postpartum visits. Their work aimed to enhance healthcare practices and patient outcomes.

    University of California, San Francisco: Gastroenterology

    Student Team: Daniel Tinoco, Tzu An Wang
    Faculty Mentor(s): Shan Wang
    Company Liaison(s): Vivek Rudrapatna

    Project Outcomes: Students contributed to two projects. In the first project, they aimed to assess the environmental and economic implications of different colon cancer screening methods. They used Markov modeling and Bayesian methods to estimate carbon emissions associated with screening options, potentially influencing healthcare decisions and policy. In the second project, students worked on information extraction from clinical notes to enhance patient-level prediction modeling using electronic health records. Their contributions supported the development of algorithms for transforming unstructured clinical data into analyzable formats, improving patient care.

    University of California, San Francisco: Oncology (NLP)

    Student Team: Max Yizhi Ma, Sanchita Jain 
    Faculty Mentor(s): Carlos Garcia
    Company Liaison(s): Dr. Hui Lin, Dr. Jorge Barrios 

    Project Outcomes: Students participated in a project focused on developing Natural Language Processing (NLP) transformer models for estimating the prognosis of cancer patients using Electronic Health Record (EHR) clinical notes. They utilized various transformer models, including ClinicalBERT and XLNet, to analyze over 160,000 oncology data registries collected over a decade. The project aimed to enhance cancer care by predicting overall survival across multiple cancer sites and provided valuable experience in NLP and data mining in the medical field.

    University of California, San Francisco: Oncology (CV)

    Student Team: Andres Martinez, Riley Tianrui Hu, Yusong Wang
    Faculty Mentor(s): Carlos Garcia 
    Company Liaison(s): Dr. Tomi Nano, Dr. Hui Lin, Dr. Dante Capaldi

    Project Outcomes: Students participated in a project focused on automating the identification and segmentation of brain lesions in magnetic resonance (MR) images for radiosurgery. They utilized deep learning techniques with PyTorch, working with 3D MR images. The project aimed to enhance efficiency in radiosurgery treatment workflows, with guidance from experienced medical physicists.

    YLabs (Youth Development Labs)

    Student Team: Tejaswi Dasari
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Robert On

    Project Outcomes: Students in the CyberRwanda project used various technologies and techniques to measure project progress and effectiveness. They employed Google Analytics to track engagement metrics and designed KPI dashboards for automatic data generation. However, challenges included manual data tracking, discrepancies between Google Analytics versions, and gaps in tracking product pick-ups. Integrating and utilizing data from different sources including MongoDB pharmacy backend for decision-making was identified as a crucial goal. In addition, the students developed an automated chatbot that can generate answers using natural language processing and existing documents, reducing the wait time.

  • ACLU

    Our Team: Joleena Marshall
    Faculty Mentor(s): Michael Ruddy
    Company Liaison(s): Linnea Nelson, Tedde Simon, Brandon Greene

    Project Outcomes: The team developed a tool with Python to acquire and preprocess publicly-available data related to the Oakland Unified School District to investigate whether or not OUSD’s allocation of resources results in inequities between schools. The team also provided an updated data analysis on educational outcomes for indigenous students for a select number of Humboldt County unified school districts, including data visualizations.

    Bay Area Rapid Transit (BART)

    Our Team: Zihao Ren, Yunhe Jia, Zipeng Hong
    Faculty Mentor(s): Steve Devlin
    Company Liaison(s): Wendy Wheeler, Yu Shen, Herbert Diamant

    Project Outcomes: The team implemented an analysis of BART train location data and location-related station message announcements across multiple data sources and tables within the BART system. The project began with exploratory data analysis to pinpoint and diagnose issues such mismatched location and messaging information for a given train, identification of error prone lines and stations, and lines or trains exhibiting unusually variable arrival times. The team then identified and fixed data engineering issues that often lead to problems, and built out statistical models to predict and quickly identify errors as they occur. Finally, the team built out an extract/transform/load (ETL) pipeline and train movement dashboard for identifying and communicating estimated time of arrival issues for trains.


    Our Team: Abdus Khan, Isabella Zhai
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Victor Mora

    Project Outcomes: The team developed a data-driven forecasting system for exchange-traded fund (ETF) flows. The team performed feature importance analysis to identify market and macroeconomic factors affecting the flows and experimented with different machine learning models to generate the forecasts. The team also provided a sensitivity analysis interpretation of how each market and macro-economic factor impacts ETF flows.


    Our Team: Xinming Wang, Yufeng Xing
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Michael Su, Taylor Smith

    Project Outcomes: The team developed a natural language processing (NLP) model to perform sentiment analysis on customer reviews. It also developed and maintained Airflow pipelines for data management purposes.


    Our Team: Marti Heit
    Faculty Mentor(s): Steve Devlin
    Company Liaison(s): Mustafa Abdul-Hamid, Christian Hanish, Jorge Costa

    Project Outcomes: The team worked on a series of small projects including: probabilistic predictions of professional soccer matches in the English Premier League (EPL); clustering of NCAA basketball players based on their style of play; translation of player clusters into context-relevant skill sets; building a pipeline to automatically generate visualizations of shooting efficiency per shot zone in NCAA basketball; building a metric to quantify and predict game excitement in different sports; auto-generation of NCAA game reports with relevant match recap data and insights obtained using techniques from natural language processing.

    California Department of Fisheries and Wildlife

    Our Team: Chandan Nayak, Isaac Lo
    Faculty Mentor(s): Brett Furnas, Christina Sloop

    Project Outcomes: The team used machine learning and natural language processing (NLP) techniques to better understand human-wildlife intersection using social media data (e.g., by scraping Twitter).

    California Forward

    Our Team: Evie Klaassen
    Faculty Mentor(s): Michael Ruddy
    Company Liaison(s): Patrick Atwater

    Project Outcomes: The team built a tool with Python to determine where high wage jobs are located in California. This tool serves as an extension to current data tools created and maintained by the organization. The team also developed a pipeline to clean and prepare new public data when it is released, and for the tool’s outputs to be regularly updated given any new data.


    Our Team: Rachit Yadav, Cameron Meziere
    Faculty Mentor(s): James Wilson
    Company Liaison(s): Skyler Cranmer

    Project Outcomes: The team applied various statistical methods, as well as neural network models, to detect the presence of mental illness using fMRI (functional magnetic resonance imaging) data.

    Environmental Defense Fund

    Our Team: Ankush Gupta
    Faculty Mentor(s): Michael Ruddy
    Company Liaison(s): Christopher Cusack

    Project Outcomes: The team worked on a computer vision project aimed at enhancing an object detection system in collaboration with The team developed an object detection model that detects small fishery vessels entering and leaving a port with high precision and high inference speed, even in harsh weather conditions. In addition, the team developed a tool to automate the preprocessing step of converting a custom dataset to an object detection dataset format – saving manual efforts by the annotation team.


    Our Team: Edith Lee, Mateen Saifyan
    Faculty Mentor(s): Yannet Interian
    Company Liaison(s): Claire Broad, Anne Chittum, Mike Fahey

    Project Outcomes: Students built a daily landing extract/transform/load pipeline to query and aggregate internal pipeline metadata to assist in pipeline ownership assignment and pipeline deprecation. The team then designed and built a drill-down dashboard to effectively visualize the granularity of the generated data. Other tasks addressed by the team included updating existing data pipelines to meet current coding standards and constructing metrics to evaluate pipelines.

    First Republic Bank

    Our Team: Ronica Gupta, Arman Hashemizadeh 
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Aaron Frank, Xu Liu, Chris Csiszar, Mark Woodworth 

    Project Outcomes: Embedded within the financial planning and analysis unit, the team used natural language processing (NLP) to solve their named entity recognition (NER) problem. We developed an end-to-end machine learning pipeline using NLP techniques, Bidirectional Encoder Representations from Transformers (BERT), and tree-based models to extract relevant information from 200-page-long portable document format (PDF) files.

    Freedom Financial Network

    Our Team: Jaysen Shi, Surbhi Prasad
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): James Olness

    Project Outcomes: The team built a price optimizer model to recommend best loan rates, with the aim of maximizing the total number of loans provided by the company. The data was queried and organized using BigQuery from GoogleCloud Storage. The model was created using machine learning and optimization techniques in Python. The proposed loan rates replaced the recommendations of a third-party analytical partner after improvement was demonstrated in funded loans with the new model.

    Golden State Warriors

    Our Team: David Lyu, Britta Goldman
    Faculty Mentor(s): Steve Devlin
    Company Liaison(s): Ray Yocke

    Project Outcomes: The team focused on combining disparate data sources, including Warriors internal data from summer camp enrollment, season ticket purchases, and Chase center retail sales, with external data from Ticketmaster and third-party ticketing apps. Once combined and cleaned, the team built a model to predict future purchases from past purchase history over various time frames. Finally, the team worked on streamlining and productionalizing the model with the engineering team, and interpreting actionable results with the marketing team.

    Hims and Hers

    Our Team: Karishma Chauhan, Jason Yu
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Yao Liu, Long Nguyen

    Project Outcomes: The team developed and productized time series models to predict the impacts of television advertisements. Additionally, the team developed and productized machine learning and deep learning models to predict customer lifetime value.


    Our Team: Kooha Kwon, Srividya Krithivasan
    Faculty Mentor(s): Michael Ruddy
    Company Liaison(s): Edwin Zhang, Colleen Qiu, Chiropher Olley, Lindsay Orr

    Project Outcomes: The team improved a risk prediction model that estimates the total loss each policy will claim through feature engineering, hyperparameter tuning, and experimentation with pre-processing methods. In addition, the team also developed a new model that identifies the precise location of a street-parked vehicle and alerts the mobile app user of upcoming parking restrictions, such as street sweeping.

    New York Mets

    Our Team: Brendan Jenkins, Seungju Han
    Faculty Mentor(s): Daniel Jerison
    Company Liaison(s): Jake Toffler

    Project Outcomes: In baseball, the fielding team wants to know where the ball is likely to be hit so that the fielders can be positioned in the best locations. For this project, the team used applied machine learning techniques to predict the distribution of balls in play based on characteristics of the pitcher and batter. Their method substantially improved prediction accuracy – even in situations with limited historical data.

    Nextracker (Abnormal Detection Methods Team)

    Our Team: Tong Wang, Xinyue Wang
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Chennan Li, Peng Liu

    Project Outcomes: The team developed abnormal detection methods for both solar and wind trackers and sensors. The team defined abnormal behaviors through time series models, including correlation coefficients and different notions of measuring “distance” in the data set.

    Nextracker (Irradiance Forecasting Team)

    Our Team: Lucas Oliveira
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Chennan Li, Peng Liu

    Project Outcomes: The team developed a library for analyzing and optimizing the performance of control software for trackers. The team also developed libraries for preprocessing irradiance data and forecasting irradiance, using both statistical and deep learning models.

    Nextracker (Solar Panel Design Team)

    Our Team: Michael Reigelman
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Chennan Li, Peng Liu

    Project Outcomes: This student performed exploratory data analysis to help engineers identify areas of improvement for new solar panel designs. The team created dashboards and libraries to enable engineers to continuously monitor specific features of the structural integrity of their designs.


    Our Team: Kyril Panilov
    Faculty Mentor(s): Daniel O’Connor
    Company Liaison(s): Ravi Narayanan

    Project Outcomes: The team researched recommender systems and machine learning applications in finance. The team also implemented content-based filtering, collaborative filtering, and hybrid approaches to recommender systems. Finally, the team presented a recommender model to potential Nisum clients.


    Our Team: Wei He, Mengting Xu
    Faculty Mentor(s): Jeff Hamrick
    Company Liaison(s): Christine Walsh, Ajish George

    Project Outcomes: The team utilized multiple machine learning models to generate user engagement analytics and predict credit card transaction amounts. For another project, the team improved the customer identification matching system by building a set of rules and tracking evaluated metrics for the identification algorithm.


    Our Team: Jih-Chin Chen, Derek Wolfgang Herwald
    Faculty Mentor(s): David Guy Brizan
    Company Liaison(s): Sarah Luger

    Project Outcomes: The team curated a dataset for a French-Bambara translation model by finding and cleaning existing translation data. This task involved researching aligners and implementing them into an alignment pipeline for unaligned data. It also included researching social strategies for annotation of untranslated Bambara data. The team then designed a Kaggle-style competition for the translation models. Finally, the team hyperparameter tuned byte pair encodings in light of a lack of available lemmatization.

    Pocket Gems

    Our Team: Shambhavi Gupta
    Faculty Mentor(s): Daniel O’Connor
    Company Liaison(s): Maxim Levet, Dixin Yan, Byron Han

    Project Outcomes: The team built and deployed language models to generate animation code scripts for content writers at Pocket Gems. The team also developed a churn prediction model to identify features contributing to player churn in a mobile game.

    Propeller Health

    Our Team: Cassidy Newberry, Anthony Wang
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Ian Smeenk, Ben Theye, Connelly Doan

    Project Outcomes: The team developed a data pipeline to analyze screen usage for an application. The deployed dashboard was delivered to the internal product team for feature improvement and key performance indicator (KPI) evaluation.


    Our Team: Dominnic Chant, Monashree Sanil
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Minna Tao, Aijaz Patel, John LaBarge

    Project Outcomes: The team built a text classifier to automate the manual process of identifying customer locking accounts from comments data, using natural language processing (NLP) and machine learning models. Additionally, the team designed and developed a user interface to facilitate easy use of route sequencing tools. The team deployed their model as an application programming interface (API) on the Azure platform. Finally, the team designed and developed key performance indicators (KPIs) and Qlik Sense dashboards to help general managers optimize and manage routes more effectively.


    Our Team: Tongyao (Nancy) Ruan, Ka Yam
    Faculty Mentor(s): Yannet Interian
    Company Liaison(s): Mackenzie Greene, Jose Lobez, Deitrick Franklin, Cynthia Li

    Project Outcomes: Using A/B testing, the team analyzed how users interact with different interest groups across time, and assessed the depth of user interactions. The team developed a dashboard to share insights into the popularity of particular search terms and various topics among different interest groups.


    Our Team: Karsten Kao
    Faculty Mentor(s): David Guy Brizan
    Company Liaison(s): Kellie Meckenstock, Rui Li, Allie Akridge, Brad Null, Marine Lin, Sonika Cottmar, Hao Xu

    Project Outcomes: The team achieved an improvement in neutral reviews’ recall by 87% (i.e., from 7.7% to 61.5%) by developing and tuning a Bidirectional Encoder Representations from Transformers (BERT) sentiment model. The team extended this project by building out an MLFlow pipeline for faster machine learning experimentation. Finally, the team built a Twitter text brand-extraction pipeline that improved recall by 19% after identifying issues in an analytics report by using Python.


    Our Team: Fan Li, Chandrish Ambati
    Faculty Mentor(s): Tahir Bachar Issa
    Company Liaison(s): Uri Mano

    Project Outcomes: The team re-implemented a previously-published deep learning paper for super-resolution of brain microscope images using convolutional neural network (CNN) models built on FastAI and PyTorch. The team improved the quality of the resolution of the previous approach by using a perceptual loss function, combined with self-supervised learning techniques such as contrastive learning and inpainting.

    Stanford Graduate School of Business

    Our Team: Neset Aydin
    Faculty Mentor(s): Steve Devlin
    Company Liaison(s): Brian Chiver, Natalya Igorevna Rapstine

    Project Outcomes: The team built an end-to-end automated extract/transform/load (ETL) pipeline using Python and the Redivis API to facilitate faculty data needs: for example, to scrape, organize, and store periodic Securities and Exchange Commission (SEC) reports available for faculty analysis in Redivis. The team also constructed tutorials and demonstrations to enable faculty to better use the pipeline functionality and Redivis platform.

    Stanford Medicine

    Our Team: Sneha Kumari, Sunil Kumar J S
    Faculty Mentor(s): Michael Ruddy
    Company Liaison(s): Sophia Ying Wang, Wendeng Hu

    Project Outcomes: The team researched developing multimodal deep learning models to identify glaucoma patients who would need surgery in the near future. The team built a fusion model combining text data, image data, and structured data to enhance model performance. They also performed explainability studies to better understand which features the model relied upon to make predictions.


    Our Team: Arman Tavana, Kaihang Zhao
    Faculty Mentor(s): Danielle Savage
    Company Liaison(s): Michael Terry

    Project Outcomes: The team built a data pipeline to extract, transform and store user data using Python and Redis feature engineering, as well as feature extraction through BERT from users’ biographical data. The team deployed random forest, gradient boosting, and A/B testing to lift marketing campaign performance by approximately 15%.


    Our Team: Melvin Vellera, Chahak Sethi
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Joey Jonghoon Ahnn

    Project Outcomes: The team developed a recommendation system to create a bundle recommendation based on recipes using natural language processing (NLP) techniques. The output included ingredients, ingredient substitutes, and kitchen gadgets. Outputs were optimized based on quantity and personalized using the user’s dietary restrictions.

    The Nature Conservancy

    Our Team: Zhiyi Ren
    Faculty Mentor(s): Michael Ruddy
    Company Liaison(s): Kirk Klausmeyer

    Project Outcomes: The team predicted natural river flow estimates in the West Coast region to aid state agency staff in setting flow targets for efficient water management. The team used random forest models and techniques such as hyperparameter tuning and feature importance analysis to generate improved estimates of the monthly natural river flow data from the model. They also used natural language processing (NLP) algorithms to evaluate sustainability reports more efficiently.

    University of California, San Francisco, Auto-Planning Radiosurgery

    Our Team: Christopher Pang
    Faculty Mentor(s): Yannet Interian
    Company Liaison(s): Tomi Nano

    Project Outcomes: The team collaborated with researchers to build a deep learning model. This model takes three-dimensional brain tumors images (i.e., magnetic resonance images) and predicts the three-dimensional radiation shot locations using PyTorch and 3D U-Net.

    University of California, San Francisco, Brain Metastasis

    Our Team: Nestor Teodoro Chavez
    Faculty Mentor(s): Yannet Interian
    Company Liaison(s): Tomi Nano

    Project Outcomes: The team leveraged convolutional neural network (CNN) model architectures to accurately segment small lesions in the brain for radiosurgery. The project consisted of building upon an established auto-segmentation pipeline to increase the robustness of the model by using computer vision and deep learning techniques.

    University of California, San Francisco, Chest X-Rays

    Our Team: Charudatta Manwatkar
    Faculty Mentor(s): Yannet Interian 
    Company Liaison(s): Tomi Nano

    Project Outcomes: The team developed a generative adversarial network (GAN) using PyTorch to enhance the visualization of cancer tumors in chest x-ray images. The team explored multiple deep learning architectures for paired (e.g., pix2pix) as well as unpaired (e.g., cycleGAN) image-to-image translation. Using a single-energy x-ray image as the model input, the model outputs a synthetic dual energy image with enhanced tumor visualization. The project should also help reduce patient exposure to dangerous x-rays.

    University of California, San Francisco, Cognitive Decline

    Our Team: Jeffery Ott, Chenjia Guo
    Faculty Mentor(s): Yannet Interian
    Company Liaison(s): Ashish Raj

    Project Outcomes: Team team created a computer vision model to predict memory and speech degradation in dementia and Alzheimer’s patients. Using magnetic resonance imaging (MRI) scans from patients, the team created a pipeline to produce parcellation results, segmentation results, and cognitive scores in the hope of eventually speeding the diagnosis and treatment plans for patients suffering from cognitive decline.

    University of California, San Francisco, Division of Gastroenterology

    Our Team: Yangzhou Tang, Mitch Veele
    Faculty Mentor(s): Shan Wang, Yannet Interian
    Company Liaison(s): Vivek Rudrapatna

    Project Outcomes: The team collaborated with UCSF faculty to work on a pilot study of ulcerative colitis aiming to enhance inference from real-world data using an externally-derived missing data model. Students pre-processed clinical trial data in Python (pandas) and imputed missing data. Quality control and data harmonization were used to benchmark against original publications. Various classification algorithms were employed – logistic regression, random forest, XGBoost, etc. – to predict multiclass disease severity scores.

    University of California, San Francisco, Division of Hospital Medicine

    Our Team: Amanda Li Luo
    Faculty Mentor(s): Shan Wang, Yannet Interian
    Company Liaison(s): Xinran Liu

    Project Outcomes: The team collaborated with UCSF researchers to predict patient readmission rates. An extract/transform/load (ETL) pipeline was built using SQL, Python, and Spark for data exploratory analysis and model-building. Predictions on whether patients will be readmitted again within 30 days after discharge were performed by leveraging tools and techniques such as AutoML, logistic regression, random forest, gradient boosting, and XGBoost using the scikit-learn package.

    University of California, San Francisco, Lung Cancer

    Our Team: Lakshmi Manne, You Wu
    Faculty Mentor(s): Yannet Interian
    Company Liaison(s): Gilmer Valdes

    Project Outcomes: The team developed machine learning models for predicting toxicities of lung cancer patients treated with proton radiotherapy, taking advantage of the largest proton therapy database in the world. The team extracted features from medical image datasets and improved baseline models through feature engineering.

    University of California, San Francisco, Natural Language Processing

    Our Team: Haotian Gong, Ruifeng Luo
    Faculty Mentor(s): Yannet Interian
    Company Liaison(s): Jorge Ginart, Hui Lin

    Project Outcomes: The team predicted the overall survival rate of brain tumor patients based on their electronic health record notes. The team built and calibrated neural network models – for example, Bidirectional Encoder Representations from Transformers (BERT) models, Long Short-Term Memory models, etc. To support their work, the team also refactored code, preprocessed data, and created data visualizations.

    University of California, San Francisco, Oncology

    Our Team: Young Zeng, Anish Mukherjee
    Faculty Mentor(s): Michael Ruddy or Yannet Interian
    Company Liaison(s): Benjamin Ziemer

    Project Outcomes: The team developed new cancer severity indices and predicted tumor growth in patients with brain metastases. The team used decision tree models to create interpretable severity indices and used random forest and gradient boosting models to predict survival. Additionally, the team utilized convolutional neural network (CNN) models to predict tumor growth using unstructured three-dimensional brain magnetic resonance imaging (MRI) data.


    Our Team: Jeff Yeh
    Faculty Mentor(s): Diane Woodbridge
    Company Liaison(s): Jesper Frederiksen, Gabriele Fusta

    Project Outcomes: The team implemented a data pipeline using the Kafka ecosystem to extract, process, and visualize data from Salesforce.

    W.L. Gore

    Our Team: Ashwani Rajan, Harshit Singh, Tanjin Sharma
    Faculty Mentor(s): Daniel O’Connor
    Company Liaison(s): Gen Gurczenski, Sharna Sattiraju, Vasudevan Venkateshwaran

    Project Outcomes: The team improved upon an internal PyTorch-based deep learning package to incorporate preprocessing pipelines and model architectures to support image segmentation tasks on microscopy and microCT data. The team used this package to build semantic segmentation workflows for histology and 3D-polymer images. Finally, the team refactored existing code to make use of PyTorch Lightning in order to increase usability, reproducibility and readability.

    Walmart Labs

    Our Team: Yanan Cao, Lawrence Lin
    Faculty Mentor(s): Diane Woobridge
    Company Liaison(s): Louise Lai

    Project Outcomes: The team implemented machine learning models to recommend grocery repurchases at Walmart’s e-commerce website. Additionally, the team developed a deep learning model for time-aware sequential recommendations.

  • ACLU Criminal Justice

    Our Team: Qianyun Li

    Goal: At the ACLU, the student identified potential discrimination in school suspensions by performing feature importance analysis with machine learning models and statistical tests.

    ACLU Micromobility

    Our Team: Max Shinnerl

    Goal: At the ACLU, the student analyzed COVID-19 vaccine equitable distribution data. They developed interactive maps with Leaflet to visualize shortcomings of the distribution algorithm and automated the cleaning of legislative record data. They also developed a pipeline for storing data to enable remote SQL queries using Amazon RDS and S3 from AWS.


    Our Team: Suren Gunturu

    Goal: At AWS, the student employed machine learning techniques to interpret user natural language questions to SQL queries. They did this by interpreting features such as database information and input questions and mapped them to queries. They read available architecture on the topic and implemented them both from scratch using a Seq2Seq architecture as well as calling HuggingFace pretrained transformers for this task.


    Our Team: Sophie Wang, Eriko Funasato

    Goal: Students at Bold developed an end-to-end machine learning pipeline using Python’s Scikit-learn to classify churned customers. They also presented feature importance from the model to aid decision making. After being deployed in production, the pipeline increased the customer retention rate. Their work also included collaboration with the customer success team and performing A/B testing on email campaigns.


    Our Team: Veeral Shah, Ricky Zhang

    Goal: At Boost, students built and deployed a logistic regression pipeline to dynamically predict college basketball in-game win probability using Python and PostgreSQL. They established novel metrics for efficiency, excitement, and tension by analyzing mean, variance, and volatility trends of in-game win probability output.


    Our Team: Nicolas Decavel-Bueff, Taince Tan

    Goal: Students at engineered and integrated machine learning techniques to perform NER as a tool to better collect and preprocess data. On another project, they worked on creating a content-based recommendation system to help identify competitors.


    Our Team: Zhimin Lyu, Victor Palacios, Daniel Carrera

    Goal: At Cerenetics, students developed and deployed a Python multi-threading application for a brain functional MRI data preprocessing pipeline (DICOM- BIDS - normalized time series) to extract voxel signals and predict the presence of mental health disorders. They also created and implemented a novel Iterative Spectral Clustering algorithm for brain functional MRI voxel clustering.

    Our Team: Emre Okcular, Yue Zhao

    Goal: Students at applied machine learning to website ad clicks and inner clicks data using Python's Scikit-learn and Matplotlib for visualization.

    Electronic Arts

    Our Team: Kexin Wang, Wenyao Zhang

    Goal: At Electronic Arts, students built an anomaly detection process with supervised models (2D CNN) and improved model robustness with an unsupervised algorithm (Autoencoder) using Keras.


    Our Team: Yihong Shen, Jordan Uyeki

    Goal: Students at Eventbrite used SQL and Python to compare revenue opportunities across different creator segments and to better understand creator behavior over time. They also compared various methods for event recommendation systems (collaborative filtering, networks, ERGM models, etc).


    Our Team: Zixi Luo

    Goal: At Facebook, the student worked on the Facebook Community Product Group team to understand how businesses use Facebook groups. Their ultimate goal was to build a machine learning model to predict Facebook groups run by businesses and understand how they can improve the user experience.


    Our Team: Flora Chen, Hsuan-Yu Lin

    Goal: At Jumio, students conducted EDA on identify thresholds that were effective at catching financial fraud. On another project, they built a flask app and set up modeling endpoints on AWS.


    Our Team: Shiqi Tao, Rahul Bethavalli

    Goal: Students at LaHaus employed NLP and deep learning techniques to identify description quality using Python. They also conceptualized and developed a suggestion system to recommend the most relevant custom page tags for real estate listings using a probabilistic random forest model. This resulted in an increase in the click-through rate by 70% post-deployment in production. On another project, they worked on improving the existing image captions for listings and leveraged zero-shot transfer learning of CLIP from OpenAI to generate qualitative and diverse captions. They implemented the end-to-end production pipeline using AWS, Pytorch, openAI, and Airflow.


    Our Team: Ye Tao, Michelle JanneyCoyle

    Goal: At LexisNexis, students used machine learning techniques to perform legal analytics and conducted a deep learning model for a classfication and text generation task. Additionally, they used matrix factorization to build a recommendation system in Python, and on another project they built a deep learning NLP API accessed by distributed spark job.


    Our Team: Catie Cronister

    Goal: At MedStar, the student built a deep learning model to predict the proper radiology protocol that a physician would prescribe and authored a paper based on their work.


    Our Team: Weronica Green, Huidon Xu

    Goal: Students at Metromile built and deployed a deep learning-based end-to-end computer vision system to identify vehicle quality issues using Resnet in PyTorch. They used the model predictions to run statistical analysis on various business metrics using SQL and Python. Lastly, they created an app that allows stakeholders to interact with the model predictions.

    Metropolitan Transportation Commission

    Our Team: Okeefe Niemann, Danh Nguyen

    Goal: At the Metropolitan Transportation Commission, students created data pipelines to both organize and quality check jurisdiction entries. In addition, they created and fine-tuned deep learning models to classify buildings into zones.

    New York Mets

    Our Team: Moh Kaddoura, Trevor Santiago

    Goal: Students at the New York Mets created an outfield defense model using multivariate distributions, powerful classifiers (RF and XGboost) and clustering. They also used SciPy and NumPy to create a matchup model that accurately predicts success rates for a certain batter against a certain pitcher, or vice versa.

    Novi Connect

    Our Team: Vaishnavi Kashyap, Phillip Navo, Sandhya Kiran Reddy Donthireddy

    Goal: At Novi, students engineered a pipeline to automate extraction of applicable columns from Excel files using Pandas and FuzzyMatch. Additionally, they conducted funnel analysis to understand customer engagement with the company platform. On another project, they leveraged Google Data Studio and Google Analytics and powered web analytics dashboards with high-level Business metrics and user engagement.


    Our Team: Tian Qi, Matthew Hui

    Goal: Students at PG&E conducted exploratory data analysis to discover power outage patterns and employed machine learning techniques in order to identify assets that experience high risk events in the future using Python, SQL, AWS and Plantir Foundry.


    Our Team: Audrey Barszcz

    Goal: At Phylagen, the student utilized multiple machine learning models along with Shap feature importance to identify a subset of features that were the most predictive for classifying an outcome. On another project, they trained embeddings using a GloVe neural network model on genetic sequences.

    Pocket Gems

    Our Team: Yi Huang, Siwei Ma

    Goal: Students at Pocket Gems used reinforcement learning to build a dragon agent that flies, follows and attacks in unity. They also developed a search engine and web server from scratch with NLP techniques.

    Propeller Health

    Our Team: Noah Matsuyoshi

    Goal: At Propeller Health, the student predicted early life failures of sensors for medical device monitoring using Redshift (SQL) and Python.


    Our Team: Yueling Wu, Hashneet Kaur

    Goal: At Ranker, students prototyped a video recommendation engine using LightFM’s collaborative filtering model based on users' implicit feedback on various website events such as trailer viewed or item clicked / added to watchlist. On another projects, they generated a script to minimize the "position on list" bias issue using descriptive statistics and SQL to increase reliability of crowdsourced lists, performed audit on the current ranking algorithm, and identified discrepancies for the engineering team to resolve. They also identified trending shows by scraping data from Twitter, applying NLP techniques (e.g., parts of speech (POS) analysis, fuzzy string matching and sentiment analysis) and leveraging number of tweets and sentiment score.


    Our Team: Amee Tan, Shruti Roy 

    Goal: Students at Recology automated sequencing of garbage pickup using telematics data, DBSCAN Clustering and Haversine Distance calculation in Python. On another project, they predicted garbage collection time using XGBoost and Isolation Forest.


    Our Team: Lucia Page-Harley, Maruo Napoli

    Goal: At Reddit, students built a time series forecasting dashboard to understand and predict different video metrics. On another project, they performed analyses using SQL and Python visualizations to understand the German user-base at Reddit and planned/analyzed experiments to improve their product experience.

    Stanford Graduate School of Business

    Our Team: Kaiqi Guo

    Goal: At the Stanford Graduate School of Business, the student explored different approaches such as BERT to detect and correct error in digitization of historical documents.

    Stanford Medicine

    Our Team: Daniel Blessing, Victor Nazlukhanyan

    Goal: Students at the Stanford Medicine Department of Radiology conducted deep learning research and implemented computer vision methods to synthetically produce contrast-enhanced MRI images. Architectures included generative adversarial networks and U-Nets.

    Our Team: Anni Liu, Aneri Dand

    Goal: Students at employed machine learning techniques to forecast sales for Syrup's retailer clients. They used Jinja3 and Plotly to build dashboards for tracking metrics, providing insights to retailers, as well as logging the results of machine learning experiments.

    The Schmidt Family Foundation 11th Hour mBio Project

    Our Team: Elyse Cheung-Sutton, Yingtong Lin, Eileen Wang, Remi LeBlanc 

    Goal: Students at the Schmidt Family Foundation's 11th Hour mBio project built web scrapers used on websites for African GMOs, IRS financial data, and news articles and created visualizations displaying the scraped information. They built a website to serve the analysis results using React and Django and trained a language model using and Pytorch to support classification of African news articles. In order to serve information about the uses of agricultural biotechnology, they also consolidated data into one central hub to serve through a web application and deployed this containerized web application with Docker.

    UCSF Brain Networks Laboratory

    Our Team: Christabelle Pabalan

    Goal: At UCSF, the student used computer vision and deep learning techniques, including multitask learning and ensemble learning, to predict cognitive scores for Alzheimer's patients.

    UCSF Department of Radiation Oncology - Brain Metastasis

    Our Team: Berkay Canogullari, Tianxiang Zhou

    Goal: Students at UCSF predicted the outcome (local failure and patient survival) for large brain metastasis treated with radiation. The project consisted of performing tumor segmentation using deep learning followed by extraction of imaging features for prediction of treatment outcomes.

    UCSF Department of Radiation Oncology - Prostate Cancer

    Our Team: Jared Mlekush, Shuyan Li, Dashiell Brookhart, Min Che

    Goal: Students at UCSF worked with physicians to predict the likelihood of success of salvage radiation treatment to help oncologists determine treatment options for prostate cancer patients. They utilized logistic regression, Cox Proportional-Hazards models, and feature importance analysis to create Kaplan-Meier estimators for patients. They also analyzed physician’s notes to create a predictive model for determining diagnostic error using techniques from Natural Language Processing (NLP) including Bag of Words and Word2vec and Machine learning models such as Random Forest, XGBoost, and Logistic Regression.

    UCSF Department of Radiation Oncology - Spinal Metastatic Cancer

    Our Team: Evan Chen 

    Goal: At UCSF, the student engaged in medical image preprocessing and deep learning (image segmentation) utilizing Python, SQL, Linear/Logistic Regression, more advanced Machine Learning, and Radiation Oncology treatment planning software.

    UCSF Department of Radiation Oncology - Auto-Planning Radiosurgery

    Our Team: Sicheng Zhou, Christopher Pang

    Goal: At UCSF, students built a data pipeline to automatically generate datasets for cross-validation by pulling samples from main dataset. They developed deep learning solutions to generate high quality synthetic x-ray images from Digitally Reconstructed Radio-graphs (DRRs) images using Cycle-Consistent Generative Adversarial Networks (CycleGAN), which improves middle frequency power, an image quality score, by 20% on average compared with baseline Histogram Matching. This model could improve real-time x-ray imaging tracking during radiation therapy. They also visualized and compared synthetic x-ray images and Fourier Analysis results using customized HTML and Jinjia templates with Flask framework and presented the results to principle investigators.

    UCSF Division of Hospital Medicine - Hospital Stays

    Our Team: Patrick Poon, Boliang Liu

    Goal: Students at UCSF collaborated with UCSF researchers to feature engineer and query patient's information using SQL and Spark. With the data, multiple machine learning models were used to forecast the need of the administration of antibiotics for these patients in 2-3 days using information from the first 24 hours utilizing Logistic Regression, Random Forest, XGBoost, and neural networks in PyTorch.


    Our Team: Efrem Ghebreab, Anawat Putwanphen

    Goal: Students at Virgo developed a classification system for Ulcerative Colitis and Crohn's Disease utilizing deep learning and video image processing techniques.

    W.L. Gore & Associates - Project 1

    Our Team: Youchen Zhang, Kristofor Johnson 

    Goal: Students at W.L. Gore & Associates deployed Deep Learning Computer Vision techniques with Python's PyTorch package to segment microscopic images. They also built a Python package for internal deployment to easily train new models and architectures on different hyperparameters.

    W.L. Gore & Associates - Project 2

    Our Team: Grant Phillips, Stephen Embry

    Goal: Students at W.L. Gore developed deep learning models to perform image classification, image segmentation, and keypoint detection on cornea image datasets using PyTorch.

    W.L. Gore & Associates - Project 3

    Our Team: Luke Thomas 

    Goal: At W.L. Gore, the student built a table extraction and merger system leveraging an AWS service for OCR, and IPython Widgets as a GUI.


    Our Team: Zachary Dougherty

    Goal: At Wanamaker, the student developed architecture for analyzing and preprocessing Google Analytics data through a Markov chain attribution model.

    Washington State University Basketball

    Our Team: Kyle Brooks, Joshua Majano

    Goal: Students at Washington State University utilized web scraping technologies to scrape international league data to be utilized in a model to predict an international player's projected performance in the NCAA. Additionally, they built out models to predict the same performance metric for NCAA transfer players.

  • ABC News

    Our Team: Daren Ma, Ming-Chuan Tsai, Haree Srinivasan

    Goal: Students at ABC News used Python to write a machine learning model to predict election results and used Docker and AWS to deploy the pipeline.

    Accountability Counsel

    Our Team: Jacob Goffin

    Goal: At Accountability Counsel, Jacob created web-scraping scripts in Python & Selenium to build a first-of-its-kind database of human rights complaints. He also built a document-search (using Django/ElasticSearch) on thousands of .pdf documents, allowing users to quickly find relevant human rights cases to support their research.


    Our Team: Ivette Sulca, Hoda Noorian

    Goal: Students at Airbnb developed an evaluation tool prototype that identifies socioeconomic bias on Airbnb algorithms and experiments. They analyzed past A/B tests and built a dashboard using Python and Superset.


    Our Team: Esther Liu, Jack Dong

    Goal: At Beam Solutions, students used machine learning techniques to classify transaction data and perform text clustering. They also worked on industry research and database mapping for potential new customers.


    Our Team: Hannah Lyon

    Goal: At Cuyana, Hannah used Markov chains to develop a data-driven marketing attribution model that informed marketing spend. She created a customer propensity model using gradient boosting to determine critical site features that were then enhanced by the digital team to improve conversion. Additionally, she combined SQL and Tableau data for ad-hoc analysis of payment methods, trained neural networks to produce product embeddings used for a recommendation system on website product pages, and modeled repeat purchaser behavior predicting second purchases.


    Our Team: Maxine Liu, Zhentao Hou

    Goal: Students at Eventbrite built a classifier and a deep learning model to improve event recommendations. They also researched cases for and against investing in online events from the perspectives of opportunity size, product data, and potential revenue impact. On another project, they analyzed text data with NLP libraries to identify features that are indicative of event listing quality.


    Our Team: Kevin Wong

    Goal: At Faire, Kevin developed a SQL-based outlier flagging mechanism. Additionally, he conducted a deep-dive analysis of the effectiveness of the Faire mobile app on retailer behavior using SQL, python, statistics, and propensity-score matching.


    Our Team: Peng Liu, Wenjie Duan

    Goal: Students at FLYR developed a SQL/python workflow that predicted flight revenue by finding similar flights with clustering and Random Forest models.


    Our Team: Vivian Chu

    Goal: Vivian worked with FracTracker on the collection and aggregation of oil and gas data for the state of California, before conducting production analysis of oil wells at the pool level. Financial data was then added to predict the status of each of the oil wells as an asset or liability.

    Golden State Warriors

    Our Team: Kyrill Rekun, Xueying Li

    Goal: At the Golden State Warriors, students used machine learning techniques to create a last-minute ticket buyer model that predicts the probability of a person being a last-minute, planner, or in-between buyer. Using the lifetimes Python package, they built a proxy lifetime value spend model for customers to aid in marketing and ticket targeting. These projects utilized tools such as Pandas, Seaborn, and sklearn.

    Gore Medical

    Our Team: Peng Liu, Wenjie Duan

    Goal: Students at Gore Medical developed PyTorch CNN models using the API to detect key points in medical optical coherence tomography images, thus allowing for automated assessment of an implant. They achieved these results using transfer learning and data augmentation.


    Our Team: Ariana Moncada, Matthew Sarmiento

    Goal: At Hohonu at the University of Hawaii, students created a tidal forecasting pipeline that helps populate a Django web application and Plotly plots for forecasts. They clustered multiple time series datasets together to increase the performance of their multivariate time series models in R and Python.

    Human Rights Data Analysis Group (HRDAG)

    Our Team: Bing Wang

    Goal: At the Human Rights Data Analysis Group (HRDAG), Bing gleaned critical location of death information from unstructured text fields in Arabic using Google Translate and Python Pandas, adding identifiable records to Syrian conflict data. She wrote R scripts and bash Makefiles to create blocks of similar records on killings in the Sri Lankan conflict to reduce the size of search space in the semi-supervised machine learning record linkage (database de-duplication) process.


    Our Team: Shreejaya Bharathan, Geoffrey Hung

    Goal: Students at Manifold developed a Python library that utilizes machine learning and deep learning to solve for the parameters of dynamical systems defined by differential equations using PyTorch, Docker and MLFlow.


    Our Team: Matthew King, Lin Meng

    Goal: At Metromile, students created a crash classification model to predict the primary point of impact during a collision using telematics data collected from customers. On another project, they used deep learning to classify images of fraudulent cars.

    New York Mets

    Our Team: Rushil Sheth

    Goal: At the New York Mets, Rushil created infield and outfield shift models using multivariate distributions, powerful classifiers (RF and XGboost) and clustering.

    Metropolitan Transportation Commission (MTC)

    Our Team: Kamron Afshar, Michael Schulze

    Goal: Students at MTC used deep learning to train a Neural Net Image Classifier on images of buildings to classify their use. They generated the data set using Google API. They also built a Selenium crawler data pipeline that scrapes legal codes and collected them in a Redshift database to track changes.


    Our Team: Lisa Chua, Shane Buchanan

    Goal: At NakedPoppy, students improved the recommendation system for new customers by incorporating content-based and collaborative filtering trained on clickstream data. They used NLP techniques to extract key aspects from Google reviews and implemented feature-based opinion mining on product reviews to assist in the scoring of new products. Later, they conducted market basket analysis on transaction data to provide customers with “pair with” recommendations and increase engagement.

    Baltimore Orioles

    Our Team: Collin Prather

    Goal: At the Baltimore Orioles, Collin implemented a Deep Recurrent Survival Analysis model (LSTM in PyTorch) to predict the probability that an American League manager will remove their pitcher using in-game time series data. Another prominent project was developing a model to predict relief pitchers’ level of fatigue, then deploying a containerized (Docker) web application on AWS to host the model and explanatory visualizations to communicate the analysis to key stakeholders in the Orioles front office.


    Our Team: Kathy Yi, Sean Sturtevant, Jingwen Yu, Nithish Kumar Bolleddula

    Goal: Students at PG&E used SQL, Python and AWS Sagemaker to employ machine learning techniques to predict whether or not a PG&E asset is likely to experience a failure. On another project at PG&E, students built computer vision models on drone imagery to identify defects in power grid lines.


    Our Team: Nicholas Parker, Mundy Reimer

    Goal: Students at Phylagen worked on projects with data from microbiome samples and laboratory processes that involved software development, data analysis, and machine learning.

    Pocket Gems

    Our Team: Qingmengting Wang, Tian (Arthur) Qin

    Goal: At Pocket Gems, students completed two NLP projects using LSTM and Dialogflow.

    Propellor Health

    Our Team: Andrew Eaton, Xuxu Pan

    Goal: Students at Propellor Health built a Random Forest model to predict how long it would take to solve a customer support ticket using word embeddings from the ticket texts and a Continuous Bag of Words (CBOW) model. They also published live dashboards with information on ticket counts and complaint rates on a Tableau Server.


    Our Team: Yunzheng Zhao, Shishir Kumar

    Goal: At Recology, students used linear regression to generate route statistics and service time estimation from GIS and trash collection data. They also analyzed routing data and identified anomalies in the reporting and data-capturing process.


    Our Team: Kevin Loftis, Esme Luo

    Goal: Students at Reddit worked on graph-based subreddit community detection. They developed a subreddit graph based on user view overlap and performed community detection on graph to cluster similar subreddits using Python and NetworkX. This doubled the subscription rate of subreddits compared to the existing system. On another project, they worked on a streaming feature extraction pipeline where they ​architected and developed a Flink streaming data processor in Scala using Docker, Flink, Kafka, Circle CI, and Kubernetes.


    Our Team: Meng Lin, Hao Xu

    Goal: At Reputation, students used entity matching in deep learning for matching addresses and performed topic modeling to analyze topic trends in reviews.

    Salk Institute for Biological Sciences

    Our Team: Alaa Abdel Latif, Annette (Zijun) Lin

    Goal: Students at the Salk Institute for Biological Studies built super-resolution deep learning models using and PyTorch.

    Sparta Science

    Our Team: Sunny Kwong

    Goal: At Sparta Science, Sunny worked on improving the reliability of balance tests by performing multiscale entropy analysis with R and Python on force plate scans.

    Specialty's Cafe & Bakery

    Our Team: Jiaqi Chen, Sakshi Singla

    Goal: At Specialty's Cafe & Bakery, Jiaqi performed revenue forecasting employing time series analysis and EDA and also worked on building a recommendation engine using machine learning.

    Stanford Graduate School of Business

    Our Team: Jingxian Li

    Goal: Students at the Stanford Graduate School of Business cleaned SEC 10-K documents and built word2vec models based on this corpus. They also came up with different ways to evaluate models and learned to use the BERT model.


    Our Team: Lea Genuit, Alan Flint

    Goal: At Trulia, Lea employed deep learning techniques using Pytorch to identify rotated scanned documents by a factor of 90 degrees. She also implemented an improvement of the current solution (Tesseract, an OCR engine) by working on a patch of the image using Python. Then, she compared the results of Tesseract and the CNN models. On another project at Trulia, Alan built a power analysis tool in Python for Trulia's A/B testing platform. This entailed coding and deploying an ETL pipeline and designing an interactive application using Streamlit. His second project involved employing an interpretable machine learning model to identify site features that influence positive outcomes for interested home buyers.


    Our Team: Dillon Quan

    Goal: At TruStar, Dillon built parsers to normalize data ingested into the data lake to centralize samples into one format for predictive analytics usage downstream using Spark and Scala. His second project focused on analyzing URLs and how to generate scores to determine their level of maliciousness using Python and Pytorch.

    UCSF Brain Networks Laboratory

    Our Team: Qingyi Sun, Akanksha

    Goal: Working with the Brain Networks Laboratory at UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), students focused on characterizing diseases, such as Autism and Alzheimer’s disease, making diagnosis and prognosis from multi-channel brain Magnetoencephalography (MEG) data. They built an LSTM (Long Short-Term Memory) model using PyTorch to analyze brain MEG data and extract information to make predictions on characteristic parameters of interest. On another project, they worked on pretraining 3D Convolutional Neural Networks with brain MRI data. The models were pretrained using a segmentation task.

    UCSF Bakar Computational Health Sciences Institute

    Our Team: Linqi Sheng

    Goal: Working with UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), Linqi built an LSTM (Long Short-Term Memory) model using PyTorch to analyze brain MEG data, extract information, and make predictions on characteristic parameters of interest.

    UCSF Radiation Oncology Department the Wicklow AI in Medicine Research Initiative (WAMRI)

    Our Team: Roja Immanni

    Goal: Working with the UCSF Radiation Oncology Department, Roja found that medical image datasets are fundamentally different from natural image datasets in terms of the number of available training observations and the number of classes for the classification task. She hypothesized that compared to architectures used for natural images, those needed for medical imaging can be simpler. She proposed smaller architectures and showed how they perform similarly while significantly saving training time and memory. This is joint work with Gilmer Valdes at UCSF.

    UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI)

    Our Team: Zachary Barnes

    Goal: Working with UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), Zachary used UCSF's Spark environment for EHR data to create a data set, generate labels for hospital acquired sepsis patients, and create prediction models using sklearn and Pytorch.

    UCSF Morin Lab and the Wicklow AI in Medicine Research Initiative (WAMRI)

    Our Team: Sihan Chen

    Goal: Working with the Morin Lab at UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), Sihan built a 3D Residual U-net to precisely segment metastases from brain MRI images with PyTorch. He evaluated the effects of number, size, and locations of metastases on the accuracy, which has resulted in a scientific conference presentation and a manuscript and helped UCSF design a state-of-the-art model.

    Vasant Lab at UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI)

    Our Team: Shrikar Thodla

    Goal: Working with the Vasant Lab at UCSF and the Wicklow AI in Medicine Research Initiative (WAMRI), Shrikar worked on multiple projects. These included using deep learning to segment and classify medical images, attempting to generate 3D images from multiple 2D image views, leading migration of full-stack components from GCP to IBM, detecting accidental rotations in images using CNNs built in PyTorch, and optimizing code to read images from a database.

    United Health Care

    Our Team: Srikar Murali, Sean Tey

    Goal: Students at United Healthcare cleaned and processed millions of insurance claims transactions with SQL and did hypothesis testing on demographics-related data. On another project, they predicted members who are likely to be hospitalized in the near future as part of a system for identifying administratively complex members with a Gradient Boosting Trees model using the CatBoost library.


    Our Team: Andrew Young, Charles Siu

    Goal: At Valimail, students tackled the problem of classifying a backlog of 100K+ unknown internet domains generated by Valimail Defend. They developed an end-to-end machine learning pipeline that classifies trusted domains by detecting whether they belong to low-risk categories such as real estate. The Gradient Boosting Machine (GBM) model achieved a 95%+ precision rate with test data when classifying real estate domains using Natural Language Processing (NLP) for web content analysis. On another project, they designed and implemented REST APIs using Flask in Dockerized modules in the pipeline and built web scrapers using BeautifulSoup to gather multiple external data sources for ML model training.


    Our Team: Mikio Tada, Stephanie Jung

    Goal: Students at Virgo developed a Python script to extract data frames from 120 hours of video. They used Google AutoML to train deep learning models to automate video recording during endoscopic medical procedures and to develop an automatic procedure type tagging system. On another project, they built a prototype object detection tool for real-time polyp tracking during a colonoscopy using CVAT for data labeling and Google AugoML to train the deep learning model.

    Walmart Labs

    Our Team: Samarth Inani, Akansha Shrivastava

    Goal: At Walmart Labs, students developed an image inpainting tool to remove occlusions from high-resolution furniture images using partial convolutions. They also worked on a research-oriented project to enhance the color detection algorithm to improve the accuracy of the color attribute in the product description of furniture listed on using Pytorch and Open-CV.

    Wicklows AI in Medicine Research Initiative (WAMRI) and Medstar Georgetown University Hospital

    Our Team: Max Calehuff, Xintao (Todd) Zhang, Wendeng Hu

    Goal: Students working with the Wicklow AI in Medicine Research Initiative (WAMRI) and MedStar Georgetown University Hospital used NLP to create an automated grading program for medical student imaging reports.


    Our Team: Andy Cheon, Aakanksha Nallabothula Surya

    Goal: At Zyper, students built and deployed an image classification convolutional neural network (CNN) with PyTorch to help brands efficiently recruit fans with desired aesthetic types on social media. They applied feature importance methods using machine learning in Python to identify top factors that drive engagement rates of user-generated content. They also developed a user location prediction pipeline using NLP tools (NLTK, spaCy) to improve upon the existing location predictor and discovered and visualized trends from group chat content from 15 brand communities using mainly Pandas and ggplot.

  • Aleinvault

    Our Team: Sankeerti Haniyur

    Goal: On this project, the student employed deep learning & NLP techniques to automatically tag cybersecurity documents. She then built a named entity recognition model to detect indicators of compromise in the documents.

    Beam Solutions

    Our Team: Darren Thomas, Liying Li

    Goal: Students employed NLP techniques in Python for name recognition and used Pytorch and an LSTM to detect fraudulent transactions. On another project, scraped data using restful API, creating an application using Flask in Python. They also applied unsupervised machine learning models to build clustering and anomaly detection models using Python.

    General Electric

    Our Team: Benjamin Khuong, Ziqi Pan

    Goal: Students worked on an object detection project to detect defects in CT scans of machine parts. Their project was focused on designing computer vision based solutions for automatic defect-detection on industrial devices. They implemented state of the art deep learning algorithms such as Faster R-CNNs, R-FCNs, and 3D convolutional neural networks.

    Bolt Threads

    Our Team: Wenkun Xiao, Nicole Kacirek

    Goal: Students worked closely with the marketing team to optimize campaign messages by applying NLP and machine learning techniques to competitors’ product reviews and social media posts. They also built and productionised a CLTV (customer lifetime value) and revenue prediction model which was put into production.

    Check Point/Dome9

    Our Team: Brian Chivers, Evan Liu

    Goal: Students developed an unsupervised learning algorithm to detect anomalies in AWS network traffic.

    Our Team: Rebecca Reilly, Minchen Wang

    Goal: Students focused on increasing revenue using topic modeling, employing Python and the spaCy library to discover industry relationships using advertiser behavior. They employed machine learning technologies to predict online ad prices and identify important features. On another project, they created an NLP classifier to correctly identify acceptable and appropriate sentences.


    Our Team: Nan Lin, Lance Fernando

    Goal: Students built machine learning models to predict the LTV (lifetime value) of customers. On another project, they deduplicated over 5 million venue addresses using fuzzy string similarity metrics and a HMM, then utilized this data to create a search ranking method to recommend venues to event creators.


    Our Team: Aditi Sharma, Zhi Li

    Goal: Students built a content-based recommendation system for cars and employed auction price prediction.


    Our Team: Byron Han, Yuhan Wang

    Goal: Students used SQL to extract data from AWS, then employed NLP techniques to build a text classification pipeline.


    Our Team: Connor Swanson

    Goal: The student built anomaly detection systems in Python for environmental data. He also built time series forecasting models to predict future environmental shifts and built dashboards to host their findings.


    Our Team: Tyler Ursuy, Anush Kocharyan

    Goal: Students classified each Kiva partner into risk categories by implementing a Random Forest risk detection model that monitors the financial, geographic, and economic information of Kiva’s global partners. They also built an interactive online dashboard to provide easy access to data analyses, data visualizations, and model predictions which will help Kiva reduce the amount of time and money spent on manually inspecting partner information and conducting scheduled in-person visits.

    KWH Aanalytics

    Our Team: Hongdou Li, Zhe Yuan

    Goal: Students employed machine learning techniques to predict solar panel performance across the country and provided business inference.


    Our Team: Hai Le, Jon-Ross Presta

    Goal: Students automated the data generation process for a dashboard with a Python script. They also trained an NLP model which takes the subject line, information about the app that sends the email, and information about the recipient segment to predict email open rates using PyTorch. On another project, the students used Python/PyTorch to build an NLP model to predict user engagement based on message content.

    Manifold AI

    Our Team: Edward Richard Owens, Prakhar Agrawal 

    Goal: Students created a system that optimizes the operation of HVAC systems by detecting the stabilization of building temperature from sensor data. On another project, they built a golf simulator with the model utilizing a video of a person hitting a golf ball and outputting the ball’s trajectory using machine learning and physics. They employed methods and architectures such as background removal, darknet (YOLO) and optical flow for computer vision.


    Our Team: Shivee Singh, Xiao Han

    Goal: Students used machine learning and deep learning to identify microplastics in the ocean water using OpenCV Python and PyTorch. Their main focus was to build object detection models trying to locate microfibers from underwater images to approximate the total volume and distribution of microfibers in the ocean.


    Our Team: Christopher Olley, Wei Wei

    Goal: Students used machine learning and deep learning to identify drivers based on their telematics data (speed and acceleration). On another project, the students extracted events and created features based on this data to train tree based models using Python. They extracted labeled trip data from SQL and Amazon S3 storage and built the ML/DL models to identify users using Python and SQL.


    Our Team: Sarah Melancon, Brian Wright

    Goal: Students used Python and Spark to combine and aggregate add-on related data from a variety of data sources into a single data source. They also built a dashboard based on this data source using Redash. The students built an ETL pipeline that aggregated several data sources into one combined dataset.

    Metropolitab Transportation Commission

    Our Team: Jacques Sham, Quinn Keck

    Goal: Students built a data lake on AWS, involving S3 and Redshift, using tools available in the market (Trifacta and Python). On another project, they analyzed Clipper and FasTrak data, tracked key performance indicators, and built dashboards. They developed machine learning and times series models to predict daily Clipper Card usage within 4%.

    Delta Analytics

    Our Team: Chong Geng

    Goal: The student developed metrics to define the success of the product in terms of user engagement and answering efficiency. He also applied NLP techniques to upgrade the recommender system and built a dashboard to visualize the results.

    Naked Poppy

    Our Team: Nina Hua, Donya Fozoonmayeh

    Goal: Students employed machine learning for product recommendations and used PySpark to apply a model in a distributed environment. They also implemented machine learning techniques to classify skin color from an image and worked a recommendation system to improve user experience.

    Orange Silicon Valley

    Our Team: Evan Calkins, Jinghui Zhao, Ran Huang

    Goal: Students developed an algorithm to support targeted marketing campaigns, which identifies similar mobile users based on their location patterns. They built an n-gram language model for the African language of Wolof to improve functionality of a chatbot using Python. On another project, they calculated relative store location optimality by comparing user movements and travel patterns using a large dataset (4TB) of mobile user information processed on a 9-node Spark cluster.

    Pacific Electric and Gas Company

    Our Team: Gokul Krishna Guruswamy, Louise Lai

    Goal: Students used PyTorch to train deep learning object detection and classification models to identify faults in equipment and to detect small-scale objects in millions of large drone images. They worked extensively in AWS cloud environment (EC2, S3, lambda, SageMaker, etc.) to productionize these models.


    Our Team: Paul Kim, Katja Wittfoth

    Goal: Students used deep learning techniques to identify different types contaminants in waste bins. They also automated identification of contaminants in complex images of waste bins by developing a multi-label image classification model using deep learning, Pytorch, Python, and AWS.

    Recology (Routes)

    Our Team: Xu Lian, Philip Trinh

    Goal: Students built a machine learning model to predict a truck's accident occurrence using Sklearn. They used data analytics and machine learning methods to provide policy recommendations on how Recology can increase safety when collection drivers are out in the city. They also merged sheets from different sources using Pandas and PySpark.


    Our Team: Yixin Sun, Julia Amaya Tavares

    Goal: Students built a machine learning pipeline on Airflow to estimate subreddit retention ability. They used Python spaCy package to build a small tool to extract keywords from post comments. On another project, they used TensorFlow to create a multi-label classifier for post titles, and SQL / Pandas for data acquisition and pre-processing.

    Our Team: Randy Ma, Xi Yang

    Goal: Students developed a review sentiment classifier using a deep learning model with LSTM and Self-Attention to improve reputation assessment (Python, PyTorch). They extracted customer concerns by building a multi-gram keyword extraction tool using syntactic dependency analysis. They also built an automated operational insight reporting tool (SQL, Python) to assess strengths & weaknesses of the client’s user experiences.

    San Francisco County Transportation Authority

    Our Team: Crystal Sun, Marwa Oussaifi

    Goal: Students created web-based visualization tools for presenting the number of accessible jobs and trip patterns within San Francisco with D3.js. They automated complex data preprocessing and data pipelines to accommodate different scenarios when collecting, processing and piping the data using python. On another project, they implemented different ML algorithms to predict auto ownership per household.

    Our Team: Xinran Zhang, Zitong Zeng

    Goal: Students developed a Scala notebook to help the customer service team analyze user-retention metrics such as DAU and Return Retention. They provided an anonymization routine for sensitive impressions and events data using Spark UDF and Murmurhash3. They explored alternatives to traditional parametric tests to improve the performance credibility of A/B test analysis. They also researched and implemented outlier detection methods in Scala.


    Our Team: Xinke Sun, Jyoti Prakash Maheswari

    Goal: Students used SQL to track KPIs and built tables to store daily metrics using Python. The students applied deep learning techniques to understand the content of real-estate listings consisting of images and text and to predict lead submission.

    Trustar Technology

    Our Team: Viviana M. Peña-Márquez, Neha Tevathia

    Goal: Students built an NLP model to identify the malware names using CBOW model and leveraged the open source data from Twitter. They used Pytorch to build the CBOW model. Created and implemented pipeline to automatically collect tweets using Twitter’s API, applied machine learning and natural language processing algorithms to detect entities, and feed daily detections to a dashboard.


    Our Team: Tian Qi, Jessica Wang

    Goal: The students deployed a machine learning pipeline to predict the paid users within the next two weeks using Python and SQL. In another project, the students predicted short term purchase using Python.

    UCSF Department of Neurology (Neuroscape Lab)

    Our Team: Jenny Kong

    Goal: The student used machine learning with fMRI data to classify network patterns of concurrently activating brain regions that arise during successful high-fidelity memory retrieval.

    UCSF Department of Radiation Oncology (AI)

    Our Team: Miguel Romero Calvo

    Goal: The student employed deep learning techniques to improve the performance of Neural Networks in small data. He also conducted research on training and transfer learning methodologies.

    UCSF Department of Radiation Oncology (Computer Vision Lab)

    Our Team: Anish Dalal, Robert Sandor

    Goal: Students employed deep learning techniques in computer vision to accurately segment ventricles in the brain using Pytorch. On another project, they built a text classifier that predicts cancer patient survival from physician notes using Python, PyTorch, Bash, and FastAI.

    UCSF Department of Radiation Oncology (Quantitative Imaging Lab)

    Our Team: Alan Perry, Tianqi Wang

    Goal: Using Python, students employed deep learning techniques to make segmentation of different organs, to make dose volume diagnosis, and to achieve MRI to CT images transformation.

    UCSF Division of Cardiology (Arnaout Laboratory)

    Our Team: Max Alfaro, Divya Bhargavi

    Goal: Students built deep learning models to classify different views of echocardiograms. They performed exploratory data analysis to become familiar with medical terminology.

    Ultimate Software

    Our Team: Victoria Suarez, Harrison Mamin

    Goal: Students built recommender system to predict which matched candidates to job posting using Python, which improved recruiters' efficiency by 56%. They researched methods of detecting unconscious gender bias in performance reviews using word embeddings and neural networks. On another project, the students worked on two approaches to extract causal language pairs from text; one using a deterministic rule-based engine and one using a neural network, integrating them into a web-based UI using Flask.

    Under Armour

    Our Team: Adam Reevesman, Meng-Ting Chang

    Goal: Students built a rule-based algorithm to identify when a user finished a route but forgot to stop their tracker in the MapMyFitness app using Python. They also preformed functions related to EDA.

    United Health Care

    Our Team: Tomohiko Ishihara, Maria Vasilenko

    Goal: Students gathered user reviews on Personal Health Record apps on Apple App Store and Google Play Store and used Latent Dirichlet Analysis to try to see what app features users talk about most. They built models to predict whether a member is likely to get pregnant by creating a data set, performing feature engineering and building machine learning models. On another project, they collected user reviews from GooglePlay and Appstore and performed topic modeling (LDA) as implemented in Gensim.


    Our Team: Joy Qi, Jialiang Shi

    Goal: Students built machine learning classification models to identify lists of legitimate email domains versus fraudulent email domains. They employed machine learning techniques to classify whether an unknown domain is trusted or untrusted. On another project, they created scraping script to scrape social links on web pages.

    Valor Water Analytics

    Our Team: Yihan Wang, Jian Wang

    Goal: Students predicted water utility customer nonpayment with a Random Forest model and implemented the model in Python into Valor’s codebase. They segmented utility customers with K-means clustering to understand their behavior. On another project they applied multiple time series model for identifying malfunctioned water meters. They used SQL and Python to build end-to-end workflow for the project.

    Vida Health

    Our Team: Shulun Chen

    Goal: The student used SQL, Python, and Swagger to build data pipelines.

    Wiser Solutions

    Our Team: Ziyu Fan

    Goal: The student applied data science and machine learning techniques to forecast E-commerce retailer sales using Python. On another project, she used machine learning and NLP to find anomalies in product matching.

    Zume Pizza

    Our Team: Brian Dorsey, Fiorella Tenorio

    Goal: Students used Python, TensorFlow, and Time Series demand prediction models. They worked on a model to predict the probability of client purchases and a demand prediction model.

  • Capital One

    Our Team: Arpita Jena, Devesh Maheshwari, Alexander Howard

    Goal: Students employed NLP and deep learning techniques to classify sensitive information in Capital One's internal domain using Python.The result was wrapped in a Flask web app. Another project involved software engineering with the goal of automating Capital One's AWS authentication process.

    Cogitativo, Inc

    Our Team: Yiqiang Zhao, Gongting Peng

    Goal: Students employed machine learning methods to build a data pipeline for anomaly detection. They also used Python for data exploration.

    Delta Analytics

    Our Team: Stephen Hsu

    Goal: Students worked within a multidisciplinary team to offer data science services to a nonprofit organization. Specifically, students developed an NLP-based model in Python to classify forum posts so that forum questions could be appropriately matched with professionals who are best positioned to answer them.


    Our Team: Timothy Lee

    Goal: Students did data pipeline work using the Python API service. Their work involved classification of PDF files using Python XGBoost and the collecting of research data samples using Python.


    Our Team: Holly Capell Students at Eventbrite used machine learning in Python to model ticket sell-through rates in order to help the company identify platform features that drive event sell-out. They performed cohort analyses using Python to help understand the revenue life-cycle of Eventbrite customers and investigated seasonality in ticket sales, using SQL to query data and R to create data visualizations.

    Firest Republic Bank

    Our Team: Bingyi Li, Christopher Csiszar

    Goal: Students built a web-based system to classify municipal bonds in order to assure government compliance using Python and Flask. They used big data analytics, machine learning and clustering algorithms to automate the classification of the bank's municipal bond portfolio into High Quality Liquid Asset bonds. This work replaced the need for inefficient and costly external consultants to perform this task quarterly.


    Our Team: Yue Lan, Akshay Tiwari

    Goal: Students wrote SQL scripts to perform exploratory data analysis and built a data pipeline to ingest airline customer data. They also employed machine learning techniques to build and validate models using python to predict bookings and cancellations of airline tickets as part of the Flyr airline revenue management system They also worked on another project that used machine learning techniques to predict customer budget and price sensitivity.

    Houston Astros

    Our Team: Jake Toffler

    Goal: Students clustered individual pitchers' pitches by pitch type using level-set trees, a density-based clustering method, in Python.

    Isazi Consulting

    Our Team: Shikhar Gupta, Fei Liu

    Goal: Students used deep learning CNN techniques to identify diseases in chest X-rays.


    Our Team: Ting Ting Liu, Jose Antonio Rodilla Xerri

    Goal: Students employed machine learning techniques to identify relevant factors that may affect whether or not a Kiva loan will reach full funding. They developed a web application powered by a random forest model in order to predict the success of loans, highlight which factors are driving those loans, and provide suggestions on how to improve them.


    Our Team: Vinay Patlolla, Jason Carpenter

    Goal: Students worked on two projects with Manifold. In the first project, they used machine learning models such as Logistic Regression, Random Forest and XGBoost to detect faults in oil pipeline using Python. In the second project, they developed a multi-camera multitracking pipeline to track people in a scene using deep learning and clustering techniques.


    Our Team: Chenxi Ge

    Goal: Students worked on a complex computer vision problem using deep learning with the goal of locating characters to decode the character sequence.


    Our Team: Tyler White, Jing Song

    Goal: Students used Spark to obtain data to build a public-facing Firefox Health report dashboard. They used time series analysis to predict ESR usage and checked the validity of t-tests with non-parametric tests.


    Our Team: Danai Avgerinou, Shannon McNish

    Goal: Students worked on a data engineering project to build a small centralized data warehouse to host MTC's data. They also worked on a data science project using NLP with FastTrak survey data and made discoveries involving ridership patterns of Clipper users.


    Our Team: Natalie Ha, Christopher Dong

    Goal: Students built a text classification model to categorize survey responses and found correlations with NPS. On another project, they built a Tableau dashboard for funnel analysis on reported content in the platform. They also built and deployed (with Airflow) a machine learning model using Spark ML to predict survey text responses and created complex SQL queries to calculate metrics regarding content moderation.


    Our Team: Guoqiang Liang

    Goal: Students employed machine learning techniques to assign probabilities of churn using Python and Spark. On another project, they used NLP techniques to classify legal documents.

    Our Team: Ernest Kim, Davi Alexander Schumacher

    Pocket Gems

    Our Team: Dixin Yan, Spencer Stanley

    Goal: At Pocket Gems, students employed machine learning techniques to build a churn model and a matchmaking model for a newly developed game. They also researched and developed models to help the marketing team with channel attribution and creatives optimization. On another project, they used time series methods to predict the impact of paid advertising channels on organic install volume.

    Price F(X)

    Our Team: Neerja Doshi, Alvira Swalin

    Goal: Students employed machine learning (Python) and deep learning (PyTorch) techniques to build a product recommendation system.


    Our Team: Khoury Ibrahim, Danielle Savage

    Goal: Students used deep learning techniques to build a multi-label image recognition CNN using PyTorch to identify contaminants in images of landfill, recycling, and compost in Recology's images of waste.

    Our Team: Sara Mahar, Nicha Ruchirawat

    Goal: Students automated the real-time detection of a data feed failure from Google, Bing and Facebook sources using a suite of standardized hypothesis tests. On another project, they identified significant clusters of words from tens of thousands of omni-channel reviews with Latent Dirichlet Allocation (LDA) topic modeling and k-means clustering.

    San Francisco 49ers

    Our Team: Kishan Panchal

    Goal: Students used machine learning techniques to create a weekly cohort-based churn prediction system for season ticket holders. On another project, they created a data ingestion system to get external ticket data into the team's data warehouse.

    San Francisco County Transportation Authority

    Our Team: John Rumpel, Kaya Tollas

    Goal: Students used Python to compute accessibility metrics for transit stops (this was later used in their study on TNCs and ridership). On another project they prepared data for input into the SFCTA travel model. And on another project they visualized traffic incidents with an interactive map using javascript.


    Our Team: Mathew Shaw, Cara Qin

    Goal: Students employed machine learning techniques to identify suspicious users, predict LTV, and classify game themes.


    Our Team: Daniel Grzenda, Jade Yun

    Goal: Students employed graph theory to quantify variants and analyze protein data from the blood of patients using Python.


    Our Team: Nimesh Sinha, Zizhen Song

    Goal: Students used natural language processing and machine learning techniques to build a data pipeline recommendation engine. On another project, they worked on clustering customers based on login data.

    Stanford Graduate School of Buisness

    Our Team: Ker-Yu Ong, Chen Wang

    Goal: Students compared cloud databases (AWS, Google Bigquery, Snowflake and Databricks) by running benchmarking queries for research use cases. They also ran machine learning models to classify WSJ articles and used NLP techniques to extract information from news articles and identify topics in Amazon product reviews.


    Our Team: David Kes

    Goal: Students developed an exponentially weighted moving average (EWMA) control charting scheme to detect bus detours for a variety of transit agencies using Python. The algorithm was used to help automate the customer success team's process for detecting defaults in any transit agencies systems.


    Our Team: Thy Khue Ly, Beiming Liu

    Goal: Students used machine learning to predict default risks of customers and also to cluster them into groups based on their credit card transactions using Python. On another project they used NLP to predict transaction categories, and on a final project they used time-series and machine learning to predict user annual income with transactional data.


    Our Team: Feiran Ji, Lingzhi Du

    Goal: Students predicted users’ purchasing behavior for future games using machine learning techniques and deployed an end-to-end pipeline to put the model into production on Hadoop clusters using Spark. Additionally, they visualized insights and developed an interactive dashboard to be used in conjunction with the predictive model.


    Our Team: Siavash Mortezavi, Kerem Can Turgutlu

    Goal: Students used traditional machine learning techniques to predict overall survival of meningioma cancer patients and used deep learning and computer vision to automatically segment brain structures.


    Our Team: Sangyu Shen, Qian Li

    Goal: Students employed machine learning techniques to classify patients with side effects from radiation therapy using Python.

    Under Armour

    Our Team: Ryan Campa, Zhengjie Xu

    Goal: Students used machine learning to predict stride and cadence to help runners improve their form. They also used unsupervised learning to identify organized race events from millions of rows of workout data.

    United Health Care

    Our Team: Savannah Logan, Sooraj Mangalath Subrahmannian

    Goal: Students applied NLP techniques in Python to identify the main complaints in a website survey. They then employed machine learning techniques to identify areas of possible improvement in coverage rejection time.


    Our Team: Taylor Pellerin, Devin Bowers

    Goal: Students employed machine learning techniques to help identify fraudulent email sending behavior. They prototyped internal tooling, documentation, and more. Additionally, they built a machine learning classifier to help identify new legitimate email services. This allows Valimail to quickly scan through email aggregate reports to identify legitimate services that email on a customer's behalf.

    Valor Water Analytics

    Our Team: Jingjue Wang, Kunal Kotian

    Goal: Students trained a recurrent neural network to forecast water consumption and flagged unusual water meter readings by comparing the deviation of forecasts from true values. They wrote production code for a pipeline to extract and transform data, train deep learning models using TensorFlow, and generate forecasts for several water consumption time series.

    Vida Health

    Our Team: Nishan Madawanarachchi, Chengcheng Xu

    Goal: Students predicted weight loss among customers using linear regression with R. On another project, they used logistic regression in Python to predict the urgency level of clients' messages using logistic regression in Python. They also built a chat bot which aimed to help new users with the onboarding process.

    Voodoo Sports

    Our Team: Ford Higgins, Ian Pieter Smeenk

    Goal: Students contributed to a 'football genome' project for stylistic classification of teams using Python. They built a college basketball statistical model that builds on top of existing models in order to improve them and designed tools for football coaches to use to as an aid in scouting opposing teams. These projects were completed using Python, R, SQL and D3.js.


    Our Team: Deena Liz John, Patrick Yang

    Goal: Students used Python, SQL and Looker to implement A:B testing at Vungle, revolving around the comparison of different ad templates, levels of compression, and more. They also aided in the development of an in-house A:B testing platform.

    Wiser Solution

    Our Team: Liz Chen, Yu Tian

    Goal: Students developed an end-to-end pipeline in Python using computer vision and deep learning technologies for a company promotional product to recognize online promotions from images. On another project, they deployed REST APIs into production and designed experiments to compare the results from different methods.


    Our Team: Vanessa Zheng

    Goal: Students developed fraud detection models on a high-dimensional imbalanced dataset using Python. On another project, they devised and evaluated global risk metrics to monitor, condition and strengthen fraud models with SQL & Python.


    Our Team: Sri Santhosh Hari

    Goal: Students used time series techniques to forecast customer churn. Additionally, they used machine learning techniques like Random Forest and XGBoost to identify key features affecting bookings to predict members' likelihood of booking a car.