Students in technology focused classroom.

Internships

The Data Institute seeks members interested in working with one or more teams of students on an MS in Data Science (MSDS) internship project. Projects are centered on creating business value through data science and last approximately nine months. Our students and faculty make a strong commitment to client success. A USF faculty member acts as project mentor, guiding students through the project. A project champion from the practicum company provides answers, business context and data.

The Membership Advantage

Our students are in high demand - become a member to ensure placement.

Get Started

The first step is to discuss possible projects from your organization and how they might fit with the goals of the MSDS program. Our Director of Partnerships can provide additional information and answer any questions:

Sean Butcher
Director of Partnerships
sbutcher@usfca.edu

Past Projects

The MS in Data Science (MSDS) sends internship teams to companies tackling data science problems in industries ranging from high-frequency trading to hospitality and energy efficiency to transportation.

Our partner organizations range from small start-ups to established Bay Area technology firms as well as large civic and nonprofit organizations.

Corporate

Our Team: Ben Miroglio and Chhavi Choudhry

Goal: Cluster web sessions to segment users and improve the flow of Airbnb's website and mobile app.

Ben and Chhavi employed machine learning techniques to identify features indicative of positive outcomes using R and Python. They built interactive web session visualizer using D3.js to identify key differences among different segments of users and to identify bottlenecks in the session journey.
Our Team: Vincent Pham and Brynne Lycette

Goal: Employ machine learning techniques for credit card fraud detection and build a data unification platform.

Capital One's fraud team has collected and built more than two hundred features relevant to classifying fraudulent credit card transactions. Vincent and Bree employed various machine learning techniques using H2O and Dato in order to evaluate software robustness and increase accuracy of fraud prediction. They also implemented a NoSQL data store and a higher level in-memory storage system to unify various streaming and batch processes.
Our Team: Meg Ellis and Jack Norman

Goal: Create a price-suggestion model to assist event organizers in optimizing ticket sales and revenue.

Identifying important features that most influence ticket prices, Meg and Jack implemented a K Nearest Neighbors model that clusters events with similar characteristics, and subsequently leveraged the distribution of costs of these similar, successful events to suggest an appropriate range of ticket prices that the organizer can use when creating their event. Flask was subsequently used to create a web application to allow users to interact with the model.
Our Team: Sandeep Vanga

Goal: Perform unsupervised text clustering to gain insights into representative sub-topics.

Sandeep built a baseline model using k-means clustering and tfidf features. He also devised two variants of Word2Vec (deep learning-based features) models. The first method is based on aggregation of word vectors and the second method is based on Bag of Clusters (BoClu) of words. He also implemented elbow method to choose optimal number of clusters. These algorithms are validated on 10 different brands/topics using the news data collected over one year. Various quantitative metrics such as entropy, silhouette, score, etc. and visualization techniques were used to validate the algorithms.
Our Team: David Reilly

Goal: Examine over 300,000 trips in the city of San Francisco to study driver behavior using SQL and R.

David constructed behavioral and situational features in order to model driver responses to dispatch requests using advanced machine learning algorithms. He analyzed cancellation fee refund rates across multiple cities in order to predict when a cancellation fee should be applied using Python.
Our Team: Sandeep Vanga and Rachan Bassi

Goal: Automate the process of image tagging by employing image processing as well as machine learning tools.

Williams-Sonoma’s product feed contains more than a million images and the corresponding meta data — such as color, pattern, type of image (catalog/multiproduct/single-product) — is extremely important to optimize the search and product recommendations. They automated the process of image tagging by employing image processing as well as machine learning tools. They used image saliency and color histogram-based computer vision techniques to segment and identify important regions/features of an image. A decision tree-based machine learning algorithm was used to classify the images. They were able to achieve 90% accuracy in case of silhouette/single-product images and 70% accuracy in case of complex multiproduct/catalog images.

Nonprofit and Civic Organizations