An intense review of the elementary aspects of computer programming using both R and Python, and an introduction to a variety of numerical and computational problems. Topics include functions, recursion, loops, list comprehensions, reading and writing files, importing web sites, generating random numbers, the method of inverse transformations, acceptance/rejection sampling, gradient descent, bootstrapping techniques, matrix and vector operations, and graphics.
An intense review of linear algebra. Topics include matrix operations, special matrices, linear systems of equations, the inverse matrix, and determinants; vectors, subspaces, linear independence, basis and dimension, row space, column space, rank, and the rank-nullity theorem; eigenvectors, eigenvalues, computational methods for nding eigenvectors and eigenvalues, and diagonalization of matrices; the LU, spectral, and singular value decompositions.
An introduction to the history of “big data” and four ideas driving the revolution in data analytics: volume, velocity, variety, and veracity. Students will read current newspaper and journal articles, listen to guest speakers, and complete case studies. After finishing this gateway course, students should understand how businesses, governments, and not-for-profit institutions are creating stakeholder value by more effectively capturing, curating, storing, searching, sharing, analyzing, and visualizing data.
An intense review of elementary probability and statistics. Topics include random variables, probability mass functions, density functions, the cumulative distribution function, moments, maximum likelihood estimation, and the method of moments; one- and two-sample hypothesis tests and confidence intervals involving proportions, means, and correlation coefficients; the axioms of Kolmogorov, independence, the law of total probability, and Bayes’ Theorem; and multivariate distributions, indicator random variables, and conditional expectation.
Before we can begin to apply rigorous statistical tools to data, we often need to approach our data intuitively, and look for meaningful associations and surprising patterns, detect outliers and anomalies, formulate hypotheses. This practice is commonly referred to as Exploratory Data Analysis (EDA). Successful exploratory data analysis depends on the ability to manipulate and visualize data. This class introduces various concepts in EDA with an emphasis on data manipulation in R.
This course is an intensive introduction to linear models, with a focus on both principles and practice. Examples from finance, business, marketing and economics are emphasized. Large data sets are used frequently. Topics include simple and multiple linear regression; weighted, generalized, and outlier-resistant least squares regression; interaction terms; transformations; regression diagnostics and addressing violations of regression assumptions; variable selection techniques like backward elimination and forward selection, and logit/probit models. Statistical packages include R and SAS.
In this course, students will read case studies and hear from guest speakers about challenges and opportunities generated by the advent of “big data.” Students will make group presentations and write critical response papers related to these case studies. Students will consider some of the traditional business frameworks (e.g., SWOT analysis) for evaluating the strategic opportunities available to a company in the “big data” space.
A survey of the theory and application of time series models, with a particular emphasis on nancial and business applications (e.g., exchange rates, sales data, Value-at-Risk, etc.). Tools for model identication, estimation, and assessment are developed in depth. Smoothing methods and trend/seasonal decomposition methods are covered as well, including moving average, exponential, Holt-Winters, and Lowess smoothing techniques. Finally, volatility clustering is modeled through ARCH, GARCH, EGARCH, and GARCH-in-mean specications. Statistical packages include R and SAS.
Student teams are placed with a client as part of a module-long analytics project with weekly deliverables and meetings. The course provides both skills and experience in working with clients and opportunities to practice the professional skills required by business. The course features signicant one-on- one mentoring and integration of topics presented in the programs courses.
In this course, students will learn essential concepts related to business communicaten and, in particular, the communication of technical material both spoken and written. Students will learn how to competently create, organize, and support ideas in their business presentations. They will deliver both planned and extemporaneous public presentations on topics related to data analysis and business, both individually and in groups. This course will emphasize the creation of presentation slides and other supporting materials, the correct presentation and organization of data analysis results, and listening to and critically evaluating presentations made by other students.
This course focuses on the core theory and application of classication and clustering techniques, feature selection, and performance evaluation. Algorithms discussed include logistic regression, support vector machines (SVM), k-Nearest Neighbors (kNN), Naive Bayes, association rules (a priori algorithm), decision trees, neural networks, clustering, and ensemble methods. Using tools available in Python and R, students will gain experience with application of the theory to key predictive and descriptive analytics problems in business intelligence. Special attention is drawn to practical issues such as class imbalance, noise, missing data, and computational complexity.
This course will address basic data visualization techniques and design principles. Students will use R with the ggplot2 and shiny packages to prototype visualizations. Students will obtain practical experience with the visualization of complex data, including multivariate data, geospatia data, textual data, time series, and network data.
This course trains students in the use of multivariate statistical methods other than multiple linear regression, which is covered in MSAN 601. Applications to nance, social science, and marketing data are emphasized (e.g., dimension reduction for Treasury yield curves and consumer microdata). Topics include principal components analysis, factor regression, linear and quadratic discriminant analysis, ANOVA and MANOVA, repeated measures ANOVA, and various clustering techniques (k-means, hierarchical, spectral, total variation, etc.). Statistical packages include R and SAS.
In this course, students will learn how companies harness their digital marketing data to drive insights that convert into better customer experiences. Topics may include survival analysis, longitudinal data analysis, heat maps, geographic information systems, fraud detection, and market basket analysis. Areas of application may include customer targeting, election management, and ecommerce.
Continuation of Practicum. Student teams extend their existing analytics project or are reassigned to new projects with a client as part of a semester-long project with weekly deliverables and meetings. Continued one-on-one mentoring and development of professional business skills are also provided, with an emphasis on "soft skills" training in creating their CV, interviewing and networking. Over the course of the semester, student teams present their Practicum I projects to the other students.
A deeper exploration of the specic properties and algorithms used in machine learning and clustering, with special attention to cutting-edge extensions of the more basic techniques learned in MSAN 621. Examples include k means ++, regularized logistic regression, topic modeling, stratied k-fold sampling, partitioning around medoids, deep learning, etc. Using industry standard machine learning and natural language processing packages, students will learn how to eciently implement advanced applications set in high-dimensional feature spaces, including text mining and image classication.
This course introduces the fundamental concepts and methods underlying the field of social network analysis including network centrality, cohesive subgroups, structural and role equivalence, visualization and hypothesis testing. Emphasis is on students learning from analyzing data and answering empirical questions using routines written in R.
Continuation of Practicum. Student teams extend their existing analytics project or are reassigned to new projects with a client as part of a module-long project with weekly deliverables and meetings. Selected student teams present their Practicum II projects over the course of the module to other students. Students also learn about the start-up process and the venture capital industry.
In this course, students receive a brief, intense, and focused review of programming in SAS Enterprise Guide. This review will augment the SAS training that students receive in other analytics courses, yet specifically prepare students to take the SAS Base Programming examination.
Students study key-value store through NoSQL with a focus on using MongoDB (including, possibly, pymongo, the Python Mongo API). Applications are used to motivate a disciplined approach to database programming with MongoDB, including the construction of indices.
Analysts spend the majority of their time just collecting data and contorting it into an appropriate or convenient form for analysis. In this course, students write programs to scrape data from websites such as Yahoo finance and you use REST APIs to extract data from Twitter. Topics also include log file filtering, table merging, data cleaning, and data reorganization.
Students learn the MapReduce technique of distributed computing. The fundamental principals are first learned with the Python multiprocessing library, in which students build their own con-current MapReduce framework. Considerable time is spent exploring practical application of mapping and reducing for various types of real world data. Distributed statistical and machine learning approaches are explored. Finally, Hadoop streaming MapReduce jobs (in Python) are launched on AWS-EMR.
The study of website traffic analysis for the purpose of understanding how visitors use a site or services. Topics include Google Analytics, A/B testing, and the analysis of incoming traffic characteristics such as client browser, language, computer attributes, and geolocation.
In this class, students learn how to prepare for technical interviews. We review varied information on how you can be successful in an interview through research, practical application of coursework, practice, interviewing tips and plenty of sample questions. Students work on their communication, presentation and technical solving problems skills in homework and mock interviews.