# CS6375: Machine Learning

3 Credit Course, *JSOM 2.802*, 2019

Spring 2019

**This is a previous offering of this class.** See the teaching page for current courses.

# Course Overview

**Class Hours**: Mo/We 1:00–2:15pm

**Class Room**: JSOM 2.802

**Instructor**: Gautam Kunapuli

**Office**: ECSS 2.717

**Email**: Gautam-dot-Kunapuli-@-utdallas-dot-edu

**Office Hours**: Wednesdays, 2:30pm-4:30pm; and by appointment

**Teaching Assistant**: TBA

**Email**: TBA

**Office Hours**: TBA

## Course Description

The main aim of the course is to provide an introduction and **hands-on understanding** of a broad variety of **machine-learning algorithms** on **real applications**. In addition to delving into the underlying mathematical and algorithmic details for many learning methods, we will also explore the practical aspects of applying machine learning to real-world data through **programming assignments**.

## Pre-requisities

The mandatory pre-requisite is **CS5343: Algorithm Analysis and Data Structures**.

In addition, many concepts in this class require a comfortable grasp of basic **probability theory**, **linear algebra**, **multivariate calculus** and **optimization**. Garret Thomas’ *Mathematics for Machine Learning* is a superb **review of essential mathematical background**: you can find it here.

## Python Resources

The programming assignments will require coding in **Python**. The following books may be useful as a **quick introduction to Python**:

*A Whirlwind Tour of Python*(and its companion repository of Jupyter Notebooks) by Jake VanderPlas is “…a fast-paced introduction to essential components of the Python language for researchers and developers who are already familiar with programming in another language” [author];*Python Data Science Handbook*by Jake VanderPlas (and its companion repository of Jupyter Notebooks) “introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages” [author].

The following books are also useful references if you want to learn Python from scratch:

*Think Python: How to Think Like a Computer Scientist*by Allen B. Downey;*Automate the Boring Stuff with Python*by Al Sweigart.

## Textbooks and Course Materials

There is **no required textbook** for this class. However, the following textbooks are useful references for various topics we will cover in this course:

*Pattern Recognition and Machine Learning*by Christopher M. Bishop; this is a standard textbook and reference for introductory machine learning and covers a large part of our syllabus;*Machine Learning: a Probabilistic Perspective*by Kevin Murphy; another excellent book and reference, especially for probabilistic graphical models.

The following books are **available online, free for personal use**. Supplemental reading material will be assigned from these sources as often as possible.

*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*by Trevor Hastie, Robert Tibshirani and Jerome Friedman (available online)*Bayesian Reasoning and Machine Learning*by David Barber (available online)*Understanding Machine Learning: From Theory to Algorithms*by Shai Shalev-Shwartz and Shai Ben-David (available online) introduces machine learning from a theoretical perspective;*Deep Learning*by Ian Goodfellow, Yoshua Bengio and Aaron Courville (available online) is an excellent introductory textbook for a wide-variety of deep learning methods and applications;*Reinforcement Learning: An Introduction*by Richard S. Sutton and Andrew G. Barto (available online) is the*de facto*textbook and reference for reinforcement learning;

# Syllabus and Schedule

Week | Date | Topic | Readings | Notes |
---|---|---|---|---|

1 | Jan 14 (mo) | Introduction & Linear Regression | Bishop, Ch. 1 | |

Jan 16 (we) | Linear Regression (continued) | Andrew Ng’s Lecture Notes, Part I; Shalev-Shwartz & Ben-David, Ch. 9.2; Kilian Weinberger’s Lecture Notes (probabilistic view) | ||

2 | Jan 21 (mo) | Martin Luther King Day No class | ||

Jan 23 (we) | Perceptron | Shalev-Shwartz & Ben-David, Ch. 9.1; Kilian Weinberger’s Lecture Notes | ||

3 | Jan 28 (mo) | Perceptron (continued) | Computational complexity of GD vs. Stochastic GD | HW1 Out |

Jan 30 (we) | Support Vector Machines | Andrew Ng’s Lecture Notes; Bishop, Ch. 7; Barber, Ch. 17.5; Shalev-Shwartz & Ben-David, Ch. 15 | ||

4 | Feb 4 (mo) | Support Vector Machines (continued) | ||

Feb 6 (we) | Decision Trees | Mitchell, Ch. 3; Kilian Weinberger’s Lecture Notes | ||

5 | Feb 11 (mo) | Decision Trees (continued) | HW 1 Due HW 2 Out | |

Feb 13 (we) | Nearest Neighbor Methods | Bishop, Ch. 14.4; Daumé III, Ch. 3 | ||

6 | Feb 18 (mo) | Good Machine Learning Practices pre-processing, model selection, cross validation, missing data, evaluation | Kotsiantis et al., 2006 | |

Feb 20 (we) | Good Machine Learning Practices (continued) | |||

7 | Feb 25 (mo) | Naive Bayes | Jerry Zhu’s Lecture Notes; Mitchell 2nd ed. Ch. 3.1-3.2; Daumé III, Ch. 9 | |

Feb 27 (we) | Logistic Regression | Mitchell 2nd ed. Ch. 3.3-3.5; Bishop, 8.4.1, 9.2, 9.3, 9.4; Andrew Ng’s Lecture Notes, Pt II; Kilian Weinberger’s Lecture Notes | HW 2 Due HW3 Out | |

8 | Mar 4 (mo) | Mid-Term Exam Prep | ||

Mar 6 (we) | Mid-Term Exam | JSOM 2.115 1-2:15pm | ||

9 | Mar 11 (mo) | Ensemble Methods: Bagging | Bishop, Ch. 14; Hastie et al., Ch. 7.1-7.6, 8.7; Visualization of the Bias-Variance Tradeoff | |

Mar 13 (we) | Ensemble Methods: Boosting | Hastie et al., Ch. 15; Freund and Schapire, 1999 | ||

10 | Mar 18 (mo) | Spring Break No class | ||

Mar 20 (we) | Spring Break No class | |||

11 | Mar 25 (mo) | Ensemble Methods: Boosting (continued) | ||

Mar 27 (we) | Ensemble Methods: Gradient Boosting | Friedman, 99; Mason et al., 99; Visualizing Gradient Boosting; Tong He’s Presentation on XGBoost | ||

12 | Apr 1 (mo) | Principal Components Analysis | Andrew Ng’s Lecture Notes, Pt II | HW 3 Due HW4 Out |

Apr 3 (we) | Clustering | Tan et al., Ch. 8 | ||

13 | Apr 8 (mo) | Clustering (continued) | ||

Apr 10 (we) | Neural Networks | Goodfellow et al., Ch. 6 | ||

14 | Apr 15 (mo) | Neural Networks (continued) | HW 4 Due HW5 Out | |

Apr 17 (we) | Convolutional Neural Networks | Goodfellow et al., Ch. 9 | ||

15 | Apr 22 (mo) | Reinforcement Learning Slides updated | Sutton and Barto, Ch. 1, 3; Andrew Ng’s Lecture Notes | |

Apr 24 (we) | Reinforcement Learning (continued) | |||

16 | Apr 29 (mo) | Final Exam Prep | HW 5 Due | |

May 1 (we) | Final Exam | JSOM 2.804 1-2:15pm |

The topic schedule is subject to change at the instructor’s discretion. Please check this page regularly for lecture slides, additional references and reading materials.

# Grading

- 50%, Homework Problem Sets/Programming Assignments (5, each 10%)
- 20%, Mid-term Exam
- 30%, Final Exam

# Course Policies

## Attendance Policy

Classroom attendance for all lectures is mandatory. Prolonged absence from the lectures may lead to substantial grade penalties:

- two consecutive absences, no penalty;
- 3 consecutive absences: 1 letter grade drop;
- 4 consecutive absences, F grade.

Absence due to emergency or extenuating circumstances can be excused, but proof may be required.

## Homework Policy

Homework assignments are **due at the start of class on the due date** without exceptions, unless permission was obtained from the instructor **in advance**. Homework and assignment deadlines will not be extended except under extreme university-wide circumstances such as weather emergencies.

All homeworks, programming projects, take-home exams (if any) **are to be written up and completed individually**. You **may discuss, collaborate, brainstorm and strategize** ideas, concepts and problems with other students. However, all written solutions and coded programs **must be your own**. Copying another student’s work or allowing other students to copy your work is academically dishonest.

## Academic Integrity

All students are responsible for adhering to UT Dallas Community Standards and Conduct, particularly regarding Academic Integrity and Academic Dishonesty. Any academic dishonesty, including, but not restricted to plagiarism (including from internet sources), collusion, cheating, fabrication, will result in a zero score on the assignment/project/exam and possible disciplinary action.

## Students with Disabilities

UT Dallas is committed to equal access in all endeavors for students with disabilities. The Office of Student Accessability (OSA) provides academic accommodations for eligible students with a documented disability. Accommodations for each student are determined by OSA on an individual basis, with input from qualified professionals. Accommodations are intended to level the playing field for students with disabilities, while maintaining the academic integrity and standards set by the University. If you think you qualify for an academic accommodation, please visit OSA to determine eligibility.

If you have already received academic accommodation, please contact me by e-mail to schedule an appointment **before classes start**, if possible.