# IMW - Classification Modelling

## Speakers and Syllabus

Detailed Syllabus

20-02-2023 (Day1)
Lecture 1 & 2. Random experiment, sample space, events, probability and its properties, conditional probability, independent events, Bayes probability, random variables, standard probability density functions, independence, law of large numbers, central limit theorem.
Lecture 3 & 4. Basics of matrix algebra, square symmetric matrices, eigen values and eigen vectors, positive definite matrices, spectral decomposition, square root of a square symmetric matrix and its properties, quadratic forms, matrix inequalities and optimization.
Lab: Basic vector computation, matrix operations (addition, multiplication, inverse, determinant, rank, eigen values, eigen vectors, svd for square symmetric matrices, square root of matrices), generating random variables, plotting of pdf’s/pmf’s , sample mean and its closeness to the population mean.

21-02-2023 (Day2)
Lecture 1 & 2. Data matrix, measures of central tendency, dispersion, skewness, kurtosis, some graphical tools like box plot, histogram, scatterplot, likelihood function, estimation strategies, application of central limit theorem.
Lecture 3 & 4.  Basics of testing of hypothesis, type I error and type II error, power, p-value, connection between testing of hypothesis and classification, z-test, t-test, paired t-test.
Lab: Exploration of few real/simulated data, simulation and confirmatory analysis of empirical significance level for mean testing along with power curve.

22-02-2023 (Day3)
Lecture 1 & 2. General regression setup, logistic regression, parameter estimation and diagnostics of logistic regression model.
Lecture 3 & 4. Linear discriminant analysis of Gaussian populations, misclassification probability matrix, ROC, quadratic discriminant analysis, cross validation method.
Lab: Implementations of logistic, LDA, QDA on real/simulated data.
23-02-2023 (Day 4)
Lecture 1 & 2. Naïve Bayes classifier and comparison with Logistic, LDA and QDA.
Lecture 3 & 4. Support vector classifier, support vector machine with linear/nonlinear boundaries, cross validation method.
Lab: Implementations of Naïve Bayes classifier and SVM on real/simulated data.
24-02-2023 (Day 5)
Lecture 1 & 2. Classification using decision trees, boosting, regularization, random forests, variable of importance.
Lecture 3 & 4. K-Nearest Neighbours classifier, notion of distance measures, cross validation, advantage of KNN, comparison, real data application.
Lab: Implementations of decision tree, random forest and KNN on real/simulated data.

25-02-2023 (Day 6)
Lecture 1. Applications of unsupervised learning, principal component analysis and factor analysis.
Lecture 2. K-means clustering algorithm.
Lecture 3 & 4. Hierarchical clustering algorithms, dendrogram.
Lab: Implementations of K-means clustering and hierarchical clustering algorithms on real/simulated data.

Book Reference:

• Searl, S.R., and Khuri A. I., Matrix Algebra Useful for Statistics, 2nd Edition, Wiley, New York, 2017.
• Hogg, R., McKean. J. and Craig, A., Introduction to Mathematical Statistics, 8th Edition, Pearson, Boston, 2019.
• Johnson, R.A. and Wichern, D.W., Applied Multivariate Statistical Analysis, 6th Edition, Upper Saddle River, Prentice Hall, New Jersey, 2007.
• Hastie, T., Tibshirani, R. and Friedman, J., The elements of statistical learning, 2nd Edition, Springer, New York, 2016.
• James, G., Witten, D.,  Hastie, T., and Tibshirani, R., An introduction to statistical learning, 1st  Edition, Springer, New York, 2013.

Note: Lab sessions will be held in R/Python. Participants are required to know the basic of R/Python.

## Time Table

Time Table

Each course will be scheduled for six consecutive days in a week and will comprise of lectures/ tutorials/ lab/ computing components. Participants with Engineering/Science background and/or work experience in Data Science.

 Date Lecture 1 9.30 am to 10.30 am Lecturer 2 11.00 am to 12.00 pm Lecture 3 12:00 pm to 1:00 pm Lecture 4 2.30 pm to 3:30 pm Lab Session 4:00 pm to 5:00 pm 20-02-2023 RS T E A RS SS L U N C H SS T E A RS+SS+SR+JD S N A C K S 21-02-2023 RS RS SS SS RS+SR+SS+JD 22-02-2023 SS SS RS RS RS + SS+SR +JD 23-02-2023 SS SS RS RS RS + SS+SR +JD 24-02-2023 SR SR SS SS RS + SS+SR +JD 25-02-2023 SS RS SR SR RS + SS+SR +JD

SS - Prof. Sanjeev Sabnis (IITB), (11Lecture+6Lab)
RS - Prof. Radhendushka Srivastava (IITB), (9Lecture+6Lab)
SR - Dr. Siddhartha Roy (Industry Expert), (4Lecture+6Lab)
JD – Dr. Jovi D’Silva (NIO) (TA), (6Lab).

File Attachments: