Classification Modelling

Convener(s)

 
Name: Prof. Sanjeev Sabnis Prof. Radhendushka Srivastava
Mailing Address: IIT Bombay (Mathematics) IIT Bombay (Mathematics)
Email:  svs at iitb.ac.in  rsrivastava at iitb.ac.in  

Please Note:

  • NCM Participants are requested to register in NCM website followed by registration in CEP website: http://www.cep.iitb.ac.in (through Google Chrome browser).
  • Shortlisted NCM participants will have to pay nominal course fee of Rs 1000 + 18% (GST) = Rs 1180
  • Last Date of receiving online application from participants is 31 Dec 2022 for both NCM and CEP website

Note: Lab sessions will be held in Python. Participants are required to know the basics of Python.

Description

The word ‘classification’ in the context of statistical modelling refers to a classification of new observations into relevant classes using a statistical decision rule that is built using training data pertaining to a particular phenomenon or a field.

This classification exercise is pervasive across fields such as medicine (for example, classifying individuals having COVID or not having COVID into classes such as ‘severe symptoms’, ‘non-severe symptoms’, ‘absence of symptoms’), various manufacturing industries (classifying newly manufactured items into ‘defective’ and ‘non-defective’ classes), banking (classifying clients into classes such as ‘fraudulent’ and ‘non-fraudulent’), social sciences and law and many more fields.

The classification methods that will be covered in this workshop include (i) logistic regression, (ii) linear and quadratic discriminant analysis, (iii) naïve Bayes, (iv) K- nearest neighbours, (v) decision trees, (vi) random forests, (vii) support vector machines. Each of these classification methods will be demonstrated using real and simulated data with the help of open-source software R/python. In the machine learning parlance, these classification methods are also referred to as supervised learning methods.

Some unsupervised learning techniques (mainly, clustering methods) will also be covered in the workshop. The K-means clustering and hierarchical clustering algorithms will be demonstrated through real data. 

 

 

 

Dates: 

Monday, February 20, 2023 - 07:00 to Saturday, February 25, 2023 - 19:00

Venue: 

Venue Address: 

Indian Institute of Technology Bombay, Powai
Mumbai-400076

 

Venue State: 

Venue City: 

PIN: 

400076

Chrono Order: 

417

Syllabus: 

Detailed Syllabus

20-02-2023 (Day1)  
Lecture 1 & 2. Random experiment, sample space, events, probability and its properties, conditional probability, independent events, Bayes probability, random variables, standard probability density functions, independence, law of large numbers, central limit theorem.
Lecture 3 & 4. Basics of matrix algebra, square symmetric matrices, eigen values and eigen vectors, positive definite matrices, spectral decomposition, square root of a square symmetric matrix and its properties, quadratic forms, matrix inequalities and optimization.
Lab: Basic vector computation, matrix operations (addition, multiplication, inverse, determinant, rank, eigen values, eigen vectors, svd for square symmetric matrices, square root of matrices), generating random variables, plotting of pdf’s/pmf’s , sample mean and its closeness to the population mean.

21-02-2023 (Day2)
Lecture 1 & 2. Data matrix, measures of central tendency, dispersion, skewness, kurtosis, some graphical tools like box plot, histogram, scatterplot, likelihood function, estimation strategies, application of central limit theorem.
Lecture 3 & 4.  Basics of testing of hypothesis, type I error and type II error, power, p-value, connection between testing of hypothesis and classification, z-test, t-test, paired t-test.
Lab: Exploration of few real/simulated data, simulation and confirmatory analysis of empirical significance level for mean testing along with power curve.

22-02-2023 (Day3)
Lecture 1 & 2. General regression setup, logistic regression, parameter estimation and diagnostics of logistic regression model.
Lecture 3 & 4. Linear discriminant analysis of Gaussian populations, misclassification probability matrix, ROC, quadratic discriminant analysis, cross validation method.
Lab: Implementations of logistic, LDA, QDA on real/simulated data.  
23-02-2023 (Day 4)
Lecture 1 & 2. Naïve Bayes classifier and comparison with Logistic, LDA and QDA.
Lecture 3 & 4. Support vector classifier, support vector machine with linear/nonlinear boundaries, cross validation method.
Lab: Implementations of Naïve Bayes classifier and SVM on real/simulated data.
24-02-2023 (Day 5)
Lecture 1 & 2. Classification using decision trees, boosting, regularization, random forests, variable of importance.
Lecture 3 & 4. K-Nearest Neighbours classifier, notion of distance measures, cross validation, advantage of KNN, comparison, real data application.
Lab: Implementations of decision tree, random forest and KNN on real/simulated data.

25-02-2023 (Day 6)
Lecture 1. Applications of unsupervised learning, principal component analysis and factor analysis.
Lecture 2. K-means clustering algorithm.
Lecture 3 & 4. Hierarchical clustering algorithms, dendrogram.
Lab: Implementations of K-means clustering and hierarchical clustering algorithms on real/simulated data.

Book Reference:

  • Searl, S.R., and Khuri A. I., Matrix Algebra Useful for Statistics, 2nd Edition, Wiley, New York, 2017.
  • Hogg, R., McKean. J. and Craig, A., Introduction to Mathematical Statistics, 8th Edition, Pearson, Boston, 2019.
  • Johnson, R.A. and Wichern, D.W., Applied Multivariate Statistical Analysis, 6th Edition, Upper Saddle River, Prentice Hall, New Jersey, 2007.
  • Hastie, T., Tibshirani, R. and Friedman, J., The elements of statistical learning, 2nd Edition, Springer, New York, 2016.
  • James, G., Witten, D.,  Hastie, T., and Tibshirani, R., An introduction to statistical learning, 1st  Edition, Springer, New York, 2013.

Note: Lab sessions will be held in R/Python. Participants are required to know the basic of R/Python.

Time Table: 

Time Table

Each course will be scheduled for six consecutive days in a week and will comprise of lectures/ tutorials/ lab/ computing components. Participants with Engineering/Science background and/or work experience in Data Science.

 

Date

Lecture 1
9.30 am
to
10.30 am

 

Lecturer 2
11.00 am
to
12.00 pm

Lecture 3
12:00 pm
to
1:00 pm

 

Lecture 4
2.30 pm
to
3:30 pm

 

Lab Session
4:00 pm
to
5:00 pm

 

20-02-2023

RS

 

T

E

A

RS

SS

L

U

N

C

H

SS

 

T

E

A

RS+SS+SR+JD

S

N

A

C

K

S

21-02-2023

RS

RS

SS

SS

RS+SR+SS+JD

22-02-2023

SS

SS

RS

RS

RS + SS+SR +JD

23-02-2023

SS

SS

RS

RS

RS + SS+SR +JD

24-02-2023

SR

SR

SS

SS

RS + SS+SR +JD

25-02-2023

SS

RS

SR

SR

RS + SS+SR +JD

SS - Prof. Sanjeev Sabnis (IITB), (11Lecture+6Lab)
RS - Prof. Radhendushka Srivastava (IITB), (9Lecture+6Lab)
SR - Dr. Siddhartha Roy (Industry Expert), (4Lecture+6Lab)
JD – Dr. Jovi D’Silva (NIO) (TA), (6Lab).
 

Selected Applicants: 

 

Serial SID Full Name Gender Affiliation Position in College/ University University/ Institute M.Sc./ M.A. Ph.D. Degree Date
1 45141 Ms Arulmani Komarasamy Female Bharathiar University PG Extension and Research Centre PhD Bharathiar University  
2 45173 Mrs. D. Poongodi Female Bharathiar University PG extension and research centre Ph. D L. R. G govt arts college for women  
3 45217 Ms. Sunita Rani Female IIT Bhubaneswar PhD Student Kurukshetra University,Kurukshetra  
4 45344 Mr. Tamil Selvan T Male NIT Calicut PhD Student PSG College of Arts & Science, Coimbatore  
5 45465 Ms. Krishna Mahapatra Female Vellore Institute of Technology, Vellore Campus Research Scholar(PhD) Diamond Harbour Women's University  
6 45559 Mr Tamizhazhagan S Male National Institute of Technology PhD Pondicherry University  
7 45574 Mr. Muzammil Khan Male Maulana Azad National Institute of Technology Bhopal Research Scholar Bundelkhand University  
8 45585 Ms. Bhavana Singh Female Maulana Azad National Institute of Technology, Bhopal Ph.D National Institute of Technology, Hamirpur  
9 45606 Mr. Nitish Kumar Mahala Male Maulana Azad National Institute of technology Bhopal PhD National Institute of technology Jamshedpur  
10 45610 Ms. Nikita Yadav Female Maulana Azad National Institute of Technology , Bhopal PhD Malaviya National Institute of Technology , Jaipur  
11 45634 Mr. Abhinav Gupta Male Maulana Azad National Institute of Technology Bhopal PhD VBSPU JAUNPUR  
12 45659 Mr. Sandeep Kumar Male National Institute of Technology Rourkela PhD Pondicherry University  

 

 

How to Reach: 

Click on the folloiwng link for more details

https://www.iitb.ac.in/en/about-iit-bombay/getting-to-iit-bombay

 

School Short Name: 

cm

Last Date Application: 

Saturday, December 31, 2022

School Type: 

IMW

Separate faculty form: 

0