Machine Learning @ Applied Statistics with Network Analysis, HSE, Moscow

This repository includes teaching materials related to the elective course Machine Learning taught at the HSE-Moscow masters programme Applied Statistics with Network Analysis. The materials are organized in sections corresponding to lecture days. Each section provides a brief outline of the topics addressed, access to the lecture slides, outline of the practical exercises and seminars, and references to the relevant literature.

For further information on the course, students can contact the lecturers via email at Nada Lavrač, [email protected] and Ljupco Todorovski, [email protected].

Grading

The grading for this course will be based on two types of assignments, homework and written exam. The schedule of the assignments is summarized in the following table:

Assignment	Grading	Submission Deadline (Date)
Homework 1	25%	16th of February 2021
Homework 2	25%	25th of February 2021
Written Exam	50%	2nd of March 2021 @12:00 (12:00 pm) Moscow time

Please note that the schedule is now final. We are going to organize additional hour of discussion related to the second homework and feedback on the first one on Tuesday, 23rd of February 2021 at 18:30 (6:30pm) Moscow time.

Handling Late Submissions

For the first homework, you can delay the submission for up to ten days: therefore, the ultimate deadline for submitting the first homework is 26th of February. Each day of delay after 16th of February will reduce your homework score by 1%: for example, if you submit your homework on 21st of February (five days of delay) and your homework were initially evaluated with 20%, your final grade, due to the 5 days delay, will be 15%.
The ultimate deadline for submitting the second homework is 2nd of March. The same rules for reducing the score as above apply, i.e., 1% penalty for one day of delay.

Student Clusters and Groups

We have noticed that there are two clusters of students attending the course:

Seventeen (17) students that have chosen the course officially. These students can work on the homework assignments and submit their solutions in groups with other students from this cluster only (up to three students per group). All students from this cluster are expected to take the written exam.
This cluster includes the following students: Borisyuk Anna, Vidovic Milica, Vorobeva Maria, Danilova Kseniia, Eremenko Alexandra, Kuzina Maria, Makhsudova Elvina, Parkhaeva Olga, Khairullina Dinara, Shabanova Ekaterina, Shakhova Anna, Petrov Gleb, Vladimirova Ksenia, Kozlova Yulia, Li Ling, Stremousov Alexander, and Chzu Chongrui.
Other students that attend the course voluntarily and are not on the official list of course students. These students can work on the homework assignments and submit their solutions in groups with other students from this cluster only (up to three students per group). The students from this cluster will not be able to take the written exam.

Other Information

The written exam will be composed of three types of questions: (1) multiple choice questions and (2) questions requiring short answers related to the methodology and theory of machine learning, as well as (3) a practical exercise that will require performing a certain learning task on given data and providing answers to specific questions related to the obtained models and results.

Tentative Course Schedule for the Academic Year 2020/21

You can follow the lectures using the following Zoom link, https://fmf-uni-lj-si.zoom.us/j/97756216461 or join the Zoom meeting using the ID 977 562 16461.

Date	Topic/Section
Thursday, 14th of January 2021	Introduction to Machine Learning
Tuesday, 19th of January 2021	Learning Rules
Thursday, 21st of January 2021	Relational Learning
Tuesday, 26th of January 2021 Thursday, 28th of January 2021	Learning from Heterogeneous Data
Thursday, 28th of January 2021 Tuesday, 2nd of February 2021	Learning Ensembles
Thursday, 4th of February 2021	Artificial Neural Networks and Deep Learning
Tuesday, 9th of February 2021 Thursday, 11th of February 2021	Embedding Complex Data Types
Thursday, 11th of February 2021	Dimensionality Reduction with Autoencoders
Tuesday, 16th of February 2021	Literature-Based Discovery and Support Vector Machines

1: Introduction to Machine Learning

Basic definitions and taxonomy of learning tasks
Three generations of machine learning and data mining methods
Understanding the error of machine learning models
The curse of dimensionality
Rough overview of the course topics

Lecture Slides

First part, Nada Lavrač
Second part, Ljupčo Todorovski
Last update: 15th of January 2021, 9:10 CET

Literature

James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Sections 1 and 2, check also the exercises at the end of Section 2.
Bramer M (2007) Principles of Data Mining. Springer, Berlin. DOI:10.1007/978-1-84628-766-4. An introductory textbook for refreshing your knowledge on basics of data mining. The first edition of the textbook is also available at ResearchGate, https://www.researchgate.net/publication/220688376_Principles_of_Data_Mining

2: Learning Rules

Learning rules from decision trees
Covering algorithm and its variants
Association rules and subgroup discovery
Evaluating rules and rule sets

Lecture Slides

Learning Rules
Last update: 20th of January 2021, 15:00 CET

Exercise Materials

Learning Decision Trees and Rules in R
Last update: 28th of January 2021, 15:20 CET

Literature

Fürnkranz J, Gamberger D and Lavrač N (2012) Foundations of Rule Learning. Springer, Berlin. DOI:10.1007/978-3-540-75197-7. Chapters 1 and 2, available here.

3: Relational Learning

Learning relational rules
Inductive logic programming
Propositionalization
Wordification and Python-RDM

Lecture Slides

Relational Learning
Last update: 26th of January 2021, 20:10 CET

Exercise Materials

Relational Learning in Python
Last update: 26th of January 2021, 15:20 CET

Literature

Džeroski S and Lavrač N (2001) Relational Data Mining. Springer, Berlin. DOI:10.1007/978-3-662-04599-2. Chapter 1, available here.
Perovšek M, Vavpetič A, Kranjc J, Cestnik B and Lavrač N (2015) Wordification: Propositionalization by unfolding relational data into bags of words. DOI:10.1016/j.eswa.2015.04.017. Available here.
Železný F and Lavrač N (2006) Propositionalization-based relational subgroup discovery with RSD. DOI:10.1007/s10994-006-8633-8. Available here.

4: Learning from Heterogeneous Data

Semantic relational learning with ontologies
- Propositionalization, Hedwig and NetSDM
Propositionalization of heterogeneous information networks
- TEHmINE and HINMINE
Practical exercises with HINMINE

Lecture Slides

Semantic Relational Learning
Last update: 26th of January 2021, 20:20 CET
Heterogeneous Information Networks
Last update: 23rd of February 2021, 17:40 CET

Exercise Materials

HINMINE in Python
Last update: 28th of January 2021, 15:20 CET

Literature

Kralj J, Robnik-Šikonja M and Lavrač N (2019) NetSDM: semantic data mining with network analysis. Journal of Machine Learning Research 20: 1-50.
Kralj J, Robnik-Šikonja M and Lavrač N (2018) HINMINE: Heterogeneous Information Network Mining with Information Retrieval Heuristics. DOI:10.1007/s10844-017-0444-9. Available here.

5: Learning Ensembles

Why ensembles: variance reduction
Boosting, bagging, feature subspaces, random forests
Out-of-bag error estimate, attribute importance
Bagging and random forests in R

Lecture Slides

Learning Ensembles
Last update: 2nd of February 2021, 22:10 CET

Exercise Materials

Bagging and Random Forests in R
Last update: 4th of February 2021, 15:40 CET

Literature

James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 8.2 (Bagging, Random Forest, Boosting), check also exercises 5 and 7-12 at the end of Section 8.

6: Neural Networks and Deep Learning

General introduction to NNs
Feed-forward networks and back propagation
Towards deep networks: Convolutional networks
Neural networks in R

Lecture Slides

Neural Networks and Deep Learning
Last update: 4th of February 2021, 15:40 CET

Exercise Materials

Neural Networks in R
Last update: 4th of February 2021, 15:40 CET

Literature

Hastie T, Tibshirani R and Friedman J (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York. Available at https://web.stanford.edu/~hastie/ElemStatLearn/. Sections 11.2 to 11.8 of Chapter 11.
Nielsen M (2019) Neural Networks and Deep Learning. Available at http://neuralnetworksanddeeplearning.com/. Excellent and highly recommended further reading.

7: Embedding Complex Data Types

Complex data types: semi-structured data and networks
Embedding of semi-structured data, bag-of-words
Embedding of words and text documents, word2vec and doc2vec
Classifying text documents in R
Embedding network nodes, node2vec
node2vec in R

Lecture Slides

Embedding Semi-Structured Data
Last update: 9th of February 2021, 22:20 CET
Embedding Networks
Last update: 11th of February, 11:10 CET

Exercise Materials

Classifying Text Documents in R
Last update: 9th of February 2021, 22:20 CET
Classifying Network Nodes in R
Last update: 11th of February 2021, 15:00 CET

Literature

Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781.
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. arXiv:1405.4053.
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. arXiv:1607.00653.
Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: Online learning of social representations. arXiv:1403.6652.

8: Dimensionality Reduction and Autoencoders

Classic methods for dimensionality reduction, PCA
Autoencoders as general embedding approach
Taxonomy of autoencoders: regularization and de-noising

Lecture Slides

Dim Reduction and Autoencoders
Last update: 11th of February 2021, 13:20 CET

Literature

James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 10.2: Principal Components Analysis.
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press. Available at https://www.deeplearningbook.org/. Introductory part of Section 14.
Charte D, Charte F, García S, del Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. arXiv:1801.01586.

9: Literature-Based Discovery and Support Vector Machines

Literature-based discovery
- Connecting unrelated terms across domains
Support vector machines and kernels
- Linear support vector machine
- Non-linearity and kernel functions
- Selecting kernels, setting hyper-parameters

Lecture Slides

Literature-Based Discovery
Last update: 16th of February 2021, 17:30 CET
Support Vector Machines and Kernels
Last update: 16th of February 2021, 10:20 CET

Literature

James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 9, check also exercises 1-8 in the same section.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
README.md		README.md

mrglazkov/hse-moscow-ml

Folders and files

Latest commit

History

Repository files navigation

Machine Learning @ Applied Statistics with Network Analysis, HSE, Moscow

Grading

Handling Late Submissions

Student Clusters and Groups

Other Information

Tentative Course Schedule for the Academic Year 2020/21

1: Introduction to Machine Learning

Lecture Slides

Literature

2: Learning Rules

Lecture Slides

Exercise Materials

Literature

3: Relational Learning

Lecture Slides

Exercise Materials

Literature

4: Learning from Heterogeneous Data

Lecture Slides

Exercise Materials

Literature

5: Learning Ensembles

Lecture Slides

Exercise Materials

Literature

6: Neural Networks and Deep Learning

Lecture Slides

Exercise Materials

Literature

7: Embedding Complex Data Types

Lecture Slides

Exercise Materials

Literature

8: Dimensionality Reduction and Autoencoders

Lecture Slides

Literature

9: Literature-Based Discovery and Support Vector Machines

Lecture Slides

Literature

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages