Topics to cover as a beginner in Machine Learning.

Sayantan Sadhu
3 min readJan 2, 2021

Machine learning is a vast field with a lot of people having enthusiasm in it. But what I have found out as I have moved in this domain is that the concepts or the topics that are absolutely must to learn is still pretty unclear. We all know that mathematics is very important but mostly people go for a 10 hours course in statistics or probability or differential calculus, stay at it for 4 hours, then gradually gets bored and quits. Maybe, sometimes they start with python and something similar happens. So this article will be kind of like a syllabus to get from a enthusiast in machine learning to having some projects in it.

  1. Basics required to get started
  • Get some introduction. (Life cycle of data science project, supervised, unsupervised and reinforcement learning, feature engineering and Kaggle).
  • Some knowledge of mathematics specially probability, statistics, matrix and determinants and differential calculus( No worries if you are an Indian and have prepared for JEE mains and advanced you are good enough in all of it except statistics. The point being if you have given some time to mathematics in high school you should be good enough).
  • Learn python.(taking inputs, conditional statements, loops, lists, dictionaries).
  • Move on to python libraries( Numpy, Matplotlib, Seaborn, Pandas) .

2. Machine learning algorithms — Learn the theories very well. Though it is not required to learn the theories to implement any machine learning algorithm but without the theory, you won’t able to go far ahead in the domain. Also you will miss out on a lot of fun as well.

  • Linear Regression.( also try if you can implement it in python using only numpy and pandas)
  • Logistic regression
  • K- nearest neighbor
  • Decision tree and Random forest
  • Naive bayes . ( revise bayes’ theorem and conditional probability )

3. Feature engineering — It is a step that we perform to the data before fitting it to the model. It is mostly data cleaning, analysis and visualization and then preparing the features.

  • Missing values — How and why there are missing values, types of values, ways of handling missing values.
  • Outliers — how to find outliers, what is outlier, how to handle them.
  • Handling imbalanced data set.
  • Handling categorical features. ( types of encodings and what they are )
  • Understand one hot encoding very well it will be used again in NLP.
  • Feature selection.
  • Normalization and standardization ( feature scaling technique ).
  • Finally do some feature engineering on the titanic dataset. It will teach you a lot.

4. Some projects to finally master this topics

  • Work on the titanic dataset.
  • Advanced house prediction dataset.
  • All space missions from 1957 dataset.
  • The MNIST dataset ( create a binary classifier from it )

Thank you so much for your time. I really hope that the articles gives you some insights or provides some help in your machine learning journey. Kindly consider hitting the clap button. Happy exploring the data science world.

--

--

Sayantan Sadhu

Just another guy exploring datasets in the world of data!!!