Some ML queries that I received!
Whether we build the machine learning model on sample of any population or on whole population cleansed data?
We build the ML model (especially supervised learning) on sample of any population to predict on the rest of the population because the sample is all we have. Just think about it, if we had the whole population data we won't need to make a model at all; We could just look up the population data for the desired value.
What are some Machine learning projects primarily focused in the banking sector?
Machine learning projects primarily focused in the banking sector could be:
- Transaction Fraud detection
- Predicting which customers will default their loans
- Recommending customised policies or programs to customers
- Predicting which customers will buy a scheme shown to them
Is math necessary for learning Machine Learning/ Data science/ Deep Learning ?
Not necessarily. These days, most things in machine learning are done using prebuilt modules (sklearn, keras, pytorch, etc) which have to be tuned a little (ofcourse). You can conduct them without much knowledge of math. However, if you want to come to a point where you will create your own modules, there you will need math. Also, for in-depth understanding of concepts, you will need math. But if you just want to implement concepts in some business case and get it over it, you can do without math.
Parameters vs. Hyperparameters in ML
Parameters are the values which have to be learned during the training process of an ML model. For example, weights in a neural net or regression model.
Hyperparameters are the values which govern the training process and have to be specified manually. For example, learning rate or number of epochs/iterations or regularization values.
Multiclass vs Multilabel Classification
Consider a classification task with target variable y. Suppose that this task involves 3 classes A, B and C.
In multiclass classification, y can only take on the value of any one class at a time for any instance.
In multilabel classification, y can belong to multiple classes simultaneously out of the given classes at a time for any instance.
Pseudo Labelling in machine learning
Pseudo Labelling is a technique for supervised learning (generally classification) where
- you train a model on training data (which is labelled)
- then use this model to generate predictions for the test data (which are called pseudo labels)
- then retrain the final model on the combination of training data and pseudo-labelled test data.
It could be useful in situations where you have limited labelled training data and a lot of unlabelled data.
Although I have my own doubts about this method and it almost sounds like something that won't work at all (at least to me!), but somehow it does (sometimes!). So, be careful when using such strategies for increasing the accuracy of the model (output performance).