Prerequisites for Learning ML in a month
- Python 3
- High School mathematics
- 16hrs+/week (2 to 2.5 hour/day)
# Note: A junior-level expert in ML requires 1 thousand hours (1000) of practice. Daily 3 hours of practice is enough for three months.
Machine Learning a subset of Artificial Intelligence
Artificial Intelligence and its algorithms are beating humans in many ways. i.e. Chess game, Better medical diagnosis, Intelligent assistance, Tesla (self-driving car), and many more. So here a question arises, what Artificial Intelligence is?
Artificial Intelligence is a huge set of tools that make machines/computers behave intelligently.
And what is Machine Learning in Artificial Intelligence?
Machine learning is a powerful subset of Artificial Intelligence. Defining ML is not simple; it has a number of applications that overlap with several other fields. Because of the rapid growth in the field, the boundaries of the field are blurry.
So, I’ll define ML as the ability to automatically learn from experiences without being forcefully programmed/Explicitly programmed. The term “Explicitly programmed” means using (if/else) conditions for every possible situation. In the past when ML wasn’t a subset of AI, problems were being solved by explicit programming.
In general, as we know that nine digits (0–9) make 36 possible combinations. So there is a chance that for a 9 imaged data set we could have to make 36 conditions of (if/else/elif). But thanks to Machine Learning we don’t have to work this hard as the ML model learns itself and works accordingly, all we need to do is train and retrain our model by updating parameters according to requirements and leave the entire process of learning new things and prediction on the model.
Data science is about making discoveries and creating meaningful insights from data. ML is often an important tool for Data Science for making predictions from data and creating insights.
Before moving to the practical practices we need to know
“How does ML work”?
ML is called interdisciplinary which means a mixture of statistics and computer science. ML has the ability to learn itself, without being explicitly programmed. ML models can learn without step-by-step instructions. The model learns patterns from existing data then applies that learnings to new input data it gets for predictions.
Example: When we archive emails, the model learns from this experience further in the future if any email we get is related to the archived email it automatically moves to the Spam mails folder. This is how ML works, it learns from experiences and further classifies data (spam mails), but keep in mind ML needs high-quality data to be successful.
Machine Learning Process Step by Step
How we use ML in models? OR What is the flow of ML working on a problem?
Step_1: Define the objectives of the problem
· What are we trying to predict?
For beginners, the first step is to know and write down what we are trying to predict!
· What are the target features?
If we are working on a dataset of fruits and we want to predict the rottenness of fruits in percentage (50% rotten, 60% rotten, 99.95% rotten) , so here the target features are the rottenness in fruits.
· What is the input data?
Input data can be anything that is available as a dataset (images, texts, videos, and audios) so here we have data set of images of fruits in different conditions (fresh, not fresh, rotten, and extremely rotten fruits).
· What kind of problem we are working with??
ML problems can be of Binary Classification, clustering, or others.
Binary means 1 or 0, fresh or rotten while in clustering, the data is not labeled so the computer makes different groups from the dataset. The fresh fruits will be in a separate group and the rotten ones would be in another group.
Step_2: Data Gathering
There are different ways to gather data and we usually prefer Kaggle.
But sometimes there are problems that have no previous data, for such problems we have to collect data ourselves.
So here are some ways to collect data,
2. Survey and Questioners
3. Online Quiz
4. Google form
There are many more ways, what suits your problem and you get accurate high quality, and more data go with that.
Step_3: Data Preprocessing
Data cleaning is an important part of ML, as the more data is cleaned the more model works accurately. Even if we get data from Kaggle that also required to be cleaned. Because data sets have
. missing values
. corrupted Values
. Unnecessary data that need to be removed.
And if we have gathered our own data set by interview, google forms, or any other way that also needs to be transformed into the desired format so that it is easy to work with it further.
. Reducing dimensions of data, if we have a dataset of images, we make sure all the data that enters the model is of the same dimension. Even if some images are dimensionally different from others so we have to do preprocessing so that any image that enters the model turns to the same dimension, this is done for better and accurate results.
Step_4: Exploring Data Analysis / Visualization
After preprocessing data we analyze what useful feathers the data set has.
We need to understand the data, we need to visualize it
Step_5: Building ML model
Data is split into two parts, training and testing. We use 76% of data for training and the remaining 33% is used for testing.
Model is a machine learning algorithm that predicts the output by the data given to it.
Step_6: Model Evaluation & Optimization
Testing data is used for the evaluation of the model. The model is never perfect at first attempt; we need to change parameters according to results during the evaluation process.
The point where the model is ready to be used and we can give different new inputs to let the model predict according to its accuracy.
Step_8: Building a model package for production
Once the code for the model is done we need to save the file, that the pre-trained model’s file can be used anywhere we want to use it. If we only copy the code, of course, we have to again train the model and do all the process from start, the accuracy changes every time the model is trained. Therefore we use the same saved model with the best accuracy we have. The way we import libraries (import pandas as pd) the same way we import our model and use it.
Types of Machine Learning
- Supervised learning
- Unsupervised Learning
- Reinforcement learning
Supervised learning is a labeling machine. It takes observation and assigns a label to it. There are two flavors of supervised learning.
- Classification: Classification means assigning each data a category.
-What day is today?
Monday, Tuesday, Friday
-Is it cold outside?
Taking an example of supervised learning(labeled data) using “Classification”
So here I have taken a dataset of students’. The data set we use for supervised learning consists of features(columns) and target(column). Our data set of students’ has features columns named as (GPA and results) and a target column named as (Accepted). This data set is of students who have applied for admission, The accepted column consists of True and False where True means got admission and false means rejected.
We can use a Support Vactor Machine (SVM) to classify our data. (Just giving a bit of knowledge/understanding by graphs that how models work.)
The classifier shows that two blue points (Accepted) are considered as red (rejected), this means our model has wrongly predicted two applicants as rejected. So SVM(linear classifier) is not applicable to this problem.
we can recover the model’s behavior by allowing curves in it so that it does better predictions.
. Regression: regression means assigning a continuous variable as a prediction.
how tall this child be as an adult?
Can we predict temperature on the basis of humidity?
80% of the data is used for training the model. Seems like when humidity(x-axis) rises temperature(y_axis) increases.
On the basis of Humidity, we found that the higher humidity is the higher temperature gets.
In unsupervised learning, the data set is not labeled therefore clusters are created from the available data. Each cluster has very similar data that data is totally different from data in other clusters. clustering identifies groups in the dataset, the observation in the groups share strong similarities with members than with members of other groups.
Example: Customer Segmentation
If we get 5000 customers, we don’t sit and label each buyer, we would prefer unsupervised learning to check from what country we are getting more buyers and what item is the most sold item!!
A dataset with 6 observations, what clusters could the algorithm detect?? It depends! Here are some Possibilities.
Cluster by Species
Cluster by Color
Cluster by Origin
· Reinforcement learning
It’s an agent left in an environment, the agent performs actions according to the environment. If the agent does exactly what it should do it will be given a reward and if it does nothing or performs wrong or opposite to what was expected so it is not given any reward this is the agent learns and can work better in the future.
Applications of ML
1. Google Map
Shows up the fastest root, Takes input and shows the best root accordingly
2. Facebook — Face recognition / Ads
If anyone posts a picture with you, your face will be recognized and you will get a notification.
If you like posts of football, facebook shows up more ads about footballs.
3. Google translate
Language is no more barrier. Any language can be detected and translated to any required language.
Full self-driving car.
5. Netflix — recommendation
Suggest something that attracts you, according to previous experiances.
Kitchen robotic Chef.
Here the brief intoduction about machin learning ends, In next story I’ll start preprocessing? what is preprocessing? why is it used? and what it does to the model?