Types of Machine Learning AlgorithmsMachine Learning algorithms can be classified into:
1.Supervised Algorithms - Linear Regression, Logistic
Regression, KNN classification, Support Vector Machine (SVM),
Decision Trees, Random Forest, Naive Bayes’ theorem
2. Unsupervised Algorithms - K Means Clustering
Let us dig a bit deeper in these machine learning basics algorithms
Supervised Machine Learning Algorithms
In this type of algorithm, the data set on which the machine is trained consists of labelled data or simply said, consists both the input parameters as well as the required output.
Let’s take the example of facial recognition and once we have identified the people in the photos, we will try to classify them as baby, teenager or adult. Here baby, teenager and adult will be our labels and our training dataset will already be classified into the given labels based on certain parameters through which the machine will learn these features and patterns and classify some new input data based on the learning from this training data.
Supervised Machine Learning Algorithms can be broadly divided into two types of algorithms; Classification and Regression.
Just as the name suggests, these algorithms are used to classify data into predefined classes or labels. We will discuss one of the most used classification algorithm known as the K-Nearest Neighbor (KNN) Classification Algorithm.
KNN Classification Machine Learning Algorithm
This algorithm is used to classify a set of data points into specific groups or classes based on the similarities between the data points.
Regression Machine Learning Algorithms
These algorithms are used to determine the mathematical
relationship between two or more variables and the level of
dependency between variables. These can be used for predicting
an output based on the interdependency of two or more
We have two types of regression algorithms: Linear Regression and Logistic Regression
Linear Regression Machine Learning
Initially developed in statistics to study the relationship
between input and output numerical variables, it was adopted
by the machine learning community to make predictions based on
the linear regression equation.
y = β0 + β1x1
A graph of the linear regression equation model is as shown below:
Linear regression can be used to find the general price trend of a stock over a period of time. This helps us understand if the price movement is positive or negative.
In logistic regression, our aim is to produce a discrete value, either 1 or 0. This helps us in finding a definite answer to our scenario.
Logistic regression can be mathematically represented as,
The logistic regression model computes a weighted sum of the input variables similar to the linear regression, but it runs the result through a special non-linear function, the logistic function or sigmoid function to produce the output y.
Support Vector Machine (SVM) Learning Algorithm
Support Vector Machine was initially used for data analysis. Initially, a set of training examples is fed into the SVM algorithm, belonging to one or the other category. The algorithm then builds a model that starts assigning new data to one of the categories that it has learned in the training phase. In the SVM algorithm, a hyperplane is created which serves as a demarcation between the categories. When the SVM algorithm processes a new data point and depending on the side on which it appears it will be classified into one of the classes.
When related to trading, an SVM algorithm can be built which categorises the equity data as a favourable buy, sell or neutral classes and then classifies the test data according to the rules.
Decision trees are basically a tree-like support tool which can be used to represent a cause and its effect. Since one cause can have multiple effects, we list them down (quite like a tree with its branches).
We can build the decision tree by organising the input data and predictor variables, and according to some criteria that we will specify.
The main steps to build a decision tree are:
1. Retrieve market data for a financial instrument.
2. Introduce the Predictor variables (i.e. Technical indicators, Sentiment indicators, Breadth indicators, etc.)
3. Setup the Target variable or the desired output.
4. Split data between training and test data.
5. Generate the decision tree training the model.
6. Testing and analyzing the model. The disadvantage of decision trees is that they are prone to overfitting due to their inherent design structure.
A random forest algorithm was designed to address some of the limitations of decision trees.Random Forest comprises of decision trees which are graphs of decisions representing their course of action or statistical probability. These multiple trees are mapped to a single tree which is called Classification and Regression (CART) Model.
To classify an object based on its attributes, each tree gives a classification which is said to “vote” for that class. The forest then chooses the classification with the greatest number of votes. For regression, it considers the average of the outputs of different trees.
Naive Bayes theorem
Now, if you remember basic probability, you would know that Bayes theorem was formulated in a way where we assume we have prior knowledge of any event that related to the former event.
For example, to check the probability that you will be late to the office, one would like to know if you face any traffic on the way.
However, the Naive Bayes classifier algorithm assumes that two events are independent of each other and thus, this simplifies the calculations to a large extent. Initially thought of nothing more than an academic exercise, Naive Bayes has shown that it works remarkably well in the real world as well.Naive Bayes algorithm can be used to find simple relationships between different parameters without having complete data
We will now look at the next type of Machine learning algorithms, ie Unsupervised machine learning algorithms.
Unsupervised Machine Learning Algorithms
Unlike supervised learning algorithms, where we deal with labelled data for training, the training data will be unlabelled for Unsupervised Machine Learning Algorithms. The clustering of data into a specific group will be done on the basis of the similarities between the variables. Some of the unsupervised machine learning algorithms are K-means clustering, neural networks.
K-means clustering Machine Learning Algorithm
Before we understand the working of the K-means clustering algorithm, let us first break down the word K-means clustering to understand what it means.
Clustering: In this algorithm, we form clusters which are a collection of data points grouped together due to their similarities.
K refers to the number of centroids which will be considered for a specific problem whereas ‘means’ refers to a centroid which is considered as the central point of any cluster.
A simple example would be that given the data of football players, we will use K-means clustering and label them according to their similarity. Thus, these clusters could be based on the strikers preference to score on free kicks or successful tackles, even when the algorithm is not given pre-defined labels to start with.K-means clustering would be beneficial to traders who feel that there might be similarities between different assets which cannot be seen on the surface.
While we did mention neural networks in unsupervised machine learning algorithms, it can be debated that they can be used for both supervised as well as unsupervised learning algorithms. Let’s understand Artificial and Recurrent Neural networks now.
Reinforcement Machine Learning Algorithms
Reinforcement Learning is a type of Machine Learning in which the machine is required to determine the ideal behaviour within a specific context, in order to maximize its rewards. It works on the rewards and punishment principle which means that for any decision which a machine takes, it will be either be rewarded or punished. Thus, it will understand whether or not the decision was correct. This is how the machine will learn to take the correct decisions to maximize the reward in the long run.
For reinforcement algorithm, a machine can be adjusted and programmed to focus more on either the long-term rewards or the short-term rewards. When the machine is in a particular state and has to be the action for the next state in order to achieve the reward, this process is called the Markov Decision Process.
We have now covered most of the popular machine learning algorithms which are used today. As you have understood them, it is imperative that we go through a few terms to make sure we are well versed with machine learning basics.