{"id":413,"date":"2019-07-09T11:01:12","date_gmt":"2019-07-09T11:01:12","guid":{"rendered":"http:\/\/nitk.acm.org\/blog\/?p=413"},"modified":"2019-07-09T11:35:43","modified_gmt":"2019-07-09T11:35:43","slug":"a-quick-intro-to-machine-learning-techniques","status":"publish","type":"post","link":"https:\/\/nitk.acm.org\/blog\/2019\/07\/09\/a-quick-intro-to-machine-learning-techniques\/","title":{"rendered":"A Quick Intro to Machine Learning Techniques"},"content":{"rendered":"<p>Machine learning is one of the hot topics everyone wants on their resume. Machine learning focuses on the development of computer programs that can use data to learn for themselves. Here is a quick introduction to various machine learning techniques:<\/p>\n<p>Machine learning is classified as follows:<\/p>\n<ol>\n<li><b>Supervised Learning<\/b><b><i>:<\/i><\/b> Machines are trained using well labelled data i.e. we already know the correct output for a portion of the input data and have to predict in injunction more such right answers for the remaining input. Supervised Learning is further classified as:<\/li>\n<\/ol>\n<ol>\n<li>Regression: The output variable is real and continuous like \u201csalary\u201d or \u201cweight\u201d.<\/li>\n<li>Classification: The output variable is discrete. We need to classify the input into categories like &#8220;spam\/not&#8221; or &#8220;benign\/malignant&#8221;.<\/li>\n<\/ol>\n<ol>\n<li><b>Unsupervised Learning:<\/b> Machines are trained using unlabelled data i.e\u00a0 we only have input data and no corresponding output. The main goal of this technique is to learn about structures in data. Unsupervised Learning is further classified as:<\/li>\n<\/ol>\n<ol>\n<li>Clustering: The objective is to discover the inherent groups in the data. For example, classifying customers based on their purchases.<\/li>\n<li>Association: The objective is to discover rules that describe a large portion of data. For example, people who buy cars also tend to buy petrol\/diesel.<\/li>\n<\/ol>\n<ol start=\"3\">\n<li><b>Semi-supervised Learning<\/b><b><i>:<\/i><\/b> The input data consists of a small amount of labeled data and a large amount of unlabelled data. We use Unsupervised Learning to learn about structures in data and Supervised Learning to make the best guess using the small amount of labeled data. We then feed the output back into the Supervised Learning algorithm as training data, and use the model to make predictions on new unseen data.<\/li>\n<\/ol>\n<p><b>Overfitting vs Underfitting:<\/b><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-414\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/1.png\" alt=\"\" width=\"580\" height=\"201\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/1.png 381w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/1-300x104.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Overfitting and underfitting are widely used terminology in ML. Our main aim is to build a generalized model which fits the training as well as the test dataset accurately.<\/p>\n<ol>\n<li><b>Overfitting<\/b>: The function is too closely fit to a limited set of data points. This results in a model biased towards the training data. It does not give accurate results on the test data.<\/li>\n<li><b>Underfitting<\/b>: The function approximates the behaviour of data to a large extent and does not fit the training data properly.<\/li>\n<\/ol>\n<p><b>Supervised Learning<\/b><\/p>\n<p>Cost function is the difference between actual output and output predicted by the model. It can be in the form of absolute difference, mean square error, etc. Generally, the mean square error is used. The main aim of Supervised Learning is to minimise the cost function.<\/p>\n<p><b>Linear Regression:<\/b> In this algorithm, we try to linearly map the input to output. The predicted line should be such that it minimizes the cost function.<\/p>\n<p>Let y = b1x + b0 be an equation of the line that minimizes the cost function. Then:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-423\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/2.png\" alt=\"\" width=\"255\" height=\"71\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-418\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/7.png\" alt=\"\" width=\"162\" height=\"39\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-422\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/3.png\" alt=\"\" width=\"649\" height=\"386\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/3.png 649w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/3-300x178.png 300w\" sizes=\"auto, (max-width: 649px) 100vw, 649px\" \/><\/p>\n<p><b>Logistic Regression:<\/b> Although named regression, it is a classification algorithm. For example, to predict whether an email is spam(1) or not(0). We define:<\/p>\n<p>Hypothesis =&gt; Z = WX + B<\/p>\n<p>h\u0398(x) = sigmoid (Z)<\/p>\n<p>If h\u0398(x) &gt;=0.5: output (y) is 1<\/p>\n<p>Else: output (y) is 0.<\/p>\n<p>Sigmoid Function:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-421\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/4.png\" alt=\"\" width=\"1000\" height=\"460\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/4.png 1000w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/4-300x138.png 300w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/4-768x353.png 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p>If h\u0398(x)&gt;=0.5, output(y) is 1, else output(y) is 0.<\/p>\n<p>Here cost function is:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-420\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/5.png\" alt=\"\" width=\"498\" height=\"111\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/5.png 498w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/5-300x67.png 300w\" sizes=\"auto, (max-width: 498px) 100vw, 498px\" \/><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><b><i>SVM:<\/i><\/b><\/li>\n<\/ul>\n<p>SVMs also known as Large Margin Classifiers, can be used for both classifications as well as regression challenges. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in an n-dimensional space (where n is a number of features) with the value of each feature equal to the value of a particular coordinate. Then, we perform classification by finding the hyperplane that differentiates the two classes well.<\/p>\n<p>If multiple hyper-planes classifying the data are possible, we choose the hyper-plane that maximizes the distance between the nearest data point (either class) and the hyper-plane. This distance is known as <b>Margin<\/b><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-416\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/An-example-of-a-maximum-margin-classifier.png\" alt=\"\" width=\"508\" height=\"301\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/An-example-of-a-maximum-margin-classifier.png 508w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/An-example-of-a-maximum-margin-classifier-300x178.png 300w\" sizes=\"auto, (max-width: 508px) 100vw, 508px\" \/><\/p>\n<p>When many different hyper-plane classify classes very well then maximizing the distances between nearest data point (either class) and hyper-plane will help us to decide the right hyper-plane. This distance is called as <b>Margin<\/b>.<\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><b><i>Naive Bayes Classifier:<\/i><\/b><\/li>\n<\/ul>\n<p>Naive Bayes classifiers are a family of classification algorithms based on <b>Bayes\u2019 Theorem<\/b>. The fundamental Naive Bayes assumption is that each feature makes an independent and equal contribution to the outcome.<\/p>\n<p><b>Bayes\u2019 Theorem:<\/b><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-424\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/1LB-G6WBuswEfpg20FMighA.png\" alt=\"\" width=\"370\" height=\"149\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/1LB-G6WBuswEfpg20FMighA.png 734w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/1LB-G6WBuswEfpg20FMighA-300x121.png 300w\" sizes=\"auto, (max-width: 370px) 100vw, 370px\" \/><\/p>\n<p>We can find the probability of <b>A<\/b> happening, given that <b>B <\/b>has already occurred. Here, <b>B<\/b> is the evidence and <b>A<\/b> is the hypothesis. The assumption made here is that the features are independent i.e. the presence of one particular feature does not affect the other. Hence it is called naive. Here, our model assumes A to be the output and B to be the feature vector. P(A|B) signifies the probability of getting output A when we have B as an input.<\/p>\n<ul>\n<li><b><i>KNN:<\/i><\/b><\/li>\n<\/ul>\n<p>The KNN algorithm is one of the simplest classification algorithms used for regression problems as well. It assumes that similar things exist in close proximity to each other i.e. similar objects are near to each other. The value of K determines how many nearest points should be considered to decide the class of a sample.<\/p>\n<p>As we decrease the value of K to 1, our predictions become less stable. If K=1, then the class of the sample will be assigned to be the class of the nearest point. But it would be better to assign class as an average of few points rather than just one.<\/p>\n<p>On increasing K, our predictions become more stable due to majority voting \/ averaging, and thus, more accurate (up to a certain point). Eventually, we begin to witness an increasing number of errors. It is at this point we know we have pushed the value of K too far. In cases where we are taking a majority vote (e.g. picking the mode in a classification problem) among labels, we usually make K an odd number to have a tiebreaker.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-415\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/index.jpeg\" alt=\"\" width=\"584\" height=\"268\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/index.jpeg 331w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/index-300x138.jpeg 300w\" sizes=\"auto, (max-width: 584px) 100vw, 584px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-417\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/8.png\" alt=\"\" width=\"561\" height=\"256\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/8.png 902w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/8-300x137.png 300w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/8-768x350.png 768w\" sizes=\"auto, (max-width: 561px) 100vw, 561px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><b><i>Unsupervised Learning:<\/i><\/b><\/p>\n<ul>\n<li><b><i>Clustering:<\/i><\/b><\/li>\n<\/ul>\n<p>Clustering is the task of dividing the population or data points into a number of groups (clusters) such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them.<\/p>\n<p>Let\u2019s understand this with an example. Suppose, you are the head of a rental store and wish to understand the preferences of your customers to scale up your business. Is it possible for you to look at the details of each customer and devise a unique business strategy for each one of them? Definitely not. But, what you can do is to cluster all of your customers into say 10 groups based on their purchasing habits and use a separate strategy for customers in each of these 10 groups.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-419\" src=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/6.png\" alt=\"\" width=\"1378\" height=\"934\" srcset=\"https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/6.png 1378w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/6-300x203.png 300w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/6-768x521.png 768w, https:\/\/nitk.acm.org\/blog\/wp-content\/uploads\/2019\/06\/6-1024x694.png 1024w\" sizes=\"auto, (max-width: 1378px) 100vw, 1378px\" \/><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><b><i>Anomaly Detection:<\/i><\/b><\/li>\n<\/ul>\n<p><b>\u00a0<\/b>Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. This is important due to the fact that anomalies in data translate to significant (often critical) actionable information in a wide variety of application domains. Generally, anomalies are detected and removed. Then, supervised machine learning techniques are applied to the sample.<\/p>\n<p><em>&#8211;<\/em><em> Chetan Agarwal<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Machine learning is one of the hot topics everyone wants on their resume. Machine learning focuses on the development of computer programs that can use data to learn for themselves. Here is a quick introduction to various machine learning techniques: Machine learning is classified as follows: Supervised Learning: Machines are trained using well labelled data&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[25,10],"tags":[118],"class_list":["post-413","post","type-post","status-publish","format-standard","hentry","category-sanganitra","category-tech","tag-machine-learning"],"_links":{"self":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/413","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/comments?post=413"}],"version-history":[{"count":13,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/413\/revisions"}],"predecessor-version":[{"id":442,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/posts\/413\/revisions\/442"}],"wp:attachment":[{"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/media?parent=413"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/categories?post=413"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nitk.acm.org\/blog\/wp-json\/wp\/v2\/tags?post=413"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}