{"id":5050,"date":"2021-09-27T06:33:48","date_gmt":"2021-09-27T06:33:48","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/09\/27\/machine-learning-model-selection-strategy-for-data-scientists-and-ml-engineers\/"},"modified":"2021-09-27T06:33:48","modified_gmt":"2021-09-27T06:33:48","slug":"machine-learning-model-selection-strategy-for-data-scientists-and-ml-engineers","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/09\/27\/machine-learning-model-selection-strategy-for-data-scientists-and-ml-engineers\/","title":{"rendered":"Machine Learning Model Selection strategy for Data Scientists and ML Engineers"},"content":{"rendered":"<p>Author: Shanthababu P<\/p>\n<div>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602746855?profile=original\" target=\"_blank\" rel=\"noopener\"><br \/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602746855?profile=RESIZE_710x\" width=\"558\" height=\"404\" class=\"align-center\"><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 18pt;\"><strong>Machine Learning Model Selection strategy for Data Scientists and ML Engineers<\/strong><\/span><\/p>\n<\/p>\n<p align=\"center\"><b>&#8220;Thus learning is not possible without inductive bias, and now the question is how to c<span class=\"comment-highlite\" id=\"spancomment2729\">hoose the<\/span><span>\u00a0<\/span>right bias. This is called model selection.<\/b>&#8221;\u00a0<span>ETHEN ALPAYDIN (2004) p33 (Introduction to Machine Learning)<\/span><\/p>\n<h3><span class=\"comment-highlite\" id=\"spancomment2472\">Guys!\u00a0<\/span><\/h3>\n<p>Really there are many more definitions concerning Model Selection. In this article, we are going to discuss Model Selection and its strategy for Data Scientists and Machine Learning Engineers.<\/p>\n<p class=\"\">An ML model(s) are always constructed using various mathematical frameworks and that would generate predictions based on the nature of the dataset and finding patterns out of it.<\/p>\n<p class=\"\">\n<h3><b>Understand Machine Learning Model and Algorithm<\/b><\/h3>\n<p class=\"\">Most of them are really confused between two terminologies in machine learning &#8211; ML-Model and ML-Algorithm. Even me too. But over the period I got to understand the thin line between these two terms. The understanding of these differences is very important during the machine learning life cycle.<\/p>\n<p class=\"\">\n<p class=\"\"><span>An\u00a0<\/span><b>ALGORITHM<span>\u00a0<\/span><\/b><span>is always RUN ON THE DATA to create a stabilized model.<\/span><\/p>\n<p class=\"\"><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602750270?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602750270?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/span><\/p>\n<p class=\"\"><span>The\u00a0<b>MODEL\u00a0<\/b>includes both the\u00a0<i><b>DATA\u00a0<\/b><\/i>and a\u00a0<i><b>PROCEDURE\u00a0<\/b><\/i>(algorithm). Which is using the data to predict a new set of data.\u00a0 The ML- Model usually represents that what was learned over time by the algorithm, which was run on the data. We could say that the model is a packed item after all learning processes and to make predictions in future with new data.<\/span><\/p>\n<p class=\"\"><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602750477?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602750477?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/span><\/p>\n<h2><b>Model Selection is a mantra<\/b><\/h2>\n<p>Always selection is the process of selecting the best one by comparing and validating with various parameters and choosing the final one. So in machine learning as well the model selection is not an exceptional case either way!\u00a0<\/p>\n<p>In Data Science and ML world, this is coming under the collection of available algorithms under different names, purposes, nature of the data. After applied on the training dataset, followed by selecting the BEST ONE and promoting the same model into the PRODUCTION for the live\/streaming dataset and monitoring the outcome. And on top make sure the best results and sustaining with the same model for new data for regression, classification and clustering problem statements.<\/p>\n<p>Model Selection is one of the critical stages in the Data Science and ML life cycle and the process of selecting the best machine learning model ( data and algorithm combination)\u00a0<\/p>\n<p><b>Where is Model Selection in the ML life cycle?<\/b><\/p>\n<p>After successful completion of the EDA and Feature Engineering process, we&#8217;re safer to start doing the model selection for the finetuned dataset with the following key considerations.<\/p>\n<ul>\n<li>Nature of the dataset<\/li>\n<li>Column collections<\/li>\n<li>Type of individual column\/features<\/li>\n<li>Given problem statement(s).<\/li>\n<\/ul>\n<p class=\"\">Below lifecycle diagram representing the entire flow and is highlighted with the model selection area.<\/p>\n<div class=\"medium-insert-images medium-insert-images-wide\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602753662?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602753662?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/div>\n<div class=\"medium-insert-images medium-insert-images-wide\">\n<p>The highlight here is<b><span>\u00a0<\/span><\/b>we may have a dataset and all preprocessing techniques and various machine learning nuances have been applied to it, but no concrete idea which model from classification or regression predictive model, etc.,<\/p>\n<\/p>\n<p class=\"\">The bottom line is<span>\u00a0<\/span><b>&#8220;We have<span>\u00a0<\/span><\/b><b>to compare the relative performance between more than two models for the given and cleaned data set&#8221;<\/b><\/p>\n<blockquote><p><b>The model should be very flexible and fit into the data and explain the insight of the data in statistical estimation form.<\/b><\/p><\/blockquote>\n<p class=\"\">Before we going for model selection, have to find out the level of data distributions and other dependency factors which are involved in the given dataset. These factors are commonly known as\u00a0<strong><em>bias<\/em><\/strong><span>\u00a0<\/span>and<span>\u00a0<\/span><strong><em>variance<\/em><\/strong>. Let&#8217;s discuss how these factors impact model selection.\u00a0<\/p>\n<p>During the model selection, we are supposed to get ready with the required sufficient data in hand.\u00a0In an ideal situation, we have to split the data into three different sets of data\u00a0<\/p>\n<ul>\n<li>Training\u00a0set\u00a0\n<ul>\n<li>Used to fit the models<\/li>\n<\/ul>\n<\/li>\n<li>Validation\u00a0set\n<ul>\n<li>Used to Estimate the prediction of error for the model<\/li>\n<\/ul>\n<\/li>\n<li>Test set\n<ul>\n<li>Used for Assessment of the Generalization error<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Once all the above process flow has been completed the final model could be select from the list of models.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602754279?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602754279?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/p>\n<h3>What is Bias and Variance, How it playing in Model selection?<\/h3>\n<p id=\"916a\"><strong>Bias:<\/strong><span>\u00a0<\/span>Bias is an error that has been introduced in our model due to the oversimplification of used the machine learning algorithm. The basic problem here is that the algorithm is not strong enough to capture the patterns or trends in the fine-tuned data set. The root cause for this error is when the data is too complex for the algorithm to understand. so it ends up with low accuracy and this leads to<span>\u00a0<\/span><b>underfitting<span>\u00a0<\/span><\/b>the model.<\/p>\n<p id=\"916a\">Generally below are the list of algorithms leading to low bias\u00a0<\/p>\n<ul>\n<li>Decision Trees<\/li>\n<li>k-NN and\u00a0<\/li>\n<li>SVM\u00a0<\/li>\n<\/ul>\n<p id=\"916a\">And high bias leading machine learning algorithms are as below<\/p>\n<ul>\n<li>Linear Regression<\/li>\n<li>Logistic Regression<\/li>\n<\/ul>\n<p id=\"ceae\"><strong>Variance:<\/strong><span>\u00a0<\/span>Variance is an error that has been introduced in our model due to the selection of a complex machine learning algorithm(s), with high noise in the given dataset, resulting in high sensitivity and<span>\u00a0<\/span><b>overfitting<\/b>. You can observe that the performs of the model is well on the training dataset but poor performance on the testing dataset.<\/p>\n<p id=\"ceae\" class=\"\">The below visualization depicting the relation between<span>\u00a0<\/span><strong>Bias and<span>\u00a0<\/span><\/strong><strong>Variance<\/strong><\/p>\n<p class=\"\"><strong><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602754868?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602754868?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/strong><\/p>\n<p id=\"ceae\" class=\"\">You can observe from the above visualization, that the model complexity has increased after a certain point, your model becomes overfitting.\u00a0<\/p>\n<blockquote><p>we have to keep it in our mind is &#8220;Increasing the bias will decrease the variance. Increasing the variance will decrease bias, So<span>\u00a0<\/span><b>Bias<span>\u00a0<\/span><\/b>and<span>\u00a0<\/span><b>Variance<span>\u00a0<\/span><\/b>are indirectly proportional to each other.<\/p><\/blockquote>\n<h2>Types of Model Selection<\/h2>\n<p class=\"\">There are 2 major techniques in model selection, as mentioned earlier this is a<span>\u00a0<\/span><b>m<\/b><b>athematical model and\u00a0<\/b><span>patterns are extracted from the given dataset.<\/span><\/p>\n<ul>\n<li>Resampling<\/li>\n<li>Probabilistic<\/li>\n<\/ul>\n<p class=\"\"><b>Resampling<\/b>: These are simple techniques just rearranging data samples and inspecting that the model performs good or bad with the data set.<\/p>\n<p class=\"\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602755486?profile=original\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602755486?profile=RESIZE_710x\" class=\"align-center\" width=\"472\" height=\"267\"><\/a><\/p>\n<p class=\"\">Let&#8217;s focus on few samples now, to understand the concepts, that we have discussed above.<span><br \/><\/span><\/p>\n<h2>I.<span class=\"comment-highlite\" id=\"spancomment2709\">Bias &amp;\u00a0<\/span>Variance<\/h2>\n<p>The below code tells you<\/p>\n<p>print(&#8220;############################################&#8221;)<br \/> print(&#8221; Importing required library &#8220;)<br \/> print(&#8220;############################################&#8221;)<br \/> import mlxtend<br \/> from pandas import read_csv<br \/> from sklearn.model_selection import train_test_split<br \/> from sklearn.linear_model import LinearRegression<br \/> from mlxtend.evaluate import bias_variance_decomp<br \/> # load dataset<br \/> print(&#8220;############################################&#8221;)<br \/> print(&#8221; Data Loading &#8220;)<br \/> print(&#8220;############################################&#8221;)<br \/> dataframe = read_csv(&#8220;housing.csv&#8221;)<br \/> data = dataframe.values<br \/> X, y = data[:, :-1], data[:, -1]<br \/> print(X)<br \/> print(&#8220;#######################################################&#8221;)<br \/> print(&#8221; Spliting the data for test and train &#8220;)<br \/> print(&#8220;#######################################################&#8221;)<br \/> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)<br \/> print(&#8220;#######################################################&#8221;)<br \/> print(&#8221; Applying LinearRegression &#8220;)<br \/> print(&#8220;#######################################################&#8221;)<br \/> model = LinearRegression()<br \/> # Estimating Bias and Variance bias_variance_decomp<br \/> print(&#8220;#######################################################&#8221;)<br \/> print(&#8220;Estimating Bias and Variance using Bias Variance Decomp&#8221;)<br \/> print(&#8220;#######################################################&#8221;)<br \/> mse, bias, var = bias_variance_decomp(model, X_train, y_train, X_test, y_test, loss=&#8217;mse&#8217;, num_rounds=200, random_seed=1)<br \/> print(&#8220;############################################&#8221;)<br \/> print(&#8216;Mean Squared Error (MSE): %.2f&#8217; % mse)<br \/> print(&#8216;Bias of the given data: %.2f&#8217; % bias)<br \/> print(&#8216;Variance of the given data: %.2f&#8217; % var)<br \/> print(&#8220;############################################&#8221;)<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Output<\/strong><\/span><\/p>\n<p>############################################<br \/> Importing required library <br \/> ############################################<br \/> ############################################<br \/> Data Loading <br \/> ############################################<br \/> [[2.7310e-02 0.0000e+00 7.0700e+00 &#8230; 1.7800e+01 3.9690e+02 9.1400e+00]<br \/> [2.7290e-02 0.0000e+00 7.0700e+00 &#8230; 1.7800e+01 3.9283e+02 4.0300e+00]<br \/> [3.2370e-02 0.0000e+00 2.1800e+00 &#8230; 1.8700e+01 3.9463e+02 2.9400e+00]<br \/> &#8230;<br \/> [6.0760e-02 0.0000e+00 1.1930e+01 &#8230; 2.1000e+01 3.9690e+02 5.6400e+00]<br \/> [1.0959e-01 0.0000e+00 1.1930e+01 &#8230; 2.1000e+01 3.9345e+02 6.4800e+00]<br \/> [4.7410e-02 0.0000e+00 1.1930e+01 &#8230; 2.1000e+01 3.9690e+02 7.8800e+00]]<br \/> #######################################################<br \/> Spliting the data for test and train <br \/> #######################################################<br \/> #######################################################<br \/> Applying LinearRegression <br \/> #######################################################<br \/> #######################################################<br \/> Estimating Bias and Variance using Bias Variance Decomp<br \/> #######################################################<br \/> ############################################<br \/> Mean Squared Error (MSE): 26.25<br \/> Bias of the given data: 25.08<br \/> Variance of the given data: 1.17<br \/> ############################################<\/p>\n<p><span style=\"font-size: 14pt;\"><b>II.Random split<\/b><\/span><\/p>\n<p>print(&#8220;############################################&#8221;)<br \/> print(&#8221; Random split &#8220;)<br \/> print(&#8220;############################################&#8221;)<br \/> print(&#8220;############################################&#8221;)<br \/> print(&#8221; Importing required library &#8220;)<br \/> print(&#8220;############################################&#8221;)<br \/> import numpy as np<br \/> from sklearn.model_selection import train_test_split<br \/> print(&#8220;############################################&#8221;)<br \/> print(&#8221; Creating array,reshaping and arraging &#8220;)<br \/> print(&#8220;############################################&#8221;)<br \/> X, y = np.arange(10).reshape((5, 2)), range(5)<br \/> print(&#8220;############################################&#8221;)<br \/> print(&#8220;y &#8211; values:&#8221;,list(y))<br \/> print(&#8220;############################################&#8221;)<br \/> print(&#8220;X &#8211; values:&#8221;,list(X))<br \/> print(&#8220;############################################&#8221;)<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Output<\/strong><\/span><\/p>\n<p>############################################<br \/> Random split <br \/> ############################################<br \/> ############################################<br \/> Importing required library <br \/> ############################################<br \/> ############################################<br \/> Creating array,reshaping and arraging <br \/> ############################################<br \/> ############################################<br \/> y &#8211; values: [0, 1, 2, 3, 4]<br \/> ############################################<br \/> X &#8211; values: [array([0, 1]), array([2, 3]), <br \/> array([4, 5]), array([6, 7]), array([8, 9])]<br \/> ############################################<\/p>\n<p class=\"\"><strong style=\"font-size: 14pt;\">III. KFold<\/strong><\/p>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; K-Fold &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; Importing required library &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>import numpy as np<\/div>\n<div>from sklearn.model_selection import KFold<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; Definining array &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>X = np.array([[1, 5], [2, 4], [2, 4], [3, 6]])<\/div>\n<div>y = np.array([1, 2, 3, 4])<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; No. Of Fold in the given array &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>kf = KFold(n_splits=4)<\/div>\n<div>kf.get_n_splits(X)<\/div>\n<div>print(kf)<\/div>\n<div>KFold(n_splits=4, random_state=None, shuffle=False)<\/div>\n<div>for train_index, test_index in kf.split(X):<\/div>\n<div>print(&#8220;TRAIN:&#8221;, train_index, &#8220;TEST:&#8221;, test_index)<\/div>\n<div>X_train, X_test = X[train_index], X[test_index]<\/div>\n<div>y_train, y_test = y[train_index], y[test_index]<\/div>\n<div><span style=\"font-size: 14pt;\"><b>Output\u00a0<\/b><\/span><\/div>\n<div>############################################<br \/> K-Fold <br \/> ############################################<br \/> ############################################<br \/> Importing required library <br \/> ############################################<br \/> ############################################<br \/> Defining array <br \/> ############################################<br \/> ############################################<br \/> No. Of Fold in the given array <br \/> ############################################<br \/> KFold(n_splits=4, random_state=None, shuffle=False)<br \/> TRAIN: [1 2 3] TEST: [0]<br \/> TRAIN: [0 2 3] TEST: [1]<br \/> TRAIN: [0 1 3] TEST: [2]<br \/> TRAIN: [0 1 2] TEST: [3]<\/div>\n<h2><b>IV. Bootstrapping with Array<\/b><\/h2>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; Bootstrapping with Array &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; Importing required library &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>import pandas as pd<\/div>\n<div>import numpy as np<\/div>\n<div>from sklearn.model_selection import train_test_split<\/div>\n<div>from sklearn import tree<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; Loading sample.csv file &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>GymData = read_csv(&#8220;sample.csv&#8221;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; Top 5 records from dataset &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(GymData.head(5))<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>print(&#8221; Defining Target Variable &#8211; Weight &#8220;)<\/div>\n<div>print(&#8220;############################################&#8221;)<\/div>\n<div>TargetVariable=&#8217;Weight&#8217;<\/div>\n<div>y=GymData[TargetVariable].values<\/div>\n<div>print(y)<\/div>\n<div>print(&#8220;#################################################&#8221;)<\/div>\n<div>print(&#8221; Defining Predictor Variables &#8211; Hours &amp; Calories &#8220;)<\/div>\n<div>print(&#8220;#################################################&#8221;)<\/div>\n<div>Predictors=[&#8216;Hours&#8217;,&#8217;Calories&#8217;]<\/div>\n<div>X=GymData[Predictors].values<\/div>\n<div>print(X)<\/div>\n<div>print(&#8220;#################################################&#8221;)<\/div>\n<div>print(&#8221; Let&#8217;s perform Bootstrapping &#8211; Using simple loop &#8220;)<\/div>\n<div>print(&#8220;#################################################&#8221;)<\/div>\n<div>AccuracyValues=[] # to hold the values<\/div>\n<div>n_times=5 # no of loop<\/div>\n<div>print(&#8220;###########################################################&#8221;)<\/div>\n<div>print(&#8221; Increasing the seed value for each iteration and building&#8221;)<\/div>\n<div>print(&#8221; DecisionTreeRegressor modle&#8221;)<\/div>\n<div>print(&#8220;###########################################################&#8221;)<\/div>\n<div>for i in range(n_times):<\/div>\n<div>X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42+i)<\/div>\n<div>RegModel = tree.DecisionTreeRegressor(max_depth=3,criterion=&#8217;mse&#8217;)<\/div>\n<div>DTree=RegModel.fit(X_train,y_train)<\/div>\n<div>prediction=DTree.predict(X_test)<\/div>\n<div>Accuracy=100- (np.mean(np.abs((y_test &#8211; prediction) \/ y_test)) * 100)<\/div>\n<div>AccuracyValues.append(np.round(Accuracy))<\/div>\n<div>print(&#8220;#################################################&#8221;)<\/div>\n<div>print(&#8220;Bootstrapping trials :&#8221;)<\/div>\n<div>print(&#8220;List of Accuracy:&#8221;,AccuracyValues)<\/div>\n<div>print(&#8220;#################################################&#8221;)<\/div>\n<div>print(&#8216;Final Average(Mean) accuracy&#8217;,np.mean(AccuracyValues))<\/div>\n<p class=\"\">\n<p class=\"\"><span style=\"font-size: 14pt;\"><b>Output<\/b><\/span><\/p>\n<\/p>\n<p>############################################<br \/> Bootstrapping with Array <br \/> ############################################<br \/> ############################################<br \/> Importing required library <br \/> ############################################<br \/> ############################################<br \/> Loading sample.csv file <br \/> ############################################<br \/> ############################################<br \/> Top 5 records from dataset <br \/> ############################################<br \/> Hours Calories Weight<br \/> 0 1.0 2500 95<br \/> 1 2.0 2000 85<br \/> 2 2.5 1900 83<br \/> 3 3.0 1850 81<br \/> 4 3.5 1600 80<br \/> ############################################<br \/> Defining Target Variable &#8211; Weight <br \/> ############################################<br \/> [95 85 83 81 80 78 77 80 75 70]<br \/> #################################################<br \/> Defining Predictor Variables &#8211; Hours &amp; Calories <br \/> #################################################<br \/> [[1.00e+00 2.50e+03]<br \/> [2.00e+00 2.00e+03]<br \/> [2.50e+00 1.90e+03]<br \/> [3.00e+00 1.85e+03]<br \/> [3.50e+00 1.60e+03]<br \/> [4.00e+00 1.50e+03]<br \/> [5.00e+00 1.50e+03]<br \/> [5.50e+00 1.60e+03]<br \/> [6.00e+00 1.70e+03]<br \/> [6.50e+00 1.50e+03]]<br \/> #################################################<br \/> Let&#8217;s perform Bootstrapping &#8211; Using simple loop <br \/> #################################################<br \/> ###########################################################<br \/> Increasing the seed value for each iteration and building<br \/> DecisionTreeRegressor modle<br \/> ###########################################################<br \/> #################################################<br \/> Bootstrapping trials :<br \/> List of Accuracy: [94.0, 95.0, 98.0, 98.0, 93.0]<br \/> #################################################<br \/> Final Average(Mean) accuracy 95.6<\/p>\n<pre><span style=\"font-size: 12pt;\"><strong>What is Model Evaluation Metrics in machine learning?<\/strong><\/span><\/pre>\n<p class=\"\">The straightforward answer is to evaluate the performance of the machine learning model which we have selected in the previous steps. The objective is to estimate the accuracy of a model on future (or) unseen\u00a0data, So the model should be generalized form. Without performing this evaluation step(s), we shouldn&#8217;t move the model into production on unseen data. otherwise, we end up with a poor prediction, classification and etc.,\u00a0<\/p>\n<blockquote><p>&#8220;We could say that the<span>\u00a0<\/span><b>MODEL<\/b>\u00a0have to train such a way to\u00a0<b>Learn,<\/b><span>\u00a0<\/span>but not<b>\u00a0<\/b>to<b>\u00a0<\/b><b>Memorize<\/b><b>\u00a0<\/b>\n<\/p><\/blockquote>\n<blockquote><p><b>Generalization of the model is the key takeaway<\/b><\/p><\/blockquote>\n<p id=\"dc1d\" class=\"\">Guys! I trust you got some understanding from this article regarding &#8220;Machine Learning Model Selection &amp; Evaluation Strategy&#8221;. I have tried to cover as quickly as possible without getting too much time. Thanks for your time in reading this. Hope it was useful. I will get back to you with a new and interesting topic. Until then, Cheers! Shanthababu,<\/p>\n<p class=\"\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9602755486?profile=original\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n<\/div>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:1069948\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Shanthababu P Machine Learning Model Selection strategy for Data Scientists and ML Engineers &#8220;Thus learning is not possible without inductive bias, and now the [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/09\/27\/machine-learning-model-selection-strategy-for-data-scientists-and-ml-engineers\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":462,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5050"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=5050"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5050\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/457"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=5050"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=5050"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=5050"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}