Programming Tips and Tricks

Environment: RStudio

Make sure you have R and RStudio installed on your machine. If not, you can follow the links below for the same.

Install R

Install R on your machine

For mac: https://cran.r-project.org/bin/macosx/

For windows: https://cran.r-project.org/bin/windows/base/

Install RStudio

Install RStudio on your machine.

https://www.rstudio.com/products/rstudio/download/

We'll make predictive model using "mtcars" dataset which is one of the built-in dataset in RStudio. This dataset has 11 variables namely "mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear" and "carb" for 32 car models.

Load this dataset in variable myData

Let's make a model to predict "mpg" using “qsec”, “hp” and “wt” attributes. We'll use "caret" package to make models. Install and load the caret package

Separate the chunk of data we need

Replace all the blank cells with NA and then remove all the rows with any cell as "NA"

Define the train control for training the model using Repeated K-fold Cross Validation

Let's build a model using the linear regression

To view the model summary,

Predict the "mpg" using Linear Regression

Calculate the accuracy of model. For Regression, there are two main metrics to calculate the accuracy of a model. First is RMSE(Root Mean Square Error) and the other is R2(R Squared). R2 has value between 0 and 1 and RMSE has a value greater than 0. Higher the value of R2, better the model and lower the value of RMSE, better the model. R2 is a better metric as it gives a better view of the accuracy of a model.

Let us make another model using Random Forest Algorithm. Only one step changes. To cater the randomness of algorithm, use set.seed() to get the same metric measurements everytime you run the model

train the model using Random Forest Algorithm

Predict the "mpg" using random forest

Measure accuracy of the model RMSE and R2

To see the metrics of both the models in a single graph, use the following

This plot will show the metrics of both the models in the form of a bwplot and also arranges the models in the ascending order of accuracy. The model at the lowest is the most accurate. When we apply repeated cv, bwplot plots multiple RMSEs and finds the model which is overall the best among multiple iterations.

Programming Tips and Tricks

Monday, August 28, 2017

Predictive Models in R