edifytubers

Today I am gonna kick start Machine learning. I gonna do using Python programming language.

Prerequisites:

To download python: https://www.python.org/

After Installing python we need to install scikit-learn

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms.

It is built on NumPy, SciPy, and matplotlib so before installing scikit-learn we need to install the above mentioned all

To install we can run the command:

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

Note: ipyhton,jupyter and pandas will be used in future.

Now to install scikit-learn use below command

pip install -U scikit-learn

Basics of ML: We will work with data sets where we need to split the data in to two Train data and Test data. Generally the ratio will be 80:20. The machine will train using train data. After training the test data is passed and the machine predict the output.

To understand machine learning easily I decide to work with problem statement

Challenge 1:

Problem: Iris Flower Classification

Description:

This problem is simple one it is like a HELLO WORLD problem for Machine learning. We need to classify the iris flower based upon the sepal and petal length. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other two; the latter are not linearly separable from each other.

The data base contains the following attributes:
1). sepal length in cm
2). sepal width in cm
3). petal length in cm
4). petal width in cm
5). class:
– Iris Setosa
– Iris Versicolour
– Iris Virginica

Algorithm used: Random forest

Random Forest is a supervised learning algorithm. It make use of decision tree.

Select random samples from a given dataset.
Construct a decision tree for each sample and get a prediction result from each decision tree.
Perform a vote for each predicted result.
Select the prediction result with the most votes as the final prediction.

Random_Forest

Coding :
#Import scikit-learn dataset library which contains the iris data set
from sklearn import datasets

#Import Random Forest Model
from sklearn.ensemble import RandomForestClassifier

# Import train_test_split function
from sklearn.model_selection import train_test_split

#Import metrics for calculating accuracy
from sklearn import metrics

# for Dataframes creation
import pandas as pd
#Load dataset
iris = datasets.load_iris()

# print the label species(setosa, versicolor,virginica)
print(iris.target_names)

# print the names of the four features
print(iris.feature_names)

#creating Dataframe
data=pd.DataFrame({
‘sepal length’:iris.data[:,0],
‘sepal width’:iris.data[:,1],
‘petal length’:iris.data[:,2],
‘petal width’:iris.data[:,3],
‘species’:iris.target
})
data.head()

# creating data frame
X=data[[‘sepal length’, ‘sepal width’, ‘petal length’, ‘petal width’]]
y=data[‘species’]

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70% training and 30% tes

#Create a Gaussian Classifier
clf=RandomForestClassifier(n_estimators=100)

#Train the model using the training set
clf.fit(X_train,y_train)

y_pred=clf.predict(X_test)

# Model Accuracy, how often is the classifier correct?
print(“Accuracy:”,metrics.accuracy_score(y_test, y_pred))

print(clf.predict([[3, 5, 4, 2]]))

Output

Random_Forest2

Edifytubers

Enlight the world!

Author Archives: edifytubers

About edifytubers

Kick starting Machine learning

Protractor_Installation

Protractor_Introduction