This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Learning

machine learning algorithms

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

machine learning algorithms

Introduction

In this report, we will try to create two classification machine learning algorithms. The two classifications models that we will be going to use will try to classify the heart disease data. The two classification models will try to model the heart data in a way that the features of the patients can be classified as having the disease, or there is no disease.  The two classification models will be compared after the modeling using the accuracy level.

 

Data

The data contains 303 observations and 13 variables. Here are the variables in the data,

  • age: The person’s age in years
  • sex: The person’s sex (1 = male, 0 = female)
  • cp: The chest pain experienced (Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic)
  • trestbps: The person’s resting blood pressure (mm Hg on admission to the hospital)
  • chol: The person’s cholesterol measurement in mg/dl
  • fbs: The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)
  • restecg: Resting electrocardiographic measurement (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy by Estes’ criteria)
  • thalach: The person’s maximum heart rate achieved
  • exang: Exercise-induced angina (1 = yes; 0 = no)

    Don't use plagiarised sources.Get your custom essay just from $11/page

  • oldpeak: ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot)
  • slope: the slope of the peak exercise ST segment (Value 1: upsloping, Value 2: flat, Value 3: downsloping)
  • ca: The number of major vessels (0-3)
  • thal: A blood disorder called thalassemia (3 = normal; 6 = fixed defect; 7 = reversable defect)
  • target: Heart disease (0 = no, 1 = yes)

The dependent variable in the data is the target. The target condition represents the condition of having or not having heart disease. The remaining variables are the independent variables. The only preprocessing technique that was used when cleaning the data is to change the data type of the variables. The following variables were converted from the numerical to objects;

  1. sex
  2. cp
  3. fbs
  4. restecg
  5. exang
  6. slope
  7. thal

Model

The two classification models that were used in the modeling are

  1. Logistic regression: It is conducted when the dependent variable is a binary or dichotomous variable. Logistic regression is also a predictive model that is used to predict the dependent variables given the independent variables. It is similar to linear regression analysis, and the only difference is that its dependent variable is binary, while the dependent variable of the linear regression. It describes the data and explains the relationship between the dependent variables (binary or dichotomous). However, logistic regression is one of the models that are very difficult to interpret. Logistic regression models also model the probability of the default class. The default class is the dependent variable in the data.
  2. Naïve Bayes. This is a type of classification technique that is derived for the Bayes’ theorem, which uses the probability theorem and statistics. The Bayes’ theory has been used in the field of machine learning and decorated as one of the machine learning techniques known as the Naïve Bayes. Machine learning is mostly interested in selecting the best hypothesis of given data. The classification technique assigns the hypothesis to the given target class on the data. Bayes Theorem comes out strongly in this kind of scenario. Also, in Naive Bayes. The target variable i.e., the dependent variable, should also be a binary or a dichotomous variable. This is why it is one of the classification techniques of machine learning.

Before the modeling was conducted, the dependent and independent variables were selected from the data. The data was later portioned into training and testing data. The training data was given 80 % proportion while the testing data was given 20 % proportion. The data was also spilled using the set seed. This ensures that there is consistency, especially when two models are to be compared. When the variables are so large, we use feature selection to select the best variables that can be used to predict the dependent variable, but, in this scenario, the independent variables are only 12 variables, and thus, there is no need to conduct the feature selection processes.

Results

The accuracy level that was obtained from the logistic regression was 100 %, while the accuracy level earned from the Naïve Bayes was 87.19 %. This means that the logistic regression was best suited to predict the heart data compared to the Naïve Bayes model. The results can be compared graphically using the bar chart, as shown in the figure below.

The above bar chart shows that the accuracy level of the logistic regression was higher than the accuracy level of the Naïve Bayes.

Conclusion

Classification is one of the supervised machine learning where the data is trained and the accuracy of the models determined. It isn’t very easy to know the best model to predict specific data, and therefore, it is always good to compare two or more classification model to determine the most accurate model to be used to predict a specific data. The accuracy level plays a significant role in determining the best model to be used to classify the dependent variables. In our case, we used two classifications model i.e., logistic regression and Naïve Bayes. The accuracy model for the logistic regression was obtained to be 100 %, while the accuracy level for the Naïve Bayes was obtained to be 87.19 %. Therefore, logistic regression is the preferred model that can be used to classify the patience to have heart disease or not.

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn 👋

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask