Breast Cancer Prediction using Machine Learning

Published on . Written by

Breast Cancer Prediction using Machine Learning

Nowadays Machine Learning is used in different domains. From recommending movies to detecting any disease, machine learning is the most talked thing in today's world of technology. Breast Cancer is one of the most common diseases among women. Early diagnosis of such cases can reduce the risk and increases the chance of survival. Diagnosis of the right time can prevent patients from undergoing unnecessary treatments and operations.

Read more..

SLNOTE

Skyfi Labs Projects
Objectives

The goal of the project is to classify the cancer is malignant or benign. Also, which features help predict breast cancer.

Risk factors

Some risk factors are directly or indirectly associated with breast cancers. We have listed down a few:

  1. Age can be a risk factor or breast cancer. 50 years aged women are more likely to have breast cancer.
  2. If the woman had breast cancer in the past are more likely to develop cancer in the future.
  3. The older the woman gives birth to her first child has a higher chance to develop breast cancer.
  4. There can be 3 more categories, like
  5. Women who menstruate at an early age
  6. Women who go through menopause late
  7. Women who never had children

SLLATEST
Concepts used:

  1. Python programming language
  2. Machine learning basics
Hardware and software specification:

  1. OS of your choice (Windows/Linux/Mac)
  2. A desktop or laptop
  3. Your preferred text editor
  4. Python 3 or upward installed in your system
Implementations:

  1. install NumPy and rename it as np
pip install NumPy as np

  1. install pandas and rename it as pd
pip install pandas as pd

  1. install matplotlib.pyplot and rename it as matplotlib.plt
pip install matplotlib.pyplot as plt

  1. download the breast cancer dataset
  2. read the CSV file and put it in a variable
example – data = read.csv("../input/data.csv")

  1. drop the columns which are unnamed or which are not needed in prediction or doesn't have any values associated with it
  2. denote malignant as M and benign as B
  3. convert the M and B to an integer value
  4. if the value is greater than is between the range 1-5 denote it as malignant and the value is above 6, denote it as benign.
  5. if diagnosed with malignant cancer denote it with 1 else 0
  6. plot the graph using lmplot and watch the distribution of malignant cancer(1) and benign cancer(0)
  7. take 2 variables. One is for input and other for output. Let x be input and y be output.
  8. Split the data into test data and training data
Use the code

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.33, random_state = 42)

  1. Check the prediction score on test data using KNN
Knn.score(X_test,y_test)

  1. Perform cross-validation
  2. Create a confusion matrix to check the correctness of the prediction.
From sklearn.metrics import confusion_

Conclusion:

By using KNN we got our prediction 95.1% which is a decent result, though we can increase the accuracy of our prediction. There are several algorithms to predict breast cancer but we have used the KNN algorithm for the prediction.  


SLDYK
Kit required to develop Breast Cancer Prediction using Machine Learning:
Technologies you will learn by working on Breast Cancer Prediction using Machine Learning:


Any Questions?


Subscribe for more project ideas