Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Machine Learning
  • Programming

Machine Learning Made Easy With Python

  • February 3, 2021
  • admin

Solve real-world machine learning problems with Naïve Bayes classifiers.

Naïve Bayes is a classification technique that serves as the basis for implementing several classifier modeling algorithms. Naïve Bayes-based classifiers are considered some of the simplest, fastest, and easiest-to-use machine learning techniques, yet are still effective for real-world applications.

Naïve Bayes is based on Bayes’ theorem, formulated by 18th-century statistician Thomas Bayes. This theorem assesses the probability that an event will occur based on conditions related to the event. For example, an individual with Parkinson’s disease typically has voice variations; hence such symptoms are considered related to the prediction of a Parkinson’s diagnosis. The original Bayes’ theorem provides a method to determine the probability of a target event, and the Naïve variant extends and simplifies this method.

 

Solving a real-world problem

This article demonstrates a Naïve Bayes classifier’s capabilities to solve a real-world problem (as opposed to a complete business-grade application). I’ll assume you have basic familiarity with machine learning (ML), so some of the steps that are not primarily related to ML prediction, such as data shuffling and splitting, are not covered here.

The Naïve Bayes classifier is supervised, generative, non-linear, parametric, and probabilistic.

In this article, I’ll demonstrate using Naïve Bayes with the example of predicting a Parkinson’s diagnosis. The dataset for this example comes from this UCI Machine Learning Repository. This data includes several speech signal variations to assess the likelihood of the medical condition; this example will use the first eight of them:

  • MDVP:Fo(Hz): Average vocal fundamental frequency
  • MDVP:Fhi(Hz): Maximum vocal fundamental frequency
  • MDVP:Flo(Hz): Minimum vocal fundamental frequency
  • MDVP:Jitter(%), MDVP:Jitter(Abs), MDVP:RAP, MDVP:PPQ, and Jitter:DDP: Five measures of variation in fundamental frequency

The dataset used in this example, shuffled and split for use, is available in my GitHub repository.

 

ML with Python

I’ll use Python to implement the solution. The software I used for this application is:

  • Python 3.8.2
  • Pandas 1.1.1
  • scikit-learn 0.22.2.post1
Read More  10 Technologies That Will Transform The Global Economy By 2025

There are several open source Naïve Bayes classifier implementations available in Python, including:

  • NLTK Naïve Bayes: Based on the standard Naïve Bayes algorithm for text classification
  • NLTK Positive Naïve Bayes: A variant of NLTK Naïve Bayes that performs binary classification with partially labeled training sets
  • Scikit-learn Gaussian Naïve Bayes: Provides partial fit to support a data stream or very large dataset
  • Scikit-learn Multinomial Naïve Bayes: Optimized for discrete data features, example counts, or frequency
  • Scikit-learn Bernoulli Naïve Bayes: Designed for binary/Boolean features

I will use sklearn Gaussian Naive Bayes for this example.

Here is my Python implementation of naive_bayes_parkinsons.py:

import pandas as pd

# Feature columns we use
x_rows=[‘MDVP:Fo(Hz)’,‘MDVP:Fhi(Hz)’,‘MDVP:Flo(Hz)’,
‘MDVP:Jitter(%)’,‘MDVP:Jitter(Abs)’,‘MDVP:RAP’,‘MDVP:PPQ’,‘Jitter:DDP’]
y_rows=[‘status’]

# Train

# Read train data
train_data = pd.read_csv(‘parkinsons/Data_Parkinsons_TRAIN.csv’)
train_x = train_data[x_rows]
train_y = train_data[y_rows]
print(“train_x:\n“, train_x)
print(“train_y:\n“, train_y)

# Load sklearn Gaussian Naive Bayes and fit
from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()
gnb.fit(train_x, train_y)

# Prediction on train data
predict_train = gnb.predict(train_x)
print(‘Prediction on train data:’, predict_train)

# Accuray score on train data
from sklearn.metrics import accuracy_score
accuracy_train = accuracy_score(train_y, predict_train)
print(‘Accuray score on train data:’, accuracy_train)

# Test

# Read test data
test_data = pd.read_csv(‘parkinsons/Data_Parkinsons_TEST.csv’)
test_x = test_data[x_rows]
test_y = test_data[y_rows]

# Prediction on test data
predict_test = gnb.predict(test_x)
print(‘Prediction on test data:’, predict_test)

# Accuracy Score on test data
accuracy_test = accuracy_score(test_y, predict_test)
print(‘Accuray score on test data:’, accuracy_train)

Run the Python application:

$ python naive_bayes_parkinsons.py

train_x:
MDVP:Fo(Hz)  MDVP:Fhi(Hz) …  MDVP:RAP  MDVP:PPQ  Jitter:DDP
        152.125       161.469  …   0.00191   0.00226     0.00574
1        120.080       139.710  …   0.00180   0.00220     0.00540
2        122.400       148.650  …   0.00465   0.00696     0.01394
3        237.323       243.709  …   0.00173   0.00159     0.00519
..           …           …           …  …       …       …
155      138.190       203.522  …   0.00406   0.00398     0.01218

Read More  Using Artificial Intelligence and Machine Learning as a Powerful Cybersecurity Tool

[156 rows x 8 columns]

train_y:
status
         1
1         1
2         1
3         
..      …
155       1

[156 rows x 1 columns]

Prediction on train data: [1 1 1  … 1]
Accuracy score on train data: 0.6666666666666666

Prediction on test data: [1 1 1 1 … 1
1 1]
Accuracy score on test data: 0.6666666666666666

The accuracy scores on the train and test sets are 67% in this example; its performance can be optimized. Do you want to give it a try? If so, share your approach in the comments below.

 

Under the hood

The Naïve Bayes classifier is based on Bayes’ rule or theorem, which computes conditional probability, or the likelihood for an event to occur when another related event has occurred. Stated in simple terms, it answers the question: If we know the probability that event x occurred before event y, then what is the probability that y will occur when x occurs again? The rule uses a prior-prediction value that is refined gradually to arrive at a final posterior value. A fundamental assumption of Bayes is that all parameters are of equal importance.

At a high level, the steps involved in Bayes’ computation are:

  1. Compute overall posterior probabilities (“Has Parkinson’s” and “Doesn’t have Parkinson’s”)
  2. Compute probabilities of posteriors across all values and each possible value of the event
  3. Compute final posterior probability by multiplying the results of #1 and #2 for desired events

Step 2 can be computationally quite arduous. Naïve Bayes simplifies it:

  1. Compute overall posterior probabilities (“Has Parkinson’s” and “Doesn’t have Parkinson’s”)
  2. Compute probabilities of posteriors for desired event values
  3. Compute final posterior probability by multiplying the results of #1 and #2 for desired events
Read More  Low-Latency Fraud Detection With Cloud Bigtable

This is a very basic explanation, and several other factors must be considered, such as data types, sparse data, missing data, and more.

 

Hyperparameters

Naïve Bayes, being a simple and direct algorithm, does not need hyperparameters. However, specific implementations may provide advanced features. For example, GaussianNB has two:

  • priors: Prior probabilities can be specified instead of the algorithm taking the priors from data.
  • var_smoothing: This provides the ability to consider data-curve variations, which is helpful when the data does not follow a typical Gaussian distribution.

 

Loss functions

Maintaining its philosophy of simplicity, Naïve Bayes uses a 0-1 loss function. If the prediction correctly matches the expected outcome, the loss is 0, and it’s 1 otherwise.

 

Pros and cons

Pro: Naïve Bayes is one of the easiest and fastest algorithms.
Pro: Naïve Bayes gives reasonable predictions even with less data.
Con: Naïve Bayes predictions are estimates, not precise. It favors speed over accuracy.
Con: A fundamental Naïve Bayes assumption is the independence of all features, but this may not always be true.

In essence, Naïve Bayes is an extension of Bayes’ theorem. It is one of the simplest and fastest machine learning algorithms, intended for easy and quick training and prediction. Naïve Bayes provides good-enough, reasonably accurate predictions. One of its fundamental assumptions is the independence of prediction features. Several open source implementations are available with traits over and above what are available in the Bayes algorithm.

This feature is originally appeared in opensource.com
admin

Related Topics
  • Deep Learning
  • Machine Learning
  • Python
  • Software
  • Thomas Bayes
You May Also Like
View Post
  • Artificial Intelligence
  • Machine Learning
  • Platforms
  • Technology

Using ML To Predict The Weather And Climate Risk

  • March 16, 2023
View Post
  • Artificial Intelligence
  • Data
  • Machine Learning
  • Technology

ChatGPT: How To Prevent It Becoming A Nightmare For Professional Writers

  • March 16, 2023
View Post
  • Data
  • Engineering
  • Machine Learning

Sentiment Analysis With BigQuery ML

  • March 13, 2023
View Post
  • Artificial Intelligence
  • Machine Learning

MuAViC: The First Audio-Video Speech Translation Benchmark

  • March 13, 2023
View Post
  • Machine Learning

Large Language Models Are Biased. Can Logic Help Save Them?

  • March 12, 2023
View Post
  • Machine Learning
  • Research
  • Technology

How Freenome Is Building The Next Generation Of Early Cancer Detection Technology With Google Cloud

  • February 27, 2023
View Post
  • Artificial Intelligence
  • Data
  • Engineering
  • Machine Learning

Built With BigQuery: Aible’s Serverless Journey To Challenge The Cost Vs. Performance Paradigm

  • February 24, 2023
View Post
  • Artificial Intelligence
  • Data
  • Data Science
  • Machine Learning
  • People

Meet Our Data Champions: Emily Bobis, Driving Road Intelligence In Australia

  • February 24, 2023

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay Connected!
LATEST
  • 1
    How Osmo Is Digitizing Smell With Google Cloud AI Technology
    • March 20, 2023
  • 2
    Built With BigQuery: How Sift Delivers Fraud Detection Workflow Backtesting At Scale
    • March 20, 2023
  • 3
    Building The Most Open And Innovative AI Ecosystem
    • March 20, 2023
  • 4
    Understand And Trust Data With Dataplex Data Lineage
    • March 17, 2023
  • 5
    Limits To Computing: A Computer Scientist Explains Why Even In The Age Of AI, Some Problems Are Just Too Difficult
    • March 17, 2023
  • 6
    The Benefits And Core Processes Of Data Wrangling
    • March 17, 2023
  • 7
    We Cannot Even Agree On Dates…
    • March 17, 2023
  • 8
    Financial Crisis: It’s A Game & We’re All Being Played
    • March 17, 2023
  • 9
    Using ML To Predict The Weather And Climate Risk
    • March 16, 2023
  • 10
    Google Is A Leader In The 2023 Gartner® Magic Quadrant™ For Enterprise Conversational AI Platforms
    • March 16, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    The Future Of AI Is Promising Yet Turbulent
    • March 16, 2023
  • 2
    ChatGPT: How To Prevent It Becoming A Nightmare For Professional Writers
    • March 16, 2023
  • 3
    Midjourney Selects Google Cloud To Power AI-Generated Creative Platform
    • March 8, 2023
  • 4
    A Guide To Managing Your Agile Engineering Team
    • March 15, 2023
  • 5
    10 Ways Wikimedia Does Developer Advocacy
    • March 15, 2023
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
  • About

Input your search keywords and press Enter.