Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Data

The Weird Ones: How To Handle Outliers In Your Data

  • September 3, 2019
  • admin

Outliers in data are the weird ones in a set. Their values are way off the rest of the values of the sample. They can really ruin your analysis, especially if you are using methods which are sensitive to the presence of outliers.

Given this, a lot are inclined to remove these observations. While this may make things convenient, this approach may end up yield false claims.

How exactly do we deal with these troublemakers?

Understanding outliers

For starters, we have to identify why these values occur at the first place. Some candidates for inclusion are cases where the outliers are produced by human or measurement errors.

On the other hand, there are cases where these aberrations are just true observations from the data set.

Consider a data set containing the net worth of US citizens. The net worth of Bill Gates will then be no measurement error — it is simply an observation of a real yet rare event.

Given this, once you’ve identified potential outliers, check whether they are mere errors which you can omit or correct.

To keep or not to keep

Now, if the observation turns out to be an unusual yet true observation, you have to assess whether the retention or omission of the said data point will be beneficial for your analysis.

There really is no quick fix for outliers. It heavily depends on the context of your analysis as well as the needs of the problem at hand. However, here are some measures you might want to consider:

  • Assess the importance of the outlier.Some outliers are produced by events that are due to the peculiar conditions. For instance, a decline in a company’s stock value may be due to a controversy which we do not expect to happen regularly. In this case, the omission of the outlier may be reasonable.
Read More  Here Are 13 Subtle Ways Statistics Can Deceive You

On the other hand, some extreme values are better left in the data set. For instance, a significantly high earthquake magnitude in a time series data should be retained since it could potentially occur again. This will also allow such a damaging event to be taken into account in the decision-making in which the analysis may be used for.

  • Consider data transformations.There are instances that the impact of the outlying value is negated or minimized to a negligible level by a proper transformation. Trying some out might just do the trick.
  • Consider reporting casesIf you are not sure whether the omission or retention is the way to go, you may also consider reporting both the cases where the outliers are retained and the case where the outliers are omitted. In this case, you retain the insights coming from both states. Doing this may also help deciding upon the omission or retention of the outlying values.

These are some soft guidelines you can consider. Again, these are NOT strict rules that you should follow all the time. Dealing with outliers is highly context-dependent. Data analysis is not a straight road, it is an art.

Weird is NOT wrong

To wrap things up, we see that outliers provide helpful insights that typical values may not provide. Therefore, we should not see these extremely different values as a nuisance.

Instead, we should examine why these values occur. Doing this will also give us the best way to deal with outliers. Let the data speak to you.

Removing the weirdos is not always the way to go. Trying to understand them might help you out more than you think.

Read More  Google Cloud Next 2019 | Extracting Value with a Cloud CDW
admin

Related Topics
  • Outliers
  • Statistics
  • Statistics For Dummies
You May Also Like
View Post
  • Data
  • Machine Learning
  • Platforms

Coop Reduces Food Waste By Forecasting With Google’s AI And Data Cloud

  • March 23, 2023
View Post
  • Data
  • Engineering

BigQuery Under The Hood: Behind The Serverless Storage And Query Optimizations That Supercharge Performance

  • March 22, 2023
View Post
  • Data
  • Design
  • Engineering
  • Tools

Sumitovant More Than Doubles Its Research Output In Its Quest To Save Lives

  • March 21, 2023
View Post
  • Data
  • Platforms
  • Technology

How Osmo Is Digitizing Smell With Google Cloud AI Technology

  • March 20, 2023
View Post
  • Data
  • Engineering
  • Tools

Built With BigQuery: How Sift Delivers Fraud Detection Workflow Backtesting At Scale

  • March 20, 2023
View Post
  • Data

Understand And Trust Data With Dataplex Data Lineage

  • March 17, 2023
View Post
  • Big Data
  • Data

The Benefits And Core Processes Of Data Wrangling

  • March 17, 2023
View Post
  • Artificial Intelligence
  • Data
  • Machine Learning
  • Technology

ChatGPT: How To Prevent It Becoming A Nightmare For Professional Writers

  • March 16, 2023

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay Connected!
LATEST
  • 1
    Ditching Google: The 3 Search Engines That Use AI To Give Results That Are Meaningful
    • March 23, 2023
  • 2
    Peacock: Tackling ML Challenges By Accelerating Skills
    • March 23, 2023
  • 3
    Coop Reduces Food Waste By Forecasting With Google’s AI And Data Cloud
    • March 23, 2023
  • 4
    Gods In The Machine? The Rise Of Artificial Intelligence May Result In New Religions
    • March 23, 2023
  • 5
    The Technology Behind A Perfect Cup Of Coffee
    • March 22, 2023
  • 6
    BigQuery Under The Hood: Behind The Serverless Storage And Query Optimizations That Supercharge Performance
    • March 22, 2023
  • 7
    6 ways Google AI Is Helping You Sleep Better
    • March 21, 2023
  • 8
    AI Could Make More Work For Us, Instead Of Simplifying Our Lives
    • March 21, 2023
  • 9
    Microsoft To Showcase Purpose-Built AI Infrastructure At NVIDIA GTC
    • March 21, 2023
  • 10
    The Next Generation Of AI For Developers And Google Workspace
    • March 21, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    ABB To Expand Robotics Factory In US
    • March 16, 2023
  • 2
    Introducing Microsoft 365 Copilot: Your Copilot For Work
    • March 16, 2023
  • 3
    Linux Foundation Training & Certification & Cloud Native Computing Foundation Partner With Corise To Prepare 50,000 Professionals For The Certified Kubernetes Administrator Exam
    • March 16, 2023
  • 4
    Intel Contributes AI Acceleration to PyTorch 2.0
    • March 15, 2023
  • 5
    Sumitovant More Than Doubles Its Research Output In Its Quest To Save Lives
    • March 21, 2023
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
  • About

Input your search keywords and press Enter.