D Data

The Weird Ones: How To Handle Outliers In Your Data

September 3, 2019

3 min read

Outliers in data are the weird ones in a set. Their values are way off the rest of the values of the sample. They can really ruin your analysis, especially if you are using methods which are sensitive to the presence of outliers.

Given this, a lot are inclined to remove these observations. While this may make things convenient, this approach may end up yield false claims.

How exactly do we deal with these troublemakers?

Understanding outliers

For starters, we have to identify why these values occur at the first place. Some candidates for inclusion are cases where the outliers are produced by human or measurement errors.

On the other hand, there are cases where these aberrations are just true observations from the data set.

Consider a data set containing the net worth of US citizens. The net worth of Bill Gates will then be no measurement error — it is simply an observation of a real yet rare event.

Given this, once you’ve identified potential outliers, check whether they are mere errors which you can omit or correct.

To keep or not to keep

Now, if the observation turns out to be an unusual yet true observation, you have to assess whether the retention or omission of the said data point will be beneficial for your analysis.

There really is no quick fix for outliers. It heavily depends on the context of your analysis as well as the needs of the problem at hand. However, here are some measures you might want to consider:

Assess the importance of the outlier.Some outliers are produced by events that are due to the peculiar conditions. For instance, a decline in a company’s stock value may be due to a controversy which we do not expect to happen regularly. In this case, the omission of the outlier may be reasonable.

On the other hand, some extreme values are better left in the data set. For instance, a significantly high earthquake magnitude in a time series data should be retained since it could potentially occur again. This will also allow such a damaging event to be taken into account in the decision-making in which the analysis may be used for.

Consider data transformations.There are instances that the impact of the outlying value is negated or minimized to a negligible level by a proper transformation. Trying some out might just do the trick.
Consider reporting casesIf you are not sure whether the omission or retention is the way to go, you may also consider reporting both the cases where the outliers are retained and the case where the outliers are omitted. In this case, you retain the insights coming from both states. Doing this may also help deciding upon the omission or retention of the outlying values.

These are some soft guidelines you can consider. Again, these are NOT strict rules that you should follow all the time. Dealing with outliers is highly context-dependent. Data analysis is not a straight road, it is an art.

Weird is NOT wrong

To wrap things up, we see that outliers provide helpful insights that typical values may not provide. Therefore, we should not see these extremely different values as a nuisance.

Instead, we should examine why these values occur. Doing this will also give us the best way to deal with outliers. Let the data speak to you.

Removing the weirdos is not always the way to go. Trying to understand them might help you out more than you think.

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

admin

The Challenge of Abundance: Boredom, Meaning, and the Struggle of Mental Freedom

The Man, The Machine, And The Black Box: ML Observability

December 8, 2021

In this talk, Aparna Dhinakaran, Co-Founder and CPO of Arize AI, covered the challenges organizations face in…

9 min read

Using AI to Solve Complex Global Supply Chain Management Challenges

December 24, 2018

Companies are starting to apply artificial intelligence across global supply chain management to improve…

4 min read

Google Cloud Next 2019 | Bringing The Power Of Google Search To Every Business

May 23, 2019

Google Cloud Next 2019 | Energy Sessions Google Cloud Next 2019 | Bringing The Power Of Google Search To Every…

1 min read

New AI & Data Foundation Combines Industry’s Fastest-Growing Open Source Developments In Artificial Intelligence And Open Data

October 26, 2020

LF AI Foundation (LF AI), the organization building an ecosystem to enable and sustain open source innovation in…

5 min read

The Weird Ones: How To Handle Outliers In Your Data

Understanding outliers

To keep or not to keep

Weird is NOT wrong

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

The Challenge of Abundance: Boredom, Meaning, and the Struggle of Mental Freedom

Why AI As Social Media Content Moderators Is Not Yet Feasible

Here Is Everyone Mark Zuckerberg Has Hired So Far for Meta’s ‘Superintelligence’ Team

Senator Blackburn Pulls Support for AI Moratorium in Trump’s ‘Big Beautiful Bill’ Amid Backlash

Cloudflare Is Blocking AI Crawlers by Default

AI Videos of Black Women Depicted as ‘Bigfoot’ Are Going Viral

Sam Altman Slams Meta’s AI Talent Poaching Spree: ‘Missionaries Will Beat Mercenaries’

OpenAI Leadership Responds to Meta Offers: ‘Someone Has Broken Into Our Home’

Microsoft Says Its New AI System Diagnosed Patients 4 Times More Accurately Than Human Doctors

Accelerating scientific discovery with AI

OpenAI Loses 4 Key Researchers to Meta

I Let AI Agents Plan My Vacation—and It Wasn’t Terrible

Using generative AI to help robots jump higher and land safely

OpenAI’s Unreleased AGI Paper Could Complicate Microsoft Negotiations

The Weird Ones: How To Handle Outliers In Your Data

Understanding outliers

To keep or not to keep

Weird is NOT wrong

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Share this article

The Challenge of Abundance: Boredom, Meaning, and the Struggle of Mental Freedom

Why AI As Social Media Content Moderators Is Not Yet Feasible

Read next