Posts in tag

Statistics


You often hear Type I and Type II errors in statistics classes. There is good reason for that — minimizing either of these two errors is pretty much the core of statistical theory. Preliminaries Type I and Type II errors are related to the concept of hypothesis testing. In hypothesis testing, we have two hypotheses: …

0 111

Outliers in data are the weird ones in a set. Their values are way off the rest of the values of the sample. They can really ruin your analysis, especially if you are using methods which are sensitive to the presence of outliers. Given this, a lot are inclined to remove these observations. While this …

0 128

Normality is one assumption that you will typically encounter in statistical methods that you will employ. A lot of the tests that were created have an underlying assumption that your data is normal. A large number of parametric tests assume normality of data. Ordinary least squares regression assumes it for its error terms, too. You can …

0 132

Arguably one of the most important skills you must have in order to get started with using statistical methods is knowing the scale or level of measurement of your data. The appropriate method of analysis for your data is dependent on the scale  it was measured in. Here’s a quick rundown of the four levels …

0 140

The normal distribution is arguably the most used distribution in statistics. A lot of statistical methods rely on assuming that your data is normally distributed.  What is so special about it? The infamous bell The normal distribution is characterized by its trademark bell-shaped curve. The shape of the bell curve is dictated by two parameters. …

0 147

PyCon 2019 | Bayesian Data Science by Simulation Speaker: Eric Ma, Hugo Bowne-Anderson   This tutorial is an Introduction to Bayesian data science through the lens of simulation or hacker statistics. We will become familiar with many common probability distributions through i) matching them to real-world stories & ii) simulating them. We will work with …

0 127

PyCon 2019 | Statistical Profiling (and other fun with the sys module) Speaker: Emin Martinian   Profiling involves computing a set of data about how often and how long various parts of your program are executed. Profiling is useful to understand what makes your program slow and how you can improve it. After a quick …

0 119

You might have encountered it in class or during researches you have made — hypothesis testing. After the establishment of a problem and a question you want answered with regards to it, hypothesis testing essentially involves the following steps, (putting the technicalities aside), Forming a guess conjecture, the alternative hypothesis, Ha, based on your prior …

0 157

R is a popular open-source programming and software environment geared towards statistical computing and graphics. Learning R is helpful since it has an array of packages which could assist you in data cleanup, analysis, presentation, among other tasks. Another plus in favour of R is that it is free. Now, we will go through how …

0 176