Probability Theory & Descriptive Statistics

Tawheed Yousuf
11 min readJun 29, 2021

Mathematics behind the unintuitive study of uncertainties and the meaningful summarization of a bulk of data

Descriptive Statistics

Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. Descriptive statistics does not, however, allow us to make conclusions beyond the data we have analyzed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.

Descriptive statistics are very important because if we simply presented our raw data it would be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore enables us to present the data in a more meaningful way, which allows simpler interpretation of the data. For example, if we had the results of 100 students’ homework, we may be interested in the overall performance of those students. We would also be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this. Typically, there are two general types of statistic that are used to describe data:

Measures of central tendency

These are the ways of describing the central position of a frequency distribution for a group of data. In this case, the frequency distribution is simply the distribution and pattern of marks scored by the 100 students from the lowest to the highest. We can describe this central position using a number of statistics, including the mode, median, and mean.

  • Mean: It is the sum of the observation divided by the sample size.
  • Median: It is the middle value of data. It splits the data in half and it’s also called 50th percentile.
  • Mode: It is the value that occurs most frequently in a dataset.

Measures of spread

These are ways of summarizing a group of data by describing how spread out the scores are. For example, the mean score of our 100 students may be 65 out of 100. However, not all students will have scored 65 marks. Rather, their scores will be spread out. Some will be lower and others higher. Measures of spread help us to summarize how spread out these scores are. To describe this spread, a number of statistics are available to us, including the range, quartiles, variance and standard deviation.

  • Range: The range describes the difference between the largest and the smallest points in your data. The bigger the range the more spread out is the data.
  • IQR: The interquartile range (IQR) is a measure of statistical dispersion between upper (75th) quartiles i.e. Q3 and lower (25th) quartiles i.e. Q1. You can understand this by below example.

An example to figure out various quartiles and the interquartile range for a given data:

  • Variance: It is the average squared deviation from mean. The variance is computed by finding the difference between every data point and the mean, squaring them, summing them up and then taking the average of those numbers. The unit of measurement for variance is different than the original data.
  • Standard Deviation: It is simply the square root of the variance and because of that it’s unit of measurement is same as the original data. When you have a low standard deviation, your data points are closer to the mean. A high standard deviation means that your data points are spread out over a wide range.

What is probability ?

Its a weekend and you are planning a trip to the nearby mountains what is the chance that it will rain today, or maybe you are at a soccer game and you are looking to bet on one of the teams, what are the odds of your team actually wining the game. This element of uncertainty which exists within such kind of questions is what we precisely like to call probability.

Probability of an Event

So the probability is a measure that quantifies the likelihood that an event will occur. The probability of an event can be calculated directly by counting all of the occurrences of the event and dividing them by the total possible outcomes of the event.

The assigned probability is a fractional value and is always in the range between 0 and 1, where 0 indicates no probability and 1 represents full probability.

Probability Distributions

Probability can be used for more than calculating the likelihood of one event; it can summarize the likelihood of all possible outcomes. A thing of interest in probability is called a random variable, and the relationship between each possible outcome for a random variable and their probabilities is called a probability distribution.

What is a Random variable?

In probability and statistics, a random variable is described informally as a variable whose values depend on outcomes of a random phenomenon. A random variable is understood as a function defined on a probability space that maps from the sample space to the real numbers. In simple terms a random variable is a rule for associating a number with each element in a sample space.

Let’s make the jargon more clear with the help of an example. Suppose you flip a coin twice in a row, which all are the possible outcomes in these two coin tosses? If we take all of these outcomes and put them inside a set ‘S’ we have a sample space as given below:

Now suppose we are only interested in finding out the number of heads we may obtain in the two flips of coin you just did. We will denote this number by a term ‘X’. Clearly X here can take values from the set {0,1,2}. Now I think you already get the idea where we are going with this, but let’s make sure you did.

The term ‘X’ here is the number that is being assigned to each random outcome of interest which of course will be a part of the sample space we previously encountered. This mysterious ‘X’ is nothing but the random variable we were talking about and this is exactly what we meant by the mapping of sample space to real numbers.

I have highlighted the word ‘may’ in bold so as to grab your attention to the fact that we are dealing with probabilities here. Whenever we talk about a random variable we will be talking about the probability of a random variable too which is nothing but the likelihood of a random variable ‘X’ taking some value ‘x’, which is denoted by P(X=x).

Take a look at the table below to get a grasp of what we are exactly talking about.

Types of Random Variables

The random variable we described above took countable (finite or not) values from the set of real numbers but can we have a situation where we encounter a random variable that takes uncountably infinite number of values. In that case let’s formally describe two kinds of random variables we can have.

  • Discrete Random variable: A random variable which can take finite number of values or an infinite number of countable values is called a discrete random variable. For example, number of heads in N number of coin tosses or number of 4’s we get when we toss a die N times.
  • Continuous Random variable: A random variable which can take on uncountably infinite number of values is called a continuous random variable.

Since it is impossible to give countably infinite values to a random variable, in case of continuous random variable we talk about the probability of a random variable taking a range of values from say a to b. This is formally denoted by P(a<X<b).

Discrete Probability Distribution

A discrete probability distribution summarizes the probabilities for a discrete random variable. The probability mass function, or PMF, defines the probability distribution for a discrete random variable. It is a function that assigns a probability for specific discrete values. A discrete probability distribution has a cumulative distribution function, or CDF. This is a function that assigns a probability that a discrete random variable will have a value of less than or equal to a specific discrete value.

Discrete probability distribution of previously calculated probabilities

Bernoulli Distribution

A Bernoulli trial is an experiment which has only two possible outcomes viz success and failure. The words “success” and “failure” are only given to the two outcomes and don’t necessarily carry the original meaning.

The Bernoulli distribution is a discrete probability distribution that covers a case where an event will have a binary outcome as either a 0 or 1.

Some common examples of Bernoulli trials include:

  • The single flip of a coin that may have a heads (0) or a tails (1) outcome
  • A single birth of either a boy (0) or a girl (1)

Binomial Distribution

The probability distribution of n successive Bernoulli trials follows a Binomial distribution. The random variable X associated with a binomial distribution represents the number of successes that occur in these n trials

The Probability distribution function for a binomial distribution is given as

Geometric Distribution

The Geometric distribution gives the probability that the first occurrence of success requires k independent trials, each with success probability p. If the probability of success on each trial is p, then the probability that the kth trial (out of k trials) is the first success is

The PDF of a geometric distribution follows the pattern given below, as can be observed the graph somewhat approximates an exponential decay. We will soon get to see the relationship between an a geometric distribution and an exponential distribution

Poisson Distribution

Consider a repeating experiment that happens completely randomly in time. The probability distribution of the number of such events that occur during a given time interval takes the form of the Poisson distribution.

The PDF of a Poisson random variable X in a given time interval is given as:

Continuous Probability Distribution

A continuous probability distribution summarizes the probability for a continuous random variable. The probability distribution function, or PDF, defines the probability distribution for a continuous random variable. Note the difference in the name from the discrete random variable that has a probability mass function, or PMF. Like a discrete probability distribution, the continuous probability distribution also has a cumulative distribution function, or CDF, that defines the probability of a value less than or equal to a specific numerical value from the domain.

Probability Density function or PDF

The probability associated with a random variable taking on an interval of values say (a, b) represented as F(X) = P(a<X<b) is the area under the probability density function (PDF) form a to b.

This relationship defines the PDF:

Normal Distribution

The normal distribution also called the Gaussian distribution (named for Carl Friedrich Gauss) or the bell curve distribution covers the probability of real-valued events from many different problem domains, making it a common and well-known distribution, hence the name normal. A continuous random variable that has a normal distribution is said to be normal or normally distributed.

The two important parameters of a normal distribution are

  1. Mean: The expected value
  2. Variance: The spread from the mean

In a normal distribution, approximately 34% of the data points are lying between the mean and one standard deviation above or below the mean. which essentially means that 68% of the data points fall between one standard deviation above and one standard deviation below the mean. Approximately 95% fall between two standard deviations below the mean and two standard deviations above the mean. And approximately 99.7% fall between three standard deviations above and three standard deviations below the mean. This is illustrated below:

Z scores

The distance in terms of number of standard deviations, the observed value is away from the mean, is the standard score or the Z score. Z-score allows to compare the data sets coming from same data distributions but having different means and standard deviations. Note that for Z-score, the population mean and standard deviation is considered, if you need to calculate it based on sample mean and standard deviation, what is used is t-score or t-statistic.

Given the mean and standard deviation of the population is denoted by μ and σ respectively, Z-score or standard score for a particular value X can be calculated based on the following formula:

A positive Z score indicates that the observed value is Z standard deviations above the mean. Negative Z score indicates that the value is below the mean.

Exponential Distribution

The exponential distribution is a continuous probability distribution where a few outcomes are the most likely with a rapid decrease in probability to all other outcomes. It is the continuous random variable equivalent to the geometric probability distribution for discrete random variables.

The exponential distribution is used as a simple model for the lifetime of certain type of equipment . More importantly, it gives the waiting time, from one event to the next in a Poisson process

The formula for exponential distribution for a random variable X having is given as:

Exponential distribution of random variable X for various values of rate (i.e. lambda) is given below:

References

[1]: Probability For the Enthusiastic Beginner by David Morin, Harvard University.

[2]: Probability for Machine Learning by Jason Brownlee

[3]: Wikipedia

--

--