Common Probability Distributions
Probability distributions describe the probability of observing a particular event. There are several probability distributions that are important to the art and science of predicting. In order to understand the statistical methods of dealing with random processes and how some predictability can be garnered from such chance events, we will examine some common distributions.
The Gaussian Distribution
The Gaussian, or normal distribution, is the most important as it is most often used to describe the distribution of results for any measurement subject to small, random error. It is commonly used to represent real-valued random variables whose distributions are not known. Whenever we examine a collection of observations (for example, people’s heights) that represent the sum of a very large number of small, contributing factors (in this case, the thousands of hereditary and environmental factors that all make small contributions to someone’s height), where these factors are for the most part either independent of one another or at most additive, normal distributions naturally arise.
This result is formally called the Central Limit Theorem, which states that, roughly speaking, if a random variable is the sum of a large number of small and independent random variables, then almost no matter how the small variables are distributed, will be approximately normally distributed.
Consider this question: Will bitcoin either boom or bust (but not just boringly wander up or down) during 2018? This question resolves when the price of Bitcoin either i) exceeds $16400 or ii) falls below $4100. Suppose that Bitcoin's price represents the sum of a very large number of small, contributing factors that sum together to determine Bitcoin's price:
These contributing factors may include mining costs, awareness and hype, regulations of Bitcoin transactions, technological changes to Bitcoin, hacks of Bitcoin exchanges, and so on. If these factors are independent, have a finite variance, and have small individual variances compared to their sum (so that a condition for the Central Limit Theorem, known as the Lindeberg condition, holds), then Bitcoin's price over some time period will be approximately normal. Using data from the period August 2017 to July 2018, from here, I charted the price data in ascending order and confirmed that this seemed roughly to be the case. Over this period, the price of Bitcoin had a mean of $8045, and a standard deviation of $3542. Using these parameters, you can estimate the probability of the price being outside the $4100-$16400 range using the following interactive tool, which turns out to be around 0.15.
You can also specify different values here, where is the standard deviation, and the mean. However, in this example clearly many contributing factors to Bitcoin's price are not independent. Demand and mining costs, for instance are likely to be correlated.
The Binomial Distribution
The binomial distribution is used to determine the probability of k successes in n independent trials. The trials produce boolean-valued outcomes: a random variable containing a single bit of information: success/yes/true/one (with probability p) or failure/no/false/zero (with probability 1 − p). Note that it is a discrete distribution; it is defined only at integral values of the variable k.
Since the individual events occur independently, the probability of a subset of k events amongst many n is the product of individual probabilities. If k occur, then n-k don’t and the probability is . For the total probability of a particular event occurring, we multiply the probability that the event occurs by the number of ways that event can occur. The probability distribution is therefore given by .
Consider this question: