Fundamental Terms in Distributions

In the previous post, we talked about descriptive statistics, which summarizes the features of a dataset, and inferential statistics, which makes inferences about a population based on sample data. Both of these branches of statistics use distribution, which is a function that describes the possible values of a variable and how often they occur. Distributions are typified into two: discrete or continuous. The first, discrete, has distinct and countable values. Think of the times you get tails in a toss coin, which is typically expressed in a whole number. The second, continuous, can take any value in a range, such as the volume of water in a container. We’ll discuss these ideas in detail later.

Distributions in Descriptive and Inferential Statistics

In descriptive statistics, distribution helps visualize data as well as measure central tendency (mean, median, mode) and dispersion (range, variance, standard deviation). A histogram, for instance, can be used to show the distribution in the test scores of a class. In inferential statistics, on the other hand, theoretical distributions are used to make estimations, conduct hypothesis tests, and calculate confidence intervals. In an experiment, distribution shows the frequency or probability of each possible outcome. A business, for example, can use a sample of customer satisfaction scores to infer the overall level of satisfaction of all their customers.

Understanding distributions in statistics involves knowing several fundamental terms by heart. It’s important to have a solid grasp of these concepts to make sense of the information that you are collecting and analyzing. Let’s start with variables.

Random Variable

A random variable is a variable whose values depend on the outcomes of a random phenomenon or experiment. In essence, the concept is a way to quantify random outcomes.

Example:

  • When rolling a six-sided die, the outcome, which can be 1, 2, 3, 4, 5, or 6, is a random variable because it varies with each roll.
  • Let’s say 100 students are about to take an exam. The number of passing or failing students, which can be a whole number between 0 and 100, is a random variable.

Random variables can be either discrete or continuous.

Discrete Random Variable

A discrete random variable takes on a countable number of distinct values. These values are often whole numbers. When a store needs to count the number of customers that visit its premises in a day, it’s looking for a whole number. When one considers the number of insurance claims that a driver can file in a month, the answer can be 0, 1, or more. Both situations are looking for discrete random variables.

Continuous Random Variable

In contrast, a continuous random variable can take on any value within a given range. These variables are not countable and can assume an infinite number of values. Let’s say that visitors typically spend anywhere from 5.23 minutes to 10.5 minutes when browsing a particular shop. The specific amount of time that a customer spends in the said location can lie within this given range. If you have a cup that has a capacity of 120.5 mL and you want to discern the volume of liquid in it, then the number that you are looking for can range from 0 to 120.5 mL. In both, you’re searching for continuous random variables.

Now that we’ve covered the different types of random variables, let’s focus on probability distributions.

Probability Distribution

A probability distribution is a mathematical function that provides the probabilities of occurrence or likelihood of different possible outcomes in an experiment. It tells us how the values of a random variable are distributed and helps us understand the likelihood of different outcomes. Probabilities are expressed in non-negative numbers. Also, the sum of all probabilities of all outcomes in a given situation must be equal to one.

Discrete Probability Distribution

A discrete probability distribution applies to discrete random variables and has an associated probability mass function (PMF). The PMF gives the probability that a discrete random variable is exactly equal to a particular value. Coming back to the example of rolling a six-sided die, the PMF for rolling a 1, 2, 3, 4, 5, or 6 is each 1/6, as each outcome is equally likely.

Continuous Probability Distribution

A continuous probability distribution applies to continuous random variables and has an associated probability density function (PDF). The PDF helps determine the probability that a continuous random variable lies between two given values. When determining the height of people in the city, the PDF would show the probability of people’s heights falling within a specific range, such as between 150 cm and 182 cm.

Commonly Occurring Distributions

There are various kinds of probability distributions. Among the most common ones are:

Bernoulli Distribution

A Bernoulli distribution is defined by a single parameter, which is the or probability of success. It has only two possible outcomes: success (1) or failure (0). A commonly used example of this distribution can be seen when tossing a coin where success is heads and failure is tails.

Formula:

Where:

  • X = The random variable that can be 1 or 0
  • p = The probability of success where where 0 \leq \rho \leq 1
  • 1 – p = The probability of failure
  • x = either 1 or 0

Binomial Distribution

A binomial distribution describes the number of successes in a fixed number of independent Bernoulli trials. It is defined by two parameters: n or the number of trials and or p probability of success in each trial. One situation where this can be applied is when counting the number of defective items, taking into consideration that you have a batch of 100 items where each item has a 2% chance of being defective.

Formula:

Where:

  • X = The random variable representing the number of successes or outcomes of interest
  • n = The number of trials
  • k = The number of successes (where 0 \leq k \leq n)
  • p = The probability of success in each trial
  •  \binom{n}{k} = The binomial coefficient is calculated as  \frac{n!}{k! (n-k)!}

Uniform Distribution

A uniform distribution has constant probability across all values in its range. For example, when rolling a fair die, each outcome (1 through 6) has an equal probability of 1/6.

Formula:

Where:

  •  f(x) = The probability density function (PDF)
  • a and  b = The lower and upper bounds of the distribution

Normal Distribution

A normal distribution is a continuous probability distribution that is symmetrical around its mean, forming a bell-shaped curve. It is defined by two parameters: mean (\mu)
and standard deviation (\sigma).This can be applied to the distribution of heights, test scores, or measurement errors where most values cluster around the mean and probabilities taper off symmetrically.

Formula:

Where:

  •  f(x) = The probability density function (PDF)
  •  \mu = The mean
  •  \sigma = The standard deviation
  •  \sigma^2 = The variance
  • exp denotes the exponential function  e^{(\cdot)}

By mastering these fundamental terms and understanding their applications, you can better analyze and interpret data. This knowledge allows you to make data-driven decisions, optimizing various aspects of your business operations. Next time, we’ll be taking a closer look at commonly occurring distributions, specifically the Bernoulli distribution and binomial distribution.

About Glen Dimaandal

Picture of Glen Dimaandal
Glen Dimaandal is a data scientist from the Philippines. He has a post-graduate degree in Data Science and Business Analytics from the prestigious McCombs School of Business in the University of Texas, Austin. He has nearly 20 years of experience in the field as he worked with major brands from the US, UK, Australia and the Asia-Pacific. Glen is also the CEO of SearchWorks.PH, the Philippines' most respected SEO agency.
Picture of Glen Dimaandal
Glen Dimaandal is a data scientist from the Philippines. He has a post-graduate degree in Data Science and Business Analytics from the prestigious McCombs School of Business in the University of Texas, Austin. He has nearly 20 years of experience in the field as he worked with major brands from the US, UK, Australia and the Asia-Pacific. Glen is also the CEO of SearchWorks.PH, the Philippines' most respected SEO agency.
ARTICLE & NEWS

Check our latest news

In data science, saving progress is essential. Just like saving your progress in a video game…

In our last lesson, we introduced the concept of Python packages and NumPy in particular. Short…

Now that we have a solid handle on basic Python programming, we can move on to…

Ready to get started?

Reveal the untapped potential of your data. Start your journey towards data-driven decision making with Griffith Data Innovations today.