Statistics Part 4: Probability
This will be a theoritical blog where we aren’t really showing anything through pyspark or python ,but the learning is importnt for the future blogs.
probability is easily one of the most important branches of statistics and data science, so we just can’t skip it. The blog covers the basic of probability only, and if you are already very good with it then you may skip it.
What is probabilty ?
Probability is how likely it is that something will occur. We can write probability as a fraction, decimal or percent, but all probabilities are the numbers equal to or between 0 and 1.
A probability of 0 means there’s a 0% chance that something will occur, where as probabilty of 1 means 100% chance something will occur.
Simple probability formula
The collection of “total number of outcomes” from the denominator is called the sample space.
lets say we have a deck of cards , which means there are 52 possible outcomes if we are asked to pull a card, lets say we have to pull out a card of queen, now there are 4 queens in a deck,which means there are only 4 possible outcomes that meet our criteria.
so P(Q) = 4/52 = 1/13
keep in mind in order to use simple probability formula, all the possible outcomes need to be equally likely to occur. In other words,
P(E) = outcomes that meet our criteria/all possible equally likely outcomes
Experimental and theoretical probability
If we could run an infinite number of experiments, then our experimental probability would eventually equal or theoritical probability — law of learge numbers
The Addition rule
When calculating the probability of either one of two events from occurring, it is as simple as adding the probability of each event and then subtracting the probability of both of the events occurring
P(A or B) = P(A) + P(B) — P(A and B)
if there is no overlap between A and B then, A and B are called mutually exclusive events.
P(A or B) = P(A) + P(B)
Union and Intersection
In the first version of the addition rule formula, we use the words “or” and “and.” But we can also write the formula as:
P(A ∪ B) = P (A) + P(B) — P(A ∩ B)
P(A ∪ B) is called the union of A and B, and it means the probability of either A or B or both occurring. P(A ∩ B) is called the intersection of A and B, and it means the probability of both A and B both occurring.
Experiment 1: A single 6-sided die is rolled. What is the probability of rolling a 2 or a 5?
Probabilities:
P(2) = 1/6
P(5) = 1/6
P(2 or 5) = P(2) + P(5) = 1/6 +1/6 = 2/6 = 1/3
Experiment 4: A single card is chosen at random from a standard deck of 52 playing cards. What is the probability of choosing a king or a club?
Probabilities:
P(king or club) = P(king) +P(club) -P(king of clubs)
= 4/52 + 13/52–1/52 = 4/13
Independent and dependent events and conditional probability
Independent Probability
In probability, we say two events are independent if knowing one event occurred doesn’t change the probability of the other event.
The Multiplication Rule
When we want to find the probability of multiple independent events (also called a joint occurrence), we’ll multiply their probabilities. This is called the multiplication rule. So for example, the probability that we get heads twice when we flip a coin two times in a row is P(A and B) = P(A) ⋅ P(B) P(HH) = ( 1/2 ).( 1/2 ) = 1/4
Dependent probability
Two or more events that depend on one another are known as dependent events. If one event is by chance changed, then another is likely to differ. Important Result: When two events, A and B are dependent, the probability of occurrence of A and B is: P(A and B) = P(A) · P(B|A)
Example 1: Shareen has to select two students from a class of 23 girls and 25 boys. What is the probability that both students chosen are boys?
Solution: Total number of students = 23 + 25 = 48
Probability of choosing the first boy, say Boy 1 = 25/48
Probability of choosing the second boy, say Boy 2 = 24/47
Now,
P(Boy 1 and Boy 2) = P(Boy 1) and P(Boy 2|Boy 1)
= (25/48) × (24/47)
= 600/2256
Bayes Theorem
Bayes’ Theorem, also known as Bayes’ Law or Bayes’ Rule, tells us the probability of an event, given prior knowledge of related events that occurred earlier.
Example
In a particular pain clinic, 10% of patients are prescribed narcotic pain killers. Overall, five percent of the clinic’s patients are addicted to narcotics (including pain killers and illegal substances). Out of all the people prescribed pain pills, 8% are addicts. If a patient is an addict, what is the probability that they will be prescribed pain pills?
Step 1: Figure out what your event “A” is from the question. That information is in the italicized part of this particular question. The event that happens first (A) is being prescribed pain pills. That’s given as 10%.
Step 2: Figure out what your event “B” is from the question. That information is also in the italicized part of this particular question. Event B is being an addict. That’s given as 5%.
Step 3: Figure out what the probability of event B (Step 2) given event A (Step 1). In other words, find what (B|A) is. We want to know “Given that people are prescribed pain pills, what’s the probability they are an addict?” That is given in the question as 8%, or .8.
Step 4: Insert your answers from Steps 1, 2 and 3 into the formula and solve.
P(A|B) = P(B|A) * P(A) / P(B) = (0.08 * 0.1)/0.05 = 0.16The probability of an addict being prescribed pain pills is 0.16 (16%).
follow me on Linkedin
LinkedIn: https://www.linkedin.com/in/shorya-sharma-b94161121/