Objectives
- Statistical Methods
- Population and Samples
- Sampling Plan and Sampling Methods
- Descriptive Statistics and components
- Probability Theory and Distributions
- Confidence Interval
- Concept of Tests of Significance
- One Sided and Two Sided Hypothesis Testing
- Various Tests of Significance
- Non-Parametric Testing
2.1 Statisitical Methods
Statistics is a applied/business mathematics where we collect, organize, analyze, and interpret numerical facts- Descriptive Statistics
- Measure of Central Tendency
- Measure of dispersion
- Sample
- Inferential Statistics
- Estimation
- Hypothesis Testing
- Population
Population and Samples
- Population is any entire collection of objects or observations from which we may collect data. It is the entire group we are interested in, which we wish to describe or draw conclusions about.
- For each population there are many possible samples
- It is important that the investigator carefully and completely defines the population before collecting the sample, including a description of the members to be included.
- A sample is a group of units selected from larger group (the population). By studying the sample it is hoped to draw valid conclusion about the larger group.
- A sample is generally selected for study because the population is too large to study in its entirety. The sample should be representative of the general population. This is often best achieved by random sampling.
Developing a sampling plan
- Define the target population - in terms of number of elements, sampling unit,extent and time.
- Select a sampling method - probability or non-probability sampling.
- Obtain the sampling frame - must contain all potential factors.
- Determination of sample size - for desired level of accuracy.
- Choose data collection method - procedure to obtain the data.
- Develop operational plan - which technique fits the best.
- Execute operational plan - verification of specified procedure.
Sampling Techniques
- Sampling
- Probability
- Simple Random - Purest form, where every member has equal probability of participating.
- Systematic - Selection of elements from an ordered sampling frame.
- Stratified - Dividing the member of the population into homogeneous sub groups (strata) before sampling.
- Cluster - When natural but relatively homogeneous groupings are evident in a population.
- Non-Probability
- Convenience - Sample that is convenient to collect.
- Judgemental - Sample is selected with a specific attribute based on the judgement of the research
- Quota - Two stages of judgemental, first develope control categories or quotas, second collect sample based on convenience or judgement to fill in the quota.
- Snowball - Initial group of respondants are selected usually at random or from contacts of existing customers.
Descriptive Statistics
Analyze data to extract meaningful information
Non-conclusive asit is only limited to the data being analyzed
| Score Range | Number of Students |
Below
40
|
20 |
40-50
|
22 |
50-60
|
33 |
60-70
|
21 |
70-80
|
13 |
>80
|
5 |
Total
|
114 |
Histograms are used to graphically represent the data.
Measures of Central Tendencies
Measure of Central Tendency is a method of descriptive statistics which:
Identify with a single value
Also called measure of central location
It falls into the category of summary statistics
- Mean
- Median
- Mode
Mean
Mean is the average of the numbers
A calculated "central" value of a set of numbers
Mean = (Sum of all numbers)/(Count of all numbers)
2 2 6 10
Mean = (2+2+6+10)/4= 20/10 = 5
Median
Median is the number in the middle
Number of values above and below median are same
3 3 4 5 7 8
3 3 (4 5) 7 8
Median = (4+5)/2 = 4.5
Mode
Calculate frequency of the occurence
Mode is the value that occurs often
Let us look at the below histogram (x=data, y=frequency)
^
Freq| 20 23 (33) 22 17 06 02
______________________________> Data
00 01 02 03 04 05 06
It is clearly visible that number 02 has highest frequency in the data set.
When to use what?
Mean
- The average is required for statistical analysis
- The variable is continuous/ discrete
- Mean is commonly used in case of quantitative variables
Median
- The variable is discrete
- There are abnormal extreme values/Non-normal data
- The characteristics under study is qualitative
Mode
- Least commonly used
- The variable is discrete
- There are abnormal extreme values
- The characteristic under study is qualitative
Measure of Dispersion
Describe the amount of heterogeneity or variation within a distribution of scoresThe spread or dispersion of a set of scores around some central value
Measure of Dispersion include:
- Variance
- Standard Deviation
Variance and Standard Deviation
Variance is an average of squared deviations about the meanS Square = (Σ(x - ̅x)²)/n
Standard Deviation is the squared root of variance
_________________
Std.Dev.(σ) = √( (Σ(x - ̅x)²)/n )
Example data: 2, 5, 5, 4, 6, 8
- n = 6
- Mean = ( 2 + 5 + 5 + 4 + 6 + 8 ) / 6 = 5
- Variance = ( (2-5)² + (5-5)² + (5-5)² + (4-5)² + (6-5)² + (8-5)² )/5
= ( (-3)² + (0)² + (0)² + (-1)² + (1)² + (3)² ) / 5
= ( 9 + 0 + 0 + 1 + 1 + 9 )/ 5 = 20/5 = 4
_________ __ - Std. Deviation (σ) = √ Variance = √ 4 = 2
Case Study - Descriptive Statistics
Business Case: A telecommunications company maintains a customer database that includes, among other things, information on how much each customer spent on long distance, toll-free, equipment rental, calling card, and wireless services in the previous month.
The telecom company surveyed 1000 of its customers on all the above services.
User Descriptive analysis to study customer spending to determine which services are most profitable.
| N | Valid N | Min | Max | Mean | Std. Dev. | |
| Long Distance last month | 1000.00 | 1000.00 | 0.90 | 99.95 | 11.72 | 10.36 |
| Toll free last month | 1000.00 | 475.00 | 0.00 | 173.00 | 13.27 | 16.90 |
| Equipment last month | 1000.00 | 386.00 | 0.00 | 77.70 | 14.21 | 19.07 |
| Calling card last month | 1000.00 | 678.00 | 0.00 | 109.25 | 13.78 | 14.08 |
| Wireless last month | 1000.00 | 296.00 | 0.00 | 111.95 | 11.58 | 19.72 |
From the above table we can make following insights:
- On average, customer spend the most on equipment rental, but there is a lot of variation in the amount spent.
- Customers with calling card service spend only slightly less, on average, than equipment rental customers, and there is much less variation in the values.
- The real problem here is that most customers don't have every service, so a lot of 0's are being counted. One solution to this problem is to treat 0's as missing values so that the analysis for each service become conditional on having that service.
2.2 Probability Theory
Probability is a branch of mathematics that deals with the uncertainty of an event happening in the future.Probability values always occurs within a range of 0 and 1.
Probability of an event, P(E) = (No. of favorable occurrences)/(No. of possible occurrences)
Let us take a simple example of tossing an unbiased coin.
Probability of getting a head or tail is = 1/2
Assigning Probabilities
Classic Method - based on equally likely outcomes.E.g. Rolling a dice.
The probability of each number 1, 2, 3, 4, 5, 6 occurring out of total 6 outcomes is 1/6.
Relative frequency method - based on experimentation or historical data.
E.g. A car agency has 5 cars. His past record as shown in the table shows his car used in past 60 days.
| No. of Cars Used | No. of days | Probability |
| 0 | 3 | (3/60) = 0.05 |
| 1 | 10 | (10/60) = 0.17 |
| 2 | 16 | (16/60) = 0.27 |
| 3 | 15 | (15/60) = 0.25 |
| 4 | 9 | (9/60) = 0.15 |
| 5 | 7 | (7/60) = 0.12 |
Subjective method - based on judgement.
E.g.: 75% chance that England will adopt to Euro currency by 2020.
Probability Distribution
Probability distribution for a random variable gives information about how the probabilities are distributed over the values of that random variable.
It's defined by f(x) which gives probability of each value.
E.g. Suppose we have sales data for AC sale in last 300 days.
| Unit Sold | No. of days | Probability | f(x) |
| 0 | 10 | (10/300) = | 0.03 |
| 1 | 55 | (55/300) = | 0.18 |
| 2 | 150 | (150/300) = | 0.50 |
| 3 | 55 | (55/300) = | 0.18 |
| 4 | 25 | (25/300) = | 0.08 |
| 5 | 5 | (5/300) = | 0.02 |
Binomial Distribution
Binomial Distribution satisfies:
- A fixed number of trials
- Each trial is independent of the others
- The probability of each outcome remains constant from trial to trial.
Example of binomial experiments
Tossing a coin 20 times, what is the probability of getting head 5 times?
Getting a diamond king from a pack of 52 cards.
Case Study - Binomial distribution
Example of binomial distribution: Amir buys a chocolate bar every day during a promotion that says one out of six chocolate bars has a gift coupon within.
Answer the following questions:
(Assume that the conditions of binomial distributions apply: the outcomes for Amir's purchases are independent, and the population of chocolate bars is effectively infinite.)
Steps:
Formula = nCr pr qn-r
Where,
n is the no. of trials
r is the number of successful outcomes
p is the probability of success
q is the probability of failure
2. Probability of failing 7 days:
P(x=0) = 7C0 (1/6)0 (5/6)7-0 = (5/6)7
3. Probability of winning a coupon on 7th day: 1/6
4. The number of winning at least 3 wrappers in six weeks:
P(X>=3) = 1 - P (X<=2)
= 1 - (P(X=0)+P(X=1)+P(X=2))
= 1 - (0.0005 + 0.0040 + 0.0163)
= 0.979
5. Number of purchase days required so that probability of success is greater than 0.95:
P(X>=1) >= 0.95 (As per Binomial Distribution)
>> P(X=1) + P(X=2) + ... + P(X=6) >= 0.95 but since Sum of P(X=r) = 1 so,
>> 1- P (X=0) >=0.95
>> P(X=0) <=0.05
>> (5/6)n <= 0.05 taking log both sides to solve this exponential quation
>> n log(5/6) <= log (0.05)
>> So, n >= 16.67
>> that is n=17 days minimum.
Normal distribution
Discrete probability distribution for events that happen randomly in time
Following conditions need to be satisfied:
Examples:
Answer the following questions:
- What is the distribution of the number of chocolates with gift coupons in seven days?
- What is the probability that Amir gets no chocolates with gift coupons in seven days?
- Amir gets no gift coupons for the first six days of the week. What is the chance that he will get a one on the seventh day?
- Amir buys a bar everyday for six weeks. What is probability that he gets at least three gift coupons?
- How many days of purchase are required so that Amir's chance of getting at least one gift coupon is 0.95 or greater?
(Assume that the conditions of binomial distributions apply: the outcomes for Amir's purchases are independent, and the population of chocolate bars is effectively infinite.)
Steps:
Formula = nCr pr qn-r
Where,
n is the no. of trials
r is the number of successful outcomes
p is the probability of success
q is the probability of failure
Other important formula include
p+q=1
q=1-p
Thus,
p=1/6
q=1-(1/6)=5/6
1. Distribution of number of chocolates with gift coupons in 7 days:
7Cr (1/6)r (5/6)7-r
2. Probability of failing 7 days:
P(x=0) = 7C0 (1/6)0 (5/6)7-0 = (5/6)7
3. Probability of winning a coupon on 7th day: 1/6
4. The number of winning at least 3 wrappers in six weeks:
P(X>=3) = 1 - P (X<=2)
= 1 - (P(X=0)+P(X=1)+P(X=2))
= 1 - (0.0005 + 0.0040 + 0.0163)
= 0.979
5. Number of purchase days required so that probability of success is greater than 0.95:
P(X>=1) >= 0.95 (As per Binomial Distribution)
>> P(X=1) + P(X=2) + ... + P(X=6) >= 0.95 but since Sum of P(X=r) = 1 so,
>> 1- P (X=0) >=0.95
>> P(X=0) <=0.05
>> (5/6)n <= 0.05 taking log both sides to solve this exponential quation
>> n log(5/6) <= log (0.05)
>> So, n >= 16.67
>> that is n=17 days minimum.
Normal Distribution
Normal distribution
- A Normal distribution is a theoretical model of the whole population
- It is perfectly symmetrical about the central value; the mean mu represented by zero.
- It is also called the bell curve.
- The distribution is symmetric with mean 0 and std. dev. of 1.
Poisson distribution
Discrete probability distribution for events that happen randomly in timeFollowing conditions need to be satisfied:
- The even results in a success of failure
- The average number of successes, mu is known
- Probability of success is proportional to the region/time.
- Probability of success in an extremely small region/time is almost zero.
- Properties: Mean and variance is equal and denoted by mu.
Examples:
- Average number of houses sold by a company is 5 per day. What is the probability that exactly 4 houses will be sold tomorrow?
- Average number of births in a hospital is 2.1 births per hour. What is the probability that there will be exactly 6 births in the next 2 hours.
Skewness and Kurtosis
Skewness: Measure of deviation from symmetry
- Difference between median and mode
- Right or Left skewed
- Skewness negative - more negative values (Left Skewed)
- Skewness positive - more positive values (Right Skewed)
Kurtosis: measure of peakedness of the distribution
- High Kurtosis - Tall peak, rapid decline in the tails.
- Low Kurtosis - flat peaks, gradual decline in the tails.
- Extreme Case - Uniform distribution.
Case Study: Skewness and Kurtosis
| Skewness | Kurtosis | ||||
| N | Statisitic | Std. Error | Statisitic | Std. Error | |
| Long Distance last month | 1000 | 2.966 | 0.077 | 14.012 | 0.155 |
| Toll free last month | 475 | 3.465 | 0.112 | 26.735 | 0.224 |
| Equipment last month | 386 | 0.756 | 0.124 | 0.641 | 0.248 |
| Calling card last month | 678 | 2.15 | 0.094 | 7.572 | 0.187 |
| Wireless last month | 296 | 1.359 | 0.142 | 3.079 | 0.282 |
Equipment last month data is more accurate in nature and its SD is comparatively lower than the other measures.
Confidence interval
- It's a rule for a population parameter to determine an interval that is likely to include the parameter based on the sample information.
- Supposing that a random variable has been taken and the random samples were taken repeatedly from the population, certain percentage of interval contains unknown value.
- In such case, if population is repeatedly sampled and intervals calculate in that fashion then 95% of interval contains true value of the unknown parameter.
- This interval is then said to be 95% confident for the population proportion.
- The upper and lower limits of the 95% confidence interval are confidence limits.
Confidence levels is the probability value that is associated with a confidence level.
The probability value is (1- alpha) This value is often represented as a percentage value.
Say for a value of alpha = 0.05 the confidence level would be 0.95. This is a 95% confidence level.
- Confidence level
- Statistic
- Margin of Error
- Range of the confidence interval = sample statistics + margin of error.
- The uncertainty associated with the confidence interval is specified by the confidence level.
How to construct a Confidence Interval
- Identify a sample statistic - Choose the statistic that will be used to estimate a population parameter.
- The statistic is generally the mean or the median or the mode in some cases.
- Select the confidence level - It describes the uncertainty of sampling method.
- Find the margin of error = Critical Value * Standard error of statistic.
- Specify the confidence interval - The range of the confidence interval is defined by the following equation.
- Confidence interval = sample statics +/- Margin of error.
e.g. Margin of error = 1.86 and Sample statistic = 150
Confidence interval = (150 - 1.86) to (150 + 1.86)
Confidence interval = 148.14 to 151.86
2.3 Tests of Significance
- Tests used in assessing the evidence in favor of or against a given assumption
- Begins with a Null Hypothesis, Ho
- Tests either validate the null hypothesis, or reject it in favor of an Alternate Hypothesis, Ha
- Two types of tests:
- One sided tests
- Two sided tests
- Results decided by calculating the "p - value"
- P value can be defined at the probability that the calculate test statistic can take extreme value as the absurd value given that the null hypothesis is true.
- Interpretation:
- If p-value is less than the significance level alpha, reject the null hypothesis.
- General values of alpha are 0.05, 0.01.
- General Assumptions:
- The distribution is almost normal
- The sample in the distribution have almost unequal variances.
One sided hypothesis testing
- Muo = null value
- Null hypothesis Mu = Muo
- Alternative hypothesis: Mu < Muo or Mu > Muo
Example: Given a sample of heights of 100 males in New York, decide whether the height has increased in general form a given average height of 5 feet 9 inches.
- Null Value: Muo = 5 feet 9 inches
- Null Hypothesis: Mu = 5.9
- Alternative Hypothesis: Mu > 5.9
Using one of various hypothesis tests, calculate "p-value" and reject null hypothesis if p-value is less than 0.05.
Two sided hypothesis testing
- Muo = null value
- Null Hypothesis: Mu = Muo
- Alternative hypothesis: Mu <> Muo
Example: given a sample heights of 100 males in New York, decide whether the height has increased/decreased in general form a given average height of 5 feet 9 inches.
- Null Value = Muo = 5.9
- Alternative Hypothesis = Mu <> 5.9
Using one of various hypothesis tests, calculate p-value and reject null hypothesis if p-value is less than 0.05.
2.4 Tests of Significance
- One Sample z test- The Z test is used to compare the mean with the given standard
- Two Sample z test - The Z test is used to compare the means of two groups.
The standard deviation need not be known to calculate the Z statistics.
The Z test is generally used when the number of samples is greater than 30. - T test
- The t test is used with mean statistics as well but to calculate the t statistic the standard deviation must be known the test is preferred if the number of samples is less than 30. As earlier the t test can be one sample two sample or paired t tests.
- One Sample t test -
- Two Sample t test - When the compared groups are independent. e.g. To compare the marks or students of two different schools.
- Paired t test - When the compared groups are paired. To compare the marks of students of same schools before and after a training class.
- Chi-Squared test - For goodness of fit is used to test if there is a different between the observed values and the expected values according to a particular hypothesis.
- F test - Annalysis of Variance (ANNOVA) - To compare variances of two or more groups. The mostly used f test is ANNOVA.
- F test - Regression - lesser used is the regression analysis.
Chi-Squared Tests
Steps:
- State the null hypothesis
- Prepare the contingency table for the variable
- Determite the expected results
- Calculate the chi-squared values
- Calculate the degree of freedom
- Based on the above, calculate the p-value
- If p-value <0.05, reject the null hypothesis
- Verify if two variables are independent
- Same steps as above
Case Study - Chi Squared Test
A city has a newly opened nuclear plant, and there are families staying dangerously close to the plant. A health safety officer wants to take this case up to provide relocation for the families that live in the surrounding area. to make a strong case, he wants to prove with numbers that an exposure to radiation levels is leading to an increase in diseased population. He formulates a contingency table of exposure and disease.| Exposure | Disease Yes | Disease No | Total |
| Yes | 37 | 13 | 50 |
| No | 17 | 53 | 70 |
| Total | 54 | 66 | 120 |
Does the data suggest an association between the disease and exposure?
Steps:
- Calculate the number of individuals of exposed and unexposed groups expected in each disease category (yes or no) if the probabilities were the same.
- If there were no effect of exposure, the probabilities should be same and the chi-squared statistics would have a very low value.
Proportion of population not exposed = (70/120)=0.58
Thus, expected values:
Popolation with disease = 54
Exposure Yes: 54 * 0.42 = 22.5
Exposure No: 54 * 0.58 = 31.5
Population without disease = 66
Exposure Yes: 66 * 0.42 = 27.5
Exposure No: 66 * 0.58 = 38.5
| Exposure | Disease Yes | Disease No | Total | Total Proportion |
| Yes Actual | 37 | 13 | 50 | 50/120 = 0.42 |
| Yes Expected | 54 * 0.42 = 22.5 | 66 * 0.42 = 27.5 | ||
| No | 17 | 53 | 70 | 70/120 = 0.58 |
| No Expected | 54 * 0.58 = 31.5 | 66 * 0.58 = 38.5 | ||
| Total | 54 | 66 | 120 |
- Calculate the Chi-Squared statistic
= ((37-22.5)^2 / 22.5) + ((13-27.5)^2 / 27.5) + ((17-31.5)^2 / 31.5) + ((53-38.5)^2 / 38.5)
= 29.1
- Calculate the degree of freedom:
df = (2-1) x (2-1)
df = 1
- Calculate the p-value from the chi-squared table(found online).
For Chi-Squared value 29.1 and degree of freedom =1, from the table, p-value is < 0.001 - Interpretation: There is 0.001 chance of obtaining such discrepancy between expected and observed values if there is no association.
ANNOVA
- Analysis of Variance - used to compare more than two groups
- Extension of the independent t-tests
- Factor variable - variable defining the groups
- Response variable - variable being compared
- One way ANNOVA
- Groups of a single variable
- E.g.: Is there a difference in student's marks based on the row he is seated - front / middle / back?
- Two way ANNOVA
- Two independent variables
- E.g.: Does the race and gender affect a person's yearly income?
Case Study - One way ANNOVA
- Marks obtained in the same subject by three students belonging to three different schools are given below.
- Does the data suggest any association between school and marks?
| School | A | B | C |
| Marks 1 | 82 | 83 | 38 |
| Marks 2 | 83 | 78 | 59 |
| Marks 3 | 97 | 68 | 55 |
The basic idea in ANNOVA: Partition the total variation in the data into the variation between groups and variation between groups.
Steps:
- Calcaute the means
School A: mean(82, 83, 97) = 87.3
School B: mean(83, 78, 68) = 76.3
School C: mean(38, 59, 55) = 50.6
- Calcualte the grand mean
Grand: mean(82, 83, 97, 83, 78, 68, 38, 59, 55) = 71.4
- Calculating the variations
Sum of Squared Deviations about the grand mean, across all observed values: SStotal = 2630.2
Sum of Squared Deviations of group mean about the grand mean - three group mean against the grand mean: SSbetween=2124.2
Sum of Squared Deviations of observations within a group about their group mean; added across all groups: SSwithin=506
- Calculate the degree of freedom for every variance:
df_total = number of observations -1 = 9-1 = 8
df_between = number of groups -1 = 3 -1 = 2
df_within = number of observations - number of groups = 6
- Calculate the Mean Squared Variances
Mean Suared variance between group MS_between = SS_between / df_between = 2124/2 = 1062
Mean Suared variance within group MS_within = SS_within / df_within = 506/6 = 84.3
- Calculate the f-statistics
F-value = MS_between/MS_within = 1062.1/84.3 = 12.59
- Calculate the p-value from the F-table
P-value for given f-value 12.59 and degree of freedom 2 and 6 is 0.007
- Conclusion: since the p-value is less than alpha, we can conclude by rejecting the null hypothesis, that there is a difference in the marks obtained by students belonging to different groups.
2.5 Non Parametric Testing
- Referred to as "distribution free", as they don't involve making assumptions of any data.
- They have lower power than the parametric tests and hence are always given the second preference after the parametric tests
- These tests are typically focused on median rather than mean
- They involve straight forward procedures like counting and ordering
- There are at least one non-parametric test done for each parametric test and are classified into following categories.
- Tests of differences between groups (independent samples)
- Tests of differences between variables (dependent variables)
- Tests of relationship between variables
One usually computes the correlation coefficient.
Non parametric equivalence to the standard correlation coefficient are
- Spearman's R
- Kendall's Tau
- Coefficient Gamma
Appropriate non-parametric testing for testing the relationship between the two variables are the chi-squared tests, the pi coefficient and the fisher exact test. In addition a simultaneous test for relationship between multiple cases is available. Kendall coefficient of concordance. This test is often used to express the inter-relative agreement among independent judges who are rating ranking the same simulate
Non Parametric Tests
| Tests | Parametric | Non Parametric |
| One Qualitative Response Variable | One Sample Test | Sign Test |
| One Qualitative Response Variable - Two Values from Paired Samples | Paired Sample T - test | Wilcoxon Signed Rank Test |
| One Qualitative Response Variable - One Qualitative Independent Variable with Two Groups | Two Independent Sample T - test | Wilcoxon Rank Sum or Mann Whitney Test |
| One Qualitative Response Variable - One Qualitative Independent Variable with Three or more Groups | ANNOVA | Kruskall Wallis |
Correlation
Measure of association between variables
Positive and negagive correlation, ranging between +1 and -1
A value of +1 or positive correlation applies that if the value of independent variable increases the value of response variable also increases.
Similarly, a value of -1 or negative correlation applies that if the value of independent variable increases the value of response variable decreases.
Positive Correlation Example:
Earning and expenditure - more a person earns more he/she spends.
Negative Correlation Example:
Speed and time - As the speed of the vehicle increases the time taken to cover a given distance decreases.
Parametric - normal distribution and hogeneous variance.
Pearson correlation
Non Parametric - no assumption, nominal variable
Spearman correlation
Correlation Coefficient
r: correlation coefficient
-1: Perfectly Negative
+1: Perfectly Positive
0 - 0.2 : No or very weak association
0.2 - 0.4 : Weak association
0.4 - 0.6 : Moderate association
0.6 - 0.8 : Strong association
0.8 - 1 : Very strong to perfect association
-1: Perfectly Negative
+1: Perfectly Positive
0 - 0.2 : No or very weak association
0.2 - 0.4 : Weak association
0.4 - 0.6 : Moderate association
0.6 - 0.8 : Strong association
0.8 - 1 : Very strong to perfect association
Summary
- Overview of Statistical Methods
- Population, Samples & Sampling Plan and Sampling Methods
- Descriptive Statistics - Measure of Central Tendency and Measure of Dispersion
- Probability Theory and Distributions
- Confidence Interval
- What are Tests of Significance
- The process flow of hypothesis testing
- One Sided and Two Sided Hypothesis Testing
- Various Tests used in calculating p-value
- What is Non-Parametric Testing and why it is used.
- Non-parametric alternatives for the usual tests of significance





