Z-test
Q1. Short state
A country has a population average height
of 65 inches with a standard deviation of 2.5.
A person feels people from his state are shorter. He takes the average of 20 people
and sees that it is 64.5.
At a 5% significance
level (or 95% confidence level), can we conclude that people from his state are
shorter, using the Z-test? What is the p-value?
A.
p-value = 0.186, and hence people from
the state are shorter
B.
p-value = 0.815, and hence people from
the state are shorter
C.
p-value = 0.186, and hence people from
the state are not shorter
D. p-value
= 0.815, and hence people from the state are not shorter
Correct
Answer: p-value
= 0.186, and hence people from the state are not shorter
Explanation:
# Null Hypothesis (H0): People
from the person's state have height equal to that of the national average
height. (i.e.,μ = 65)
# Alternative Hypothesis (Ha):
People from the person's state are shorter than the national average height.
(i.e.,μ < 65)
import scipy.stats as stats
# Population parameters
population_mean = 65 # Population average height
population_stddev = 2.5 # Population standard deviation
# Sample statistics
sample_mean = 64.5
sample_size = 20 # Average of
20 people
# Calculate the standard error
of the sample mean
standard_error =
population_stddev / (sample_size ** 0.5)
# Calculate the Z-score
z_score = (sample_mean -
population_mean) / standard_error
Q2. Pastries Produced Per Day
A French cafe has historically
maintained that their average daily pastry production is at most
500.
With the installation of a new
machine, they assert that the average daily pastry production has increased.
The average number of pastries produced per day over
a 70-day period was found to be 530.
Assume that the population
standard deviation for the pastries produced per day is 125.
Perform a z-test with the
critical z-value = 1.64 at the alpha (significance level) = 0.05 to evaluate if
there's sufficient evidence to support their claim of the new machine
producing more than 500 pastries daily.
Note: Round off the z-score to two decimal places.
A.
The computed z-score is 0.24, and since
0.24 is less than 1.64, the null hypothesis cannot be rejected.
B.
The computed z-score is 0.24, and since
0.24 is less than 1.64, the null hypothesis is rejected.
C.
The computed z-score is 2.01, and since
1.64 is less than 2.01, the null hypothesis cannot be rejected.
D. The
computed z-score is 2.01, and since 1.64 is less than 2.01, the null hypothesis
is rejected.
Correct option: The computed z-score is 2.01, and since 1.64 is less than 2.01, the null hypothesis is rejected.
Explanation:
The null and alternate hypotheses are:
Null hypothesis (H0): the average number of pastries produced per day is less than or equal to 500 (μ≤500)
Alternate hypothesis (Ha): the average number exceeds 500 (μ>500)
Approach 1:
z-score = SEM(samplemean−populationmean)
where, SEM = standard error of the mean = (samplesize)populationstd
SEM = 70125 = 14.940
Therefore, z-score = 14.940530−500= 2.01
Also, a critical Z-Score for the given significance level (0.05) can be obtained using the following code: from scipy.stats import norm
norm.ppf(0.95)
Thus, we get the critical
z-value = 1.64
Since, the observed z-score (2.01) > critical z-value(1.64), we reject the
null hypothesis for the above right-tailed test.
Thus, we conclude that the average number exceeds 500.
Approach 2:
# Import necessary library
import scipy.stats as stats
# Define sample mean, standard deviation, and sample size
sample_mean = 530
population_std = 125
sample_size = 70
population_mean = 500
# Calculate z-score
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
# Round z-score to two decimal places
z_score = round(z_score, 2)
print(f"z-score: {z_score}")
# Set critical z-value and confidence level
confidence_level = 0.95
critical_z = stats.norm.ppf(confidence_level)
print("critical z-value:",critical_z)
# Check if the z-score is greater than the critical z-value
if z_score > critical_z:
print("Reject the null hypothesis. The shop's claim is supported by the data.")
else:
print("Fail to reject the null hypothesis. There is not enough evidence to support the shop's claim.")
Output:
z-score: 2.01
critical z-value: 1.6448536269514722
Reject the null hypothesis. The shop’s claim is supported by the data.
Q3. India runs on Chai
The Chai Point stall at Bengaluru
airport estimates that each person visiting the store drinks an average
of 1.7 small cups of tea.
Assume a population standard
deviation of 0.5 small cups. A sample
of 30 customers collected over a few days
averaged 1.85 small cups of tea per person.
Test the claim using an
appropriate test at an alpha = 0.05 significance value, with a critical z-score
value of ±1.96.
Note: Round off the z-score to two decimal places.
A.
The computed z-score is 1.64, and since
1.64 is less than 1.96, the null hypothesis cannot be rejected.
B.
The computed z-score is 1.64, and since
1.64 is less than 1.96, the null hypothesis is rejected.
C.
The computed z-score is 2.33, and since
1.96 is less than 2.33, the null hypothesis cannot be rejected.
D. The
computed z-score is 2.33, and since 1.96 is less than 2.33, the null hypothesis
is rejected.
Correct option: The computed z-score is 1.64, and since 1.64 is less than 1.96, the
null hypothesis cannot be rejected.
Explanation:
The null and alternate hypotheses are:
Null hypothesis (H0): the average number of small
cups of tea per customer is equal to 1.7 (μ=1.7)
Alternate hypothesis (Ha): the average number is
either greater than or less than 1.7. (μ=1.7)
Approach 1:
z-score = SEM(samplemean−populationmean)
where, SEM = standard error of the mean = (samplesize)populationstd
SEM = 300.5
Therefore, z-score = 300.51.85−1.7=
1.64
Since 1.64 < 1.96, we fail to reject the null hypothesis.
Thus, we conclude that the average number of small cups of tea per customer is
equal to 1.7.
Approach 2:
# Import necessary library
import scipy.stats as stats
import numpy as np
# Define sample mean, standard deviation, and sample size
sample_mean = 1.85
population_std = 0.5
sample_size = 30
population_mean = 1.7
# Calculate z-score
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
# Round z-score to two decimal places
z_score = np.round(z_score, 2)
# Set alpha and critical z-score (use two-tailed since direction is unknown)
alpha = 0.05
critical_z = 1.96
# Check if the z-score is greater than the critical z-value
if abs(z_score) > critical_z:
print(f"z-score: {z_score}")
print("Reject the null hypothesis. The average tea consumption is likely different from the estimate.")
else:
print(f"z-score: {z_score}")
print("Fail to reject the null hypothesis. There is not enough evidence to support a difference from the estimated average.")
Output:
z-score: 1.64 Fail to reject the null hypothesis.
There is not enough evidence to support a difference from the estimated
average.
Q4. Web Application Response
A data scientist is looking at
how a web application responds, with an average response time of 250
milliseconds and a standard deviation of 30 milliseconds.
Find the critical value for
a 96% confidence level.
A.
311.6125
B.
278.5203
C.
412.8915
D.
262.6293
Correct Option: 311.6125
Explanation:
We can calculate the critical value for a 96% confidence level, using Python
code as shown below:
import scipy.stats as stats
# Given values
confidence_level = 0.96
mean = 250 # Mean response time in milliseconds
std_deviation = 30 # Standard deviation in milliseconds
# A two-tailed test, considering both possibilities: the average response time could be higher or lower than 250 milliseconds.
# Calculate the critical Z-score for a 96% confidence level
critical_z = stats.norm.ppf(1 - (1 - confidence_level) / 2)
# Calculate the critical value using Z-score formula
critical_value = (critical_z * std_deviation) + mean
print(f"Critical Value: {critical_value:.4f}")
Output:
Critical Value: 311.6125
Q5. CI and
Conclusion
A marketing team aims
to estimate the average time, visitors spend on their website.
They gathered a
random sample of 100 visitors and determined that the average time spent on the
website was 4.5
minutes.
The team is working
under the assumption that the population's mean time spent on the website
is 4.0
minutes, with a standard deviation of 1.2 minutes.
Their goal is to
estimate the true time spent on the website with a 95% confidence level.
Calculate the confidence
interval values and make a conclusion based
on the calculated interval.
A.
(3.892,4.287),
The average time spent on the website is not 4.0 minutes
B.
(3.892,4.287),
The average time spent on the website is 4.0 minutes
C.
(4.264,4.735),
The average time spent on the website is not 4.0 minutes
D.
(4.264,4.735),
The average time spent on the website is 4.0 minutes
Correct Option: (4.264,4.735),
The average time spent on the website is not 4.0 minutes
Explanation:
Based on the given problem, we define our hypothesis as:
- Null Hypothesis (H0): The population mean time
spent on the website is 4.0 minutes.
- Alternative Hypothesis (H1): The population
mean time spent on the website is not 4.0 minutes.
Now we need to
- Calculate the confidence interval with 95%
confidence
- Perform Hypothesis test and make a conclusion,
on it’s basis.
We can use a Python code like:
import scipy.stats as stats
import numpy as np
from scipy.stats import norm
# Given data
population_mean = 4.0
sample_mean = 4.5
population_stddev = 1.2
sample_size = 100
alpha = 0.05
# Significance level (1 - alpha will give us the confidence level)
# Calculate the critical value (Z) for a two-tailed
test at the given alpha level
z_critical = norm.ppf(1 - alpha / 2)
# Calculate the margin of error
margin_of_error = z_critical * (population_stddev /
np.sqrt(sample_size))
# Calculate the confidence interval
confidence_interval = (sample_mean -
margin_of_error, sample_mean + margin_of_error)
print("Confidence Interval:",
confidence_interval)
# Check if the population mean (4.0) falls within
the confidence interval
if confidence_interval[0] <= population_mean
<= confidence_interval[1]:
print("The population mean falls within the confidence interval.
Then we fail to reject the null hypothesis")
else:
print("The population mean does not fall within the confidence
interval. Then we reject the null hypothesis")
Output:
Confidence Interval: (4.264804321855194, 4.735195678144806)
The population mean does not fall within the confidence interval. Then we
reject the null hypothesis
Q6. Significance
of Power
A researcher is
conducting a hypothesis test with a significance level (α) of 0.05.
The null hypothesis
is that there is no effect, and the alternative hypothesis is that there is a
significant effect. The researcher calculates the power of the test to be 0.80.
What does a power of 0.80 signify
in this context?
A.
The
probability of committing a Type I error.
B.
The
probability of correctly rejecting a false null hypothesis.
C.
The
probability of failing to reject a true null hypothesis.
D.
The
significance level used in the test.
Correct Option: The
probability of correctly rejecting a false null hypothesis.
Explanation:
The power of a statistical test is the probability of correctly rejecting a
false null hypothesis (i.e., detecting a true effect if it exists).
- If the probability of type II error is given
by the value of β, then power can be found as: Power=1−β
In this case, a power of 0.80 indicates an 80%
chance of correctly finding a significant effect when the alternative
hypothesis is true.
Q7. Institution's
claim
It is known that the
mean IQ of high school students is 100,
and the standard deviation is 15.
A coaching institute
claims that candidates who study there have more IQ than an average high school
student. When the IQ of 50 candidates
was calculated, the average turned out to be 110
Conduct an
appropriate hypothesis test to test the institute’s claim, with a significance
level of 5%
A.
p-value:
1.214e-06, Candidates who study at this coaching institution have more IQ than
an average high school student.
B.
p-value:
1.451e-02, Candidates who study at this coaching institution have more IQ than
an average high school student.
C.
p-value:
1.451e-02, Candidates who study at this coaching institution have the same IQ
as an average high school student.
D.
p-value:
1.214e-06, Candidates who study at this coaching institution have the same IQ
as an average high school student.
Correct Answers: p-value: 1.214e-06, Candidates who study at this
coaching institution have more IQ than an average high school student.
Explanation:
# Null Hypothesis (H0): The average IQ of candidates from the institution is the same as the population's average IQ.(μ = 100)
# Alternative Hypothesis (Ha): The average IQ of candidates from the institution is higher than the population's average IQ.(μ > 100)
import scipy.stats as stats
# Given values
population_mean = 100
population_std = 15
sample_mean = 110
sample_size = 50
significance_level = 0.05
# Calculate the standard error (standard deviation of the sample mean)
standard_error = population_std / (sample_size**0.5)
# Calculate the Z-score
Z = (sample_mean - population_mean) / standard_error
# Calculate the p-value for a one-tailed test
p_value = 1 - stats.norm.cdf(Z)
# Determine whether to reject the null hypothesis
if p_value < significance_level:
conclusion = "Reject the null hypothesis.Candidates who study at this coaching institution have more IQ than an average high school student."
else:
conclusion = "Fail to reject the null hypothesis. Candidates who study at this coaching institution have the same IQ as an average high school student."
print(f"Z-score: {Z}")
print(f"P-value: {p_value}")
print(f"Conclusion: {conclusion}")
Output:
Z-score:
4.714045207910317
P-value: 1.2142337364462463e-06
Conclusion: Reject the null hypothesis.Candidates who study at this coaching
institution have more IQ than an average high school student.
Q8. Smokers
When smokers smoke,
nicotine is transformed into cotinine, which can be tested.
The average cotinine level in
a group of 50
smokers was 243.5
ng ml.
Assuming that
the standard
deviation is known to be 229.5 ng ml.
Test the assertion
that the mean cotinine level of all smokers is equal to 300.0 ng ml,
at 95% confidence.
A.
P-value:
0.0408, the mean cotinine level of all smokers is not equal to 300.0 ng/ml
B.
P-value:
0.0408 , the mean cotinine level of all smokers is equal to 300.0 ng/ml.
C.
P-value:
0.0817 , the mean cotinine level of all smokers is not equal to 300.0 ng/ml
D.
P-value:
0.0817 , the mean cotinine level of all smokers is equal to 300.0 ng/ml.
Correct Answer:P-value: 0.0817 , the mean cotinine level of all smokers
is equal to 300.0 ng/ml.
Explanation:
# Null Hypothesis (H0): The mean cotinine level of all smokers is equal to 300.0 ng/ml. (µ = 300.0 ng)
# Alternative Hypothesis (Ha): The mean cotinine level of all smokers is not equal to 300.0 ng/ml. (µ ≠ 300.0 ng)
import scipy.stats as stats
# Given values
sample_mean = 243.5 # Sample mean cotinine level
population_std = 229.5 # Known population standard deviation
population_mean = 300.0 # Hypothesized population mean
sample_size = 50 # Sample size
confidence_level = 0.95 # 95% confidence level
# Calculate the Z-score
standard_error = population_std / (sample_size**0.5)
Z = (sample_mean - population_mean) / standard_error
# Calculate the p-value for a two-tailed test
p_value = 2 * (1 - stats.norm.cdf(abs(Z)))
# Determine whether to reject the null hypothesis
alpha = 1 - confidence_level
if p_value < alpha:
conclusion = "Reject the null hypothesis which means the mean cotinine level of all smokers is not equal to 300.0 ng/ml "
else:
conclusion = "Fail to reject the null hypothesis which means the mean cotinine level of all smokers is equal to 300.0 ng/ml. "
print(f"Z-score: {Z}")
print(f"P-value: {p_value}")
print(f"Conclusion: {conclusion}")
Output:
Z-score:
-1.7408075440976007
P-value: 0.08171731915149638
Conclusion: Fail to reject the null hypothesis which means the mean cotinine
level of all smokers is equal to 300.0 ng/ml.
Q1. Quality
control analysis
For a quality control
analysis, a factory assesses the tensile strength of a sample of steel rods.
The sample exhibits
a mean tensile
strength of 750
MPa with a sample standard deviation of 50 MPa, while
the known population mean is 800 MPa.
Calculate Cohen's
d for this quality control study.
A.
0.5
B.
-1.0
C.
-0.5
D.
1.0
E.
Insufficient
Information
Correct Option: -1.0
Explanation:
For a one-sample test comparing a sample mean to a known population mean, the
effect size can be calculated using:
d
= (Sample Mean - Population Mean) / Sample Standard Deviation
= (750 - 800) / 50
= -1.0
Q2. Water
Regulation
The student hostel
office at IIT Madras estimates that each student uses more than 3.5 buckets
of water per day.
In order to verify
this claim, the college trustees decide to monitor the water consuption over
the next 45 days, and it is found that on an average, 3.72 buckets of
water is consumed by a student, per day.
Assume that the
population standard
deviation is 0.7 buckets.
What is the critical
sample mean, assuming a critical z-value of 1.28?
Note: The critical sample mean is defined as the mean
value for which the z-score is equal to the critical value. Also, round off the
final answer to three
decimal places.
A.
3.634
B.
3.511
C.
3.720
D.
3.691
Correct option: 3.634
Explanation:
Approach 1:
z-score = SEM(samplemean−populationmean)
where, SEM = standard error of the mean = (samplesize)populationstd
Rearranging the terms, we have ,
sample_mean = (criticalz−value)×(S.E.M)+(populationmean)
sample_mean= (45)1.28×0.7 + 3.5 = 1.28×0.1043+3.5 = 3.634
Approach 2:
import math
# Given values
population_mean = 3.5
population_std = 0.7
critical_z_value = 1.28
sample_size = 45
# Calculate the critical sample mean
critical_sample_mean = population_mean + (critical_z_value * (population_std / math.sqrt(sample_size)))
# Round off the answer to three decimal places
critical_sample_mean = round(critical_sample_mean, 3)
print("Critical Sample Mean:", critical_sample_mean)
Output:
Critical Sample Mean: 3.634
Q3. Testing
efficacy of improving GRE score
The verbal reasoning
in the GRE has an average score of 150 and
a standard
deviation of 8.5.
A coaching centre
claims that their students are better. An average of 10 people
showed that students from this coaching centre have an average score of 155.
At a 5% significance
level (or 95% confidence level), can we conclude that students from the
coaching centre are better? Use the Z-test, and compute the p-value.
A.
p-value
= 0.03, and hence students from the coaching center are better
B.
p-value
= 0.96, and hence students from the coaching center are better
C.
p-value
= 0.03, and hence students from the coaching center are not better
D.
p-value
= 0.96, and hence students from the coaching center are not better
Correct Answer: p-value = 0.03, and hence students from the coaching
center are better
Explanation:
# Null Hypothesis (H0): The average verbal reasoning score of students from the coaching centre is the same as the national average verbal reasoning score.(μ = 150)
# Alternative Hypothesis (Ha): The average verbal reasoning score of students from the coaching centr2 is better than the national average verbal reasoning score. (μ > 150)
import scipy.stats as stats
# Given data
mu = 150 # Population average (national)
sigma = 8.5 # Population standard deviation
n = 10 # Sample size
sample_mean = 155 # Sample mean
# Calculate the standard error of the mean (SEM)
sem = sigma / (n**0.5)
# Calculate the Z-score
Z = (sample_mean - mu) / sem
# Calculate the p-value for the right-tailed test
p_value = 1 - stats.norm.cdf(Z)
# Set the significance level (alpha)
alpha = 0.05
# Compare the p-value to the significance level
if p_value < alpha:
print(f"p-value: {p_value}, Reject the null hypothesis. Hence students from the coaching center are better")
else:
print(f"p-value: {p_value}, Fail to reject the null hypothesis. Hence students from the coaching center are not better")
Output:
p-value:
0.031431210741779014, Reject the null hypothesis. Hence students from the
coaching center are better
Q4. What
is the delivery time?
A company claims that
the average time it takes to deliver a product to customers is 3 days.
The company's
delivery process is under scrutiny, and a sample of 25 delivery times is
collected. The sample mean delivery time is 3.5 days, and the population
standard deviation is known to be 0.8 days.
At a 5%
significance level, can we conclude that the average delivery
time is greater
than 3 days?
Conduct a one-sample
Z-test to determine the same. Also, evaluate the z-score for observed average
time.
A.
pvalue = 0.000889 ; Reject the null hypothesis; The average delivery time
is greater than 3 days
B.
pvalue = 0.000889 ; Reject the null hypothesis; The average delivery time
is greater than 3 days
C.
Z-score = 3.125
D.
Z-score = 1.5
Correct Options:
- pvalue = 0.000889 ; Reject the null
hypothesis; The average delivery time is greater than 3 days
- Z-score = 3.125
Explanation:
Based on the given problem, we define our hypothesis as:
- H0: The average delivery time is 3 days
- Ha: The average delivery time is greater than
3 days
We can test to see if the null hypothesis is true
or not by conducting Z-test
Code:
from scipy.stats import norm
# Population parameters
population_mean = 3
# Population average (Claimed) delivery time
population_stddev = 0.8 # Population standard deviation
# Sample statistics
sample_mean = 3.5 # Sample delivery time observed
sample_size = 25
# Calculate the standard error of the sample mean
standard_error = population_stddev / (sample_size
** 0.5)
# Calculate the Z-score
z_score = (sample_mean - population_mean) /
standard_error
# Calculate the p-value: Right Tailed test
p_value = 1 - norm.cdf(z_score)
# Significance level
alpha = 0.05
# Compare p-value with the significance level
if p_value < alpha:
print(f"Reject the null hypothesis. The average delivery time is
greater than 3 days")
else:
print(f"Fail to reject the null hypothesis. The average delivery
time is 3 days")
print(f"Z-score: {z_score}")
print(f"P-value: {p_value}")
Output:
Reject the null hypothesis. The average delivery time is greater than 3 days
Z-score: 3.125
P-value: 0.0008890252991083925