Z - Test Contd.
Q1. Average hourly wage
The average hourly
wage of a sample of 150 workers
in plant 'A' was Rs.2·87 with
a standard deviation of
Rs. 1·08.
The average wage
of a sample of 200 workers
in plant 'B' was Rs. 2·56 with
a standard deviation of Rs. 1·28.
(i) Calculate the Z-score for this scenario.
(ii) Can an applicant safely assume that the hourly wages
paid by plant 'A' are higher than those paid by plant 'B' at a 1% significance
level?
A.
1.0071, Hourly wages in plant 'A' are not
higher than those in plant 'B'.
B.
1.0071, Hourly wages in plant 'A' are
higher than those in plant 'B'.
C.
2.4532, Hourly wages in plant 'A' are not
higher than those in plant 'B'.
D. 2.4532,
Hourly wages in plant 'A' are higher than those in plant 'B'.
Correct
Answer: 0.0071, Hourly wages in plant ‘A’ are higher than those in plant
‘B’.
Explanation:
Based on the given problem, we define our hypotheses as:
- H0: μ1=μ2,
i.e., the hourly wages paid by plant ‘A’ is equal to that paid by plant
‘B’.
- H1: μ1>μ2,
i.e., the hourly wages paid by plant ‘A’ are higher than that paid by
plant ‘B’.
Based
on this, we need to perform a Right one-tailed test.
Further,
we can calculate the Z-score for 2 Sample Z Test using the following formula:
Z = n1σ12+n2σ22xˉ1−xˉ2
where,
- Z:
The z-score, a standard normal variable used to determine the probability
of the observed difference between the two samples.
- xˉ1: The mean of the
first sample.
- xˉ2: The mean of the
second sample.
- σ1: The standard
deviation of the first population.
- σ2: The standard
deviation of the second population.
- n1: The size of the
first sample.
- n2: The size of the
second sample.
Code:
import numpy as np
from scipy import stats
# Define the function to
calculate the two-sample z-test
def TwoSampZTest(samp_mean_1, samp_mean_2,
samp_std_1, samp_std_2, n1, n2):
denominator = np.sqrt((samp_std_1**2 / n1)
+ (samp_std_2**2 / n2))
z_score = (samp_mean_1 - samp_mean_2) /
denominator
return z_score
# Given data for plant A
sample_mean_A = 2.87
sample_std_A = 1.08
sample_size_A = 150
# Given data for plant B
sample_mean_B = 2.56
sample_std_B = 1.28
sample_size_B = 200
# Set the significance level
significance_level = 0.01
# Calculate the z-score using
the function
z_score = TwoSampZTest(sample_mean_A,
sample_mean_B, sample_std_A, sample_std_B, sample_size_A, sample_size_B)
# Calculate the one-tailed
p-value
p_value =
1-stats.norm.cdf(z_score)
# Compare the p-value to the
significance level
if p_value <
significance_level:
conclusion = "Reject the null
hypothesis. Hourly wages in plant 'A' are higher than those in plant 'B' at a
1% significance level."
else:
conclusion = "Fail to reject the null
hypothesis. No significant difference in hourly wages between plant 'A' and 'B'
at a 1% significance level."
# Print the results
print(f'z-score:
{z_score:.4f}')
print(f'p-value:
{p_value:.4f}')
print('Conclusion:',
conclusion)
Output:
z-score: 2.4532
p-value: 0.0071
Conclusion: Reject the null hypothesis. Hourly wages in plant ‘A’ are higher
than those in plant ‘B’ at a 1% significance level.
Q2. Complexity of SQL queries
The Head of Data Analyst Department is conducting a
comparative analysis of the complexity of SQL queries written by two analysts,
namely Analyst X and Analyst Y.
He has gathered data on the number of lines of code for
each SQL query.
Analyst X's SQL lines of code:
[15, 18, 20, 17, 16, 19, 22, 16, 18, 21]
Analyst Y's SQL lines of code: [14, 17, 19, 16, 15, 18,
21, 15, 17, 20]
The analyst hypothesizes that Analyst Y writes less
complex code compared to Analyst X. To investigate this hypothesis, conduct an
appropriate test with a 90% confidence interval.
A.
P-value: 0.8345, There is significant
evidence that Analyst Y writes less complex code compared to Analyst X.
B.
P-value: 0.1654, There is no significant
evidence that Analyst Y writes less complex code compared to Analyst X.
C.
P-value: 0.1654, There is significant
evidence that Analyst Y writes less complex code compared to Analyst X.
D.
P-value: 0.8345, There is no significant
evidence that Analyst Y writes less complex code compared to Analyst X.
Correct
Answer: P-value: 0.1654, There is no significant evidence that Analyst Y
writes less complex code compared to Analyst X.
Explanation:
Based on the given question, we define our hypothesis as:
- Null
Hypothesis: Analyst Y writes code with the same complexity as Analyst X (μY=μX)
- Alternative
Hypothesis: Analyst Y writes less complex code compared to Analyst X (μY<μX)
Hence,
we would have to conduct a Left Tailed 2 Sample Z-Test.
Code:
from statsmodels.stats import
weightstats as stests
import numpy as np
# Number of lines of code for
SQL queries by Analyst X
sql_lines_X = [15, 18, 20, 17,
16, 19, 22, 16, 18, 21]
# Number of lines of code for
SQL queries by Analyst Y
sql_lines_Y = [14, 17, 19, 16,
15, 18, 21, 15, 17, 20]
# Perform two-sample Z-test
z_score, p_value =
stests.ztest(sql_lines_Y, sql_lines_X, alternative ='smaller')
# Confidence level
confidence_level = 0.90
alpha = 1 - confidence_level
# Print the results
print(f"Z-score:
{z_score}")
print(f"P-value:
{p_value}")
# Decision Rule
if p_value < alpha:
print("Reject the null hypothesis.
There is significant evidence that Analyst Y writes less complex code compared
to Analyst X.")
else:
print("Fail to reject the null
hypothesis. There is no significant evidence that Analyst Y writes less complex
code compared to Analyst X.")
Output:
Z-score:
-0.9723055853282467
P-value: 0.1654492730143623
Fail to reject the null hypothesis. There is no significant evidence that
Analyst Y writes less complex code compared to Analyst X.
Q3. Rice
and Wheat
Out of a sample of 1,000 people residing in Maharashtra, 540 are rice eaters, while the
rest consume wheat primarily.
Can we assume that rice and wheat are equally popular in this state
at a 5% significance
level?
A. P-value:
0.01115, Rice and wheat are not equally popular
B. P-value:
0.02149, Rice and wheat are equally popular
C. P-value:
0.02149, Rice and wheat are not equally popular
D. P-value:
0.01115, Rice and wheat are equally popular
Correct
option: P-value:
0.01115, Rice and wheat are not equally popular
Explanation:
import statsmodels.api as sm
# H0: Both rice and wheat are equally popular in the State (P = 0.5)
# Ha: Both rice and wheat are not equally popular in the State( P ≠ 0.5)(two-tailed test).
# Given data
total_population = 1000
rice_eaters = 540
wheat_eaters = total_population - rice_eaters
assumed_proportion = 0.5 # Assuming equal popularity of rice and wheat
# Hypothesis test
z_stat, p_value = sm.stats.proportions_ztest(rice_eaters, total_population, assumed_proportion, alternative='two-sided')
print("Z-statistic:", z_stat)
print("P-value:", p_value)
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis. Rice and wheat are not equally popular in Maharashtra at a 5% significance level.")
else:
print("Fail to reject the null hypothesis. There is no significant difference in the popularity of rice and wheat in Maharashtra at a 5% significance level.")
Output:
Z-statistic: 2.537956625422939
P-value: 0.011150180283180655
Reject the null hypothesis. Rice and wheat are not equally popular in
Maharashtra at a 5% significance level.
A state senator cannot decide how to vote on an
environmental protection bill.
The senator decides to request a survey and if the
proportion of registered voters supporting the bill exceeds 0.60, she will vote for it.
A random sample of 750 voters is selected and 495 are found to support the
bill.
Conduct an appropriate test at a 90% confidence interval.
A. P-value:
0.00039, There is no evidence to suggest that the proportion of registered
voters supporting the bill is greater than 0.60
B. P-value:
0.00039, There is evidence to suggest that the proportion of registered voters
supporting the bill is greater than 0.60
C. P-value: 0.999,
There is no evidence to suggest that the proportion of registered voters
supporting the bill is greater than 0.60
D. P-value: 0.999,
There is evidence to suggest that the proportion of registered voters
supporting the bill is greater than 0.60
Correct
Option: P-value: 0.00039, There is evidence to suggest that the proportion of
registered voters supporting the bill is greater than 0.60
Explanation:
Based
on the given problem, we define our hypothesis as:
Null
Hypothesis: The proportion of registered voters supporting the bill is less
than or equal to 0.60 (p≤0.60)
Alternative
Hypothesis: The proportion of registered voters supporting the bill is greater
than 0.60(p>0.60)
Hence
we would need to perform a Right Tailed Z Proportion Test.
We can
solve this problem using the following code:
Code:
import
scipy.stats as stats
import
math
# Given
data
sample_size
= 750
observed_support
= 495
hypothesized_proportion
= 0.60
confidence_level
= 0.90
#
Calculate the sample proportion
sample_proportion
= observed_support / sample_size
#
Calculate the standard error
standard_error
= math.sqrt((hypothesized_proportion * (1 - hypothesized_proportion)) /
sample_size)
#
Calculate the Z-score
z_stat
= (sample_proportion - hypothesized_proportion) / standard_error
#
Calculate the p-value by conducting Right Tailed Test
p_value
= 1 - stats.norm.cdf(z_stat)
print("Z-statistic:",
z_stat)
print("P-value:",
p_value)
alpha =
1 - confidence_level
if
p_value < alpha:
print("Reject the null hypothesis. There
is evidence to suggest that the proportion of registered voters supporting the
bill is greater than 0.60.")
else:
print("Fail to reject the null
hypothesis. There is no evidence to suggest that the proportion of registered
voters supporting the bill is greater than 0.60.")
Output:
Z-statistic:
3.354101966249688
P-value:
0.0003981150787953913
Reject
the null hypothesis. There is evidence to suggest that the proportion of
registered voters supporting the bill is greater than 0.60.
Q5. Find the Hypotheses
A fair coin should land showing
tails with a relative frequency of 50% in a long
series of flips.
John was told by a friend that spinning a
coin on a flat surface, rather than flipping it
would not be fair. Spinning would cause the coin to
be more biased towards giving tails.
To test this claim, he spun his own penny 100
times. It was observed that the penny showed tails in 60% of
the spins.
Let p represent the proportion of
spins that this penny would land showing tails.
What are appropriate hypotheses for
John's significance test?
A. Null: p = 50%, Alternative: p > 50%
B. Null: p = 50%, Alternative: p > 50%
C. Null: p = 60%, Alternative: p > 60%
D. Null: p = 60%, Alternative: p < 60%
Correct
option: Null:
p = 50%, Alternative: p > 50%
Explanation:
Null Hypothesis (H0):
- The
null hypothesis represents the assumption that there is no difference from
the expected proportion of tails for a fair coin,
- i.e.
H0: p=50
Alternative
Hypothesis (H1):
- The
alternative hypothesis expresses the claim being tested, which is that
spinning the penny makes it more likely to land showing tails, implying
that the proportion of tails may be greater than 50%,
- i.e.
H1: p>50
Q6. Quidditch teams
The Quidditch teams at Hogwarts conducted tryouts for two
positions: Chasers and Seekers.
In Group Chasers, out of 90 students who tried out, 57
were selected. In Group Seekers, out of 120 students who tried out, 98 were
selected.
Is there a significant difference in the proportion of
students selected for Chasers and Seekers positions?
Conduct a test at 90% confidence level.
A.
P-value: 0.00278, There is a significant
difference in the proportion of students selected for Chasers and Seekers
positions.
B.
P-value: 0.00278, There is no significant
difference in the proportion of students selected for Chasers and Seekers
positions.
C.
P-value: 0.00461, There is a significant
difference in the proportion of students selected for Chasers and Seekers
positions.
D. P-value:
0.00461, There is no significant difference in the proportion of students
selected for Chasers and Seekers positions
Correct Answer: P-value: 0.00278, There is a significant
difference in the proportion of students selected for Chasers and Seekers
positions.
Explanation:
# Null Hypothesis: There is no significant difference in the proportion of students selected for Chasers and Seekers positions at Hogwarts.
# Alternative Hypothesis: There is a significant difference in the proportion of students selected for Chasers and Seekers positions at Hogwarts.
import statsmodels.api as sm
# Data for Chasers
selected_chasers = 57
total_chasers = 90
# Data for Seekers
selected_seekers = 98
total_seekers = 120
# Perform two-sample Z-proportion test
z_stat, p_value = sm.stats.proportions_ztest([selected_chasers, selected_seekers], [total_chasers, total_seekers], alternative = 'two-sided')
# Confidence level
confidence_level = 0.90
# Calculate the critical value for a two-tailed test
alpha = 1 - confidence_level
# Print the results
print(f"Z-statistic: {z_stat}")
print(f"P-value: {p_value}")
# Decision Rule
if p_value < alpha:
print("Reject the null hypothesis. There is a significant difference in the proportion of students selected for Chasers and Seekers positions.")
else:
print("Fail to reject the null hypothesis. There is no significant difference in the proportion of students selected for Chasers and Seekers positions.")
Output:
Z-statistic: -2.990306921349541
P-value: 0.002786972588958094
Reject the null hypothesis. There is a significant difference in the proportion
of students selected for Chasers and Seekers positions.
Q7. Best Season of Naruto
As a product manager, you want to evaluate the user
satisfaction for two different seasons of Naruto Shippuden (Season 1 and Season
2).
You collected feedback from 250 viewers who watched Season
1 of Naruto Shippuden, and 120 expressed satisfaction. Similarly, for Season 2,
you gathered data from 300 viewers, and 150 of them expressed satisfaction.
Conduct an appropriate test at a 95% confidence
interval to determine if there's a higher user satisfaction for
Season 2 than for Season 1.
A.
P-value: 0.3202. There is no significant
evidence of higher user satisfaction for Season 2.
B.
P-value: 0.3202. There is significant
evidence of higher user satisfaction for Season 2.
C.
P-value: 0.6798. There is no significant
evidence of higher user satisfaction for Season 2.
D.
P-value: 0.6798. There is significant
evidence of higher user satisfaction for Season 2.
Correct
Option: P-value:
0.3202. There is no significant evidence of higher user satisfaction for Season
2.
Explanation:
Based on the given problem, we define our hypothesis as:
- Null
Hypothesis: The proportion of satisfied users for Season 2 is equal to or
less than the proportion for Season 1.
- Alternative
Hypothesis: The proportion of satisfied users for Season 2 is higher than
the proportion for Season 1.
We can
solve this problem using the concept of Z Proportion Test
Code:
import numpy as np
from
statsmodels.stats.proportion import proportions_ztest
# Given data
n_season1, x_season1 = 250,
120
n_season2, x_season2 = 300,
150
# Perform z-test for
proportions
z_stat, p_value =
proportions_ztest(count=[x_season2, x_season1], nobs=[n_season2, n_season1],
alternative='larger')
# Display results
print(f"Z-statistic:
{z_stat:.4f}")
print(f"P-value:
{p_value:.4f}")
# Compare with critical value
(e.g., for 95% confidence level)
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis.
There is evidence of higher user satisfaction for Season 2 than Season
1.")
else:
print("Fail to reject the null
hypothesis. There is no significant evidence of higher user satisfaction for
Season 2.")
Output:
Z-statistic: 0.4672
P-value: 0.3202
Fail to reject the null hypothesis. There is no significant evidence of higher
user satisfaction for Season 2.
Q8. Assess Customer
Satisfaction
A company is surveying to assess customer satisfaction
with two different support approaches.
The company collects feedback from customers subjected to
each approach and wants to compare the satisfied customers.
Which statistical test would be most appropriate for the
company to compare the satisfied customers between the two support approaches,
and what would be the relevant null hypothesis?
A.
One-sample z-test for mean, H0: The
proportion of satisfied customers is different for the two customer support
approaches.
B.
Two-sample z-test for mean, H0: The
proportion of satisfied customers is the same for both customer support
approaches.
C.
One-sample z-proportion test H0: The
proportion of satisfied customers is different for the two customer support
approaches.
D.
Two-sample z-proportion test, H0: The
proportion of satisfied customers is the same for both customer support
approaches.
Correct
Answer: Two-sample
z-proportion test, H0: The proportion of satisfied customers is the same for
both customer support approaches.
Explanation:
In this
scenario, the company is comparing the proportion of satisfied customers
between two different groups (support approaches).
Therefore, we need a statistical test that compares the proportions between two
independent samples.
- One-sample
z-test for mean: This is not suitable as it compares the mean of a single
sample to a known mean.
- Two-sample
z-test for mean: This is not applicable as we are dealing with
proportions, not means.
- One-sample
z-proportion test: This is only suitable for comparing the proportion of a
single sample to a known proportion.
- Two-sample
z-proportion test: This is the best option as it specifically compares the
proportions of two independent samples.
Null
Hypothesis (H0): The
proportion of satisfied customers is the same for both customer support
approaches.
Alternative
Hypothesis (H1): The
proportion of satisfied customers is different for the two customer support
approaches.
By
performing a two-sample z-proportion test, the company can statistically assess
whether the observed difference in customer satisfaction between the two
support approaches is simply due to chance or reflects a real difference in the
effectiveness of the approaches.
Q1. Server A or B?
An IT team is comparing
the response times of two different web servers, Server A and Server B, under a
specific load. They have collected response time data for a sample of requests.
- Server
A: Mean response time of 120 milliseconds
from 30 requests, with a standard deviation of 15 milliseconds.
- Server
B: Mean response time of 110 milliseconds
from 35 requests, with a standard deviation of 12 milliseconds.
Conduct an appropriate
test to determine if there is a significant difference in the mean response
times between the two servers. Assume a 5% significance level.
A.
p-value: 0.0017, There is a significant
difference in the mean response times between Server A and Server B.
B.
p-value: 0.0017, There is no significant
difference in the mean response times between Server A and Server B.
C.
p-value: 0.0033, There is a significant
difference in the mean response times between Server A and Server B.
D.
p-value: 0.0033, There is no significant
difference in the mean response times between Server A and Server B.
Correct
Option: p-value:
0.0033, There is a significant difference in the mean response times between
Server A and Server B.
Explanation:
Based on the given problem, we define our hypothesis as:
- Null
Hypothesis: The mean response time of Server A is equal to the mean
response time of Server B.(μA=μB)
- Alternative
Hypothesis: The mean response time of Server A is not equal to the mean
response time of Server B.(μA=μB)
We can
solve this problem using the concept of 2 Tailed 2 Sample Z-test
Code:
import numpy as np
from scipy import stats
# Define the function to
calculate the two-sample Z-test
def TwoSampZTest(samp_mean_1,
samp_mean_2, samp_std_1, samp_std_2, n1, n2):
# Calculate the test statistic
denominator = np.sqrt((samp_std_1**2 / n1) +
(samp_std_2**2 / n2))
z_score = (samp_mean_1 - samp_mean_2) /
denominator
return z_score
# Given data for Server A
mean_A = 120
std_dev_A = 15
sample_size_A = 30
# Given data for Server B
mean_B = 110
std_dev_B = 12
sample_size_B = 35
# Significance level
significance_level = 0.05
# Calculate the z-score using
the function
z_score = TwoSampZTest(mean_A,
mean_B, std_dev_A, std_dev_B, sample_size_A, sample_size_B)
# Calculate the two-tailed
p-value
p_value = 2 * (1 -
stats.norm.cdf(abs(z_score)))
# Compare the p-value to the
significance level
if p_value <
significance_level:
conclusion = "Reject the null
hypothesis. There is a significant difference in the mean response times
between Server A and Server B."
else:
conclusion = "Fail to reject the null
hypothesis. There is no significant difference in the mean response times
between Server A and Server B."
# Print the results
print(f'z-score:
{z_score:.4f}')
print(f'p-value:
{p_value:.4f}')
print('Conclusion:',
conclusion)
Output:
z-score: 2.9343
p-value: 0.0033
Conclusion: Reject the null hypothesis. There is a significant difference in
the mean response times between Server A and Server B.
Q2. Bullseye
A group of archers claims that they can hit the bullseye
with a success rate of 70%. To test this claim, a random sample of 100 shots is
taken, and 65 of them hit the bullseye.
Is there significant evidence to suggest that the archer’s
actual success rate is greater than 70% at a 95% confidence level?
A.
P-value: 0.1376, There is no significant
evidence to suggest that the archers' actual success rate is greater than 70%
B.
P-value: 0.8623, There is no significant
evidence to suggest that the archers' actual success rate is greater than 70%.
C.
P-value: 0.1376, There is no significant
evidence to suggest that the archers' actual success rate is greater than 70%.
D.
P-value: 0.8623, There is significant
evidence to suggest that the archers' actual success rate is greater than 70%.
Correct
Answer: P-value:
0.8623, There is no significant evidence to suggest that the archers’ actual
success rate is greater than 70%.
Explanation:
Based on the given question, we define our hypothesis as:
- Null
Hypothesis: Success rate of this group of archers is 70%, i.e. p=0.7
- Alternative
Hypothesis: Success rate of this group of archers is greater than 70%,
i.e. p>0.7
Code:
import numpy as np
import scipy.stats as stats
# Data
successes = 65 # Number of successful shots
total_shots = 100 # Total number of shots
claimed_success_rate =
0.70 # Claimed success rate by the
archers
# Calculate the sample
proportion
sample_proportion = successes
/ total_shots
# Calculate the standard error
standard_error =
np.sqrt((claimed_success_rate * (1 - claimed_success_rate)) / total_shots)
# Calculate the Z-score
z_stat = (sample_proportion - claimed_success_rate)
/ standard_error
# Calculate the p-value for a
right-tailed test
p_value = 1 -
stats.norm.cdf(z_stat)
print("Z-statistic:",
z_stat)
print(f"P-value:
{p_value}")
# Confidence level
confidence_level = 0.95
alpha = 1 - confidence_level
if p_value < alpha:
print("Reject the null hypothesis.
There is significant evidence to suggest that the archers' actual success rate
is greater than 70%.")
else:
print("Fail to reject the null
hypothesis. There is no significant evidence to suggest that the archers'
actual success rate is greater than 70%.")
Output:
Z-statistic:
-1.0910894511799603
P-value: 0.8623832379625824
Fail to reject the null hypothesis. There is no significant evidence to suggest
that the archers’ actual success rate is greater than 70%.
Q3. Are they comparable
You are testing two drugs as a remedy. Drug A is
effective in 41 out of a sample of 195. Drug B works
on 351 out of 605 people.
Are the two drugs comparable in
terms of effectiveness? Use a 5% significance level for testing.
Perform an appropriate test.
A.
P-value: 2.566e-19, the proportions of
Drug A and Drug B are significantly different.
B.
P-value: 2.566e-19, no significant
difference in the proportions of Drug A and Drug B
C.
P-value: 1.896e-9, the proportions of
Drug A and Drug B are significantly different.
D. P-value: 1.896e-9, no significant difference in the proportions of Drug
A and Drug B
Correct option: P-value: 2.566e-19, The proportions of Drug A
and Drug B are significantly different.
Explanation:
import numpy as np
import statsmodels.api as sm
# H0: The proportions are the same.
# H1: The proportions are different.
# Given data for Drug A
success_A = 41
sample_size_A = 195
# Given data for Drug B
success_B = 351
sample_size_B = 605
# Perform the two-proportion Z-test
z_stat, p_value = sm.stats.proportions_ztest([success_A, success_B], [sample_size_A, sample_size_B], alternative='two-sided')
# Significance level
alpha = 0.05
# Print the results
print(f"Z-statistic: {z_stat}")
print(f"P-value: {p_value}")
# Decision Rule
if p_value < alpha:
print("Reject the null hypothesis. The proportions of Drug A and Drug B are significantly different.")
else:
print("Fail to reject the null hypothesis. There is no significant difference in the proportions of Drug A and Drug B.")
Output:
Z-statistic: -8.985900954503084
P-value: 2.566230446480293e-19
Reject the null hypothesis. The proportions of Drug A and Drug B are
significantly different.
Q4. Post Engagement Rate
As a social media analyst, you want to compare the
engagement rates of posts from two different accounts (Account X and Account
Y).
You collected data on 180 posts from Account X, where 40
received high engagement. Similarly, you collect data on 200 posts from Account
Y, where 60 received high engagement.
Conduct an appropriate test at a 95%
confidence interval to determine if there's a significant difference
in high engagement proportions between the two accounts.
A.
P-value: 0.042. There is a significant
difference in high engagement proportions between Account X and Account Y.
B.
P-value: 0.042. There is no significant
difference in high engagement proportions between Account X and Account Y.
C.
P-value: 0.085. There is a significant
difference in high engagement proportions between Account X and Account Y.
D.
P-value: 0.085. There is no significant
difference in high engagement proportions between Account X and Account Y.
Correct
Option: P-value:
0.085. There is no significant difference in high engagement proportions
between Account X and Account Y.
Explanation:
Based on the given problem, we define our hypothesis as:
- Null
Hypothesis: The proportion of posts with high engagement is the same for
Account X and Account Y.
- Alternative
Hypothesis: The proportion of posts with high engagement is different for
Account X and Account Y.
We can
solve this using the concepts of Z Proportion.
Code:
import numpy as np
from
statsmodels.stats.proportion import proportions_ztest
# Given data
posts_X = 180
high_engagement_X = 40
posts_Y = 200
high_engagement_Y = 60
# Calculate sample proportions
p_X = high_engagement_X /
posts_X
p_Y = high_engagement_Y /
posts_Y
# Conduct a two-sample
z-proportion test
count =
np.array([high_engagement_X, high_engagement_Y])
nobs = np.array([posts_X,
posts_Y])
z_stat, p_value =
proportions_ztest(count, nobs, alternative='two-sided')
# Display results
print("Z-statistic:",
z_stat)
print("P-value:",
p_value)
# Check for significance
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis.
There is a significant difference in high engagement proportions between
Account X and Account Y.")
else:
print("Fail to reject the null
hypothesis. There is no significant difference in high engagement proportions
between Account X and Account Y.")
Output:
Z-statistic: -1.7191729277636834
P-value: 0.08558288874449103
Fail to reject the null hypothesis. There is no significant difference in high
engagement proportions between Account X and Account Y.