Machine Learning - Deep Learning: -Z-test

Showing posts with label -Z-test. Show all posts

Z-test

Z-test

Q1. Short state

A country has a population average height of 65 inches with a standard deviation of 2.5. A person feels people from his state are shorter. He takes the average of 20 people and sees that it is 64.5.

At a 5% significance level (or 95% confidence level), can we conclude that people from his state are shorter, using the Z-test? What is the p-value?

A. p-value = 0.186, and hence people from the state are shorter

B. p-value = 0.815, and hence people from the state are shorter

C. p-value = 0.186, and hence people from the state are not shorter

D. p-value = 0.815, and hence people from the state are not shorter

Correct Answer: p-value = 0.186, and hence people from the state are not shorter

Explanation:

# Null Hypothesis (H0): People from the person's state have height equal to that of the national average height. (i.e.,μ = 65)

# Alternative Hypothesis (Ha): People from the person's state are shorter than the national average height. (i.e.,μ < 65)

import scipy.stats as stats

# Population parameters

population_mean = 65 # Population average height

population_stddev = 2.5 # Population standard deviation

# Sample statistics

sample_mean = 64.5

sample_size = 20 # Average of 20 people

# Calculate the standard error of the sample mean

standard_error = population_stddev / (sample_size ** 0.5)

# Calculate the Z-score

z_score = (sample_mean - population_mean) / standard_error

Q2. Pastries Produced Per Day

A French cafe has historically maintained that their average daily pastry production is at most 500.

With the installation of a new machine, they assert that the average daily pastry production has increased. The average number of pastries produced per day over a 70-day period was found to be 530.

Assume that the population standard deviation for the pastries produced per day is 125.

Perform a z-test with the critical z-value = 1.64 at the alpha (significance level) = 0.05 to evaluate if there's sufficient evidence to support their claim of the new machine producing more than 500 pastries daily.

Note: Round off the z-score to two decimal places.

A. The computed z-score is 0.24, and since 0.24 is less than 1.64, the null hypothesis cannot be rejected.

B. The computed z-score is 0.24, and since 0.24 is less than 1.64, the null hypothesis is rejected.

C. The computed z-score is 2.01, and since 1.64 is less than 2.01, the null hypothesis cannot be rejected.

D. The computed z-score is 2.01, and since 1.64 is less than 2.01, the null hypothesis is rejected.

Correct option: The computed z-score is 2.01, and since 1.64 is less than 2.01, the null hypothesis is rejected.



Explanation:



The null and alternate hypotheses are:



Null hypothesis (H₀): the average number of pastries produced per day is less than or equal to 500 (μ≤500)

Alternate hypothesis (Ha): the average number exceeds 500 (μ>500)



Approach 1:



z-score = SEM(samplemean−populationmean)



where, SEM = standard error of the mean = (samplesize)populationstd



SEM = 70125 = 14.940



Therefore, z-score = 14.940530−500= 2.01



Also, a critical Z-Score for the given significance level (0.05) can be obtained using the following code: from scipy.stats import norm

norm.ppf(0.95)

Thus, we get the critical z-value = 1.64

Since, the observed z-score (2.01) > critical z-value(1.64), we reject the null hypothesis for the above right-tailed test.
Thus, we conclude that the average number exceeds 500.

Approach 2:

# Import necessary library

import scipy.stats as stats



# Define sample mean, standard deviation, and sample size

sample_mean = 530

population_std = 125

sample_size = 70

population_mean = 500



# Calculate z-score

z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))



# Round z-score to two decimal places

z_score = round(z_score, 2)

print(f"z-score: {z_score}")



# Set critical z-value and confidence level

confidence_level = 0.95

critical_z = stats.norm.ppf(confidence_level)

print("critical z-value:",critical_z)



# Check if the z-score is greater than the critical z-value



if z_score > critical_z:

    print("Reject the null hypothesis. The shop's claim is supported by the data.")

else:

    print("Fail to reject the null hypothesis. There is not enough evidence to support the shop's claim.")

Output:

z-score: 2.01
critical z-value: 1.6448536269514722
Reject the null hypothesis. The shop’s claim is supported by the data.

Q3. India runs on Chai

The Chai Point stall at Bengaluru airport estimates that each person visiting the store drinks an average of 1.7 small cups of tea.

Assume a population standard deviation of 0.5 small cups. A sample of 30 customers collected over a few days averaged 1.85 small cups of tea per person.

Test the claim using an appropriate test at an alpha = 0.05 significance value, with a critical z-score value of ±1.96.

Note: Round off the z-score to two decimal places.

A. The computed z-score is 1.64, and since 1.64 is less than 1.96, the null hypothesis cannot be rejected.

B. The computed z-score is 1.64, and since 1.64 is less than 1.96, the null hypothesis is rejected.

C. The computed z-score is 2.33, and since 1.96 is less than 2.33, the null hypothesis cannot be rejected.

D. The computed z-score is 2.33, and since 1.96 is less than 2.33, the null hypothesis is rejected.

Correct option: The computed z-score is 1.64, and since 1.64 is less than 1.96, the null hypothesis cannot be rejected.

Explanation:

The null and alternate hypotheses are:

Null hypothesis (H₀): the average number of small cups of tea per customer is equal to 1.7 (μ=1.7)
Alternate hypothesis (H_a): the average number is either greater than or less than 1.7. (μ=1.7)

Approach 1:

z-score = SEM(samplemean−populationmean)

where, SEM = standard error of the mean = (samplesize)populationstd

SEM = 300.5

Therefore, z-score = 300.51.85−1.7= 1.64

Since 1.64 < 1.96, we fail to reject the null hypothesis.
Thus, we conclude that the average number of small cups of tea per customer is equal to 1.7.

Approach 2:

# Import necessary library

import scipy.stats as stats

import numpy as np



# Define sample mean, standard deviation, and sample size

sample_mean = 1.85

population_std = 0.5

sample_size = 30

population_mean = 1.7



# Calculate z-score

z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))



# Round z-score to two decimal places

z_score = np.round(z_score, 2)



# Set alpha and critical z-score (use two-tailed since direction is unknown)

alpha = 0.05

critical_z = 1.96



# Check if the z-score is greater than the critical z-value

if abs(z_score) > critical_z:

   print(f"z-score: {z_score}")

   print("Reject the null hypothesis. The average tea consumption is likely different from the estimate.")

else:

   print(f"z-score: {z_score}")

   print("Fail to reject the null hypothesis. There is not enough evidence to support a difference from the estimated average.")

Output:

z-score: 1.64 Fail to reject the null hypothesis.
There is not enough evidence to support a difference from the estimated average.

Q4. Web Application Response

A data scientist is looking at how a web application responds, with an average response time of 250 milliseconds and a standard deviation of 30 milliseconds.

Find the critical value for a 96% confidence level.

A. 311.6125

B. 278.5203

C. 412.8915

D. 262.6293

Correct Option: 311.6125

Explanation:
We can calculate the critical value for a 96% confidence level, using Python code as shown below:

import scipy.stats as stats

# Given values

confidence_level = 0.96

mean = 250  # Mean response time in milliseconds

std_deviation = 30  # Standard deviation in milliseconds

# A two-tailed test, considering both possibilities: the average response time could be higher or lower than 250 milliseconds.

# Calculate the critical Z-score for a 96% confidence level

critical_z = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Calculate the critical value using Z-score formula

critical_value = (critical_z * std_deviation) + mean

print(f"Critical Value: {critical_value:.4f}")

Output:
Critical Value: 311.6125

Q5. CI and Conclusion

A marketing team aims to estimate the average time, visitors spend on their website.

They gathered a random sample of 100 visitors and determined that the average time spent on the website was 4.5 minutes.

The team is working under the assumption that the population's mean time spent on the website is 4.0 minutes, with a standard deviation of 1.2 minutes.

Their goal is to estimate the true time spent on the website with a 95% confidence level. Calculate the confidence interval values and make a conclusion based on the calculated interval.

A. (3.892,4.287), The average time spent on the website is not 4.0 minutes

B. (3.892,4.287), The average time spent on the website is 4.0 minutes

C. (4.264,4.735), The average time spent on the website is not 4.0 minutes

D. (4.264,4.735), The average time spent on the website is 4.0 minutes

Correct Option: (4.264,4.735), The average time spent on the website is not 4.0 minutes

Explanation:
Based on the given problem, we define our hypothesis as:

Null Hypothesis (H0): The population mean time spent on the website is 4.0 minutes.
Alternative Hypothesis (H1): The population mean time spent on the website is not 4.0 minutes.

Now we need to

Calculate the confidence interval with 95% confidence
Perform Hypothesis test and make a conclusion, on it’s basis.

We can use a Python code like:

import scipy.stats as stats

import numpy as np

from scipy.stats import norm

# Given data

population_mean = 4.0

sample_mean = 4.5

population_stddev = 1.2

sample_size = 100

alpha = 0.05 # Significance level (1 - alpha will give us the confidence level)

# Calculate the critical value (Z) for a two-tailed test at the given alpha level

z_critical = norm.ppf(1 - alpha / 2)

# Calculate the margin of error

margin_of_error = z_critical * (population_stddev / np.sqrt(sample_size))

# Calculate the confidence interval

confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print("Confidence Interval:", confidence_interval)

# Check if the population mean (4.0) falls within the confidence interval

if confidence_interval[0] <= population_mean <= confidence_interval[1]:

print("The population mean falls within the confidence interval. Then we fail to reject the null hypothesis")

else:

print("The population mean does not fall within the confidence interval. Then we reject the null hypothesis")

Output:
Confidence Interval: (4.264804321855194, 4.735195678144806)
The population mean does not fall within the confidence interval. Then we reject the null hypothesis

Q6. Significance of Power

A researcher is conducting a hypothesis test with a significance level (α) of 0.05.

The null hypothesis is that there is no effect, and the alternative hypothesis is that there is a significant effect. The researcher calculates the power of the test to be 0.80.

What does a power of 0.80 signify in this context?

A. The probability of committing a Type I error.

B. The probability of correctly rejecting a false null hypothesis.

C. The probability of failing to reject a true null hypothesis.

D. The significance level used in the test.

Correct Option: The probability of correctly rejecting a false null hypothesis.

Explanation:
The power of a statistical test is the probability of correctly rejecting a false null hypothesis (i.e., detecting a true effect if it exists).

If the probability of type II error is given by the value of β, then power can be found as: Power=1−β

In this case, a power of 0.80 indicates an 80% chance of correctly finding a significant effect when the alternative hypothesis is true.

Q7. Institution's claim

It is known that the mean IQ of high school students is 100, and the standard deviation is 15.

A coaching institute claims that candidates who study there have more IQ than an average high school student. When the IQ of 50 candidates was calculated, the average turned out to be 110

Conduct an appropriate hypothesis test to test the institute’s claim, with a significance level of 5%

A. p-value: 1.214e-06, Candidates who study at this coaching institution have more IQ than an average high school student.

B. p-value: 1.451e-02, Candidates who study at this coaching institution have more IQ than an average high school student.

C. p-value: 1.451e-02, Candidates who study at this coaching institution have the same IQ as an average high school student.

D. p-value: 1.214e-06, Candidates who study at this coaching institution have the same IQ as an average high school student.

Correct Answers: p-value: 1.214e-06, Candidates who study at this coaching institution have more IQ than an average high school student.

Explanation:

# Null Hypothesis (H0): The average IQ of candidates from the institution is the same as the population's average IQ.(μ = 100)

# Alternative Hypothesis (Ha): The average IQ of candidates from the institution is higher than the population's average IQ.(μ > 100)

import scipy.stats as stats

# Given values

population_mean = 100

population_std = 15

sample_mean = 110

sample_size = 50

significance_level = 0.05

# Calculate the standard error (standard deviation of the sample mean)

standard_error = population_std / (sample_size**0.5)

# Calculate the Z-score

Z = (sample_mean - population_mean) / standard_error

# Calculate the p-value for a one-tailed test

p_value = 1 - stats.norm.cdf(Z)

# Determine whether to reject the null hypothesis

if p_value < significance_level:

    conclusion = "Reject the null hypothesis.Candidates who study at this coaching institution have more IQ than an average high school student."

else:

    conclusion = "Fail to reject the null hypothesis. Candidates who study at this coaching institution have the same IQ as an average high school student."

print(f"Z-score: {Z}")

print(f"P-value: {p_value}")

print(f"Conclusion: {conclusion}")

Output:

Z-score: 4.714045207910317
P-value: 1.2142337364462463e-06
Conclusion: Reject the null hypothesis.Candidates who study at this coaching institution have more IQ than an average high school student.

Q8. Smokers

When smokers smoke, nicotine is transformed into cotinine, which can be tested.

The average cotinine level in a group of 50 smokers was 243.5 ng ml.

Assuming that the standard deviation is known to be 229.5 ng ml.

Test the assertion that the mean cotinine level of all smokers is equal to 300.0 ng ml, at 95% confidence.

A. P-value: 0.0408, the mean cotinine level of all smokers is not equal to 300.0 ng/ml

B. P-value: 0.0408 , the mean cotinine level of all smokers is equal to 300.0 ng/ml.

C. P-value: 0.0817 , the mean cotinine level of all smokers is not equal to 300.0 ng/ml

D. P-value: 0.0817 , the mean cotinine level of all smokers is equal to 300.0 ng/ml.

Correct Answer:P-value: 0.0817 , the mean cotinine level of all smokers is equal to 300.0 ng/ml.

Explanation:

# Null Hypothesis (H0): The mean cotinine level of all smokers is equal to 300.0 ng/ml. (µ = 300.0 ng)

# Alternative Hypothesis (Ha): The mean cotinine level of all smokers is not equal to 300.0 ng/ml. (µ ≠ 300.0 ng)

import scipy.stats as stats

# Given values

sample_mean = 243.5  # Sample mean cotinine level

population_std = 229.5  # Known population standard deviation

population_mean = 300.0  # Hypothesized population mean

sample_size = 50  # Sample size

confidence_level = 0.95  # 95% confidence level

# Calculate the Z-score

standard_error = population_std / (sample_size**0.5)

Z = (sample_mean - population_mean) / standard_error

# Calculate the p-value for a two-tailed test

p_value = 2 * (1 - stats.norm.cdf(abs(Z)))

# Determine whether to reject the null hypothesis

alpha = 1 - confidence_level

if p_value < alpha:

    conclusion = "Reject the null hypothesis which means the mean cotinine level of all smokers is not equal to 300.0 ng/ml "

else:

    conclusion = "Fail to reject the null hypothesis which means the mean cotinine level of all smokers is equal to 300.0 ng/ml. "

print(f"Z-score: {Z}")

print(f"P-value: {p_value}")

print(f"Conclusion: {conclusion}")

Output:

Z-score: -1.7408075440976007
P-value: 0.08171731915149638
Conclusion: Fail to reject the null hypothesis which means the mean cotinine level of all smokers is equal to 300.0 ng/ml.

Q1. Quality control analysis

For a quality control analysis, a factory assesses the tensile strength of a sample of steel rods.

The sample exhibits a mean tensile strength of 750 MPa with a sample standard deviation of 50 MPa, while the known population mean is 800 MPa.

Calculate Cohen's d for this quality control study.

A. 0.5

B. -1.0

C. -0.5

D. 1.0

E. Insufficient Information

Correct Option: -1.0

Explanation:
For a one-sample test comparing a sample mean to a known population mean, the effect size can be calculated using:

d = (Sample Mean - Population Mean) / Sample Standard Deviation
= (750 - 800) / 50
= -1.0

Q2. Water Regulation

The student hostel office at IIT Madras estimates that each student uses more than 3.5 buckets of water per day.

In order to verify this claim, the college trustees decide to monitor the water consuption over the next 45 days, and it is found that on an average, 3.72 buckets of water is consumed by a student, per day.

Assume that the population standard deviation is 0.7 buckets. What is the critical sample mean, assuming a critical z-value of 1.28?

Note: The critical sample mean is defined as the mean value for which the z-score is equal to the critical value. Also, round off the final answer to three decimal places.

A. 3.634

B. 3.511

C. 3.720

D. 3.691

Correct option: 3.634

Explanation:
Approach 1:
z-score = SEM(samplemean−populationmean)

where, SEM = standard error of the mean = (samplesize)populationstd

Rearranging the terms, we have ,

sample_mean = (criticalz−value)×(S.E.M)+(populationmean)

sample_mean= (45)1.28×0.7 + 3.5 = 1.28×0.1043+3.5 = 3.634
Approach 2:

import math

# Given values

population_mean = 3.5

population_std = 0.7

critical_z_value = 1.28

sample_size = 45

# Calculate the critical sample mean

critical_sample_mean = population_mean + (critical_z_value * (population_std / math.sqrt(sample_size)))

# Round off the answer to three decimal places

critical_sample_mean = round(critical_sample_mean, 3)

print("Critical Sample Mean:", critical_sample_mean)

Output:
Critical Sample Mean: 3.634

Q3. Testing efficacy of improving GRE score

The verbal reasoning in the GRE has an average score of 150 and a standard deviation of 8.5.

A coaching centre claims that their students are better. An average of 10 people showed that students from this coaching centre have an average score of 155.

At a 5% significance level (or 95% confidence level), can we conclude that students from the coaching centre are better? Use the Z-test, and compute the p-value.

A. p-value = 0.03, and hence students from the coaching center are better

B. p-value = 0.96, and hence students from the coaching center are better

C. p-value = 0.03, and hence students from the coaching center are not better

D. p-value = 0.96, and hence students from the coaching center are not better

Correct Answer: p-value = 0.03, and hence students from the coaching center are better

Explanation:

# Null Hypothesis (H0): The average verbal reasoning score of students from the coaching centre is the same as the national average verbal reasoning score.(μ = 150)

# Alternative Hypothesis (Ha): The average verbal reasoning score of students from the coaching centr2 is better than the national average verbal reasoning score. (μ > 150)

import scipy.stats as stats

# Given data

mu = 150  # Population average (national)

sigma = 8.5  # Population standard deviation

n = 10  # Sample size

sample_mean = 155  # Sample mean

# Calculate the standard error of the mean (SEM)

sem = sigma / (n**0.5)

# Calculate the Z-score

Z = (sample_mean - mu) / sem

# Calculate the p-value for the right-tailed test

p_value = 1 - stats.norm.cdf(Z)

# Set the significance level (alpha)

alpha = 0.05

# Compare the p-value to the significance level

if p_value < alpha:

    print(f"p-value: {p_value}, Reject the null hypothesis. Hence students from the coaching center are better")

else:

    print(f"p-value: {p_value}, Fail to reject the null hypothesis. Hence students from the coaching center are not better")

Output:

p-value: 0.031431210741779014, Reject the null hypothesis. Hence students from the coaching center are better

Q4. What is the delivery time?

A company claims that the average time it takes to deliver a product to customers is 3 days.

The company's delivery process is under scrutiny, and a sample of 25 delivery times is collected. The sample mean delivery time is 3.5 days, and the population standard deviation is known to be 0.8 days.

At a 5% significance level, can we conclude that the average delivery time is greater than 3 days?

Conduct a one-sample Z-test to determine the same. Also, evaluate the z-score for observed average time.

A. pvalue = 0.000889 ; Reject the null hypothesis; The average delivery time is greater than 3 days

B. pvalue = 0.000889 ; Reject the null hypothesis; The average delivery time is greater than 3 days

C. Z-score = 3.125

D. Z-score = 1.5

Correct Options:

pvalue = 0.000889 ; Reject the null hypothesis; The average delivery time is greater than 3 days
Z-score = 3.125

Explanation:
Based on the given problem, we define our hypothesis as:

H0: The average delivery time is 3 days
Ha: The average delivery time is greater than 3 days

We can test to see if the null hypothesis is true or not by conducting Z-test

Code:

from scipy.stats import norm

# Population parameters

population_mean = 3 # Population average (Claimed) delivery time

population_stddev = 0.8 # Population standard deviation

# Sample statistics

sample_mean = 3.5 # Sample delivery time observed

sample_size = 25

# Calculate the standard error of the sample mean

standard_error = population_stddev / (sample_size ** 0.5)

# Calculate the Z-score

z_score = (sample_mean - population_mean) / standard_error

# Calculate the p-value: Right Tailed test

p_value = 1 - norm.cdf(z_score)

# Significance level

alpha = 0.05

# Compare p-value with the significance level

if p_value < alpha:

print(f"Reject the null hypothesis. The average delivery time is greater than 3 days")

else:

print(f"Fail to reject the null hypothesis. The average delivery time is 3 days")

print(f"Z-score: {z_score}")

print(f"P-value: {p_value}")

Output:
Reject the null hypothesis. The average delivery time is greater than 3 days
Z-score: 3.125
P-value: 0.0008890252991083925

Machine Learning - Deep Learning

Z-test

About Machine Learning

SOFTWARE ENGINEERING