2 sample t test for sample sizes - 30,000 and 150,000 Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)How to perform t-test with huge samples?Safely determining sample size for A/B testingPower of the t-test under unequal sample sizesA/B test with unequal sample sizeIndependent samples t-test with unequal sample sizesHow to test for significance if groups differed at baseline?Sample size for A/B testingPaired T-Test and general linear hypothesisDetermining minimum required sample size for control (for purposes of measuring lift)Which hypothesis test to use to compare two data sets that have a lot of zeros?Non-inferiority margin and minimum detectable effect vs sample size
Can the van der Waals coefficients be negative in the van der Waals equation for real gases?
Protagonist's race is hidden - should I reveal it?
Why did Europeans not widely domesticate foxes?
lm and glm function in R
How to charge percentage of transaction cost?
Does traveling In The United States require a passport or can I use my green card if not a US citizen?
Would I be safe to drive a 23 year old truck for 7 hours / 450 miles?
Assertions In A Mock Callout Test
How do I overlay a PNG over two videos (one video overlays another) in one command using FFmpeg?
Determine the generator of an ideal of ring of integers
2 sample t test for sample sizes - 30,000 and 150,000
When does Bran Stark remember Jamie pushing him?
Can I ask an author to send me his ebook?
Why do C and C++ allow the expression (int) + 4*5?
Are there any AGPL-style licences that require source code modifications to be public?
Why isn't everyone flabbergasted about Bran's "gift"?
How can I introduce the names of fantasy creatures to the reader?
Why does my GNOME settings mention "Moto C Plus"?
How to leave only the following strings?
What is the evidence that custom checks in Northern Ireland are going to result in violence?
How to mute a string and play another at the same time
“Since the train was delayed for more than an hour, passengers were given a full refund.” – Why is there no article before “passengers”?
Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?
Converting a text document with special format to Pandas DataFrame
2 sample t test for sample sizes - 30,000 and 150,000
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)How to perform t-test with huge samples?Safely determining sample size for A/B testingPower of the t-test under unequal sample sizesA/B test with unequal sample sizeIndependent samples t-test with unequal sample sizesHow to test for significance if groups differed at baseline?Sample size for A/B testingPaired T-Test and general linear hypothesisDetermining minimum required sample size for control (for purposes of measuring lift)Which hypothesis test to use to compare two data sets that have a lot of zeros?Non-inferiority margin and minimum detectable effect vs sample size
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I have 2 samples, one with sample size of 30,000 customers and the other with 150,000. I have to perform a 2 sample t test(on conversion rates of the 2 groups). My question is, will t test in this case be biased towards the smaller sample? If yes, what is the correct approach to perform a test?
hypothesis-testing statistical-significance t-test ab-test
New contributor
$endgroup$
add a comment |
$begingroup$
I have 2 samples, one with sample size of 30,000 customers and the other with 150,000. I have to perform a 2 sample t test(on conversion rates of the 2 groups). My question is, will t test in this case be biased towards the smaller sample? If yes, what is the correct approach to perform a test?
hypothesis-testing statistical-significance t-test ab-test
New contributor
$endgroup$
6
$begingroup$
Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
$endgroup$
– StatsStudent
4 hours ago
add a comment |
$begingroup$
I have 2 samples, one with sample size of 30,000 customers and the other with 150,000. I have to perform a 2 sample t test(on conversion rates of the 2 groups). My question is, will t test in this case be biased towards the smaller sample? If yes, what is the correct approach to perform a test?
hypothesis-testing statistical-significance t-test ab-test
New contributor
$endgroup$
I have 2 samples, one with sample size of 30,000 customers and the other with 150,000. I have to perform a 2 sample t test(on conversion rates of the 2 groups). My question is, will t test in this case be biased towards the smaller sample? If yes, what is the correct approach to perform a test?
hypothesis-testing statistical-significance t-test ab-test
hypothesis-testing statistical-significance t-test ab-test
New contributor
New contributor
edited 6 hours ago
Shivam Tiwari
New contributor
asked 6 hours ago
Shivam TiwariShivam Tiwari
62
62
New contributor
New contributor
6
$begingroup$
Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
$endgroup$
– StatsStudent
4 hours ago
add a comment |
6
$begingroup$
Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
$endgroup$
– StatsStudent
4 hours ago
6
6
$begingroup$
Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
$endgroup$
– StatsStudent
4 hours ago
$begingroup$
Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
$endgroup$
– StatsStudent
4 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.
$^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).
$endgroup$
add a comment |
$begingroup$
Maybe a couple of examples will help to illustrate some of the issues.
Suppose the two populations are $X sim mathsfNorm(mu = 500, sigma =30)$
and $Y sim mathsfNorm(mu = 501, sigma = 20.)$
If both sample sizes are $150,000,$ then there is sufficient power to detect
the small difference in means.
set.seed(422)
x = rnorm(150000, 500, 30)
y = rnorm(150000, 501, 20)
t.test(x, y)
Welch Two Sample t-test
data: x and y
t = -10.983, df = 261530, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2042715 -0.8395487
sample estimates:
mean of x mean of y
499.9804 501.0023
If we use only the first 30,000 values in the first sample, results are
very nearly the same for most practical purposes.
t.test(x[1:30000], y)
Welch Two Sample t-test
data: x[1:30000] and y
t = -6.3728, df = 35463, p-value = 1.879e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.5126269 -0.8010336
sample estimates:
mean of x mean of y
499.8455 501.0023
Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):
Issues of minimal concern:
Even though labeled as 'Welch t tests', sample sizes are sufficiently large
that these are essentially t tests. Unless the data are very far from normal,
we would still detect the small difference in means.The power of the test is heavily dependent on the smaller sample size. But
power is not a concern here.
Issues warranting attention:
With such large samples
in the real world (not the simulation world),
one is entitled to wonder whether data are truly simple random samples from
their respective populations. Could smaller, more carefully collected samples provide better information?Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
it is OK for variances to differ. But would different variances have important practical implications?Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
taking the effort of check whether means are different? And what do the results
of the t test actually contribute to that purpose?
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404439%2f2-sample-t-test-for-sample-sizes-30-000-and-150-000%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.
$^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).
$endgroup$
add a comment |
$begingroup$
I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.
$^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).
$endgroup$
add a comment |
$begingroup$
I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.
$^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).
$endgroup$
I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.
$^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).
edited 4 hours ago
answered 6 hours ago
AdamOAdamO
35.2k265143
35.2k265143
add a comment |
add a comment |
$begingroup$
Maybe a couple of examples will help to illustrate some of the issues.
Suppose the two populations are $X sim mathsfNorm(mu = 500, sigma =30)$
and $Y sim mathsfNorm(mu = 501, sigma = 20.)$
If both sample sizes are $150,000,$ then there is sufficient power to detect
the small difference in means.
set.seed(422)
x = rnorm(150000, 500, 30)
y = rnorm(150000, 501, 20)
t.test(x, y)
Welch Two Sample t-test
data: x and y
t = -10.983, df = 261530, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2042715 -0.8395487
sample estimates:
mean of x mean of y
499.9804 501.0023
If we use only the first 30,000 values in the first sample, results are
very nearly the same for most practical purposes.
t.test(x[1:30000], y)
Welch Two Sample t-test
data: x[1:30000] and y
t = -6.3728, df = 35463, p-value = 1.879e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.5126269 -0.8010336
sample estimates:
mean of x mean of y
499.8455 501.0023
Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):
Issues of minimal concern:
Even though labeled as 'Welch t tests', sample sizes are sufficiently large
that these are essentially t tests. Unless the data are very far from normal,
we would still detect the small difference in means.The power of the test is heavily dependent on the smaller sample size. But
power is not a concern here.
Issues warranting attention:
With such large samples
in the real world (not the simulation world),
one is entitled to wonder whether data are truly simple random samples from
their respective populations. Could smaller, more carefully collected samples provide better information?Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
it is OK for variances to differ. But would different variances have important practical implications?Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
taking the effort of check whether means are different? And what do the results
of the t test actually contribute to that purpose?
$endgroup$
add a comment |
$begingroup$
Maybe a couple of examples will help to illustrate some of the issues.
Suppose the two populations are $X sim mathsfNorm(mu = 500, sigma =30)$
and $Y sim mathsfNorm(mu = 501, sigma = 20.)$
If both sample sizes are $150,000,$ then there is sufficient power to detect
the small difference in means.
set.seed(422)
x = rnorm(150000, 500, 30)
y = rnorm(150000, 501, 20)
t.test(x, y)
Welch Two Sample t-test
data: x and y
t = -10.983, df = 261530, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2042715 -0.8395487
sample estimates:
mean of x mean of y
499.9804 501.0023
If we use only the first 30,000 values in the first sample, results are
very nearly the same for most practical purposes.
t.test(x[1:30000], y)
Welch Two Sample t-test
data: x[1:30000] and y
t = -6.3728, df = 35463, p-value = 1.879e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.5126269 -0.8010336
sample estimates:
mean of x mean of y
499.8455 501.0023
Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):
Issues of minimal concern:
Even though labeled as 'Welch t tests', sample sizes are sufficiently large
that these are essentially t tests. Unless the data are very far from normal,
we would still detect the small difference in means.The power of the test is heavily dependent on the smaller sample size. But
power is not a concern here.
Issues warranting attention:
With such large samples
in the real world (not the simulation world),
one is entitled to wonder whether data are truly simple random samples from
their respective populations. Could smaller, more carefully collected samples provide better information?Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
it is OK for variances to differ. But would different variances have important practical implications?Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
taking the effort of check whether means are different? And what do the results
of the t test actually contribute to that purpose?
$endgroup$
add a comment |
$begingroup$
Maybe a couple of examples will help to illustrate some of the issues.
Suppose the two populations are $X sim mathsfNorm(mu = 500, sigma =30)$
and $Y sim mathsfNorm(mu = 501, sigma = 20.)$
If both sample sizes are $150,000,$ then there is sufficient power to detect
the small difference in means.
set.seed(422)
x = rnorm(150000, 500, 30)
y = rnorm(150000, 501, 20)
t.test(x, y)
Welch Two Sample t-test
data: x and y
t = -10.983, df = 261530, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2042715 -0.8395487
sample estimates:
mean of x mean of y
499.9804 501.0023
If we use only the first 30,000 values in the first sample, results are
very nearly the same for most practical purposes.
t.test(x[1:30000], y)
Welch Two Sample t-test
data: x[1:30000] and y
t = -6.3728, df = 35463, p-value = 1.879e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.5126269 -0.8010336
sample estimates:
mean of x mean of y
499.8455 501.0023
Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):
Issues of minimal concern:
Even though labeled as 'Welch t tests', sample sizes are sufficiently large
that these are essentially t tests. Unless the data are very far from normal,
we would still detect the small difference in means.The power of the test is heavily dependent on the smaller sample size. But
power is not a concern here.
Issues warranting attention:
With such large samples
in the real world (not the simulation world),
one is entitled to wonder whether data are truly simple random samples from
their respective populations. Could smaller, more carefully collected samples provide better information?Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
it is OK for variances to differ. But would different variances have important practical implications?Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
taking the effort of check whether means are different? And what do the results
of the t test actually contribute to that purpose?
$endgroup$
Maybe a couple of examples will help to illustrate some of the issues.
Suppose the two populations are $X sim mathsfNorm(mu = 500, sigma =30)$
and $Y sim mathsfNorm(mu = 501, sigma = 20.)$
If both sample sizes are $150,000,$ then there is sufficient power to detect
the small difference in means.
set.seed(422)
x = rnorm(150000, 500, 30)
y = rnorm(150000, 501, 20)
t.test(x, y)
Welch Two Sample t-test
data: x and y
t = -10.983, df = 261530, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2042715 -0.8395487
sample estimates:
mean of x mean of y
499.9804 501.0023
If we use only the first 30,000 values in the first sample, results are
very nearly the same for most practical purposes.
t.test(x[1:30000], y)
Welch Two Sample t-test
data: x[1:30000] and y
t = -6.3728, df = 35463, p-value = 1.879e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.5126269 -0.8010336
sample estimates:
mean of x mean of y
499.8455 501.0023
Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):
Issues of minimal concern:
Even though labeled as 'Welch t tests', sample sizes are sufficiently large
that these are essentially t tests. Unless the data are very far from normal,
we would still detect the small difference in means.The power of the test is heavily dependent on the smaller sample size. But
power is not a concern here.
Issues warranting attention:
With such large samples
in the real world (not the simulation world),
one is entitled to wonder whether data are truly simple random samples from
their respective populations. Could smaller, more carefully collected samples provide better information?Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
it is OK for variances to differ. But would different variances have important practical implications?Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
taking the effort of check whether means are different? And what do the results
of the t test actually contribute to that purpose?
edited 4 hours ago
answered 4 hours ago
BruceETBruceET
7,1561721
7,1561721
add a comment |
add a comment |
Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.
Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.
Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.
Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404439%2f2-sample-t-test-for-sample-sizes-30-000-and-150-000%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
$begingroup$
Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
$endgroup$
– StatsStudent
4 hours ago