Perform one-sample and two-sample Welch's t-tests on raw data, then review the t-statistic, p-value, degrees of freedom.
Last updated
T-test calculator for raw-data hypothesis checks, confidence bounds, and significance planning Use this t test calculator to run a one-sample or independent-samples Welch t-test from raw data, then inspect the selected-tail p-value, rejection region, effect size, confidence interval, and how far the observed mean difference sits from statistical significance.
A prompt-grade t-test workflow should do more than return a t-statistic. This page helps you decide whether your design needs a one- or two-tailed test, how large the current mean gap really is, and whether a statistically significant result would also be practically meaningful.
Quick examples
Test type
Alternative hypothesis
Is the sample mean different from μ₀?
Significance level
Before you trust a t-test result
Use independent groups for the two-sample mode. Paired before-and-after data need a paired t-test instead.
Choose a one-tailed hypothesis only if the direction was justified before you looked at the data.
Interpret the p-value alongside the confidence interval and effect size, not in isolation.
Test summary
p < 0.001
One-sample t-test using the two-sided alternative. The sample mean is 2.12 against μ₀ = 0. The observed gap already clears the current significance threshold.
18.31
t-statistic
< 0.001
Selected-tail p-value
4
Degrees of freedom
Reject H₀
Decision at α = 0.05
6.55
Large effect size
0.32
Mean gap needed at this α
Reject H₀ at α = 0.05 Large effect (6.55). A statistically significant result answers the directional test you selected, while the effect size helps decide whether the observed gap is large enough to matter outside the hypothesis test.
T-test calculator — one-sample and two-sample Welch's t-test
A t test calculator answers a practical hypothesis-testing question: does one sample mean differ from a hypothesised value, or do two independent groups differ from each other enough to matter? This page works as a one sample t test calculator and an independent-samples Welch t test calculator from raw data, then extends the output into selected-tail p-values, confidence bounds, effect-size context, and significance-planning guidance.
When to use a t-test calculator
Use a one-sample t-test when you want to compare one sample mean against a fixed or hypothesised value. Use a two-sample t-test when you want to compare two independent groups and see whether their means differ.
The inputs here should be continuous numeric values entered as raw data, not percentages, proportions, or paired measurements. If your data are before-and-after readings from the same people or matched pairs, a paired t-test is the correct analysis instead.
One-sample t-test
A one-sample t-test asks: "Is the mean of my sample significantly different from a specific value μ₀?" For example, you may measure the weight of 10 packages and want to know whether the mean differs from the labelled weight of 500 g.
t = (x̄ − μ₀) / (s / √n), where x̄ is the sample mean, μ₀ is the hypothesised mean, s is the sample standard deviation, and n is the sample size. The t-statistic is compared against the t-distribution with df = n − 1 to obtain a two-tailed p-value.
Two-sample Welch's t-test
A two-sample t-test asks: "Are the means of two independent groups significantly different?" Welch's variant does not assume equal variances, so it is more robust than the classic Student's t-test when spreads differ or sample sizes are uneven.
t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂). Degrees of freedom are computed using the Welch–Satterthwaite equation, which accounts for unequal variances and sample sizes. That is why many t-test calculator searches now specifically ask for Welch's t-test.
Reading the output
The t-statistic shows the standardized difference relative to the spread in the data. Degrees of freedom affect the shape of the t-distribution, and the p-value tells you how unusual the result would be under the null hypothesis.
If p is below your significance level α, usually 0.05, you reject H₀. If p is larger, the data do not provide strong evidence of a difference. Statistical significance is not the same thing as practical importance, so effect size still matters.
Choosing a two-sided or one-sided t-test
A two-sided t-test asks whether the mean difference is simply non-zero. That is the conservative default and the right choice whenever higher and lower outcomes would both matter. A one-sided t-test asks a narrower question: is the sample mean specifically greater than the target, or specifically lower than it? The same logic applies to two independent groups.
Choose a one-tailed test only when the direction was justified before you looked at the data. Otherwise the page should be used as a two-sided t test calculator. Switching to a one-sided alternative after seeing the sign of the sample difference is a common way to overstate evidence.
Worked example
Take the default one-sample dataset on the page: 2.0, 2.5, 1.8, 2.2, and 2.1, tested against a hypothesised mean of 2.0. The sample mean is 2.12, so the observed difference is small. The calculator converts that difference into a t-statistic by dividing by the standard error, then uses the t-distribution with n − 1 degrees of freedom to compute the two-tailed p-value.
For the default two-sample example, the page compares one cluster around 2 with another cluster around 3. Because the groups are clearly separated, the t-statistic becomes large in magnitude and the p-value drops well below common α thresholds. The value of the calculator is that it shows the numeric evidence directly instead of stopping at a generic statement that the groups are different.
Confidence intervals and effect size matter as much as the p-value
A p-value only tells you how compatible the observed data are with the null model. It does not tell you how large the mean gap is or whether that gap is useful in practice. That is why the page now pairs the test result with a confidence interval on the mean difference and an effect-size estimate.
If a two-sided confidence interval for the mean difference still crosses zero, the result is not statistically decisive at that α level. If it excludes zero, the direction and plausible size of the effect become easier to explain. Effect size adds another layer: a tiny p-value with a negligible standardized difference can still describe a result that is real but operationally small.
Using the calculator as a planning tool
Many searchers are not only checking whether the current dataset is significant. They also want to know how close the result is to the decision boundary. The planning rows on this page answer that follow-up question by showing the minimum mean gap needed at the current α level and how much additional difference would be required in the tested direction if the present result falls short.
That is especially useful when you are interpreting a borderline p-value. Instead of saying only that the test is or is not significant, you can estimate whether the missing gap is trivial, moderate, or so large that the current design is unlikely to produce a meaningful signal without a different intervention or much lower noise.
What to do with paired data
If your observations are before-and-after measurements on the same subjects, or matched pairs, use a paired t-test instead of this calculator. Pairing changes the structure of the data, so treating the samples as independent groups would overstate the error.
That same rule applies to repeated measures from the same units over time. In those cases, a paired test or a repeated-measures method is the better choice.
How to prepare raw data for a t-test calculator
This page expects raw numeric observations separated by commas, spaces, or line breaks. Do not enter summary statistics, percentages, or already-averaged values unless those numbers really are the full observations. A t test from raw data works because the calculator can estimate the mean, variability, and standard error directly from the observations.
Before interpreting the result, inspect the data for obvious entry errors, impossible values, or strong outliers. A t-test is fairly robust, especially with moderate sample sizes, but one or two extreme points can still shift the mean and inflate the standard deviation enough to change the decision.
Common t-test calculator mistakes
A common mistake is mixing up independent groups with paired observations. Another is entering summary statistics when the page expects raw values. A third is focusing only on whether the p-value is smaller than 0.05 without checking whether the effect itself is large enough to matter in practice.
It is also easy to overstate the certainty of a result when sample sizes are tiny or the data contain strong outliers. Even though a t-test is quite robust, unusual values can still drag the mean and standard deviation enough to change the final conclusion. That is why it helps to inspect the data, not just the final p-value.
Frequently asked questions
When should I use a one-sample vs two-sample t-test?
Use a one-sample t-test when you are comparing a sample mean to a known or theoretical value (μ₀). Use a two-sample t-test when you have two independent groups and want to compare their means.
What is the difference between a two-sided and a one-sided t-test?
A two-sided t-test checks whether the mean difference is non-zero in either direction. A one-sided t-test checks only one directional claim, such as whether Group 1 is greater than Group 2 or whether the sample mean is below a target. Use a one-sided test only when that direction was justified before seeing the data; otherwise a two-sided test is the safer default.
Why Welch's t-test instead of Student's t-test?
Student's t-test assumes the two groups have equal variances (homoscedasticity). Welch's test makes no such assumption and performs better when variances are unequal. It has similar power when variances are equal, so it is preferred as the default.
Can I use this for paired samples?
No. Paired data need a paired t-test because the observations are linked. This calculator is for one-sample tests and two-sample tests with independent groups.
Can I enter summary statistics instead of raw values?
No. This page is built as a t test calculator from raw data, so it needs the actual observations to estimate the mean, sample standard deviation, and standard error. If you only have n, mean, and SD, you need a summary-statistics workflow instead.
What sample size do I need for a t-test?
A minimum of 2 values per group is required. For reliable results, n ≥ 20–30 per group is generally recommended. Smaller samples have less power and the t-distribution approximation is less accurate if the data are highly non-normal.
What does the p-value mean in a t-test?
The p-value is the probability of seeing a t-statistic at least this extreme if the null hypothesis were true. A small p-value suggests the observed mean difference would be unusual under the null, which is why results below the chosen α threshold are usually treated as statistically significant.
What does it mean if the confidence interval still crosses zero?
If the confidence interval for the mean difference crosses zero, the data are still compatible with both a positive and a negative effect at that confidence level. That usually lines up with a non-significant result for the matching two-sided test. The interval is useful because it shows not only whether zero is plausible, but also how large the effect could still be.
Should I use a one-tailed or two-tailed t-test?
Use a two-tailed t-test when any difference matters, whether the sample mean is higher or lower. Use a one-tailed test only when the direction was specified before seeing the data and only one direction is relevant to the research question. This calculator reports the more conservative two-tailed p-value.
Why can a result be statistically significant but still not practically important?
Statistical significance only tells you that the data are hard to explain under the null model at the chosen α. It does not say whether the mean difference is large enough to matter in real life. That is why this page also reports effect-size context and the observed gap in the original units.
How should I choose between α = 0.10, 0.05, and 0.01?
Lower α values make the test stricter and reduce false positives, but they also require stronger evidence. α = 0.05 is the common default. α = 0.10 is sometimes used for exploratory work, while α = 0.01 is more conservative when false positives are costly. The right choice depends on the decision context, not on which threshold makes the result look better.