T-test with Population Data

Zachariah Rutledge

Join Date: Jun 2019

Posts: 44
#1

T-test with Population Data

30 Apr 2025, 17:35

I have payroll data for 2 years for a particular firm (representing data from the population and not just a sample). I want to test whether the average share of female hours worked is different between the two years. Do I need to run a T-test or is there a different method for testing differences between averages using population data?
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5024
#2

30 Apr 2025, 18:40

That is a good question. If you have the entire population, why use significance tests? But some have argued that you should. For example, see

Williams, R., & Bornmann, L. Sampling issues in bibliometric analysis. Journal of Informetrics (2016), http://dx.doi.org/10.1016/j.joi.2015.11.004

Berk, R. A., Western, B., & Weiss, R. E. (1995b). Statistical inference for apparent populations. Sociological Methodology, 25, 421–458. https://doi.org/10.2307/271073

Bollen, K. A. (1995). Apparent and nonapparent significance tests. Sociological Methodology, 25, 459–468. https://doi.org/10.2307/271074

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Zachariah Rutledge

Join Date: Jun 2019

Posts: 44
#3

30 Apr 2025, 21:18

Hi Richard,
Thanks for taking the time to respond. I reviewed the sources you sent. It seems that these papers suggest we can treat a population as a sample if we are willing to make the argument that the data represent some type of sample from a super-population (such that the data on the population are not really a true population). If I am not willing to make that assumption, can I just simply identify the means in the population data and show that they are different over time? On a related note, if I want to identify whether the share of females has decreased over time, can I use a regression model and identify the coefficient on a trend variable? If yes to the latter, do I just ignore the standard errors and p-values and just rely upon the coefficient value, or should I also rely upon the significance level to determine if the trend variable is significantly different from zero?

Last edited by Zachariah Rutledge; 30 Apr 2025, 21:34.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4493
#4

01 May 2025, 05:43

the use, even requirement, for p-values and CIs with a "population" can be defended on many other grounds as well; as an example, not that in the US, Courts always want p-values/CIs even with a full population (e.g., all of a company's employees when looking at possible discrimination in pay or promotion or termination); they recognize that the world is stochastic and I, as a statistician, agree with that; in my more than 30 years serving as a statistical expert in all kinds of court cases (something I no longer do), I did see people attempt to get away without p-values/CIs and I know of no Court that accepted that - this does not mean that you have to use these - I have no idea who your audience is
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#5

01 May 2025, 08:00

Rich G is right. Regardless of what you personally think, people expect P values and you should accommodate them.

Unless your firm began and died during the period studied, I think sample from superpopulation is reasonable. Quoting my paper,

As Bollen (1995) notes, data from one year may be followed by data from later years. Hence, you really do not have the entire population, you just have information from one or more points in time.

Researchers from Canada's Manitoba Center for Health Policy (2001) (p. 1) reached similar conclusions: [A majority of us] reached the conclusion that even when one has data on the full population, one only has that data cross-sectionally in time. In a sense, the data can be viewed as a sample from possible states in the Province as they unfold over time. Therefore, it made sense to us to try to indicate whether differences which are certainly real across units are statistically significant when one considers the data to be a one-time sample of the unfolding of the universe.

A second rationale, and a perhaps more compelling one, is to think of observed cases as repeated trials that are products of an underlying stochastic process. If we tossed a coin 100 times, we would not think that we had the entire population of coin tosses; a different set of tosses is possible and, because of chance factors, would likely yield somewhat different results. As Berk, Western, and Weiss (1995b) (p. 423) explain, …the data are treated as a ‘realization’ of some set of social process that could have in principle produced a very large number of other realizations. These realizations, in turn, constitute a super population. That is, the data could have been different as a result of random sampling from the ‘super population.’ Then, conventional statistical inference is applied as usual. An apparent population has now become a random sample.

As per Rich's point,

By way of analogy, a public opinion poll may estimate, subject to some degree of sampling error, who is leading in an election. But, once the election has been held we no longer need to estimate the levels of support because we know who actually got the most votes. However, in situations similar to institutional evaluations, it is actually quite common to go ahead and perform significance tests and compute CIs anyway. Bielby (2013), for example, notes that significance tests are widely used in class action employment lawsuits even when all employee records are available for analysis.

As my paper also notes, there are strong counter-argumens. See Soc Methodology 1995 for a lively back and forth discussion.

But, even if you totally disagree with the superpopulation idea, I recommend humoring those who believe otherwise. They won't believe you otherwise and the significance tests will usually make your case more compelling anyways. You can always summarize the debate in your paper and argue that no matter what you believe, the evidence supports your argument.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Zachariah Rutledge

Join Date: Jun 2019

Posts: 44
#6

01 May 2025, 08:04

Hi Rich Goldstein,
Thanks for the response. My audience is indeed a court, and I am being asked to test for significance on the change in an outcome variable over time. I just wanted to make sure that using CIs and p-values with data from the full population is appropriate. I am an applied econometrician, and I rarely have access to data from a full population, so I am just doing my due diligence to make sure I can defend my interpretation of the results. Do you have any references I can cite that use full population data and report p-values or significance levels? That would give me an option to cite existing sources that have used the same approach.

Last edited by Zachariah Rutledge; 01 May 2025, 08:06.
Comment
Zachariah Rutledge

Join Date: Jun 2019

Posts: 44
#7

01 May 2025, 08:07

Hi Richard Williams,
Thanks for the response. Can you provide full citations of the sources you cited that lean towards using significance tests with population data. It will help me as I organize my argument. Thanks in advance.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4493
#8

01 May 2025, 08:42

I presume that one of the lawyers is your client - that person should have lots of citations that they think are directly relevant to the issue in front of the court and I think that is the best way to go if you feel you need references; as I said, when I was doing litigation support, no references were needed as the Court always expected to see these values
Comment
Zachariah Rutledge

Join Date: Jun 2019

Posts: 44
#9

01 May 2025, 08:58

Hi Rich Goldstein,
Thanks for the response. I will ask the lawyers for references.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#10

01 May 2025, 12:09

Zachariah - I already provided the urls which link to the full articles. If you can’t download or otherwise access them, message me and I can probably get you copies, at least of my own article.

i doubt if any court wil blast you for using significance tests. You might get in trouble, say, if you claimed the differences between men and women were statistically insignificant, and it was countered that tests didn’t matter, in the population there were differences. But even if it was agreed that the differences were real, the substantive significance or causes of any differences might be argued over.

A point that may or may not matter to you: a company argued that each branch location needed to be examined separately, and because the Ns were so small it was often difficult to get statistically significant results. But if you looked at the company as a whole, it was clear that significant discrimination was going on. People get paid big bucks for these things and I imagine it is challenging for expert witnesses to make clear what the right way and the wrong way to do things are.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment

Announcement

T-test with Population Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment