Sample Size

Murry Siyasiya

Join Date: May 2022

Posts: 6
#1

Sample Size

08 Jun 2025, 11:21

Dear All,

I have 5 independent variables in my model, most of which have 15 yearly data points. I'm I justified to run this model? Of course, the data conforms to all CNLRM assumptions. Provide necessary reference materials for your answer.

Thanks!
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3187
#2

08 Jun 2025, 16:34

The standard rule of thumb is 10, but there's some differences in recommendations.
Peduzzi, P., et al. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology.

Green, S.B. (1991). How many subjects does it take to do a regression analysis. Multivariate Behavioral Research, 26(3), 499-510.

Austin, P.C., & Steyerberg, E.W. (2015). The number of subjects per variable required in linear regression analyses. Journal of Clinical Epidemiology, 76, 16-28.

Jenkins, David G. & Quintana-Ascencio, Pedro F. (2020). A solution to minimum sample size for regressions. Plos One, 15, 1-15.
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#3

09 Jun 2025, 02:31

Also, the statement that your data confirms to all the CNLRM assumptions seems overly optimistic to me. That is never true in real data. The best you can hope for is a reasonable approximation. Your sample size is so small that it becomes hard for you to detect even sizable deviations. So in your case it is hard to determine whether the unavoidable deviations are reasonable or not.

This is an unfortunate situation: in large samples the assumptions are easier to check, but mostly irrelevant. In small samples the assumptions are hard to check, but important.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#4

09 Jun 2025, 02:32

Also, the statement that your data confirms to all the CNLRM assumptions seems overly optimistic to me. That is never true in real data. The best you can hope for is a reasonable approximation. Your sample size is so small that it becomes hard for you to detect even sizable deviations. So in your case it is hard to determine whether the unavoidable deviations are reasonable or not.

This is an unfortunate situation: in large samples the assumptions are easier to check, but mostly irrelevant. In small samples the assumptions are hard to check, but important.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Murry Siyasiya

Join Date: May 2022

Posts: 6
#5

11 Jun 2025, 03:23

Thanks for the references, George Ford. I will take a look at them. I hope your advice is that I am okay, right?

Maarten Buis, please clarify on your point. Do you mean that the standard CNLRM assumptions always hold for small samples? Please provide references.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#6

11 Jun 2025, 05:47

No, these assumptions are hardly ever true in real data. I was doubting your statement that "Of course, the data conforms to all CNLRM assumptions." I do believe that you could not find any deviations from those assumptions. However that is just because It extremely hard to detect deviations from assumptions in such small samples. Trying to find deviations from assumptions in a small dataset is like searching for something while being blindfolded. If you can't find the thing you are looking for, then it is theoretically possible that it does not exist, but the more likely explanation is that it has something to do with the blindfold... Same with not finding deviations from assumptions in a small dataset: the most likely reason is that your dataset is too small to find those deviations. The bad news is that these deviations will still mess up your model even if you cannot detect them.

I was also commenting on the tragedy that the assumptions become more important (deviations from the assumptions are more likely to influence the results) in smaller samples, while at the same time detecting deviations from these assumptions becomes harder in smaller samples.

As to references: any decent intro stats book.

In short: small samples suck

A little bit less short: In small samples the assumptions are more important, but harder to check.

George Ford can answer for himself, but rules of thumbs I am familiar with are 10 observations per independent variable. So for 5 independent variables you need at least 50 observations. Alternatively, with 15 observations you can have 1 independent variable. So your study is in real trouble.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

Comment

Comment

Comment

Comment

Comment