Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sample Size

    Dear All,

    I have 5 independent variables in my model, most of which have 15 yearly data points. I'm I justified to run this model? Of course, the data conforms to all CNLRM assumptions. Provide necessary reference materials for your answer.

    Thanks!

  • #2
    The standard rule of thumb is 10, but there's some differences in recommendations.
    • Peduzzi, P., et al. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology.
    • Green, S.B. (1991). How many subjects does it take to do a regression analysis. Multivariate Behavioral Research, 26(3), 499-510.
    • Austin, P.C., & Steyerberg, E.W. (2015). The number of subjects per variable required in linear regression analyses. Journal of Clinical Epidemiology, 76, 16-28.
    • Jenkins, David G. & Quintana-Ascencio, Pedro F. (2020). A solution to minimum sample size for regressions. Plos One, 15, 1-15.

    Comment


    • #3
      Also, the statement that your data confirms to all the CNLRM assumptions seems overly optimistic to me. That is never true in real data. The best you can hope for is a reasonable approximation. Your sample size is so small that it becomes hard for you to detect even sizable deviations. So in your case it is hard to determine whether the unavoidable deviations are reasonable or not.

      This is an unfortunate situation: in large samples the assumptions are easier to check, but mostly irrelevant. In small samples the assumptions are hard to check, but important.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Also, the statement that your data confirms to all the CNLRM assumptions seems overly optimistic to me. That is never true in real data. The best you can hope for is a reasonable approximation. Your sample size is so small that it becomes hard for you to detect even sizable deviations. So in your case it is hard to determine whether the unavoidable deviations are reasonable or not.

        This is an unfortunate situation: in large samples the assumptions are easier to check, but mostly irrelevant. In small samples the assumptions are hard to check, but important.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Thanks for the references, George Ford. I will take a look at them. I hope your advice is that I am okay, right?

          Maarten Buis, please clarify on your point. Do you mean that the standard CNLRM assumptions always hold for small samples? Please provide references.

          Comment


          • #6
            No, these assumptions are hardly ever true in real data. I was doubting your statement that "Of course, the data conforms to all CNLRM assumptions." I do believe that you could not find any deviations from those assumptions. However that is just because It extremely hard to detect deviations from assumptions in such small samples. Trying to find deviations from assumptions in a small dataset is like searching for something while being blindfolded. If you can't find the thing you are looking for, then it is theoretically possible that it does not exist, but the more likely explanation is that it has something to do with the blindfold... Same with not finding deviations from assumptions in a small dataset: the most likely reason is that your dataset is too small to find those deviations. The bad news is that these deviations will still mess up your model even if you cannot detect them.

            I was also commenting on the tragedy that the assumptions become more important (deviations from the assumptions are more likely to influence the results) in smaller samples, while at the same time detecting deviations from these assumptions becomes harder in smaller samples.

            As to references: any decent intro stats book.

            In short: small samples suck

            A little bit less short: In small samples the assumptions are more important, but harder to check.

            George Ford can answer for himself, but rules of thumbs I am familiar with are 10 observations per independent variable. So for 5 independent variables you need at least 50 observations. Alternatively, with 15 observations you can have 1 independent variable. So your study is in real trouble.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment

            Working...
            X