Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data estimator with small sample size, which to use?

    Dear all,

    I have an unbalanced panel (xtset id year) of 22 countries for 25 years (with total observations = 286). I am interested in finding the best estimator to run a regression (with max 7-8 regressors). I am also taking into consideration non parametric estimators (npregress kernel), but just as a comparison as I am not sure they perform well in small sample sizes. So far, I understood that OLS (xtreg, fe), being the only estimator with known small sample properties, is comparatively better than others (such as FGLS) in small sample sizes.

    Two questions:
    1) Which estimator is more appropriate? (also several)
    2) Another issue concerns the numbers of regressors, are they too many?

    Additional informations:
    a) Mundlak test to determine whether I should use a Fixed- or Random-effect model: with chi2( 7) = 29.47 and Prob > chi2 = 0.0001, the test suggests to go for a Fixed-effect model.
    b) Autocorrelation (xtserial): the null hypothesis of no serial correlation is strongly rejected.
    c) Heteroskedasticity (lrtest, xttest3): the null hypothesis of homoskedasticity is strongly rejected.

    Thanks for the support,
    Alessandro
    Last edited by Alessandro Franconi; 09 Dec 2020, 09:11.

  • #2
    Alessandro:
    welcome to this forum.
    Since you have a T>N panel dataset, I would consider -xtregar-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo, I am now with a strongly balanced panel with T=13 and N=22, same question: which estimator would you advise?

      What about the number of regressors, do you think with such a small sample 7 regressors are too many?

      Thanks a lot,
      Alessandro
      Last edited by Alessandro Franconi; 10 Dec 2020, 08:40.

      Comment


      • #4
        Below is an extract from my recently revised lecture notes; I am no longer teaching but the purpose of posting it is to point out that there are no magic solutions. If Jeff Wooldridge sees this he may (want to) react.

        The basic assumption underlying the panel data models developed above is that the number of groups (G) is large, and the number of periods (T) within each group small and fixed. The statistical properties of panel data models are derived, with a few exceptions, assuming that G goes to infinity (asymptotic theory) with T fixed. The assumption that T is small and fixed means that we need not be concerned about the temporal characteristics of the sample.
        At the opposite end we have large T and small G. For instance, when we have data on a small number of regions in a country but a long time series for each region. In such cases we cannot robustify the variances and covariances of the residuals because we have a large number of variances and covariances and few clusters. One alternative is to model the time series characteristics of the residuals, for instance, with the Stata command, ‒xtgee‒, or the user written command –xtscc-. This way we have fewer parameters to estimate. If the number of groups is small, one may want to model each group separately, allowing the regression coefficients to vary between groups and including dynamic effects. In this case, asymptotic properties will rely on T going to infinity. Another possibility is to model the groups together using multi-equation (systems) methods.
        Where the borderline between small, fixed T and large G, and large increasing T but small, fixed G lies is not clear. If the groups are too big relative to number of groups (T is too big relative to G) the estimates of the variances and covariances will be very imprecise.
        A further complication arises if we have cross dependence between groups as often arises with regional data: two contiguous regions, especially if they are small, will most likely have correlated residuals. This issue is treated in spatial econometrics.

        Note that I do not refer to xtregar because of hesitations concerning its asymptotic propertiies.

        Comment


        • #5
          Alessandro:
          as an aside to Eric's excellent teaching-note, 7 regressors are probably OK. However, before the quantitative aspect, the qualitative feature of those estimators is woth considering: do they support a fair and true view of the data generating process under investigation?
          As far as the panel model is concerned, I would take a look at the literature in your research field and see wht others did in the past when preented with the same research topic.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Eric, thanks a lot for the comment. However, that was my starting point (my lecture notes indeed) and I was feeling a bit lost in trying to conciliate these knowledge with the data that I have. In fact, for the panel of 22 countries for 13 years (cause I removed years where I did not have information on two key variables) the notes that I was reviewing were not very quite conclusive. As far as I know, OLS is the only estimator that have known and favourable small-sample properties, but I am not 100% sure about other estimators. Hence, I employed a OLS within group estimator as baseline and I was wondering, given the data structure that I have, if there is anything else to do.

            Carlo, thanks again, that is indeed a good point. Unfortunately I am introducing some econometric analysis in a sociology study, hence the reference literature misses all the technicality that I am asking to the forum. Concerning the qualitative side: the estimates are very interesting, and I believe that the "literature" would be quite intrigued by my results. That is why I want to be very prudent in computing the estimates.

            Comment


            • #7
              Alessandro:
              good habit to be cautious if you'opening a new research field (by the way, there's an interesting paper about "the economics of more research is needed" https://academic.oup.com/ije/article/30/4/771/705915).
              That said, removing observations if the missing values are not missing completely at random can capture reviewers' attention.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thanks Carlo, I found the paper quite interesting.
                For the new research field, I am just supporting, with some quantitative elements, some social policies claims. So, thanks for the concern (highly appreciated) but unfortunately the real problem here is with small sample estimators.

                Comment


                • #9
                  Alessandro:
                  the issue rests on the fact that the most supportive literature in panel data regression focuses on N>T.
                  As Eric wisely pointed out, there's no hard and fast rule to decide, if T approaches N, when it is the right time to switch to long panel (ie, T>N) estimators,
                  That's why I previously recommended to skim through the literature in your research field and see if it provides you with some methodogical hints for your panel data regression.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    I guess that is a good point, thanks again Carlo.

                    Have a nice day,
                    Alessandro

                    Comment


                    • #11
                      I am conducting research on the impact of new technology on employment in developing countries.
                      I have N= 33 (Individual dimension Number of country) and T= 10 (Time period from 2010 to 2020).

                      when I use the SYS-GMM technique (by xtabond2 command under stata) to estimate my model, I find myself with a problem of over-identification as indicated, by the HANSEN test, and the number of instruments exceeds the number of groups (Number of instrument =71 > Number of groups= 33).

                      Please,
                      could you direct me to another estimation (sophisticated) method that can allow to estimate my model ?

                      Can you list me, by ordre, a list of estimators that I can use ?

                      THANK YOU IN ADVANCE

                      Comment

                      Working...
                      X