Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possibility of adopting System GMM (xtabond2) for a panel data of N=30 and T=18

    Dear Statalist members,

    I have been struggling with a panel data with N=30 and T=18. My professor suggests me to adopt System GMM (xtabond2) for this panel data, but many materials have said that 'N=30 and T=18' is 'too small N and too large T' for System GMM.

    My basic model is as below: (y is dependent variable; x1 is endogenous variable)

    xtabond2 y L.y x1 x2 x3 x4 x5, gmm(L.y x1, lag(2 3) collapse) iv(x2 x3 x4 x5, eq(level)) iv(x2 x3 x4 x5, eq(diff)) robust twostep small

    Could this model with my panel data of 'N=30 and T=18' obtain a trustable result? If not, would you please correct my model? Should I adopt Difference-GMM instead or one-step System GMM?

    Thanks in advance.

    from Andy in labyrinth of System GMM
    Last edited by Andy Oh; 12 Jun 2025, 00:55.

  • #2
    If you want to use GMM with these dimensions (either system or difference GMM), then you need to carefully ensure to keep the number of instruments very small: use collapsing and restrict the lag length as much as possible. These estimators are designed for large N; their small-N performance can be quite poor. I would also recommend to only use the one-step GMM estimator, which is asymptotically inefficient, but with N=30 you are very far from asymptopia anyway. The two-step estimator would probably suffer substantially from a poorly estimated weighting matrix. Moreover, you might want to adopt some stricter assumptions (e.g., strict exogeneity for some of your regressors) to gain instrument strength.

    Note that by specifying iv(x2 x3 x4 x5, eq(level)), you are effectively assuming that those variables are uncorrelated with the unobserved group-specific effects, which is akin to a random-effects assumption.

    If you are happy to assume that all regressors (besides the lagged dependent variable) are strictly exogenous, a better alternative would be a bias-corrected estimator; see the reference below.

    More on dynamic panel data estimation in Stata:
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Thanks for your kind advice, Dr. Kripfganz. N=30 seems to be surely too small for GMM. But in spite of this limit, if I have to do with this N=30 & T=18 panel data, should I adopt difference GMM using one-step GMM estimator?

      And if I can increase the size of N from 30 to 50 (however, in tis case about 53% of the dependent viarables are 0), is it possible to adopt system GMM with two-step GMM estimator?

      Comment


      • #4
        The system GMM estimator is more demanding than the difference GMM estimator in terms of assumptions and data requirements, but the difference GMM estimator might be more biased under some constellations (which we cannot observe). A priori, it is not entirely clear whether the difference GMM estimator should be preferred when N is very small, although I am tempted to say 'yes'.

        Increaing N of course always helps. In any case, you should make sure to keep the number of instruments small. If your depending variable has many 0s, the question arises if this is due to some type of selection, which could create other biases in the estimation.
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Thank you again for your kind advice, Dr. Kripfganz.

          Panel data of N=50 has far more 0 as dependent variable.

          The dependent variable in the panel data I have created is the log of foreign direct investment (lnFDIkrj) made by Korean investors in an African country (denoted as Country j) from 2006 to 2023, and the key explanatory variable is the official development assistance (lnODAkrj) provided by Korea to Country j from 2006 to 2023.

          1) Panel data of N=30 and T=18
          N=30 : 30 African sovereign countries that have received FDI from Korea for at least 3 years
          - The number of total observations is 30 x 18 (from 2006 to 2023) =540
          * Kimura & Todo (2010), a precedent study dealing with a similar theme, has chosen this way.
          - 178 of 540 obs. (32.96%) have 0 as dependent variables
          - In this panel data, lnFDIkrj is stationary.

          2) Panel data of N=50 and T=18
          N=50 : All 50 African sovereign countries minus 3 countries whose data are unavailable
          - The number of total observations is 50 x 18 (from 2006 to 2023) =900
          - 529 of 900 obs. (58.78%) have 0 as dependent variables
          - In this panel data, lnFDIkrj exhibits cross-sectional dependence (CSD) and is nonstationary; the first difference of lnFDIkrj is stationary.

          Comment


          • #6
            A lot of 0s for a variable in logs raises some serious eyebrows. This looks like a coding error. It is hard to believe that there are lots of observations (before taking logs) of FDI with a value of 1. If FDI was 0 before taking logs, this would result in a missing observation after taking logs. I am afraid this is a question beyond the topic of dynamic panel data estimators. You would need to check how this is typically handled in the respective literature on FDI investment.
            https://www.kripfganz.de/stata/

            Comment


            • #7
              I added 1 to all FDI amounts (in thousands of US dollars) because actually there are 529 obs of 0 for FDI in the panel data of N=50; Korean investors have never invested in about 20 African countries from 2006 to 2023.

              I consider changing the dependent variable from 'FDI from Korea to Country j' to 'FDI from Korea to Country j + export of goods from Country j to Korea'. If then, the number of dependent variable '0' decreases to 14.

              P.S. The result of CD test shows that several variables in my panel data have cross-sectional dependence (CDS). My colleagues warned me that using GMM on such a panel data with a lot of CDS will lead to a biased result. They recommended me to adopt xtdcce2 (Ditzen, 2016). Should I adopt xtdcce2 instead of GMM?
              Last edited by Andy Oh; 15 Jun 2025, 07:28.

              Comment


              • #8
                Personally, I would be worried about sample selection issues if you are including all the observations with an FDI of 0. The model might not be representative for those observations.

                There are limits to what you can do with such small N and small T. Yes, ideally, you would like to account for cross-sectional dependence, but you would need to impose other strong assumptions, such as strict exogeneity of the independent variables. Moreover, T might still be too small to obtain reliable estimates from xtdcce2. Having said this, it might not harm to compare your results from different specifications. If they are similar to each other, this might be somewhat reassuring. If they are very different, it will be hard to decide which are more reliable because both methods have their pros and cons.
                https://www.kripfganz.de/stata/

                Comment


                • #9
                  I considered sample selection of including 'African countries that have received FDI from Korean investors for more than 7 consecutive years between 2006 and 2023' in case of using gmm(L.y, lag(2 6) collapse). But this selection decreases the size of N. In the panel data of N=30 and T=18, 178 of 540 obs. (32.96%) have 0 as dependent variables. Is this also far from being enough? As for the strict exogeneity, I have done DWH test for all the independent variables and the results were 'exogenous'.

                  Comment


                  • #10
                    I am afraid, this becomes more of a question related to the empirical FDI literature. In my view, your model cannot explain those 0s. I understand that leaving them out reduces the sample size much further, but there is no gain from increasing the sample size artifically with those questionable observations. Your best chance might be to resort to a very simple IV estimator, possibly just-identified (i.e., as many instruments as regressors).
                    https://www.kripfganz.de/stata/

                    Comment


                    • #11
                      Do you mean that using as many instruments as regressors might be the best choice? Not keeping the number of instruments small? Frankly I worry if I should change the whole panel data... these are the two panel data - 30 countries and 50 countries. Would you please indicate their problems?
                      Attached Files

                      Comment


                      • #12
                        You can still try a small degree of overidentification, but I would recommend sticking to the one-step estimator.

                        Additionally, you might want to consider estimation by Poisson pseudo-maximum likelihood (PPML), which can accommodate the many zeros you have.

                        I am afraid, I am unable to provide more detailed help.
                        https://www.kripfganz.de/stata/

                        Comment

                        Working...
                        X