Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Many thanks Tom, your paper does indeed very specifically address my issue, and will be very helpful in writing up the discussion about these estimation techniques in my article.

    I have some further questions now, regarding your ppmlhdfe command.
    For my project, I divide the world's countries into 2 groups (A and B).
    I also divide trade value into two groups (interesting and not-interesting), so i have two observations for each country-pair and year.
    The question I have is whether trade in the goods of interest are higher (i.e., higher shares) between A->A, A->B, B->A, or B->B.

    For this purpose I create dummies for the interaction (as ppml did not allow factor variables), and drop dummies that I consider baselines (AA and 'not-interesting').
    I then estimate:
    Code:
    ppmlhdfe tradevalueusd AB BA BB interesting intAB intBA intBB, a(expf#year impf#year expf#impf) cluster(expf#impf) nolog
    If I do this in ppml (with time-varying country dummies, but excluding country-pair dummies), I get an estimate for all dummies.
    If I use ppmlhdfe with the same variables, two dummies are dropped because of collinearity, and I cannot figure out why this would be. Do you have any insight?


    Code:
    . ppmlhdfe tradevalueusd AB BA BB interesting intAB intBA intBB if year>2010, a(expf#year impf#year expf#impf) cluster(expf#impf) nolog
    (dropped 181656 observations that are either singletons or separated by a fixed effect)
    warning: dependent variable takes very low values after standardizing (2.1388e-10)
    note: 2 variables omitted because of collinearity: BA BB
    Converged in 20 iterations and 104 HDFE sub-iterations (tol = 1.0e-08)
    
    HDFE PPML regression                              No. of obs      =    142,680
    Absorbing 3 HDFE groups                           Residual df     =     25,452
    Statistics robust to heteroskedasticity           Wald chi2(5)    =    8356.41
    Deviance             =  8.93317e+11               Prob > chi2     =     0.0000
    Log pseudolikelihood = -4.46659e+11               Pseudo R2       =     0.9975
    
    Number of clusters (expf#impf)=    25,453
                             (Std. Err. adjusted for 25,453 clusters in expf#impf)
    ------------------------------------------------------------------------------
                 |               Robust
    tradevalue~d |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              AB |  -.0059766   .0827166    -0.07   0.942    -.1680982     .156145
              BA |          0  (omitted)
              BB |          0  (omitted)
     interesting |   -4.99851   .0725082   -68.94   0.000    -5.140623   -4.856396
           intAB |  -.2686544   .1358391    -1.98   0.048    -.5348942   -.0024146
           intBA |   .0959186   .1709414     0.56   0.575    -.2391204    .4309577
           intBB |  -.0540459   .2374066    -0.23   0.820    -.5193544    .4112625
           _cons |   23.61517   .0165171  1429.74   0.000      23.5828    23.64754
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
       expf#year |       453           0         453     |
       impf#year |       693           3         690     |
       expf#impf |     25453       25453           0    *|
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    Last edited by Jorrit Gosens; 09 Aug 2019, 09:02.

    Comment


    • Hi Jorrit,
      The reason why ppml does not appear to drop any variables is because Stata often drops collinear variables from right to left (it does not know which are the variables that you really care about.) So in this case, these variables that are being dropped by ppmlhdfe are perfectly collinear with your fixed effects; there is actually no ambiguity about it here. You can also change your syntax for ppml so that your FE dummies are to the right of the other variables if you don't believe me.

      To see why this is, consider your country-pair (expf#impf) fixed effect. Note that this absorbs all time- and industry-invariant sources of variation in trade that are specific to each pair. Your AA and AB variables are pair-specific based on how you defined them, and they do not appear to vary by either time or industry. Thus, they cannot be identified independently of the pair fixed effect in this case. That said, it seems like you are mostly interested in the interaction terms, yes? In that case, it does not seem like a problem that AA and AB are not identified.

      Regards,
      Tom

      Originally posted by Jorrit Gosens View Post
      Many thanks Tom, your paper does indeed very specifically address my issue, and will be very helpful in writing up the discussion about these estimation techniques in my article.

      I have some further questions now, regarding your ppmlhdfe command.
      For my project, I divide the world's countries into 2 groups (A and B).
      I also divide trade value into two groups (interesting and not-interesting), so i have two observations for each country-pair and year.
      The question I have is whether trade in the goods of interest are higher (i.e., higher shares) between A->A, A->B, B->A, or B->B.

      For this purpose I create dummies for the interaction (as ppml did not allow factor variables), and drop dummies that I consider baselines (AA and 'not-interesting').
      I then estimate:
      Code:
      ppmlhdfe tradevalueusd AB BA BB interesting intAB intBA intBB, a(expf#year impf#year expf#impf) cluster(expf#impf) nolog
      If I do this in ppml (with time-varying country dummies, but excluding country-pair dummies), I get an estimate for all dummies.
      If I use ppmlhdfe with the same variables, two dummies are dropped because of collinearity, and I cannot figure out why this would be. Do you have any insight?


      Code:
      . ppmlhdfe tradevalueusd AB BA BB interesting intAB intBA intBB if year>2010, a(expf#year impf#year expf#impf) cluster(expf#impf) nolog
      (dropped 181656 observations that are either singletons or separated by a fixed effect)
      warning: dependent variable takes very low values after standardizing (2.1388e-10)
      note: 2 variables omitted because of collinearity: BA BB
      Converged in 20 iterations and 104 HDFE sub-iterations (tol = 1.0e-08)
      
      HDFE PPML regression No. of obs = 142,680
      Absorbing 3 HDFE groups Residual df = 25,452
      Statistics robust to heteroskedasticity Wald chi2(5) = 8356.41
      Deviance = 8.93317e+11 Prob > chi2 = 0.0000
      Log pseudolikelihood = -4.46659e+11 Pseudo R2 = 0.9975
      
      Number of clusters (expf#impf)= 25,453
      (Std. Err. adjusted for 25,453 clusters in expf#impf)
      ------------------------------------------------------------------------------
      | Robust
      tradevalue~d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      AB | -.0059766 .0827166 -0.07 0.942 -.1680982 .156145
      BA | 0 (omitted)
      BB | 0 (omitted)
      interesting | -4.99851 .0725082 -68.94 0.000 -5.140623 -4.856396
      intAB | -.2686544 .1358391 -1.98 0.048 -.5348942 -.0024146
      intBA | .0959186 .1709414 0.56 0.575 -.2391204 .4309577
      intBB | -.0540459 .2374066 -0.23 0.820 -.5193544 .4112625
      _cons | 23.61517 .0165171 1429.74 0.000 23.5828 23.64754
      ------------------------------------------------------------------------------
      
      Absorbed degrees of freedom:
      -----------------------------------------------------+
      Absorbed FE | Categories - Redundant = Num. Coefs |
      -------------+---------------------------------------|
      expf#year | 453 0 453 |
      impf#year | 693 3 690 |
      expf#impf | 25453 25453 0 *|
      -----------------------------------------------------+
      * = FE nested within cluster; treated as redundant for DoF computation

      Comment


      • Ah, I see, thanks. This makes sense now.
        I wouldn't have guessed the different behavior of order of picking dummies.
        I was somehow also confused that one dummy was left in. I dont know why but I somehow figured that I already dropped one of the possible 4 combinations of AA AB BA BB, and therefore all 3 remaining dummies should not be reported if they were perfectly collinear with the country-pair FE. Maybe ti was already too deep into the Friday afternoon to understand that.

        But yes, I am entirely interested in the interaction, and can also say that in this case, the sign and significance level hardly change a bit when including the full set of FE. Maybe this is because the variable of interest is not a country characteristic?

        Thanks a lot in any case, both for the paper/package, and the explanation here. Much appreciated.

        Comment


        • Hi Jorrit,
          Glad to hear it.
          Tom

          Comment


          • Dear Joao,

            Nice to meet you!

            Does it means ppml could be used in almost all estimates? Could it be used in a model that treat FDI flows as the dependent and even have no zero?



            Comment


            • Dear Qiyangfan Feng,

              PPML can be used for any multiplicative model (e.g. gravity equation, Cobb-Dougles). For FDI the problem is that sometimes the data takes on negative values and in those cases PPML is unlikely to be suitable.

              Best wishes,

              Joao

              Comment


              • Dear Joao,

                Thanks for your replay!

                Please forgive my misrepresentation.
                I want to measure the effect of FDI to Economic Growth, and treat E-G as the dependent var ,there is no negative values in them,but negative values for FDI. Can I use ppml to do this? And is it necessery to log the E-G(it has big Std)?

                Comment


                • You should not log the dependent variable in a model estimated by PPML. I do not know the form of the model you want to estimate but PPML is suitable if you have a multiplicative model.

                  Joao

                  Comment


                  • Dear Joao,

                    Many thanks for your reply and pations.

                    I estiblished a theoretical model,and it is as follow : The national endowments(include elementary education、R&D level 、Business environment etl.) would help the countries absorb FDI.So I take national economic growth(GDP per capita) as the dependent variable,take FDI and all the endowments variable as the independent variables. And the main independent variables are the Interaction item of FDI and endowment. I think a liner equation may not be suitble for it.So i decide to use ppmlhdfe model. I even do not know Is it correct. Many sugestion?

                    Thanks Joao !

                    Comment


                    • Hello Joao

                      Comment


                      • Hello

                        I have some data on grants to geographic units. About 125 entities to 3134 counties over a 8-year period. About 90% of the data is zero. Would PPML work in this instance?
                        My variables are as follows:

                        totalgrants - grants from entity X to county C in year t
                        distance - geographic distance from X to C
                        comm_normalized - normalized community score of entity X
                        population, MedianHouseholdIncome, gini, unemployment, socioecogrants - these are all stuff to apply to county C at time t
                        size roa cashratio - stuff that applies to entity X at time t
                        year* - year fixed effects (not sure if I should have them)

                        pair = pair of entity X and county C

                        I am running the following: ppml totalgrants distance comm_normalized population MedianHouseholdIncome gini unemployment socioecogrants size roa cashratio year*, cluster(pair);

                        In particular, I have the following warnings when I run PPML:

                        WARNING: totalgrants has very large values, consider rescaling
                        WARNING: population has very large values, consider rescaling or recentering
                        WARNING: gini has very large values, consider rescaling or recentering
                        WARNING: unemployment has very large values, consider rescaling or recentering
                        WARNING: socioecogrants has very large values, consider rescaling or recentering
                        WARNING: roa has very large values, consider rescaling or recentering

                        Number of regressors excluded to ensure that the estimates exist: 0
                        Number of observations excluded: 0

                        ... the regression runs but estimates differ quite massively from OLS, Hausman Taylor with log (1+totalgrants). Plus it comes with this warning:

                        WARNING: The model appears to overfit some observations with totalgrants=0

                        Any tips?

                        Comment


                        • Dear Umar Boodoo,

                          Yes, PPML should work in this case and it is not surprising that the results are different because using log(1+totalgrants) as the dependent variable leads to meaningless results. I suggest you use the command ppmlhdfe rather than ppml.

                          Best wishes,

                          Joao

                          Comment


                          • Dear Joao Santos Silva

                            I would like to analyse the effect of education on innovation. I have a panel data set with over 100 countries for around 20 years. My dependent variable is approximated by adjusted patent applications per country. Since this is not normally distributed, but non-negative and right-skewed (almost as most trade data if I am correct), I wanted to use the PPML estimator. I am unsure about the model specification though. The gravity equation in trade is a multiplicative model. However, I assume that my model is additive. My plan was to define it as follows: Patent = exp(b1*education + b2*controls + ... + e), so without logging the independent variables. Is this possible? Or is PPML estimation only applicable for multiplicative models and therefore with logged variables?

                            Best wishes,
                            Etienne

                            Comment


                            • Dear Etienne Jenni

                              You can estimate a model like that with PPML, but note that it is still a multiplicative model because you have the exponential function on the right-hand side.

                              Best wishes,

                              Joao

                              Comment


                              • Dear Joao

                                Ah indeed, the overall specification is still multiplicative, just the term in the exponential function is additive I guess. Great, thank you for your help!

                                Best wishes,
                                Etienne

                                Comment

                                Working...
                                X