Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • predict command for poi2hdfe

    Dear all,

    I have a problem with a command and have not been able to solve it since I'm not a stata expert and new in the stata world. I was originally intending to use the PPML command to run a regression, including trade flows of Iran as my dependent variable, and some other variable as the regressors. I wanted to use three fixed effects, time dummies (year_*) as well as importer and exporter dummies (exp, imp), the reason why the PPML command did not work very well. Instead of that, I used the poi2hdfe command and it worked perfect:

    Code:
    . poi2hdfe import1 loggdpimp loggdpexp logpimp logpexp logdist Dsanc Dlan Dbor year_*, id1(exp) id2(imp)
    
    Dropping exp groups for which import1 is always zeros
    
    Total Number of observations used in the regression -> 9367
    
    Starting Estimation of coefficients
    
    1 dif is -> 263.87923
    2 dif is -> .88462613
    3 dif is -> .78570317
    4 dif is -> .47922415
    5 dif is -> .22449422
    6 dif is -> .09474546
    7 dif is -> .05612353
    8 dif is -> .03423796
    9 dif is -> .01788093
    10 dif is -> .00736692
    11 dif is -> .00281278
    12 dif is -> .0007891
    13 dif is -> .00012874
    14 dif is -> .00003796
    15 dif is -> .00001659
    16 dif is -> 4.337e-06
    17 dif is -> 3.080e-07
    18 dif is -> 1.542e-09
    
    Coefficients converged after 18 reghdfe calls 
    
    
     ******* Poisson Regression with Two High-Dimensional Fixed Effects ********** 
    
                                                    Number of obs     =      9,367
    ------------------------------------------------------------------------------
                 |               Robust
         import1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       loggdpimp |   1.132046   .0885372    12.79   0.000     .9585161    1.305576
       loggdpexp |   1.231915   .0705646    17.46   0.000     1.093611    1.370219
         logpimp |    .578808   .1490937     3.88   0.000     .2865898    .8710263
         logpexp |   .3734077   .2623947     1.42   0.155    -.1408765    .8876919
         logdist |          0  (omitted)
           Dsanc |  -.4244898   .0746442    -5.69   0.000    -.5707897     -.27819
            Dlan |          0  (omitted)
            Dbor |          0  (omitted)
          year_1 |    1.59724    .265999     6.00   0.000     1.075891    2.118588
          year_2 |   1.701557   .2586135     6.58   0.000     1.194684     2.20843
          year_3 |   2.152632   .2441069     8.82   0.000     1.674192    2.631073
          year_4 |   1.421192   .2058134     6.91   0.000     1.017805    1.824579
          year_5 |   3.419411   .2805368    12.19   0.000     2.869569    3.969253
          year_6 |   2.893661    .262118    11.04   0.000     2.379919    3.407403
          year_7 |   2.405252   .2452033     9.81   0.000     1.924662    2.885841
          year_8 |   1.846082   .2246121     8.22   0.000      1.40585    2.286313
          year_9 |   1.710722   .2149643     7.96   0.000     1.289399    2.132044
         year_10 |   1.730028   .2098566     8.24   0.000     1.318717     2.14134
         year_11 |    1.70484   .2138328     7.97   0.000     1.285735    2.123944
         year_12 |   1.434204   .2211158     6.49   0.000     1.000825    1.867583
         year_13 |   1.809136   .2048561     8.83   0.000     1.407626    2.210647
         year_14 |    1.73383   .1990534     8.71   0.000     1.343692    2.123967
         year_15 |   1.728628   .1991251     8.68   0.000      1.33835    2.118906
         year_16 |   1.608781   .1768353     9.10   0.000      1.26219    1.955371
         year_17 |   1.428643   .1627806     8.78   0.000     1.109599    1.747687
         year_18 |   1.261635    .149493     8.44   0.000     .9686343    1.554636
         year_19 |   1.122973   .1241564     9.04   0.000     .8796311    1.366315
         year_20 |   .8207016   .1059854     7.74   0.000     .6129739    1.028429
         year_21 |   .7202783   .1085713     6.63   0.000     .5074824    .9330742
         year_22 |   .6247063   .0959396     6.51   0.000     .4366681    .8127445
         year_23 |   .4285563   .0886198     4.84   0.000     .2548647    .6022478
         year_24 |   .1697519    .081832     2.07   0.038     .0093643    .3301396
         year_25 |  -.0448062   .0988734    -0.45   0.650    -.2385945     .148982
         year_26 |          0  (omitted)
         year_27 |   .1756412   .0947305     1.85   0.064    -.0100273    .3613097
         year_28 |    .062862   .1126554     0.56   0.577    -.1579385    .2836625
         year_29 |  -.0120616   .1212996    -0.10   0.921    -.2498043    .2256812
    ------------------------------------------------------------------------------
    Now, I am intending to predict the potential trade of Iran, clustered by each trade partner. I know there might be a command or subcommand for that, but I was not able to find it. So far I have this information:

    Code:
     predict importhat, xb
    
    . generate exp_importhat = exp(importhat)
    
    . summarize import1 importhat exp_importhat
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
         import1 |      9,512    179.5432    1120.309          0   30332.97
       importhat |      9,512    76.31283    3.932427   31.66138   86.08951
    exp_import~t |      9,512    1.64e+35    1.15e+36   5.63e+13   2.44e+37
    Obviously I am missing the potential trade for each of countries trading with Iran. Does any body have a suggestion for me? I would appreciate any comments, replies. Thank you.

    Homa

  • #2
    You didn't get a quick answer. Part of the problem is using a user-written routine - unless someone has used that estimator and prediction, they may not be able to help you easily. I also don't get a clear idea of your data structure. The FAQ on asking questions asks for sample program in code delimiters, readable Stata output, and sample data using dataex. You may also be losing a pile of observations by using logs which make zeros into missing. In some cases, predictions don't work as well when you have a bunch of unestimated parameters. I also don't understand your statement "I am missing the potential trade..." - how do you derive this conclusion?

    Comment


    • #3
      Dear Phil Bromiley ,

      thank you for your response. I try to clarify the issue better: My data set includes 165 exporters and 165 importers. Since the country of my interest is Iran, my dataset includes all possible trade partners of Iran, once as importer and once as exporter. So with 165 countries, being twice considered in my data, for 29 years, I do have 9512 observations. So I do not have "country pairs" for all of the countries, but just for Iran and it's trade partner.
      I've contacted the authors of the command and got helpful comments on my issue. I had to run a new command, called ppmlhdfe and got this result:

      Code:
       ppmlhdfe import loggdpimp loggdpexp logpimp logpexp logdist Dsanc Dlan Dbor, absorb(exp imp year)
      (dropped 145 observations that are either singletons or separated by a fixed effect)
      note: 3 variables omitted because of collinearity: logdist Dlan Dbor
      Iteration 1:   [p ] deviance = 1.384e+12                  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =   6.73
      Iteration 2:   [  ] deviance = 7.149e+11  eps = 9.35e-01  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =   5.09
      Iteration 3:   [  ] deviance = 5.654e+11  eps = 2.64e-01  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =   3.40
      Iteration 4:   [  ] deviance = 5.360e+11  eps = 5.49e-02  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =   2.09
      Iteration 5:   [  ] deviance = 5.291e+11  eps = 1.31e-02  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =   1.07
      Iteration 6:   [p ] deviance = 5.272e+11  eps = 3.46e-03  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =   0.15
      Iteration 7:   [  ] deviance = 5.268e+11  eps = 8.81e-04  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =  -0.66
      Iteration 8:   [  ] deviance = 5.267e+11  eps = 2.15e-04  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =  -1.36
      Iteration 9:   [  ] deviance = 5.266e+11  eps = 4.85e-05  itol = 1.0e-04 subiters = 6                   min(eta
      > ) =  -1.94
      Iteration 10:  [  ] deviance = 5.266e+11  eps = 9.76e-06  itol = 1.0e-05 subiters = 6                   min(eta
      > ) =  -2.85
      Iteration 11:  [p ] deviance = 5.266e+11  eps = 1.57e-06  itol = 1.0e-06 subiters = 6                   min(eta
      > ) =  -3.63
      Iteration 12:  [  ] deviance = 5.266e+11  eps = 2.08e-07  itol = 1.0e-06 subiters = 6                   min(eta
      > ) =  -4.16
      Iteration 13:  [ s] deviance = 5.266e+11  eps = 3.78e-08  itol = 1.0e-07 subiters = 6                   min(eta
      > ) =  -5.05
      Iteration 14:  [ps] deviance = 5.266e+11  eps = 9.22e-09  itol = 1.0e-08 subiters = 6                   min(eta
      > ) =  -5.78
      Iteration 15:  [ps] deviance = 5.266e+11  eps = 1.53e-09  itol = 1.0e-09 subiters = 6                   min(eta
      > ) =  -6.23
      (legend: p = exact partial-out    s = exact solver)
      
      Converged in 15 iterations and 90 HDFE sub-iterations (tol = 1.0e-08)
      
      HDFE PPML regression                              No. of obs      =      9,367
      Absorbing 3 HDFE groups                           Residual df     =      9,011
                                                        Wald chi2(5)    =     719.13
      Deviance             =  5.26631e+11               Prob > chi2     =     0.0000
      Log pseudolikelihood = -2.63316e+11               Pseudo R2       =     0.9426
      ------------------------------------------------------------------------------
                   |               Robust
            import |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
         loggdpimp |   1.132046   .0883858    12.81   0.000     .9588128    1.305279
         loggdpexp |   1.231915    .070444    17.49   0.000     1.093848    1.369983
           logpimp |    .578808   .1488388     3.89   0.000     .2870894    .8705267
           logpexp |   .3734077   .2619461     1.43   0.154    -.1399972    .8868126
           logdist |          0  (omitted)
             Dsanc |  -.4244898   .0745165    -5.70   0.000    -.5705396   -.2784401
              Dlan |          0  (omitted)
              Dbor |          0  (omitted)
             _cons |  -59.66482   5.695763   -10.48   0.000    -70.82831   -48.50133
      ------------------------------------------------------------------------------
      
      Absorbed degrees of freedom:
      -----------------------------------------------------+
       Absorbed FE | Categories  - Redundant  = Num. Coefs |
      -------------+---------------------------------------|
               exp |       160           0         160     |
               imp |       165           2         163     |
              year |        29           1          28    |
      -----------------------------------------------------+

      So I do have the coefficients for the gravity equation. What I want to do now is to predict the potential trade between Iran and each of its partners. It means I want the potential trade categorised by the ISO codes of the countries.

      Comment


      • #4
        Hi Homa
        One thing that i have tried on a similar reserarch design (and using an alternative version to the poi2hd is to do something like the following:
        st1 estimate the High order fixed effects Poisson (you have done it)
        st2 obtain predictions for the fixed effects (i m not sure about the new command -ppmlhdfe-, but im willing to bet that they already include options to obtain fixed effect predictions.
        st3 re -estimate the model using the standard poisson command adding the predicted fixed effects. -poisson y x1 x2 x3 fe1 fe2 fe3
        st4 obtain the predicted values for trade potential from here.
        Hope this helps.
        Fernando

        Comment


        • #5
          Dear FernandoRios ,

          thank you very much for your comment. Could you please give me an advice how to do the second step? I'm not a stata expert. It would be awesome.

          Comment


          • #6
            Hi Hona,
            As i mentioned, im not sure how to use the ppmlhdfe command. Never used it before. I decided instead to send you a modification i made on the former poi2hd command.
            For the following example, you can download the data from here "https://www.dropbox.com/s/7nn1611kmn6j13t/Example.dta?dl=0"
            Code:
              
            * see that the option fe(f) creates all the fixed effects associated with ex imp and year with a new variable name f_ex f_imp f_year . poihdfe trade DIST CNTG
            . poihdfe trade DIST CNTG LANG CLNY RTA, abs(ex imp year) fe(f)
            
            Total Number of observations used in the regression -> 99981
            
            Starting Estimation of coefficients
            
            1 dif is -> 11399.637
            2 dif is -> .85209645
            3 dif is -> 1.008951
            4 dif is -> .88633553
            5 dif is -> .49950671
            6 dif is -> .18512638
            7 dif is -> .04496204
            8 dif is -> .00745137
            9 dif is -> .00059014
            10 dif is -> 7.095e-06
            11 dif is -> 1.469e-09
            
            Coefficients converged after 11 reghdfe calls 
            
            
             ******* Poisson Regression with N High-Dimensional Fixed Effects ********** 
            
                                                            Number of obs     =     99,98
            > 1
            -----------------------------------------------------------------------------
            > -
                         |               Robust
                   trade |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval
            > ]
            -------------+---------------------------------------------------------------
            > -
                    DIST |  -.0006063   .0000113   -53.61   0.000    -.0006285   -.000584
            > 1
                    CNTG |  -1.200061   .0610184   -19.67   0.000    -1.319654   -1.08046
            > 7
                    LANG |   .0482914   .0756209     0.64   0.523    -.0999228    .196505
            > 7
                    CLNY |  -1.185326    .143854    -8.24   0.000    -1.467275   -.903377
            > 8
                     RTA |  -.9638096   .0693364   -13.90   0.000    -1.099706   -.827912
            > 8
            -----------------------------------------------------------------------------
            > -
            
            . poisson trade DIST CNTG LANG CLNY RTA f_ex f_imp f_year
            note: you are responsible for interpretation of noncount dep. variable
            
            Iteration 0:   log likelihood = -2.008e+08  
            Iteration 1:   log likelihood = -1.919e+08  
            Iteration 2:   log likelihood = -1.919e+08  
            Iteration 3:   log likelihood = -1.919e+08  
            Iteration 4:   log likelihood = -1.919e+08  
            
            Poisson regression                              Number of obs     =     99,981
                                                            LR chi2(8)        =   2.54e+09
                                                            Prob > chi2       =     0.0000
            Log likelihood = -1.919e+08                     Pseudo R2         =     0.8688
            
            ------------------------------------------------------------------------------
                   trade |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    DIST |  -.0006063   3.15e-08 -1.9e+04   0.000    -.0006064   -.0006062
                    CNTG |  -1.200061   .0002624 -4572.59   0.000    -1.200575   -1.199546
                    LANG |   .0482914   .0002806   172.13   0.000     .0477415    .0488413
                    CLNY |  -1.185326   .0004368 -2713.43   0.000    -1.186183    -1.18447
                     RTA |  -.9638096   .0002976 -3238.29   0.000    -.9643929   -.9632262
                   f_exp |          1   .0000498  2.0e+04   0.000     .9999025    1.000098
                   f_imp |          1    .000042  2.4e+04   0.000     .9999176    1.000082
                  f_year |          1   .0001879  5321.51   0.000     .9996317    1.000368
                   _cons |   13.16547     .00008  1.6e+05   0.000     13.16532    13.16563
            ------------------------------------------------------------------------------
            
            . predict trade_hat,
            (option n assumed; predicted number of events)
            
            . sum trade trade_hat
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                   trade |     99,981    3266.005    61861.06          0    4233436
               trade_hat |     99,981    3266.005    59510.32    .000039    5884268
            
            .
            I hope it helps. Fernando
            Attached Files
            Last edited by FernandoRios; 05 Oct 2018, 09:01.

            Comment


            • #7
              Dear FernandoRios

              I am trying to use two-stage estimation from Freeman et al. (2021), "Unlocking new methods to estimate country-specific trade costs and trade elasticties, Drexel working paper series, WP 2021-17".

              X_i,j,t = exp[m_i,t + (P_i,t)^1-sigma + (P_j,t)^ 1-sigma) ] * e_i,j,t (1)


              Since (P_i,t)^1-sigma and (P_j,t)^ 1-sigma) are unobservable importer and exporter multilateral resistance. So Freeman et al. (2021) suggest using two-stage estimation, where in first-stage we run a standard gravity model

              stage-1. Run the standard version of Gravity model

              X_i,j,t = exp(D_i,j + c_i,t + k_j,t ) e_i,j,t (2)


              where D_i,j = {Contiguity, distance, common language, RTA etc.) is a vector of dyadic variables; c_i,t and k_j,t are importer-time and exporter-time FE's used to approximate (P_i,t)^1-sigma and (P_j,t)^ 1-sigma respectively in (1).

              obtain \hat(c_i,t) and \hat(k_i,t) from (2), and then by additive property of PPML, we recover the estimates of IMR [(P_i,t)^1-sigma] and OMR [(P_j,t)^1-sigma] as:


              \hat [(P_i,t)^1-sigma] = Y_i,t / exp[\hat(c_i,t)] * E_0,t / Y_t (3)

              and

              \hat [(P_i,t)^1-sigma] = E_j,t / exp[\hat(k_j,t)] * 1/ E_0,t (4)

              E_0,t
              is GDP of country that has been selected as numaraire, Freeman et al (2021); Anderson et al (2018)

              stage-2. Substituting the estimated values of (P_i,t)^1-sigma and (P_j,t)^ 1-sigma from (3) and (4) in (1), we estimate the following reduced-form model

              X_i,j,t = exp[m_i,t +\hat [(P_i,t)^1-sigma]+ \hat [(P_i,t)^1-sigma] ] * e_i,j,t (5)

              Therefore, I ran the following code to estimate (2) as


              Code:
              cap egen exp = group(iso_e)
              cap egen imp = group(iso_i)
              Code:
                
               ppmlhdfe Trade_Value ln_distwces contig comlang_off colony comcol, absorb(exp#Year imp#Year) d cluster(distwces) nolog 
              From here onwards, I do not know the code to estimate exp#Year (k_j,t) and imp#Year (c_i,t) FE's.
              After that, i need to estimate expressions in (3) and (4). Please help me with to find the estimated MRTs (P_i and P_j) in stage-1. I shall be very thankful.
              I am not a STATA geek and had tried a bit of coding, but i was not able to calculate IMR and OMR, from (3) and (4) respectively, Your help will be greatly appreciated.

              I have been struggling to obtain the estimated values of importer-time (imp#Year or c_i,t) and exporter -time (exp#Year or k_j,t) fixed effects after I run the ppmlhdfe with absorb ( ) option. I need these estimated FE's to obtain the expression (3) and (4), so that i could be able to run the model (5)

              regards,

              (Ridwan)

              Comment

              Working...
              X