Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ppmlhdfe in panels: predicted values and the adding up problem

    Dear everyone,

    I am trying to obtain predicted values for a trade gravity equation from a panel of bilateral trade relations. For my application it is important that the sum of overall trade in the predictions corresponds exactly to the overall sum of trade in the original data.
    I consulted Arvis, J-F. and Shepherd, B. (2013) The Poisson quasi-maximum likelihood estimator: a solution to the ‘adding up’ problem in gravity models. Applied Economics Letters. (link below) They find that PPML does in fact preserve the overall sum of trade and is furthermore the only estimator to do so.

    I am using ppmlhdfe to estimate the gravity equation. My code is:
    ppmlhdfe tradevalue ln_dist_air $dist_inds, abs(i.importercode#i.year i.exportercode#i.year) d vce(robust)
    where tradevalue is levels of bilateral trade, ln_dist_air is log of great-circle distanceand $dist_inds area number of distance indicator variables. I include importer-year and exporter-year effects to account for multilateral resistance and country sizes in each period. I have several questions:
    1. Will the predicted values obtained from running predict, xb be predicted trade in logs or in levels? From the lin-log specification of PPML generally I would expect them to be in levels.
    2. The sums of my predicted results (both for logs and levels) overall, by year as well as by year and exporter are far from their corresponding sums in the observed data. That is although from eq(10) or eq(11) in Arvis and Shepherd (2013) (i.e. the FOC for the log-likelihood in PPML) it becomes clear that including year-importer fixed effects should render these sums equal for each exporter in each year. The total sum of all predicted to all observed trade should also be equal.

      Following my estimation I run:
    predict pr_grav, xb
    egen check = total(pr_grav)
    egen check2 = total(tradevalue)
    gen check3 = check / check2
    For the ratio of the sums Check2 I obtain a value of 0.039, where I would expect it to be precisely 1. That implies that overall trade is smaller than observed trade by a factor of 25.

    Similarly, for the sum of importer-year trade I run
    egen check_y1 = total(pr_grav), by(year importer)
    egen check_y2 = total(tradevalue), by(year importer)
    gen check_y3 = check_y1 / check_y2
    Again, I obtain arbitrary values, most of which range from 0.02 to 56 (implying that the sum of predicted values are off by up to a factor 56).
    My core question now is: Does absorbing the fixed effects in ppmlhdfe render the result by Arvis and Shepherd invalid in some way or am I making a grave mistake?

    Thank you for your help
    Last edited by Daniel Prosi; 09 Jul 2020, 08:25.

  • #2
    Additional note: Running the simple ppml command for one year of the data and then computing the above ratios returns the expected ratios of 1:
    Namely running

    ppml tradevalue ln_dist_air $dist_inds _IM* _EX* if year == 2010

    yields the expected ratio of 1. _IM* and _EX* are importer end exporter dummies generated via the xi command.
    (I noted that predicted values from ppml return tradevalue in logs, so I had to take the exponential. It seemed to me that ppmlhdfe predicts level values, however. This doesn't make much sense for me but I accpet it for the moment. I did all calculations with predicted and exponentials of predicted values).

    Running the same analysis with ppmlhdfe generated the puzzling results that are different from 1.


    ppmlhdfe tradevalue ln_dist_air $dist_inds if year == 2010, abs(importeriso exporteriso) d

    Comment


    • #3
      Dear Daniel Prosi,

      Note that PPML does not return trade values in logs, but allows you to predict either the expectation of the dependent variable or the linear index (which you call trade in logs). However, I believe that by default it actually predicts the expected value of trade and the sum of that equals the sum of trade.

      For ppmlhdfe you probably need to safe the fixed effects and incorporate them in the predictions; please check the help file.

      Best wishes,

      Joao

      Comment


      • #4
        Thank you Joao Santos Silva , that actually makes a lot of sense. Sorry for confusing terminology about predictions. Adding the saved sum of fixed effects resolves the problem.

        I hope that I am right to assume that for the ppmlhdfe model above the correct interpretation of the predicted value + the sum of fixed effects variable would be the RHS of the gravity equation in logs. So taking the exponential of that should be the best model prediction for any bilateral trade flow (of course this is an expected value, as we obtain a fully connected network of trade where the real network is sparse).

        Comment


        • #5
          Originally posted by Daniel Prosi View Post
          Dear everyone,

          I am trying to obtain predicted values for a trade gravity equation from a panel of bilateral trade relations. For my application it is important that the sum of overall trade in the predictions corresponds exactly to the overall sum of trade in the original data.
          I consulted Arvis, J-F. and Shepherd, B. (2013) The Poisson quasi-maximum likelihood estimator: a solution to the ‘adding up’ problem in gravity models. Applied Economics Letters. (link below) They find that PPML does in fact preserve the overall sum of trade and is furthermore the only estimator to do so.

          I am using ppmlhdfe to estimate the gravity equation. My code is:
          ppmlhdfe tradevalue ln_dist_air $dist_inds, abs(i.importercode#i.year i.exportercode#i.year) d vce(robust)
          where tradevalue is levels of bilateral trade, ln_dist_air is log of great-circle distanceand $dist_inds area number of distance indicator variables. I include importer-year and exporter-year effects to account for multilateral resistance and country sizes in each period. I have several questions:
          1. Will the predicted values obtained from running predict, xb be predicted trade in logs or in levels? From the lin-log specification of PPML generally I would expect them to be in levels.
          2. The sums of my predicted results (both for logs and levels) overall, by year as well as by year and exporter are far from their corresponding sums in the observed data. That is although from eq(10) or eq(11) in Arvis and Shepherd (2013) (i.e. the FOC for the log-likelihood in PPML) it becomes clear that including year-importer fixed effects should render these sums equal for each exporter in each year. The total sum of all predicted to all observed trade should also be equal.

            Following my estimation I run:
          predict pr_grav, xb
          egen check = total(pr_grav)
          egen check2 = total(tradevalue)
          gen check3 = check / check2
          For the ratio of the sums Check2 I obtain a value of 0.039, where I would expect it to be precisely 1. That implies that overall trade is smaller than observed trade by a factor of 25.

          Similarly, for the sum of importer-year trade I run
          egen check_y1 = total(pr_grav), by(year importer)
          egen check_y2 = total(tradevalue), by(year importer)
          gen check_y3 = check_y1 / check_y2
          Again, I obtain arbitrary values, most of which range from 0.02 to 56 (implying that the sum of predicted values are off by up to a factor 56).
          My core question now is: Does absorbing the fixed effects in ppmlhdfe render the result by Arvis and Shepherd invalid in some way or am I making a grave mistake?

          Thank you for your help
          Dear Daniel,

          I can clarify that ppmlhdfe is compatible with predict. If you use predict with the "mu" option you will get the expected trade flow value. Note you need to add a "d" in your options syntax when you estimate with ppmlhdfe to make this possible. ppmlhdfe will give you a reminder about this if you try to use predict without it.

          Another thing that concerns me though is that you are using factor variables to create the fixed effects. If you want exporter-time and importer-time fixed effects you need only put "abs(importercode#year exportercode#year)" not "abs(i.importercode#i.year i.exportercode#i.year)". The latter may be much slower.

          "xb" is what it sounds like: the b's are your estimated coefficients and the x's are your covariates. Hence, xb = x1 * b1 + x2 * b2 + (...) Note there is no such "adding up" property involving xb.

          Yes you are correct that if you add the predicted xb and fixed effects values together and then take the exponent you should get the predicted trade value. But this is actually not necessary...

          Regards,
          Tom
          Last edited by Tom Zylkin; 09 Jul 2020, 17:43.

          Comment


          • #6
            Thank you Tom Zylkin . Again, a very helpful remark.

            Comment


            • #7
              Hi Tom Zylkin Thanks for you help here, much appreciated. Following your suggestion I was trying to edit my code for predicting expected trade value. Am I writing correctly the command:
              ppmlhdfe tradeflow_gdp lnpopr lnpopp border_lnpopr border_lnpopp , a(iso3r#year iso3p#year, save) standardize_data(0) d cluster(pair) nolog
              predict fittedxtreg4, xb
              predict stdpred_fitxtreg4, stdp
              gen ptrade4=exp(fittedxtreg4)

              Comment


              • #8
                Tom Zylkin Please disregard the above post

                Comment


                • #9
                  Hi Tom Zylkin Thanks for you help here, much appreciated. Following your suggestion, I was trying to edit my code for predicting expected trade value. Am I writing correctly the command:

                  ppmlhdfe tradeflow lnpopr lnpopp border_lnpopr border_lnpopp , a(iso3r#year iso3p#year, save) standardize_data(0) d cluster(pair) nolog

                  predict fitppmlhdfe, mu
                  predict stdpred_fitppmlhdfe, stdp
                  gen ptrade=exp(fitppmlhdfe)

                  Comment


                  • #10
                    Originally posted by Farhad Russell View Post
                    Hi Tom Zylkin Thanks for you help here, much appreciated. Following your suggestion, I was trying to edit my code for predicting expected trade value. Am I writing correctly the command:

                    ppmlhdfe tradeflow lnpopr lnpopp border_lnpopr border_lnpopp , a(iso3r#year iso3p#year, save) standardize_data(0) d cluster(pair) nolog

                    predict fitppmlhdfe, mu
                    predict stdpred_fitppmlhdfe, stdp
                    gen ptrade=exp(fitppmlhdfe)
                    Dear Farhad,
                    I think "predict fitppmlhdfe,mu" should give you predicted trade here. Is ptrade intended to give you the standard error of the prediction here? I don't think that part is right.
                    Regards,
                    Tom

                    Comment


                    • #11
                      Hi Tom, Many thanks for your reply. In this regression I am trying to predict the trade share in GDP with 'ptrade`. What I understood that I only need to use "predict fitppmlhdfe,mu" to find the predicted trade share, and disregard other. I hope you find the options I use after the regression command are right.
                      Thanks again and best regards,
                      Farhad.

                      Comment


                      • #12
                        Originally posted by Farhad Russell View Post
                        Hi Tom, Many thanks for your reply. In this regression I am trying to predict the trade share in GDP with 'ptrade`. What I understood that I only need to use "predict fitppmlhdfe,mu" to find the predicted trade share, and disregard other. I hope you find the options I use after the regression command are right.
                        Thanks again and best regards,
                        Farhad.
                        Hi Farhad,
                        If ptrade is supposed to be the predicted trade share in GDP, then just take predicted trade ("fitppmlhdfe") and divide by GDP. Predict, stdp is usually for obtaining the standard error of the prediction. Though for ppmlhdfe I believe it instead gives you the standard error of the linear predictor (i.e., xb) rather than of the predicted mean (which would be e^(xb+fes)).
                        Regards,
                        Tom

                        Comment


                        • #13
                          Good day all,
                          Thank you for this thread that helps me in predicting. However, I have some inquiries, please.

                          Following the thread and advices that Joao Santos Silva and Tom Zylkin have provided in #3 and #4 by including d in option syntax on ppmlhdfe codeto include the fixed effect in predictions.
                          I run ppmlhdfe to estimate the attraction constrained gravity model as follows:
                          PHP Code:
                          ppmlhdfe Flow_ij lnTEU_i lnDistance_ij d_Rail_ij d_Redsea_i lnGasolinePrice_t lnBunkerRate_i lnFreightRate_t i.Province_jvce(cluster IDd(newvar1
                          Where:
                          Flow_ij is freight flow between port i and province j.
                          lnDistance_ij is the log of distance between port i and province j.
                          d_Rail_ij d_Redsea_i are two dummies for Rail availability and port location.
                          I also included the fixed effect of i.Province_j to include the unobserved effect of province j.

                          These are the model estimates I got:
                          PHP Code:
                          ppmlhdfe Flow_ij lnTEU_i lnDistance_ij d_Rail_ij d_Redsea_i lnGasolinePrice_t lnBunkerRate_i lnFreightRate_t i.Province_jvce(cluster IDd(newvar
                          1)
                          Iteration 1:   deviance 1.8240e+07  eps = .         iters 1    tol 1.0e-04  min(eta) =  -3.62  P   
                          Iteration 2
                          :   deviance 1.1142e+07  eps 6.37e-01  iters 1    tol 1.0e-04  min(eta) =  -5.39      
                          Iteration 3
                          :   deviance 1.0150e+07  eps 9.78e-02  iters 1    tol 1.0e-04  min(eta) =  -6.67      
                          Iteration 4
                          :   deviance 1.0099e+07  eps 5.08e-03  iters 1    tol 1.0e-04  min(eta) =  -7.12      
                          Iteration 5
                          :   deviance 1.0098e+07  eps 3.11e-05  iters 1    tol 1.0e-04  min(eta) =  -7.16      
                          Iteration 6
                          :   deviance 1.0098e+07  eps 4.21e-09  iters 1    tol 1.0e-05  min(eta) =  -7.16   S O
                          ------------------------------------------------------------------------------------------------------------
                          (
                          legendpexact partial-out   sexact solver   hstep-halving   oepsilon below tolerance)
                          Converged in 6 iterations and 6 HDFE sub-iterations (tol 1.0e-08)

                          PPML regression                                   Noof obs      =        507
                                                                            Residual df     
                          =         38
                          Statistics robust to heteroskedasticity           Wald chi2
                          (19)   =  267609.61
                          Deviance             
                          =  10098271.74               Prob chi2     =     0.0000
                          Log pseudolikelihood 
                          = -5051290.332               Pseudo R2       =     0.9194

                          Number of clusters 
                          (ID)     =         39
                                                                   
                          (StdErradjusted for 39 clusters in ID)
                          -----------------------------------------------------------------------------------
                                            |               
                          Robust
                                    Flow_ij 
                          |      Coef.   StdErr.      z    P>|z|     [95ConfInterval]
                          ------------------+----------------------------------------------------------------
                                    
                          lnTEU_i |   .5091603   .2050472     2.48   0.013     .1072751    .9110455
                              lnDistance_ij 
                          |  -.5966364   .0645599    -9.24   0.000    -.7231714   -.4701014
                                  d_Rail_ij 
                          |   1.397236   .2703799     5.17   0.000     .8673009    1.927171
                                 d_Redsea_i 
                          |   .8889462   .4633548     1.92   0.055    -.0192124    1.797105
                          lnGasolinePrice_t 
                          |  -.2135134   .0688799    -3.10   0.002    -.3485155   -.0785112
                             lnBunkerRate_i 
                          |   .2990565   .1514294     1.97   0.048     .0022604    .5958527
                            lnFreightRate_t 
                          |  -.5772155   .1585844    -3.64   0.000    -.8880352   -.2663958
                                            
                          |
                                 
                          Province_j |
                                       
                          bah  |  -2.607147   .2788131    -9.35   0.000    -3.153611   -2.060684
                                       epr  
                          |    .187406   .3250745     0.58   0.564    -.4497284    .8245404
                                       jaz  
                          |   -.711751   .3230449    -2.20   0.028    -1.344907   -.0785947
                                       jof  
                          |  -1.590804   .3332172    -4.77   0.000    -2.243898   -.9377107
                                       mad  
                          |   .0443517   .2926633     0.15   0.880    -.5292579    .6179613
                                       mkk  
                          |    -.84163   .3586469    -2.35   0.019    -1.544565   -.1386951
                                       nai  
                          |  -.9154942   .3373447    -2.71   0.007    -1.576678   -.2543107
                                       naj  
                          |  -1.791735   .3549225    -5.05   0.000     -2.48737     -1.0961
                                       nbr  
                          |  -1.853444    .797542    -2.32   0.020    -3.416597   -.2902903
                                       qas  
                          |   .2175232   .5026922     0.43   0.665    -.7677355    1.202782
                                       riy  
                          |   1.145969   .2279948     5.03   0.000     .6991075     1.59283
                                       tab  
                          |   -.910621   .3442817    -2.64   0.008    -1.585401   -.2358412
                                            
                          |
                                      
                          _cons |   12.32115   1.553988     7.93   0.000      9.27539    15.36691
                          ----------------------------------------------------------------------------------- 

                          Thereafter, I predicted Flow_ij using predict code as follows:
                          PHP Code:
                          predict pr_Flow_ijmu 
                          However,
                          1. the results I got in ( pr_Flow_i ) are very different from the actual ones I have in ( Flow_ij )
                          2. The deviance in the estimated model is very high (as can be seen below in the outcomes of ppmlhdfe), is this the reason for the big difference?
                          3. What I am doing wrong that yielded this large gap?
                          4. Please let me know if more info is needed to clarify the issue.

                          Thank you,
                          Hussain

                          Comment


                          • #14
                            Hi Hussain,
                            The syntax you are using for predict is correct. If you want to do a quick check, input

                            HTML Code:
                            sum Flow_ij pr_Flow_ij
                            Both variables should have the same mean value.

                            Regarding your other questions, it's not clear that you've done something "wrong" per se. In general, any model is estimated with some error, and the error is generally going to be large for at least a few observations. The only way you are going to reduce the amount of error is to improve the model fit. But you may not necessarily want to do that just for the sake of doing it. For example, you could include port (i) and time (t) fixed effects in addition to province fixed effects, which will absorb all i- and t-specific variation, but this would mean that you will not be able to identify the effects of gasoline price, freight rates, lnTEU, or bunker fit. If these estimates are important to your objective, this is not the direction you want to go in.

                            Regarding the deviance, there's not much to say here because typically we need a baseline for comparison. The deviance of a particular model in isolation is not that interesting to focus on. Another thing to keep aware of is that the deviance is not invariant to the scale of the dependent variable. If you divide all your flow variables by 1000, you will get a different deviance.

                            Finally, there's another way to include the fixed effects in ppmlhdfe. This will be faster to estimate, especially when you have a lot of fixed effects:

                            HTML Code:
                            ppmlhdfe Flow_ij lnTEU_i lnDistance_ij d_Rail_ij d_Redsea_i lnGasolinePrice_t lnBunkerRate_i lnFreightRate_t, a(Province_j) vce(cluster ID) d(newvar1)
                            Hope this is helpful!

                            Regards,
                            Tom





                            Comment


                            • #15

                              Thank you Tom Zylkin for your reply. This is valuable advice. Since these variables are of importance, I will not you any additional FE. Thus, I will just stick to the FE of the province (i.province_j).

                              By comparing the mean, I found that predicted and actual dependent variables have the same mean.

                              I will keep going on the analysis and let you know if I encounter any issues.

                              Best regards,
                              Hussain

                              Comment

                              Working...
                              X