Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Dear Joost,

    ppml will struggle to deal with such massive number of dummies; you will need a very fast processor and a lot of memory to be able to do it, assuming that you do not go beyond Stata's limits. For these cases I suggest you try ppml_panel_sg (avaliable form SSC), which should be much faster and also checks for the existence of the estimates. I recommend that you start with a small data set to make sure you get the same results with both commands.

    About the problem with the OLS results, I prefer not to comment on that because the results are not reliable anyway.

    Best wishes,

    Joao

    Comment


    • #62
      Dear Joost,

      Just would like to add small thing to Mr Joao's excellent advice.

      ppml_panel_sg does not allow only importer and exporter fixed effects in the model. The smallest fixed effects it can do is importer-time and exporter-time. This will drop all of your time-variant variables, including output.

      But you can use another command written for the same purpose, poi2hdfe, as mentioned in the Log of Gravity webpage. Type ssc install poi2hdfe.

      Best,
      Dias

      Comment


      • #63
        Dear Joao and Dias,

        Thank you for your helpful and quick advice, I appreciate it.

        I estimated some small subsets of my dataset using the suggestion by Dias (poi2hdfe) and the ppml command, but the ppml command was faster so I am running now regressions with different subsets (where the subsets start with only 1 industry and the last subset contains all 14 industries for intermediate input trade). I did not run regressions using the ppml_panel_sg command because I believe with cross-sectional data I only need to include exporter and importer fixed effects.

        Kind regards,


        Joost.

        Comment


        • #64
          Hi Joao , i wonder if you may help with some doubts that i have with an intra regional gravity model.

          i have a panel with 4 periods and my dependent variable is the total kilograms trade.

          This is the Stata do code and results.

          ppml L_KL_TOTALES_deptos L_PIBtotal2016pr_origen L_PIBtotal2016pr_destino L_Distancia_geodésica L_remoteness_origen L_remoteness_destino frontera_pais_origen Zonas_francas_destino puert
          > o_marítimo_destino puerto_marítimo_origen d_frontera_depto Zonas_francas_origen

          note: checking the existence of the estimates
          WARNING: Zonas_francas_destino has very large values, consider rescaling or recentering
          WARNING: Zonas_francas_origen has very large values, consider rescaling or recentering

          Number of regressors excluded to ensure that the estimates exist: 0
          Number of observations excluded: 0

          note: starting ppml estimation
          note: L_KL_TOTALES_deptos has noninteger values

          Iteration 1: deviance = 400.8257
          Iteration 2: deviance = 400.2885
          Iteration 3: deviance = 400.2885
          Iteration 4: deviance = 400.2885

          Number of parameters: 12
          Number of observations: 2648
          Pseudo log-likelihood: -6217.08
          R-squared: .66772627
          Option strict is: off
          ------------------------------------------------------------------------------------------
          | Robust
          L_KL_TOTALES_deptos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------------------+----------------------------------------------------------------
          L_PIBtotal2016pr_origen | .0666277 .0027448 24.27 0.000 .0612479 .0720074
          L_PIBtotal2016pr_destino | .0717089 .0025906 27.68 0.000 .0666313 .0767865
          L_Distancia_geodésica | -.0539089 .0044969 -11.99 0.000 -.0627226 -.0450952
          L_remoteness_origen | -.0295255 .0104363 -2.83 0.005 -.0499803 -.0090707
          L_remoteness_destino | .0401392 .0090682 4.43 0.000 .0223658 .0579126
          frontera_pais_origen | .0257149 .0067418 3.81 0.000 .0125013 .0389285
          Zonas_francas_destino | .0025479 .0004794 5.31 0.000 .0016083 .0034875
          puerto_marítimo_destino | .0307702 .0074366 4.14 0.000 .0161947 .0453457
          puerto_marítimo_origen | .0699515 .0074102 9.44 0.000 .0554277 .0844752
          d_frontera_depto | .0413737 .0064782 6.39 0.000 .0286767 .0540707
          Zonas_francas_origen | .0055721 .0004222 13.20 0.000 .0047445 .0063997
          _cons | 1.539765 .1010227 15.24 0.000 1.341764 1.737766
          ------------------------------------------------------------------------------------------


          RESET TEST



          . predict u, xb

          . gen u2 = u^2

          . ppml L_KL_TOTALES_deptos L_PIBtotal2016pr_origen L_PIBtotal2016pr_destino L_Distancia_geodésica L_remoteness_origen L_remoteness_destino frontera_pais_origen Zonas_francas_destino puert
          > o_marítimo_destino puerto_marítimo_origen d_frontera_depto Zonas_francas_origen u2

          note: checking the existence of the estimates
          WARNING: Zonas_francas_destino has very large values, consider rescaling or recentering
          WARNING: Zonas_francas_origen has very large values, consider rescaling or recentering

          Number of regressors excluded to ensure that the estimates exist: 0
          Number of observations excluded: 0

          note: starting ppml estimation
          note: L_KL_TOTALES_deptos has noninteger values

          Iteration 1: deviance = 392.748
          Iteration 2: deviance = 391.5719
          Iteration 3: deviance = 391.5718
          Iteration 4: deviance = 391.5718

          Number of parameters: 13
          Number of observations: 2648
          Pseudo log-likelihood: -6212.7217
          R-squared: .67972867
          Option strict is: off
          ------------------------------------------------------------------------------------------
          | Robust
          L_KL_TOTALES_deptos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------------------+----------------------------------------------------------------
          L_PIBtotal2016pr_origen | .2947494 .0355748 8.29 0.000 .225024 .3644748
          L_PIBtotal2016pr_destino | .3167968 .0382707 8.28 0.000 .2417876 .3918059
          L_Distancia_geodésica | -.2412124 .0296801 -8.13 0.000 -.2993844 -.1830404
          L_remoteness_origen | -.1296204 .0187523 -6.91 0.000 -.1663742 -.0928666
          L_remoteness_destino | .1754634 .0233051 7.53 0.000 .1297862 .2211406
          frontera_pais_origen | .1134619 .0151337 7.50 0.000 .0838005 .1431233
          Zonas_francas_destino | .0113557 .0013858 8.19 0.000 .0086395 .0140718
          puerto_marítimo_destino | .1392252 .0177827 7.83 0.000 .1043718 .1740786
          puerto_marítimo_origen | .3095705 .0379202 8.16 0.000 .2352483 .3838927
          d_frontera_depto | .1859924 .0219579 8.47 0.000 .1429558 .229029
          Zonas_francas_origen | .025091 .0029905 8.39 0.000 .0192297 .0309524
          u2 | -.6285101 .0954756 -6.58 0.000 -.8156388 -.4413813
          _cons | 2.184017 .128346 17.02 0.000 1.932464 2.435571
          ------------------------------------------------------------------------------------------


          -----------------------------------------------------------------
          RESULTS OF FGLS ESTIMATOR


          Cross-sectional time-series FGLS regression

          Coefficients: generalized least squares
          Panels: heteroskedastic
          Correlation: no autocorrelation

          Estimated covariances = 662 Number of obs = 2,648
          Estimated autocorrelations = 0 Number of groups = 662
          Estimated coefficients = 12 Time periods = 4
          Wald chi2(11) = 158041.37
          Log likelihood = -3046.99 Prob > chi2 = 0.0000

          ------------------------------------------------------------------------------------------
          L_KL_TOTALES_deptos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------------------+----------------------------------------------------------------
          L_PIBtotal2016pr_origen | .9365632 .0061036 153.45 0.000 .9246004 .9485259
          L_PIBtotal2016pr_destino | 1.144922 .006901 165.91 0.000 1.131396 1.158448
          L_Distancia_geodésica | -.9195397 .0123681 -74.35 0.000 -.9437806 -.8952987
          L_remoteness_origen | -.5537694 .0335578 -16.50 0.000 -.6195415 -.4879974
          L_remoteness_destino | 1.235415 .0236238 52.30 0.000 1.189113 1.281716
          frontera_pais_origen | .4298494 .0238932 17.99 0.000 .3830196 .4766793
          Zonas_francas_destino | .0321848 .0012136 26.52 0.000 .0298061 .0345635
          puerto_marítimo_destino | .2281944 .0180926 12.61 0.000 .1927335 .2636553
          puerto_marítimo_origen | 1.168645 .0224582 52.04 0.000 1.124627 1.212662
          d_frontera_depto | .2722579 .0166337 16.37 0.000 .2396565 .3048593
          Zonas_francas_origen | .0863585 .0006899 125.18 0.000 .0850063 .0877106
          _cons | -4.642567 .3106725 -14.94 0.000 -5.251474 -4.033661
          ---------------------------------------------------------------------------------------



          ---------------------------------------------------------------------------------

          Im worried about the fact that the RESET test its being rejected, should i use the FGLS estimator instead?. What do you think about the performance of that estimator?

          Thank u very much.

          Felipe


          Comment


          • #65
            Dear Felipe,

            First of all, forget the FGLS estimation because that is simply inadequate.

            About your model, I think you should use clustered standard errors. Also, your sample is rather small, but maybe you could try to include the usual "fixed effects".

            Best wishes,

            Joao

            Comment


            • #66
              Dear Joao,

              I am running a pooled OLS, and I want to check robustness so I use PPML method. Total import or export flows do not have any zero but trade by sector level. When I use the total flow of import I got a WARNING to rescale lngdp, two independent variable lngdp_exporter(6 countries) lngdp_importer(1 country). how can I rescale such small coefficient?
              I rescaled then I used again PPML, result shows distance, one dummy expo and one dummy import gdpimporter, and two dummy year are dropped. Despite is a control variables, gdp_importer is part of the research question as distance why are dropped?
              Here, I copy a sample of the data,

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input double lnimpo float(lnintus lngdp1 lngdp2 lndist) byte(exporter_1 exporter_2 exporter_3 exporter_4 exporter_5 exporter_6 importer_1)
               20.65069580078125  .5743148  26.37296 27.817717 9.857967 1 0 0 0 0 0 1
              20.970928192138672  .9706464  26.31685 27.917883 9.857967 1 0 0 0 0 0 1
              20.937944412231445  1.525122  25.34863 28.010767 9.857967 1 0 0 0 0 0 1
               21.72722816467285 1.8245493 25.587696  28.13175 9.857967 1 0 0 0 0 0 1
              21.903419494628906 1.9878744 25.934366  28.29461 9.857967 1 0 0 0 0 0 1
              end
              Please any suggestions is very welcome, thanks in advance. Kind Regards

              Comment


              • #67
                Dear Isabel,

                I am not sure to have understood all your questions, but here is my attempt to help:

                1 - The fact that you do not have zeros does not make it OK to use OLS in logs; indeed, the zeros are just a very minor problem. Therefore, I expect that OLS and PPML results to be very different and, of course, the PPML results are much more reliable.

                2 - You do not have to rescale the variables, but you can do it. For example, instead of using log of GDP in thousands of dollars, you can use log of GDP in millions of dollars. If the estimator converges, there is no need to worry about this.

                3 - If you only have one importer, distance and exporter GDP will be collinear with the exporter fixed effects and need to drop; the same happens if you use OLS. There may be other variables being dropped by the same reason, again just like in OLS. You need to think carefully about what you are doing because you risk interpreting coefficients that are meaningless.

                Best wishes,

                Joao

                Comment


                • #68
                  Dear Joao,

                  First at all thanks for your help. Second, apologize if I was not enough clearly in my message.
                  Concern to your second comment, gdp exporter and importer were already rescaled, i did not understand why the WARNING message when the coefficient are already small, I was using log of millions of dollars.
                  I understood why these variables are dropped, perhaps in this case happens because I am using an small database with few countries, but the specification model is correct in this research question. You made a good point and thank ver much again for remember it. I will try to add other control variables.

                  Comment


                  • #69
                    Hi Joao,

                    Just to confirm this specification is correct through ppml:

                    Code:
                    ppml Mig LOGMig(-1) DUM_COUNTRY*
                    Where Mig is the stock of Migrants from different origins at a given country for different periods of time. This depends on their lag and FE are included (DUM_COUNTRY*).
                    My only question is whether the lag of the dependent variable (LOGMig(-1)) should indeed be in LOGs or not.

                    Many thanks,

                    Ainhoa

                    Comment


                    • #70
                      Dear Ainhoa,

                      If you want to include the lag, it makes sense to log it. Myquestion is whether it makes sense to include thelag; I guess the answer depends on the purpose of the model.

                      Best wishes,

                      Joao

                      Comment


                      • #71
                        Hi Joao,

                        Many thanks for all your help. Using the model I stated above (stocks of migrants as a function of their lag, plus other demographic/economic variables that I also included, and Fixed Effects), the predictions get really explosive in general. A small addition/subtraction of variables in the model result in non-sense (i.e., unrealisticly too high) predictions. Would you have any rationale for this? I noticed that both the constant and the country dummies get very high coefficients as compared to those for time-varying variables. I know the question is rather general, but there might be something obvious that I'm getting wrong.

                        Thanks again,

                        Ainhoa

                        Comment


                        • #72
                          Dear Ainhoa,

                          I am afraid I have no suggestions, but I still think that your model is very strange and so I am not surprised by the strange results.

                          Best wishes,

                          Joao

                          Comment


                          • #73
                            Dear Joao,

                            Could you give me some more insight on why you think the model is strange? If you could give me some advice on a specification that would make more sense, I would be really grateful.

                            Cheers,

                            Ainhoa

                            Comment


                            • #74
                              Dear Ainhoa,

                              If I understand it correctly, you are explaining the stock of migrants by the stock in the previous period. Because the stock of migrants is likely to vary slowly, you are essentially using something to explain itself. Also, I do not know what kind of fixed effects are using but these are likely to be very collinear with the lagged stock, and this may make the model very unstable.

                              Best wishes,

                              Joao

                              Comment


                              • #75
                                Thanks a million for that, Joao. I've dropped the lag. Just an additional point related to the model getting explosive. My X variables include country FE, GDP per capita of origin and destination, and population structures by age groups at origin and destination. When I exclude the population structure part, the model looks very reasonable, including the predictions. However, when I include them, the model becomes really unstable. Would you think dropping these population structures could be justified? In a way, GDP per capita is partly feeding from population assumptions.

                                Best wishes,

                                Ainhoa

                                Comment

                                Working...
                                X