
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deciding equation when analyzing by ppml

    My data is panel data with strongly balanced . As I want to know the effect of lpi on export and import I have chosen export and import as dep vars and lpi, gdp, distance and dummy as indep vars. The summaries of my data as follow as
        Variable |        Obs        Mean    Std. Dev.       Min        Max
          export |      1,328     833.446    3024.465          0   41549.71
          import |      1,328     837.313    3961.817          0   58532.57
        distance |      1,328    8860.618    4304.137    478.553   19228.99
             gdp |      1,328     4523.65    16524.65    1.97454   196236.7
      landlocked |      1,328    .2108434    .4080611          0          1
            lpi0 |      1,262    2.879012    .5787635   1.598322   4.225967
            lpi1 |      1,262    2.691696    .5987479   1.111111    4.20779
            lpi2 |      1,262    2.754088    .6829058   1.237654   4.439356
            lpi3 |      1,262    2.846396    .5248384   1.362654      4.235
            lpi4 |      1,262    2.828908     .608916   1.394253    4.31065
            lpi5 |      1,262    2.886015    .6297591   1.513605   4.377678
            lpi6 |      1,262    3.253649    .5854234   1.665079   4.795714
    Because the cases with the export or import value is equal to 0 account for about 18% of total obs so I decided to use pplm to analyse. My equation becomes like this
    Ex= a ln(gdp) + b ln(dis) + c ln( lpi) + e. dummy ( landlocked)
    But there are some missing data on lpi because in some years in some specific countries , LPI were not collected
    gen ll1=ln(lpi1)
    (66 missing values generated)
    So I wonder whether my equation is suitable or not. If not what is equation should I use?
    Please give me advice
    Thank so much

  • #2
    Dear Nguyen Linh,

    This is just like any other missing data case: if it is reasonable to assume that the data is missing at random, then it is fine to drop those observations (I guess that is what most people do); otherwise you would have to model the sample selection but that will probably require some very strong assumptions.

    Best wishes,



    • #3
      Originally posted by Joao Santos Silva View Post
      Dear Nguyen Linh,

      This is just like any other missing data case: if it is reasonable to assume that the data is missing at random, then it is fine to drop those observations (I guess that is what most people do); otherwise you would have to model the sample selection but that will probably require some very strong assumptions.

      Best wishes,

      Dear Mr
      Joao Santos Silva

      Actually, I also want to run heckman to compare the results but as I know in hackman, we have to add at least one variable that effects the probability that two countries engage in trade. But I don't know what can I choose to add. Can you give me some ideas ?
      Thanks so much


      • #4
        Originally posted by Joao Santos Silva View Post
        Dear Nguyen Linh,

        This is just like any other missing data case: if it is reasonable to assume that the data is missing at random, then it is fine to drop those observations (I guess that is what most people do); otherwise you would have to model the sample selection but that will probably require some very strong assumptions.

        Best wishes,


        Dear Mr
        Joao Santos Silva
        Sorry but relating to PPML, I have one more concern relating to pplm
        When I added a indep var as FTA ( RTA) in my equation , the result changed ( For example: for the coefficient of lpi0 change and has statistically significant at 1%) . Actually, this is what I want but I wonder about the exact of the result
        Is there any method help me to know whether adding more indepent is better or not ?
        Thanks so much

          ppml import lgdp dis ll0 RTA landlocked, cluster (dis)
        note: checking the existence of the estimates
        note: starting ppml estimation
        note: import has noninteger values
        Iteration 1:   deviance =   1173658
        Iteration 2:   deviance =    676958
        Iteration 3:   deviance =  576213.6
        Iteration 4:   deviance =  568285.6
        Iteration 5:   deviance =  568188.3
        Iteration 6:   deviance =  568188.3
        Iteration 7:   deviance =  568188.3
        Number of parameters: 6
        Number of observations: 1262
        Number of observations dropped: 0
        Pseudo log-likelihood: -287123.12
        R-squared: .79774203
                                          (Std. Err. adjusted for 165 clusters in dis)
                     |               Robust
              import |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                lgdp |   .7101879   .0628089    11.31   0.000     .5870848    .8332911
                 dis |   -1.10785   .2529942    -4.38   0.000    -1.603709   -.6119903
                 ll0 |   2.713882   .9082589     2.99   0.003     .9337276    4.494037
                 RTA |     .87008   .5022701     1.73   0.083    -.1143513    1.854511
          landlocked |  -.5435923   .3011126    -1.81   0.071    -1.133762    .0465775
               _cons |   6.657156   2.182711     3.05   0.002     2.379121    10.93519
        Number of regressors dropped to ensure that the estimates exist: 0
        Option strict is off

         ppml import lgdp dis ll0 landlocked, cluster (dis)
        note: checking the existence of the estimates
        note: starting ppml estimation
        note: import has noninteger values
        Iteration 1:   deviance =   1313738
        Iteration 2:   deviance =  778580.5
        Iteration 3:   deviance =  673140.7
        Iteration 4:   deviance =  664755.4
        Iteration 5:   deviance =  664644.3
        Iteration 6:   deviance =  664644.3
        Iteration 7:   deviance =  664644.3
        Number of parameters: 5
        Number of observations: 1262
        Number of observations dropped: 0
        Pseudo log-likelihood: -335351.13
        R-squared: .78083848
                                          (Std. Err. adjusted for 165 clusters in dis)
                     |               Robust
              import |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                lgdp |   .8505762    .072593    11.72   0.000     .7082965    .9928559
                 dis |  -1.400866   .1378808   -10.16   0.000    -1.671108   -1.130625
                 ll0 |    1.83978   1.098713     1.67   0.094    -.3136585    3.993218
          landlocked |  -.5713438   .3106276    -1.84   0.066    -1.180163    .0374751
               _cons |   9.335693   .9011997    10.36   0.000     7.569374    11.10201
        Number of regressors dropped to ensure that the estimates exist: 0
        Option strict is off


        • #5
          It is natural that the estimates change when you add variables to the model. You can use the t-tests and economic theory to decide what variables to include.

          Best wishes,



          • #6
            Thanks so much for your reply

