Deciding equation when analyzing by ppml

Nguyen Linh

Join Date: Aug 2020
Posts: 25

Deciding equation when analyzing by ppml

29 Sep 2020, 18:30

Hello
My data is panel data with strongly balanced . As I want to know the effect of lpi on export and import I have chosen export and import as dep vars and lpi, gdp, distance and dummy as indep vars. The summaries of my data as follow as

Code:

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      export |      1,328     833.446    3024.465          0   41549.71
      import |      1,328     837.313    3961.817          0   58532.57
    distance |      1,328    8860.618    4304.137    478.553   19228.99
         gdp |      1,328     4523.65    16524.65    1.97454   196236.7
  landlocked |      1,328    .2108434    .4080611          0          1
-------------+---------------------------------------------------------
        lpi0 |      1,262    2.879012    .5787635   1.598322   4.225967
        lpi1 |      1,262    2.691696    .5987479   1.111111    4.20779
        lpi2 |      1,262    2.754088    .6829058   1.237654   4.439356
        lpi3 |      1,262    2.846396    .5248384   1.362654      4.235
        lpi4 |      1,262    2.828908     .608916   1.394253    4.31065
-------------+---------------------------------------------------------
        lpi5 |      1,262    2.886015    .6297591   1.513605   4.377678
        lpi6 |      1,262    3.253649    .5854234   1.665079   4.795714

Because the cases with the export or import value is equal to 0 account for about 18% of total obs so I decided to use pplm to analyse. My equation becomes like this
Ex= a ln(gdp) + b ln(dis) + c ln( lpi) + e. dummy ( landlocked)
But there are some missing data on lpi because in some years in some specific countries , LPI were not collected

Code:

 
gen ll1=ln(lpi1)
(66 missing values generated)

So I wonder whether my equation is suitable or not. If not what is equation should I use?
Please give me advice
Thank so much

Tags: None

Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

30 Sep 2020, 09:24

Dear Nguyen Linh,

This is just like any other missing data case: if it is reasonable to assume that the data is missing at random, then it is fine to drop those observations (I guess that is what most people do); otherwise you would have to model the sample selection but that will probably require some very strong assumptions.

Best wishes,

Joao
Comment
Nguyen Linh

Join Date: Aug 2020

Posts: 25
#3

30 Sep 2020, 16:18

Originally posted by Joao Santos Silva View Post

Dear Nguyen Linh,

This is just like any other missing data case: if it is reasonable to assume that the data is missing at random, then it is fine to drop those observations (I guess that is what most people do); otherwise you would have to model the sample selection but that will probably require some very strong assumptions.

Best wishes,

Joao

Dear Mr
Joao Santos Silva

Actually, I also want to run heckman to compare the results but as I know in hackman, we have to add at least one variable that effects the probability that two countries engage in trade. But I don't know what can I choose to add. Can you give me some ideas ?
Thanks so much
Comment

Nguyen Linh

Join Date: Aug 2020
Posts: 25

30 Sep 2020, 20:42

Originally posted by Joao Santos Silva View Post

Dear Nguyen Linh,

This is just like any other missing data case: if it is reasonable to assume that the data is missing at random, then it is fine to drop those observations (I guess that is what most people do); otherwise you would have to model the sample selection but that will probably require some very strong assumptions.

Best wishes,

Joao

Joao

Dear Mr

Joao Santos Silva
Sorry but relating to PPML, I have one more concern relating to pplm
When I added a indep var as FTA ( RTA) in my equation , the result changed ( For example: for the coefficient of lpi0 change and has statistically significant at 1%) . Actually, this is what I want but I wonder about the exact of the result
Is there any method help me to know whether adding more indepent is better or not ?
Thanks so much

Code:

  ppml import lgdp dis ll0 RTA landlocked, cluster (dis)
note: checking the existence of the estimates
note: starting ppml estimation
note: import has noninteger values

Iteration 1:   deviance =   1173658
Iteration 2:   deviance =    676958
Iteration 3:   deviance =  576213.6
Iteration 4:   deviance =  568285.6
Iteration 5:   deviance =  568188.3
Iteration 6:   deviance =  568188.3
Iteration 7:   deviance =  568188.3

Number of parameters: 6
Number of observations: 1262
Number of observations dropped: 0
Pseudo log-likelihood: -287123.12
R-squared: .79774203
                                  (Std. Err. adjusted for 165 clusters in dis)
------------------------------------------------------------------------------
             |               Robust
      import |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lgdp |   .7101879   .0628089    11.31   0.000     .5870848    .8332911
         dis |   -1.10785   .2529942    -4.38   0.000    -1.603709   -.6119903
         ll0 |   2.713882   .9082589     2.99   0.003     .9337276    4.494037
         RTA |     .87008   .5022701     1.73   0.083    -.1143513    1.854511
  landlocked |  -.5435923   .3011126    -1.81   0.071    -1.133762    .0465775
       _cons |   6.657156   2.182711     3.05   0.002     2.379121    10.93519
------------------------------------------------------------------------------
Number of regressors dropped to ensure that the estimates exist: 0
Option strict is off

Code:

 ppml import lgdp dis ll0 landlocked, cluster (dis)
note: checking the existence of the estimates
note: starting ppml estimation
note: import has noninteger values

Iteration 1:   deviance =   1313738
Iteration 2:   deviance =  778580.5
Iteration 3:   deviance =  673140.7
Iteration 4:   deviance =  664755.4
Iteration 5:   deviance =  664644.3
Iteration 6:   deviance =  664644.3
Iteration 7:   deviance =  664644.3

Number of parameters: 5
Number of observations: 1262
Number of observations dropped: 0
Pseudo log-likelihood: -335351.13
R-squared: .78083848
                                  (Std. Err. adjusted for 165 clusters in dis)
------------------------------------------------------------------------------
             |               Robust
      import |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lgdp |   .8505762    .072593    11.72   0.000     .7082965    .9928559
         dis |  -1.400866   .1378808   -10.16   0.000    -1.671108   -1.130625
         ll0 |    1.83978   1.098713     1.67   0.094    -.3136585    3.993218
  landlocked |  -.5713438   .3106276    -1.84   0.066    -1.180163    .0374751
       _cons |   9.335693   .9011997    10.36   0.000     7.569374    11.10201
------------------------------------------------------------------------------
Number of regressors dropped to ensure that the estimates exist: 0
Option strict is off

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#5

02 Oct 2020, 03:30

It is natural that the estimates change when you add variables to the model. You can use the t-tests and economic theory to decide what variables to include.

Best wishes,

Joao
Comment
Nguyen Linh

Join Date: Aug 2020

Posts: 25
#6

03 Oct 2020, 01:01

Thanks so much for your reply
Comment

Announcement

Deciding equation when analyzing by ppml

Comment

Comment

Comment

Comment

Comment