Estimating NLS

Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#1

Estimating NLS

15 Nov 2014, 14:33

Dear All,
What could be the reason(s) for this undesired result? (please see codes and results below). In the model, y is the dependent variable, controlling for only its lagged, l.y. The "g" parameter is the coefficient of the endogenous regressor. I'm raising the parameter to a variable called 'gap'. This variable is the number years separating observed years in an irregularly spaced panel, e.g 1992 and 1996, gap=4, 1996 and 2004, gap=8 and so on. "m" stands for the actual number of observed periods. For instance if between 1970 and 2010 datasets were collected for only 19 years within the period, m=1, ..., 19 while actual T=40. In this case, data is missing for all observations for 40-19 = 21 years. If the gaps are ignored the panel will have T=19. Doing this will bias the results, and neither is of interest to me. It is for this reason am accounting for gaps by trying to use NLS fixed effect estimator.

Help will be much appreciated.

Thanks,
Dapel

Code:

. nl (y=l.y*{g}^gap) if m>1 (obs = 432) Iteration 0: residual SS = 1.64e+08 Source | SS df MS -------------+------------------------------ Number of obs = 432 Model | -141461844 -1 141461844 R-squared = -6.2283 Residual | 164174641 432 380033.89 Adj R-squared = -6.2116 -------------+------------------------------ Root MSE = 616.4689 Total | 22712796.1 431 52697.9028 Res. dev. = 6776.306 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- /g | 0 (constrained) ------------------------------------------------------------------------------ Parameter g taken as constant term in model & ANOVA table

Last edited by Zuhumnan Dapel; 15 Nov 2014, 14:40.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

16 Nov 2014, 15:02

The initial value for any parameter in -nl- that is not user-specified is zero. I think that g=0 is a really bad starting point for this model. You'd probably do much better initializing g to 1. Just a guess.

That said, what about a different approach? It looks like you are trying to fit, what is in effect a compound interest model to the data. So why not log transform everything? Then it becomes a linear model: log(y) = gap*log(g) + log(L.y)

Code:

gen log_y = log(y) gen log_lag_y = log(L.y) regress log_y gap log_lag_y test log_lag_y = 1, coef

The -test- command will report the regression coefficients with log_lag_y's constrained to 1. Then you just have to exponentiate the coefficient of gap to get your estimate of g.
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#3

16 Nov 2014, 17:32

Dear Clyde, What a great relief this is to me, thanks a million! However, this is the equation I have been wanting to estimate: y_im=g^gap*L.yim_-1 + e_imi= 1, ..., N, m= 1, ..., M. Where M =19 is the total number of observed years between 1970 and 2010. The parameter of interest is the coefficient of the lagged, L.yim_-1,which is subject to non-linear restrictions, gap size. The reason is to account for the intervals between the surveys. In the log transformed version you suggested, the lagged variable seem to be unassociated with a parameter.

Last edited by Zuhumnan Dapel; 16 Nov 2014, 17:40.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#4

16 Nov 2014, 18:16

Yes, the lagged variable is unassociated with a parameter in the log-transformed version. Ignore the error terms and take logarithms of both sides. Then log(g) becomes the coefficient of gap, and L.y becomes an added term with (implicit) coefficient 1. That is precisely the model that will be estimated by the code I gave you. The difference from your original model is that the error terms are additive on the log scale, and multiplicative in the original metric. But given the nature of your model, I would think that would be a feature, not a problem.
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#5

17 Nov 2014, 03:32

Thanks! I'm trying to grasp what this means: "~~the error terms are additive on the log scale, and multiplicative in the original metric". And can we use the estimated log(g) as a speed of convergence parameter (or autoregressive parameter), in measuring the impact of previous
log(y) on current log(y)?

Last edited by Zuhumnan Dapel; 17 Nov 2014, 03:48.
Comment
Lui Yiu Lim

Join Date: Jun 2014

Posts: 17
#6

17 Nov 2014, 05:14

"the error terms are additive on the log scale, and multiplicative in the original metric"

This only means when you take log, you are assuming y=x*eps => lny=lnx+ln(eps)
instead of the original metrics y=x+eps=>lny=ln(x+eps)

BTW, since you are doing autoregressive stuff, you should also check the stationary condition by using PP test , Dicky-Fuller , ADF, or DF GLS.
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#7

17 Nov 2014, 06:42

Thanks Lui
Comment

Zuhumnan Dapel

Join Date: Sep 2014
Posts: 392

17 Nov 2014, 07:34

Dear Clyde,
This is what I got with some of the codes you gave.

Code:

. gen log_y = log(y)

. gen log_lag_y = log(L.y)
(108 missing values generated)

. regress log_y gap log_lag_y

      Source |       SS       df       MS              Number of obs =     432
-------------+------------------------------           F(  2,   429) =  123.24
       Model |  22.9499511     2  11.4749756           Prob > F      =  0.0000
    Residual |  39.9449164   429  .093111693           R-squared     =  0.3649
-------------+------------------------------           Adj R-squared =  0.3619
       Total |  62.8948675   431  .145927767           Root MSE      =  .30514

------------------------------------------------------------------------------
       log_y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gap |   .1381267   .0108975    12.68   0.000     .1167075     .159546
   log_lag_y |   .4509374   .0310385    14.53   0.000      .389931    .5119439
       _cons |   2.507871    .243472    10.30   0.000     2.029324    2.986417
------------------------------------------------------------------------------

Can we refer to the coefficient gap=0.138 as the speed of convergence or the autoregressive parameter? Otherwise I may have to go with the initial estimation of using '1' as initial value. What do you think? What I observe is that the transformation dampens the value of parameter.

That said, one may be working with natural log of consumption. In this case one would have to take log of log? How plausible is this? It is the case too, that when the autoregressive parameter is estimated using non-logged values, the parameter is <1, but when logged, it becomes >=1

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#9

17 Nov 2014, 10:37

Those are not the results you want. You forgot to run the last, crucial, line of code:

Code:

test log_lag_y = 1, coef

That line will re-evaluate the regression constraining the coefficient of log_lag_y to 1. Without that constraint, the coefficient of gap is not what you are looking for. The output of the test command will include a new regression table, and the coefficient of gap in that table will be what you want.
Comment

Zuhumnan Dapel

Join Date: Sep 2014
Posts: 392

#10

17 Nov 2014, 15:59

Thanks. I run. Just that it was not pasted here because there is no much difference between the values.
However, here is it:

Code:

. test log_lag_y = 1, coef

 ( 1)  log_lag_y = 1

       F(  1,   429) =  312.93
            Prob > F =    0.0000


Constrained coefficients

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gap |   .2390388   .0092852    25.74   0.000     .2208402    .2572374
   log_lag_y |          1          .        .       .            .           .
       _cons |  -1.676763    .057613   -29.10   0.000    -1.789682   -1.563843
------------------------------------------------------------------------------

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#11

17 Nov 2014, 16:05

So your estimate of the parameter you referred to as g in your original post is exp(0.2390388) = 1.27 (to 2 decimal places.) If you want a confidence interval around it, you can just exponentiate the confidence interval for the gap coefficient itself.
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#12

17 Nov 2014, 16:17

Thanks. You are making it clearer for me. In dynamic panel data analysis, the coefficient of the endogenous covariate is <1. How is this result that we've just got 1.27>1?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#13

17 Nov 2014, 17:31

Hmm. I see I made an error in translating between the algebraic equation log(y) = gap*log(g) + log(L.y) and the regression code. I should have specified that there be no constant term in the model. So the code should have been

Code:

regress log_y gap log_lag_y, nocons test log_lag_y = 1, coef

Sorry for the confusion.

There is another issue here, also not dealt with by this code. At the start of this thread, there was no mention of panel data. And I had in mind that this was just a single time series with gaps. If this is panel data, then we can't use -regress- here because it will fail to capture the within-panel-unit nature of what we are trying to do. Then we need -xtreg, fe-. It then also gets a little more complicated because -xtreg, fe- does not allow the -nocons- option. So if we are in a panel data situation it has to be:

Code:

xtreg log_y gap log_lag_y, fe test (_cons = 0) (log_lag_y = 1), coef
Comment

Zuhumnan Dapel

Join Date: Sep 2014
Posts: 392

#14

17 Nov 2014, 17:44

Thanks for such a humble response. Here is what I got

Code:

. xtreg log_y gap log_lag_y, fe

Fixed-effects (within) regression               Number of obs      =       432
Group variable: id                              Number of groups   =       108

R-sq:  within  = 0.3431                         Obs per group: min =         4
       between = 0.4441                                        avg =       4.0
       overall = 0.3646                                        max =         4

                                                F(2,322)           =     84.09
corr(u_i, Xb)  = 0.0946                         Prob > F           =    0.0000

------------------------------------------------------------------------------
       log_y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gap |   .1321363   .0114847    11.51   0.000     .1095418    .1547308
   log_lag_y |   .4183435    .036233    11.55   0.000     .3470602    .4896267
       _cons |   2.756283   .2821837     9.77   0.000     2.201126    3.311439
-------------+----------------------------------------------------------------
     sigma_u |  .14976866
     sigma_e |  .30750083
         rho |   .1917356   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(107, 322) =     0.94            Prob > F = 0.6449

. test (_cons = 0) (log_lag_y = 1), coef

 ( 1)  _cons = 0
 ( 2)  log_lag_y = 1

       F(  2,   322) =  545.90
            Prob > F =    0.0000
Warning:  variance matrix is nonsymmetric or highly singular


Constrained coefficients

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gap |  -.0222749          .        .       .            .           .
   log_lag_y |          1          .        .       .            .           .
       _cons |   1.95e-14          .        .       .            .           .
------------------------------------------------------------------------------

.
end of do-file

Now we have a coefficient <1 ,e.1 exp(-.0222749) = exp
0.97 (to 2 decimal places). But no standard error and CI.
Is there no way to estimate NLS panel fixed effects based on y_im=g^gap*L.yim_-1 + e_imi= 1, ..., N, m= 1, ..., M?

Last edited by Zuhumnan Dapel; 17 Nov 2014, 18:20.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#15

17 Nov 2014, 19:01

Well, that didn't work very well! And as I think about it more deeply, simply setting the constant term in the fixed effects regression to zero does not also set the fixed effects themselves to zero, which is what is required here. So that model was also mis-specified. Second helping of humble pie.

I don't know of any way to do non-linear estimation with fixed effects in Stata.

I do have one more approach to doing it with the log-transformed data. If we de-mean all the variables involved and then use -regress-, then we are capturing the within-panel-unit variation. So give this a try:

Code:

// DE-MEAN THE VARIABLES drop if missing(log_y, gap, log_lag_y) // RESTRICT TO ESTIMATION SAMPLE foreach v of varlist log_y gap log_lag_y { egen mean_`v' = mean(`v'), by(panel_var) gen dm_`v' = `v' - mean_`v' } // NOW -regress, nocons- GIVES A WITHIN panel_var REGRESSION regress dm_log_y dm_gap dm_log_lag_y, nocons test log_lag_y = 1, coef

And, again, when you're done, the exponential of the coefficient of dm_gap shown in the -test- output is your estimate of {g} in the original problem.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment