How to correctly simulate a xtregAR process ?

Alexandre Cazenave-Lacroutz

Join Date: Aug 2019
Posts: 9

How to correctly simulate a xtregAR process ?

27 Aug 2019, 05:51

For a MonteCarlo study, we would like to have some simulated data that follows the process that the command xtregar, fe identifies:
\[Y = X\beta + c_i + \epsilon_{i,t}\] with the c_i individual fixed-effects
and with \[\epsilon_{i,t+1} = \rho \epsilon_{i,t} + \eta_{i,t}\] (where η_i,t are i.i.d. and follow a normal law.)

This should be simple but when we generate the data with some given parameters, and we analyse the simulated data, we do not get these parameters back with xtregar, fe ! (more exactly, results are significantly different from the initial parameters) What could go wrong ?

Here are the built data and code. Thanks in advance for any hint!

Code:

clear all
        *to get easily a panel structure : nr year
        set obs 100
        gen nr=_n
        expand 19
        gen year = 2000
        sort nr
        bysort nr: replace year=year[_n-1]+1 if _n!=1
        sort nr year
        xtset nr year
    *We consider some arbitrary parameters:
        scalar the_rho = 0.3
        scalar the_sigma_epsilon = 2
        scalar the_sigma_eta = the_sigma_epsilon * sqrt(1-the_rho*the_rho)
        scalar the_sigma_c_i = 0.9
        matrix the_m = (0,0,0)
        matrix the_sd = (the_sigma_eta,the_sigma_epsilon,the_sigma_c_i)
    * We try to simulate 100 times the corresponding process
    set seed 89
    gen rho_emp = 0
    gen sigma_e_emp = 0
    gen sigma_std_ci = 0    
    forvalues i = 1/100 {
        drawnorm eta epsilon0 c_i, means(the_m) sds(the_sd)
     bysort nr: gen epsilon= epsilon0 if _n==1
     bysort nr: replace epsilon=eta + the_rho * epsilon[_n-1] if _n>1
     bysort nr: replace c_i = c_i[1]
    gen y = 5 + c_i + epsilon
        xtregar y, fe
        replace rho_emp = e(rho_ar) if _n==`i'
        replace sigma_e_emp = e(sigma_e) if _n==`i'
        replace sigma_std_ci = e(sigma_u) if _n==`i'
        drop epsilon epsilon epsilon0 y eta c_i
    }
    //comparing the results
    keep if _n<=100
    keep rho_emp sigma_e_emp sigma_std_ci
    gen constant=1
    reg rho_emp constant
        test (_cons=0.3)
    reg sigma_e_emp constant
        test (_cons=2)
    reg sigma_std_ci constant
        test (_cons=0.9)

Additional note: the line

Code:

 scalar the_sigma_eta = the_sigma_epsilon * sqrt(1-the_rho*the_rho)

comes from the stationnarity of ε_i,t.

The reference for xtregar, fe can be found here: https://www.stata.com/manuals13/xtxtregar.pdf

Tags: None

Alexandre Cazenave-Lacroutz

Join Date: Aug 2019

Posts: 9
#2

27 Aug 2019, 10:39

Note: I cannot any longer edit the above post, but in the Stata documentation, the c_i are in fact called: ν_i. From now on, I respect this notation.

In this second post, I share why we (really) hope to find mistakes in the above piece of code. If it was correct, we think it would have the following implications:

1/ Whereas the Stata documentation for xtregar, fe claims that e(sigma_e) provides the standard errors of the perturbations ε, we find that it rather provides the standard errors of the innovations η_i,t.
Consider for instance the above piece of code with: the_rho = 0.9 ; the_sigma_epsilon = 100 ; the_sigma_c_i = 10
The mean of e(sigma_e) over the 100 runs is: 42.8242 (standard error: .0686569). It is quite different from the_sigma_epsilon (= 100).
Conversely e(sigma_e) is not so far from the_sigma_eta (this latter is equal to 43.58899).

To be noted (and quite troubling): the Stata documentation for xtregar, re claims that e(sigma_e) provides the standard errors of the innovations η_i,t.

2/ Point 1/ would have other implications. For instance, ρ_fov is defined as "the fraction of variance because of u_i" (the ν_i).
It is computed as e(sigma_u)²/[e(sigma_u)²+e(sigma_e)²]. If Point 1/ was true, it should be equal to e(sigma_u)²/[e(sigma_u)²+(1-e(rho_ar))²*e(sigma_e)²]

3/ More intriguing (and of higher importance for our purposes), our few simulations would show a biais of e(rho_ar) toward 0. That is: the mean of e(rho_ar) over the 100 runs is: .7854179 (standard error: .0013959) which is quite below 0.9! (e.g. in none of 100 runs is e(rho_ar) equal or higher than 0.9...)
But in the Stata documentation, (in the Technical note below exemple 1, p486 of the online documentation here: https://www.stata.com/manuals/xt.pdf), it is written: "dw is the default because it performs well in Monte Carlo simulations." It is compared to the tscorr estimator of ρ, which is "biased towards zero". From that, it seems implicit that the default choice of the ρ estimator is not biased toward zero in Monte Carlo simulations. But we find precisely the opposite with our above code...

Thank you very much in advance. Every suggestion is welcome.
Comment
Alexandre Cazenave-Lacroutz

Join Date: Aug 2019

Posts: 9
#3

23 Sep 2019, 11:04

For those who encounters similar problems, here are some news.

1/ Private messages for StataCorp members aknowledge that the table notation does not map the equation notation.

3/ Regarding the bias towards zero of the autocorrelation coefficient, we understood the theory behind: even in short balanced panel, there is for instance a strong bias towards zero (of order 1/T, with T the number of periods). Happily, we were able to define a consistent and asymptotically unbiased estimator even for unbalanced cases. We wrote a user-written Stata command (we recently sent it to ssc).

Please find the companion papers on my webpage: https://sites.google.com/view/acazenave-lacroutz/english

Alexandre
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 216
#4

23 Sep 2019, 12:28

For those interested on the subject, a couple of clarifying comments:

1. I would like to emphasize that there is nothing wrong with the command.
2. For all interested, if in doubt about the meaning of a parameter, you can always look at the returned results documented for each of our commands.
3. We document what e(sigma_e) is in the returned results section.
4. We write the equations to present the models and the estimators and not to match the tables.

P.S: Looking forward to read the papers and play with the commands contributed by Alexandre
Comment
Alexandre Cazenave-Lacroutz

Join Date: Aug 2019

Posts: 9
#5

27 Sep 2019, 12:35

@Enrique: I did not want to display the exact content of private messages, or to wrongly present the position of StataCorp, hence I limited myself to the cryptic statement 1/ above. Thank you for your clarifying comments.

@all: Regarding the papers, we are making some incremental changes every day, so thanks a lot in advance for your comments !
An example of incremental change not yet in the online version: for instance, by using our command, and then plugging the corresponding estimate of rho in xtregar, re, the theory and the MonteCarlo simulations show that you get an almost perfect estimation of all the parameters of the AR(1) process (this will be in version 2.0 of the paper). Hopefully even more to say in the near future.

@all: Regarding the command, it is now available by typing:
ssc install rho_xtregar, replace
(I will make a separate post next week.)
Comment
Alexandre Cazenave-Lacroutz

Join Date: Aug 2019

Posts: 9
#6

09 Oct 2019, 02:57

Reassuring Note: mosts users of xtregar are probably not concerned by the post below.
(Notably: the estimation of the Beta is happily not affected - if you are ready to make the hypothesis that missing data occurs at random.)

In fact, from MonteCarlo simulations and by studying the formulas, save a mistake from us, we think that there are potentially two incorrect estimations in the current xtregar command.

It concerns only xtregar, FE in an unbalanced setting. More precisely, within xtregar, fe in an unbalanced setting, it concerns only the estimation of the constant (but, who cares?) and the estimation of the variance of the perturbation (we care, but who else?). The estimation of the Beta coefficient is correct, even in the unbalanced case (if one is ready to make a Missing-at-random hypothesis).
See our note at the below link: https://github.com/ACL90/rho_xtregar...ons_v1_0_0.pdf

We can propose a (consistent) estimation of the variance of the perturbations in the unbalanced setting, so this can be corrected. See at the above link. If, for your research purpose, you cannot wait for corrections in the estimation of this very specific parameter, send us an e-mail to get our Stata code on the matter.

Note: As for our previous posts, we should insist this is also work-in-progress that we make available as early as possible because it can really help other researchers (and to get useful feedbacks). Every feedback is warmly welcome.
Comment
Alexandre Cazenave-Lacroutz

Join Date: Aug 2019

Posts: 9
#7

21 Oct 2019, 03:26

We should admit that we might have been too rassuring in our previous post. In fact, a "missing-at-random" hypothesis may not ensure that the β currently estimated by xtregar, fe is consistent.
Regarding β, let us only state that:
- in balanced panels, β is well estimated with xtregar, fe.
- in our Monte Carlo simulations in the unbalanced case, the β estimated by xtregar, fe seems sometimes consistently estimated, and sometimes not.
We are however able to propose a simple modification of the Baltagi-and-Wu transformation that enables a consistent estimation of the β in all unbalanced cases. Note that a consistent estimation of the β was/is already possible with xtreg, fe but the dedicated method may be more efficient in some case (eg. the balanced case).
Comment

Announcement

How to correctly simulate a xtregAR process ?

Comment

Comment

Comment

Comment

Comment

Comment