Generate dataset for simulation (non-i.i.d panel data)

Fahad Mirza

Join Date: Sep 2018

Posts: 243
#1

Generate dataset for simulation (non-i.i.d panel data)

10 Apr 2021, 06:28

Hello everyone,

Hope you all are well and keeping safe.

I have a question for the community here and I have tried looking for a method online but could not find a solution which is why I am posting my request here.

I have to construct a Hausman test for non-i.i.d panel data and check it's efficacy against traditional one. For that I need to generate a dataset having small T and large N.

At the moment I am thinking 20 time periods and 50 observations per time period (total 1000 observations).

I am unable to figure out how to generate such a dataset. I looked at this presentation by Christopher F. Baum (which had useful information on simulations) however, I am still stuck at square one.

I will really appreciate the help on this.

Thank you!
Tags: Dataset, hausman, non iid, panel, simulation
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

10 Apr 2021, 07:10

Hello, my understanding is that you're not sure how to set up that scaffold? Try this:

Code:

clear set obs 50 gen id = _n expand 20 bysort id: gen timeunit = _n
Comment
Fahad Mirza

Join Date: Sep 2018

Posts: 243
#3

10 Apr 2021, 07:17

Ken Chui thank you for the reply. I feel my post may have caused this confusion. I am unable to understand how I can create a Non-identical and Non-independent dataset. The scaffold is something I am familiar with but unable to figure out what makes a dataset non-iid. I hope I am able to convey this
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#4

10 Apr 2021, 08:40

Originally posted by Fahad Mirza View Post

I am unable to understand how I can create a Non-identical and Non-independent dataset. The scaffold is something I am familiar with but unable to figure out what makes a dataset non-iid.

You could induce dependence with a shared random effect. You could render the distributions nonidentical by varying the residual variance parameter. Maybe consider something along the lines of the following.

Code:

version 16.1 clear * set seed `=strreverse("1602717")' quietly set obs 50 generate byte pid = _n generate double pid_u = rnormal() generate double var_e = runiform(1, 4) quietly expand 20 bysort pid: generate byte tim = _n generate double out = 0 + 0 * tim + /// pid_u + /// <- nonindependent rnormal(0, sqrt(var_e)) // <- not identically distributed residuals exit

I think that most would try to induce some form of autocorrelation between the residuals, too.

I hesitate to consider N = 50 and T = 20 "a dataset having small T and large N".
Comment

Announcement

Generate dataset for simulation (non-i.i.d panel data)

Comment

Comment

Comment