Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate dataset for simulation (non-i.i.d panel data)

    Hello everyone,

    Hope you all are well and keeping safe.

    I have a question for the community here and I have tried looking for a method online but could not find a solution which is why I am posting my request here.

    I have to construct a Hausman test for non-i.i.d panel data and check it's efficacy against traditional one. For that I need to generate a dataset having small T and large N.

    At the moment I am thinking 20 time periods and 50 observations per time period (total 1000 observations).

    I am unable to figure out how to generate such a dataset. I looked at this presentation by Christopher F. Baum (which had useful information on simulations) however, I am still stuck at square one.

    I will really appreciate the help on this.

    Thank you!

  • #2
    Hello, my understanding is that you're not sure how to set up that scaffold? Try this:
    Code:
    clear
    set obs 50
    gen id = _n
    expand 20
    bysort id: gen timeunit = _n

    Comment


    • #3
      Ken Chui thank you for the reply. I feel my post may have caused this confusion. I am unable to understand how I can create a Non-identical and Non-independent dataset. The scaffold is something I am familiar with but unable to figure out what makes a dataset non-iid. I hope I am able to convey this

      Comment


      • #4
        Originally posted by Fahad Mirza View Post
        I am unable to understand how I can create a Non-identical and Non-independent dataset. The scaffold is something I am familiar with but unable to figure out what makes a dataset non-iid.
        You could induce dependence with a shared random effect. You could render the distributions nonidentical by varying the residual variance parameter. Maybe consider something along the lines of the following.
        Code:
        version 16.1
        
        clear *
        
        set seed `=strreverse("1602717")'
        
        quietly set obs 50
        generate byte pid = _n
        generate double pid_u = rnormal()
        generate double var_e = runiform(1, 4)
        
        quietly expand 20
        bysort pid: generate byte tim = _n
        
        generate double out = 0 + 0 * tim + ///
            pid_u + /// <- nonindependent
                rnormal(0, sqrt(var_e)) // <- not identically distributed residuals
        
        exit
        I think that most would try to induce some form of autocorrelation between the residuals, too.

        I hesitate to consider N = 50 and T = 20 "a dataset having small T and large N".

        Comment

        Working...
        X