Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulating data with variables correlated within groups

    Hi all,

    First, I'm sorry if I have failed to find an existing thread discussing this issue.

    I am attempting to run a simulation where I allow variables to be correlated within groups. I have found an example that I repost here that solves the problem with correlated variables, but I am still at a loss understanding how I can use this approach while still grouping observations non-randomly. Say if I want to simulate data with students in different classrooms where student characteristics are correlated within classrooms.

    Thank you.

    * Set up the steps you want to repeat for the simulation in a program program define myprog2 * drop all variables to create an empty dataset, do not use clear drop _all * create a vector that contains the equivalent of a lower triangular correlation matrix matrix c = (1, 0.5968, 1, 0.6623, 0.6174, 1) * create a vector that contains the means of the variables matrix m = (52.23,52.775,52.645) * create a vector that contains the standard deviations matrix sd = (10.25,9.47,9.36) * draw a sample of 1000 cases from a normal distribution with specified correlation structure * and specified means and standard deviations drawnorm x1 x2 y, n(1000) corr(c) cstorage(lower) means(m) sds(sd) * run the desired command reg y x1 x2 end
    * use the simulate command to rerun myprog2 1000 times * collect the betas (_b) and standard errors (_se) from the regression each time * You'll probably want to set reps(10) for testing, then set it higher for the simulation. simulate _b _se, reps(1000): myprog2

  • #2
    That program is unreadable. You have to use [C O D E] [/ C O D E] (no spaces) to make it readable:

    Code:
    * Set up the steps you want to repeat for the simulation in a program
    program define myprog2
    * drop all variables to create an empty dataset, do not use clear
    drop _all
    * create a vector that contains the equivalent of a lower triangular correlation matrix
    matrix c = (1, 0.5968, 1, 0.6623, 0.6174, 1)
    * create a vector that contains the means of the variables
    matrix m = (52.23,52.775,52.645)
    * create a vector that contains the standard deviations
    matrix sd = (10.25,9.47,9.36)
    * draw a sample of 1000 cases from a normal distribution with specified correlation structure
    * and specified means and standard deviations
    drawnorm x1 x2 y, n(1000) corr(c) cstorage(lower) means(m) sds(sd)
    * run the desired
    command reg y x1 x2
    end
    * use the simulate command to rerun myprog2 1000 times
    * collect the betas (_b) and standard errors (_se) from the regression each time
    * You'll probably want to set reps(10) for testing, then set it higher for the simulation.
    simulate _b _se, reps(1000): myprog2
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Sorry, I was a bit quick there. Is this better?
      clear all //
      * Set up the steps you want to repeat for the simulation in a program //
      program define myprog2 //
      * drop all variables to create an empty dataset, do not use clear //
      drop _all //
      * create a vector that contains the equivalent of a lower triangular correlation matrix //
      matrix c = (1, 0.5968, 1, 0.6623, 0.6174, 1) //
      * create a vector that contains the means of the variables //
      matrix m = (52.23,52.775,52.645) //
      * create a vector that contains the standard deviations //
      matrix sd = (10.25,9.47,9.36) //
      * draw a sample of 1000 cases from a normal distribution with specified correlation structure //
      * and specified means and standard deviations //
      drawnorm x1 x2 y, n(1000) corr(c) cstorage(lower) means(m) sds(sd) //
      * run the desired command //
      reg y x1 x2 //
      end //
      * use the simulate command to rerun myprog2 1000 times //
      * collect the betas (_b) and standard errors (_se) from the regression each time //
      * You'll probably want to set reps(10) for testing, then set it higher for the simulation. //
      simulate _b _se, reps(1000): myprog2 //

      Comment


      • #4
        Here is an example

        Code:
        clear all
        program define sim
            drop _all
            // create a 100 classrooms
            set obs 100
            gen i = _n
            // classroom specific constants
            gen cons = rnormal(0,.25)
            // 20 students per classroom
            expand 20
            // create individual level explanatory variables
            matrix c = (1,.25 \ .25, 1)
            drawnorm x1 x2, corr(c)
            // create the dependent variable (with classroom specific constants)
            gen y = cons + x1 - .25*x2 + rnormal(0,.3)
            // estimate our model
            xtset i
            xtreg y x1 x2
        end
        simulate _b _se, reps(10): sim
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Thank you very much. This was very helpful

          Comment

          Working...
          X