Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random dummy variable with restrictions

    I have a panel dataset and want to create a random dummy variable with mean 0.2 that marks a random subpart of the dataset. This would work with some code like this

    Code:
    set seed 2803
    gen id = _n
    gen random = runiform()
    sort random
    local cut = round((_N / 10) * 2)
    gen sample = _n <= `cut'
    drop random
    However, I want to impose the restriction that each individual i and each time period t is marked at least once by this dummy, i.e. is present in the sample. Any ideas on how to approach this?

    Note: I tried to use the gsample command by B. Jann, but it does not seem to work for this
    Code:
    clear
    use http://www.stata-press.com/data/r16/grunfeld.dta
    distinct company
    distinct time
    xtset company year
    gsample 20, wor cluster(company year) percent alt
    distinct company
    distinct time
    Code:
    clear
    use http://www.stata-press.com/data/r16/grunfeld.dta
    distinct company
    distinct time
    xtset company year
    gsample 20, wor strat(company year) percent alt
    distinct company
    distinct time
    Last edited by Felix Stips; 07 Jan 2021, 03:34.

  • #2
    Okay, so first of all using the ssc command randomtag by Robert Picard makes this easier as the data is kept in place. Secondly, one approach to do this is to pick one observation from each group and then generate the remaining observations randomly until we reach target sample size.

    Code:
    set seed 12345
    use http://www.stata-press.com/data/r16/grunfeld.dta
    
    local percent = 20
    local samplesize = round((_N / 100 ) * `percent')
    
    egen company1 = tag(company)
    qui count if company1 == 1
    local n1 = r(N)
    
    egen time1 = tag(time)
    qui count if time1 == 1
    local n2 = r(N)
    
    local n3 = `samplesize' - `n1' - `n2'
    
    randomtag if company1 == 0 & time1 == 0, count(`n3') gen(sample)
    replace sample = 1 if time1 == 1 | company1 == 1
    drop time1 company1
    Last edited by Felix Stips; 07 Jan 2021, 05:35.

    Comment

    Working...
    X