Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using foreach loops to map classroom composition to individual student observations

    Dear Statalist users,

    We are using TIMSS panel data, where each student is one observation, with one StudentID, and each student is assigned to a classroom with a number of students and a common ClassID. Using foreach, we aim to compute the share of girls in the classroom for each student observation while excluding the student her-/himself. Hence, we have computed these shares separately for each class (ClassID) and for the two sexes (StudentSex == 1 for girls and StudentSex == 0 for boys). However, we are trying different configurations, we always end up with the same shares for all boys and all girls respectively, with no difference between classes. Either the first classroom composition or the last classroom composition is used by Stata throughout, depending on our configuration. How do we compute these values (shares) separately for each class?

    This is our code:


    HTML Code:
    gen GirlShare = .
    
    levelsof ClassID
    
    foreach class in `r(levels)' {
    
    count if (StudentSex == 1 & ClassID == `class')
    gen femalesInClass = `r(N)'
    
    count if ClassID == `class'
    gen classSize = `r(N)'
    
    if ClassID == `class' {
    
    foreach student in StudentID {
    
    gen femalesInClassExcl = femalesInClass - StudentSex
    
    replace GirlShare = ( femalesInClassExcl / ( classSize - 1) )
    
    drop femalesInClassExcl
    
    }
    
    }
    
    drop femalesInClass
    drop classSize
    
    }

    Thank you in advance!

  • #2
    No need for looping here; a combination of -bysort- and -egen- can calculate directly what you want:
    Code:
    clear
    input StudentID ClassID StudentSex
    1 1 1
    2 1 1
    3 1 0
    4 1 0
    5 2 1
    6 2 0
    7 2 1
    8 2 0
    9 2 1
    10 2 0
    end
    label define gender 0 "male" 1 "female"
    label values StudentSex gender
    
    bysort ClassID (StudentID) : generate classSize=_N-1 // "-1" to exclude the student her-/himself from calculation-- remove the "-1" to calculate the overall share
    bysort ClassID (StudentID) : egen femalesInClass=total(StudentSex)
    replace femalesInClass=femalesInClass-1 if StudentSex==1 // to exclude the student her-/himself from calculation -- remove this line to calculate the overall share
    
    generate GirlShare=(femalesInClass/classSize)
    Please note that in posting a minimal data example, it gets much easier to find an answer to your question. Please consider doing so in the future.

    Regards
    Bela
    Last edited by Daniel Bela; 28 Mar 2017, 09:11.

    Comment


    • #3
      Hello Bela,
      Thank you so much, looks a lot simpler.

      However, when running this code, the classSize variable is correctly generated while the femalesInClass variable displays a value more than twice the correct value for every observation.

      For instance, a classSize of 22 students of which 13 are female, the femalesInClass variable shows a value of 31 for boys and 30 for girls.

      We suspect something to be wrong with the total command but cannot figure it out.

      Would appreciate your help.

      Thanks in advance,
      Olivia

      Comment


      • #4
        Stat Olivia: Please note our expressed preference for full real names.

        I am going to simplify Daniel's excellent example a little and show another way to do it using rangestat (SSC). Daniel's method exploits a useful identity

        mean for all others = sum of all others / (#sample - 1)
        = (sum of all MINUS this value) / (#sample - 1)

        but it does assume no missing values, perhaps unlikely in this example but common in others.

        The rangestat syntax would cope with missing values and can be applied to more challenging problems.

        Code:
        clear
        input student class female
        1 1 1
        2 1 1
        3 1 0
        4 1 0
        5 2 1
        6 2 0
        7 2 1
        8 2 0
        9 2 1
        10 2 0
        end
        label define female 0 "male" 1 "female"
        label values female female
        
        bysort class : egen totalfemales=total(female)
        bysort class : gen prfemales = (totalfemales - female) / (_N - 1) 
        
        rangestat prfemales2=female, interval(class 0 0) excludeself 
        
        list, sepby(class) 
        
        
             +------------------------------------------------------------+
             | student   class   female   totalf~s   prfema~s   prfemal~2 |
             |------------------------------------------------------------|
          1. |       1       1   female          2   .3333333   .33333333 |
          2. |       2       1   female          2   .3333333   .33333333 |
          3. |       3       1     male          2   .6666667   .66666667 |
          4. |       4       1     male          2   .6666667   .66666667 |
             |------------------------------------------------------------|
          5. |       5       2   female          3         .4          .4 |
          6. |       6       2     male          3         .6          .6 |
          7. |       7       2   female          3         .4          .4 |
          8. |       8       2     male          3         .6          .6 |
          9. |       9       2   female          3         .4          .4 |
         10. |      10       2     male          3         .6          .6 |
             +------------------------------------------------------------+

        Comment


        • #5
          Thank you so much Nick! Very valuable input.

          Comment

          Working...
          X