Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Computing the share of a variable

    Dear Profs and Colleagues,

    I am going to generate the share of workers with 4 years of schooling at the firm level (by firm and year), which can be the share of workers with 4 years of schooling (if Edgroup==1) / total workers.
    I don't have the total workers variable. I have a variable "nacio" which shows the nationality of workers so the total number of "nacio" can be used as the total number of workers in the denominator.
    FirmsID:NPC_FIC
    Since I would compute other shares( share of workers if Edgroup==2/3/2) so the collapse syntax wont be the case.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(year NPC_FIC) str6 nacio float Edgroup
    2010 500000001 "PT" 2
    2010 500000001 "PT" 2
    2010 500000001 "PT" 2
    2010 500000002 "IN" 2
    2010 500000002 "PT" 2
    2010 500000002 "IR" 2
    2010 500000002 "PT" 2
    2010 500000002 "GB" 2
    2011 500000002 "PT" 2
    2011 500000002 "PT" 2
    2011 500000002 "AO" 4
    2011 500000002 "PT" 2
    2012 500000002 "AO" 4
    2012 500000002 "PT" 2
    2012 500000002 "FR" 2
    2012 500000002 "PT" 2
    2013 500000002 "PT" 2
    2013 500000002 "PT" 2
    2014 500000002 "PT" 2
    2016 500000033 "PT" 2
    2017 500000033 "PT" 2
    2018 500000033 "PT" 2
    2019 500000033 "PT" 2
    2010 500000050 "PT" 2
    2010 500000050 "DE" 1
    2011 500000050 "PT" 2
    2012 500000050 "PT" 2
    2013 500000050 "PT" 2
    2014 500000050 "UA" 2
    2014 500000050 "PT" 2
    2015 500000050 "PT" 2
    2015 500000050 "PT" 2
    2016 500000069 "BR" 1
    2017 500000069 "PT" 1
    2019 500000073 "PT" 2
    2019 500000073 "BR" 2
    2010 500000083 "PT" 3
    2011 500000083 "PT" 3
    2012 500000083 "PT" 3
    2013 500000083 "PT" 3
    2014 500000083 "PT" 3
    2015 500000083 "PT" 3
    2016 500000083 "PT" 3
    2019 500000101 "PT" 2
    2010 500000104 "UA" 2
    2010 500000119 "PT" 2
    2010 500000119 "PT" 1
    2010 500000119 "SP" 2
    2010 500000119 "BR" 2
    2010 500000119 "PT" 2
    2010 500000119 "PT" 2
    2010 500000119 "PT" 2
    2010 500000119 "PT" 2
    2011 500000119 "PT" 2
    2011 500000119 "PT" 2
    2011 500000119 "PT" 2
    2011 500000119 "PT" 2
    2011 500000119 "PT" 2
    2011 500000119 "PT" 2
    2011 500000119 "PT" 1
    2011 500000119 "PT" 2
    2011 500000119 "PT" 2
    2012 500000119 "PT" 2
    2012 500000119 "BR" 2
    end
    
     tab Edgroup
    
        Edgroup |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |  3,582,424       12.93       12.93
              2 | 11,419,311       41.23       54.16
              3 |  7,473,149       26.98       81.14
              4 |  5,223,308       18.86      100.00
    ------------+-----------------------------------
          Total | 27,698,192      100.00
    
    when Edgroup==1 it is schooling with 4 years ( what I need to compute the share )
    Any ideas are appreciated.

    Cheers,
    Paris

  • #2
    probably could simplify, but I think this works.

    Code:
    egen denom = count(nacio), by(NPC_FIC)  //employee count
    g educ4 = Edgroup>=4  //has 4 years of education
    egen numer = sum(educ4), by(NPC_FIC) //sum all 4yr educ by firm
    g educ4shr = numer/denom

    Comment


    • #3

      Code:
      egen numer = total(Edgroup >= 4), by(NPC_FIC)
      could replace the middle two commands. The egen function sum() still works but (as from Stata 9) is undocumented in favour of total(). More interesting is its scope to feed on expressions, not just variable names. But watch out if Edgroup is ever missing.

      Comment


      • #4
        Nice. I think this may be the third time Nick's corrected me on "sum". Old habits die hard.

        I had tried "cond" but it didn't work. I didn't realize you could include a condition in "total" without "cond". I'll add that to my bag of tricks.

        Comment


        • #5
          I would call it a comment rather than a correction as a call to sum() still works. However, a user could easily be puzzled by not finding it mentioned as an egen function in the help for egen.

          As it happens the quite different Stata sum() function for cumulative or running sums is mentioned there.

          I guess more paranoid code would condition on !missing(nacio, Edgroup).

          Comment

          Working...
          X