Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem using bysort, egen, mean(), and weight together

    Hello,

    I am doing my best to follow all the statalist guidelines. My apologies if I am still lacking in etiquette.

    I am using Public Use Micro Data from IPUMS USA. These data come from the American Community Survey conducted by the US Census Bureau. IPUMS provides a level of geography called a PUMA (Public Use Microdata Area) that can contain one or more counties. The data contain observations at the individual level, and I want to create variables that summarize these individual characteristics into PUMA-level characteristics.


    Below is an example of the data I am using.
    Variables are:
    smigpuma0 (the level of geography for which I want to summarize individual-level characteristics),
    age (each individual's age), and
    perwt (the person weight variable).

    If I want to create a variable called pum_age which is the mean age of individuals for each smigpuma0 (or PUMA), I run the following code:
    Code:
    bysort smigpuma0: egen pum_age = mean(age)
    This works nicely, but this command will not accept a weight. I tried:

    Code:
    bysort smigpuma0: egen pum_age = mean(age) [fweight=perwt]
    But Stata returns the following message: "weights not allowed".

    Is there another way to accomplish what I am trying to do? I use Stata 13 SE.

    Example data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long smigpuma0 byte age int perwt
    4702100 0  5
    4804600 0 19
    2901800 0 17
     101900 0 36
    3200400 0 17
    1303700 0 65
    3702690 0 16
    2900100 0  8
     101900 0 11
    5159300 0 13
    4806700 0 64
    3100200 0  8
    2601000 0 31
    4100300 0 15
    1000300 0  3
    3904000 0 31
     101100 0 11
    2601000 0 44
    1601190 0 12
    3604000 0 31
    end
    label values age age_lbl
    label def age_lbl 0 "Less than 1 year old", modify

  • #2
    You could do it with collapse (collapse allows weights). See here and here. Collapse allows for fweight, iweight, and pweight

    Code:
     
     collapse (count) pop = age (mean) avg_age=age (median) med_age=age (p25) p25_age=age [fweight=perwt], by(smigpuma0)

    Comment


    • #3
      Hi Marty
      As a matter of fact, what you are experiencing is not a problem perse, because the mean function within egen is not meant to allow for weights.
      There is however another command that you can install. If you search for "_gwtmean", you may find an egen program that will allow you to estimate the weighted means.
      Best Regards,
      Fernando

      Comment


      • #4
        You can also look at the asgen program that can be downloaded from SSC.
        Code:
        ssc install asgen
        help asgen
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #5
          This also yields to an application of basic principles:

          Code:
          bysort smigpuma0: egen double mean = total(age * perwt) 
          bysort smigpuma0: egen double den = total(perwt) 
          replace mean = mean/den 
          drop den

          Comment


          • #6
            Another application of basic principles is to implement it through -summ- and a loop:

            Code:
            . egen group = group( smigpuma0 )
            
            . gen mean = .
            (20 missing values generated)
            
            . summ group, meanonly
            
            . forvalues i= 1/`r(max)' {
              2. qui summ age if `i'==group [fweight= perwt ]
              3. replace mean = r(mean) if `i'==group
              4. }

            Comment


            • #7
              Also Marty, note that there is something dodgy in your #1 that you need to clarify for yourself.

              You want to calculate mean age by groups as if age is a numerical variable.

              However in the example data set you provide, age is not numerical, age is a categorical variable (at least so it seems to me)... It takes the value of 0 which is labelled as "Age less than 1 year".

              Comment


              • #8
                Buenos días ,


                Una consulta.¿ Cómo generar porcentajes filas con pesos, usando "
                asgen u otro .ado." . Este es mi programación:


                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str4 anio str2 mes str6 conglome str3 vivienda str2 hogar float area byte recpobreza float factornd07
                "2017" "01" "005500" "067" "11" 1 2 765
                "2010" "05" "001556" "005" "11" 1 2 1322
                "2012" "11" "001915" "078" "11" 1 2 4570
                "2011" "07" "001286" "080" "11" 1 2 2221
                "2016" "11" "006759" "013" "11" 1 2 420
                "2013" "09" "000602" "101" "11" 2 1 776
                "2007" "04" "002855" "030" "11" 1 2 1297
                "2011" "08" "000922" "110" "11" 2 2 967
                "2008" "05" "000328" "076" "11" 2 1 666
                "2015" "01" "005857" "054" "11" 1 2 780
                "2014" "10" "006023" "047" "11" 1 2 875
                "2016" "02" "006838" "054" "11" 1 2 341
                "2010" "07" "002203" "035" "11" 1 2 928
                "2016" "10" "006872" "167" "11" 2 2 166
                "2009" "06" "001581" "001" "12" 1 1 2020
                "2016" "05" "006472" "006" "11" 2 2 1094
                "2013" "01" "000145" "066" "11" 1 2 1249
                "2015" "09" "006984" "034" "11" 1 2 1130
                "2013" "03" "005287" "003" "11" 2 1 963
                "2008" "08" "002189" "008" "11" 2 2 1273
                "2017" "07" "007198" "095" "11" 1 1 2309
                "2015" "09" "005653" "263" "11" 2 1 117
                "2013" "11" "009670" "115" "11" 1 2 640
                "2008" "11" "003390" "089" "11" 2 2 549
                "2013" "01" "001471" "022" "11" 1 2 3654
                "2013" "05" "007859" "029" "11" 1 2 3494
                "2008" "06" "000611" "048" "11" 2 1 1114
                "2012" "11" "003755" "080" "11" 2 2 208
                "2014" "08" "007699" "052" "33" 1 2 4388
                "2012" "03" "000326" "045" "11" 2 1 425
                "2013" "05" "001627" "009" "11" 2 2 1071
                "2017" "09" "007199" "052" "11" 1 2 1851
                "2017" "04" "009087" "403" "11" 2 2 488
                "2010" "05" "001404" "043" "11" 1 2 928
                "2011" "02" "003000" "034" "11" 1 2 362
                "2017" "03" "005978" "156" "11" 2 2 622
                "2007" "06" "000410" "096" "11" 1 1 2090
                "2010" "09" "001537" "145" "11" 1 2 678
                "2012" "09" "003477" "069" "11" 1 2 762
                "2017" "10" "010093" "008" "11" 2 2 482
                "2009" "09" "002396" "058" "11" 1 2 242
                "2014" "04" "005680" "073" "11" 2 2 473
                "2017" "05" "007499" "017" "11" 1 2 2451
                "2016" "01" "009359" "236" "11" 1 1 348
                "2016" "07" "008612" "061" "11" 1 2 131
                "2017" "03" "007577" "024" "11" 1 2 1829
                "2015" "01" "006032" "124" "11" 1 2 501
                "2007" "03" "001899" "027" "11" 1 1 3578
                "2012" "07" "000256" "030" "11" 2 2 1099
                "2011" "04" "002144" "100" "11" 1 2 1212
                "2007" "10" "000178" "040" "11" 1 2 1525
                "2013" "04" "007449" "041" "11" 2 2 872
                "2009" "01" "002142" "085" "11" 1 2 1790
                "2008" "05" "000207" "145" "11" 2 1 1333
                "2007" "08" "000680" "031" "11" 2 1 2604
                "2015" "11" "008181" "033" "22" 1 2 1188
                "2013" "08" "000692" "095" "11" 2 1 854
                "2012" "05" "002958" "029" "11" 1 2 312
                "2013" "05" "007086" "121" "11" 1 2 1995
                "2017" "06" "005439" "217" "11" 2 2 458
                "2008" "08" "000558" "033" "11" 1 2 718
                "2011" "01" "003067" "013" "11" 2 2 306
                "2010" "01" "000987" "126" "11" 2 1 163
                "2014" "04" "006864" "081" "11" 2 2 375
                "2016" "06" "007772" "073" "11" 1 2 2435
                "2010" "10" "003040" "084" "11" 2 2 586
                "2015" "04" "008944" "009" "11" 2 1 512
                "2008" "09" "002292" "037" "12" 1 1 1568
                "2009" "11" "003274" "083" "11" 1 2 59
                "2011" "05" "003533" "043" "11" 1 2 490
                "2012" "08" "003764" "032" "12" 1 2 3250
                "2014" "08" "005733" "038" "11" 1 1 1606
                "2014" "01" "000251" "057" "11" 2 2 1097
                "2015" "04" "009442" "122" "11" 2 2 221
                "2014" "04" "008536" "087" "11" 1 2 178
                "2007" "08" "001760" "067" "11" 1 2 5466
                "2013" "01" "003090" "014" "11" 1 2 532
                "2013" "11" "009501" "041" "11" 1 1 400
                "2008" "08" "002651" "013" "11" 2 1 296
                "2014" "11" "007771" "002" "11" 1 2 1585
                "2015" "04" "006072" "080" "11" 1 1 1265
                "2014" "12" "008173" "114" "11" 2 2 441
                "2009" "04" "001380" "048" "11" 2 2 1947
                "2014" "05" "009161" "018" "11" 1 2 1614
                "2015" "06" "006386" "032" "11" 1 2 609
                "2013" "04" "003700" "091" "11" 1 1 3055
                "2012" "08" "001630" "032" "11" 2 2 820
                "2012" "02" "000513" "033" "11" 1 2 638
                "2010" "09" "002340" "012" "11" 2 1 2089
                "2013" "06" "002822" "070" "11" 1 2 3440
                "2014" "05" "002628" "054" "11" 2 2 154
                "2008" "08" "000080" "001" "11" 2 1 103
                "2008" "03" "000493" "072" "11" 2 1 380
                "2016" "09" "010527" "043" "11" 1 2 578
                "2013" "11" "009085" "185" "11" 2 1 315
                "2010" "12" "002533" "079" "11" 2 2 274
                "2014" "11" "003100" "025" "11" 1 2 574
                "2011" "11" "001316" "035" "11" 1 2 4049
                "2007" "08" "000946" "007" "11" 2 1 932
                "2017" "11" "009922" "095" "11" 2 2 183
                end
                label values area Area
                label def Area 1 "Urbana", modify
                label def Area 2 "Rural", modify
                label values recpobreza recpobreza
                label def recpobreza 1 "pobre", modify
                label def recpobreza 2 "no pobre", modify



                *La idea es generar una nueva base de datos con estos porcentajes filas.
                tab anio recpobreza [iw=factornd07], row



                *Porfavor, una pista. Muchas gracias.
                *Atte

                *Saludos, desde Lima, Perú, Sudamérica.

                Comment


                • #9
                  I do not know which language is this but google translate converts it to the following, still I am not sure what is required here.
                  A query. How to generate percentages of rows with weights, using " asgen or another .ado. "This is my programming
                  The idea is to generate a new database with these percentages rows. tab anio recpobreza [iw = factornd07], row
                  Regards
                  --------------------------------------------------
                  Attaullah Shah, PhD.
                  Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                  FinTechProfessor.com
                  https://asdocx.com
                  Check out my asdoc program, which sends outputs to MS Word.
                  For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                  Comment


                  • #10
                    Hi Andres,
                    So first of all you may need to try to translate your request into English for future interactions, as the majority of people here use English as their primary Stata Language.
                    Now for what you ask, you can obtain that using a few steps:

                    Code:
                    sysuse auto, clear
                    ** assume weights are given by mpg
                    
                    bysort rep78:egen tot=sum(mpg)
                    bysort rep78:egen dome=sum(mpg*(foreign==0))
                    bysort rep78:egen for=sum(mpg*(foreign==1))
                    replace dome=dome/tot*100
                    replace for=for/tot*100
                    tabstat dome for, by(rep78)
                    tab rep78 foreign [iw=mpg], row nofreq
                    Best,
                    Fernando

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      This also yields to an application of basic principles:

                      Code:
                      bysort smigpuma0: egen double mean = total(age * perwt)
                      bysort smigpuma0: egen double den = total(perwt)
                      replace mean = mean/den
                      drop den
                      Thanks Nick, this was quite helpful!

                      Comment

                      Working...
                      X