Problem using bysort, egen, mean(), and weight together

Marty Masek

Join Date: Jun 2018

Posts: 2
#1

Problem using bysort, egen, mean(), and weight together

18 Jan 2019, 16:28

Hello,

I am doing my best to follow all the statalist guidelines. My apologies if I am still lacking in etiquette.

I am using Public Use Micro Data from IPUMS USA. These data come from the American Community Survey conducted by the US Census Bureau. IPUMS provides a level of geography called a PUMA (Public Use Microdata Area) that can contain one or more counties. The data contain observations at the individual level, and I want to create variables that summarize these individual characteristics into PUMA-level characteristics.

Below is an example of the data I am using.
Variables are:
smigpuma0 (the level of geography for which I want to summarize individual-level characteristics),
age (each individual's age), and
perwt (the person weight variable).

If I want to create a variable called pum_age which is the mean age of individuals for each smigpuma0 (or PUMA), I run the following code:

Code:

bysort smigpuma0: egen pum_age = mean(age)

This works nicely, but this command will not accept a weight. I tried:

Code:

bysort smigpuma0: egen pum_age = mean(age) [fweight=perwt]

But Stata returns the following message: "weights not allowed".

Is there another way to accomplish what I am trying to do? I use Stata 13 SE.

Example data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long smigpuma0 byte age int perwt 4702100 0 5 4804600 0 19 2901800 0 17 101900 0 36 3200400 0 17 1303700 0 65 3702690 0 16 2900100 0 8 101900 0 11 5159300 0 13 4806700 0 64 3100200 0 8 2601000 0 31 4100300 0 15 1000300 0 3 3904000 0 31 101100 0 11 2601000 0 44 1601190 0 12 3604000 0 31 end label values age age_lbl label def age_lbl 0 "Less than 1 year old", modify
Tags: None
David Benson

Join Date: Oct 2018

Posts: 489
#2

18 Jan 2019, 16:49

You could do it with collapse (collapse allows weights). See here and here. Collapse allows for fweight, iweight, and pweight

Code:

collapse (count) pop = age (mean) avg_age=age (median) med_age=age (p25) p25_age=age [fweight=perwt], by(smigpuma0)
2 likes
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2471
#3

18 Jan 2019, 19:43

Hi Marty
As a matter of fact, what you are experiencing is not a problem perse, because the mean function within egen is not meant to allow for weights.
There is however another command that you can install. If you search for "_gwtmean", you may find an egen program that will allow you to estimate the weighted means.
Best Regards,
Fernando
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#4

18 Jan 2019, 19:44

You can also look at the asgen program that can be downloaded from SSC.

Code:

ssc install asgen help asgen

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
2 likes
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35726

19 Jan 2019, 02:23

This also yields to an application of basic principles:

Code:

bysort smigpuma0: egen double mean = total(age * perwt) 
bysort smigpuma0: egen double den = total(perwt) 
replace mean = mean/den 
drop den

Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

19 Jan 2019, 06:26

Another application of basic principles is to implement it through -summ- and a loop:

Code:

. egen group = group( smigpuma0 )

. gen mean = .
(20 missing values generated)

. summ group, meanonly

. forvalues i= 1/`r(max)' {
  2. qui summ age if `i'==group [fweight= perwt ]
  3. replace mean = r(mean) if `i'==group
  4. }

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

19 Jan 2019, 06:32

Also Marty, note that there is something dodgy in your #1 that you need to clarify for yourself.

You want to calculate mean age by groups as if age is a numerical variable.

However in the example data set you provide, age is not numerical, age is a categorical variable (at least so it seems to me)... It takes the value of 0 which is labelled as "Age less than 1 year".
Comment
Andrés Lahur Talavera Cuya

Join Date: Dec 2018

Posts: 54
#8

10 Apr 2019, 08:28

Buenos días ,

Una consulta.¿ Cómo generar porcentajes filas con pesos, usando "
asgen u otro .ado." . Este es mi programación:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 anio str2 mes str6 conglome str3 vivienda str2 hogar float area byte recpobreza float factornd07
"2017" "01" "005500" "067" "11" 1 2 765
"2010" "05" "001556" "005" "11" 1 2 1322
"2012" "11" "001915" "078" "11" 1 2 4570
"2011" "07" "001286" "080" "11" 1 2 2221
"2016" "11" "006759" "013" "11" 1 2 420
"2013" "09" "000602" "101" "11" 2 1 776
"2007" "04" "002855" "030" "11" 1 2 1297
"2011" "08" "000922" "110" "11" 2 2 967
"2008" "05" "000328" "076" "11" 2 1 666
"2015" "01" "005857" "054" "11" 1 2 780
"2014" "10" "006023" "047" "11" 1 2 875
"2016" "02" "006838" "054" "11" 1 2 341
"2010" "07" "002203" "035" "11" 1 2 928
"2016" "10" "006872" "167" "11" 2 2 166
"2009" "06" "001581" "001" "12" 1 1 2020
"2016" "05" "006472" "006" "11" 2 2 1094
"2013" "01" "000145" "066" "11" 1 2 1249
"2015" "09" "006984" "034" "11" 1 2 1130
"2013" "03" "005287" "003" "11" 2 1 963
"2008" "08" "002189" "008" "11" 2 2 1273
"2017" "07" "007198" "095" "11" 1 1 2309
"2015" "09" "005653" "263" "11" 2 1 117
"2013" "11" "009670" "115" "11" 1 2 640
"2008" "11" "003390" "089" "11" 2 2 549
"2013" "01" "001471" "022" "11" 1 2 3654
"2013" "05" "007859" "029" "11" 1 2 3494
"2008" "06" "000611" "048" "11" 2 1 1114
"2012" "11" "003755" "080" "11" 2 2 208
"2014" "08" "007699" "052" "33" 1 2 4388
"2012" "03" "000326" "045" "11" 2 1 425
"2013" "05" "001627" "009" "11" 2 2 1071
"2017" "09" "007199" "052" "11" 1 2 1851
"2017" "04" "009087" "403" "11" 2 2 488
"2010" "05" "001404" "043" "11" 1 2 928
"2011" "02" "003000" "034" "11" 1 2 362
"2017" "03" "005978" "156" "11" 2 2 622
"2007" "06" "000410" "096" "11" 1 1 2090
"2010" "09" "001537" "145" "11" 1 2 678
"2012" "09" "003477" "069" "11" 1 2 762
"2017" "10" "010093" "008" "11" 2 2 482
"2009" "09" "002396" "058" "11" 1 2 242
"2014" "04" "005680" "073" "11" 2 2 473
"2017" "05" "007499" "017" "11" 1 2 2451
"2016" "01" "009359" "236" "11" 1 1 348
"2016" "07" "008612" "061" "11" 1 2 131
"2017" "03" "007577" "024" "11" 1 2 1829
"2015" "01" "006032" "124" "11" 1 2 501
"2007" "03" "001899" "027" "11" 1 1 3578
"2012" "07" "000256" "030" "11" 2 2 1099
"2011" "04" "002144" "100" "11" 1 2 1212
"2007" "10" "000178" "040" "11" 1 2 1525
"2013" "04" "007449" "041" "11" 2 2 872
"2009" "01" "002142" "085" "11" 1 2 1790
"2008" "05" "000207" "145" "11" 2 1 1333
"2007" "08" "000680" "031" "11" 2 1 2604
"2015" "11" "008181" "033" "22" 1 2 1188
"2013" "08" "000692" "095" "11" 2 1 854
"2012" "05" "002958" "029" "11" 1 2 312
"2013" "05" "007086" "121" "11" 1 2 1995
"2017" "06" "005439" "217" "11" 2 2 458
"2008" "08" "000558" "033" "11" 1 2 718
"2011" "01" "003067" "013" "11" 2 2 306
"2010" "01" "000987" "126" "11" 2 1 163
"2014" "04" "006864" "081" "11" 2 2 375
"2016" "06" "007772" "073" "11" 1 2 2435
"2010" "10" "003040" "084" "11" 2 2 586
"2015" "04" "008944" "009" "11" 2 1 512
"2008" "09" "002292" "037" "12" 1 1 1568
"2009" "11" "003274" "083" "11" 1 2 59
"2011" "05" "003533" "043" "11" 1 2 490
"2012" "08" "003764" "032" "12" 1 2 3250
"2014" "08" "005733" "038" "11" 1 1 1606
"2014" "01" "000251" "057" "11" 2 2 1097
"2015" "04" "009442" "122" "11" 2 2 221
"2014" "04" "008536" "087" "11" 1 2 178
"2007" "08" "001760" "067" "11" 1 2 5466
"2013" "01" "003090" "014" "11" 1 2 532
"2013" "11" "009501" "041" "11" 1 1 400
"2008" "08" "002651" "013" "11" 2 1 296
"2014" "11" "007771" "002" "11" 1 2 1585
"2015" "04" "006072" "080" "11" 1 1 1265
"2014" "12" "008173" "114" "11" 2 2 441
"2009" "04" "001380" "048" "11" 2 2 1947
"2014" "05" "009161" "018" "11" 1 2 1614
"2015" "06" "006386" "032" "11" 1 2 609
"2013" "04" "003700" "091" "11" 1 1 3055
"2012" "08" "001630" "032" "11" 2 2 820
"2012" "02" "000513" "033" "11" 1 2 638
"2010" "09" "002340" "012" "11" 2 1 2089
"2013" "06" "002822" "070" "11" 1 2 3440
"2014" "05" "002628" "054" "11" 2 2 154
"2008" "08" "000080" "001" "11" 2 1 103
"2008" "03" "000493" "072" "11" 2 1 380
"2016" "09" "010527" "043" "11" 1 2 578
"2013" "11" "009085" "185" "11" 2 1 315
"2010" "12" "002533" "079" "11" 2 2 274
"2014" "11" "003100" "025" "11" 1 2 574
"2011" "11" "001316" "035" "11" 1 2 4049
"2007" "08" "000946" "007" "11" 2 1 932
"2017" "11" "009922" "095" "11" 2 2 183
end
label values area Area
label def Area 1 "Urbana", modify
label def Area 2 "Rural", modify
label values recpobreza recpobreza
label def recpobreza 1 "pobre", modify
label def recpobreza 2 "no pobre", modify

*La idea es generar una nueva base de datos con estos porcentajes filas.
tab anio recpobreza [iw=factornd07], row

*Porfavor, una pista. Muchas gracias.
*Atte

*Saludos, desde Lima, Perú, Sudamérica.
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#9

10 Apr 2019, 09:20

I do not know which language is this but google translate converts it to the following, still I am not sure what is required here.
A query. How to generate percentages of rows with weights, using " asgen or another .ado. "This is my programming
The idea is to generate a new database with these percentages rows. tab anio recpobreza [iw = factornd07], row

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2471
#10

10 Apr 2019, 10:11

Hi Andres,
So first of all you may need to try to translate your request into English for future interactions, as the majority of people here use English as their primary Stata Language.
Now for what you ask, you can obtain that using a few steps:

Code:

sysuse auto, clear ** assume weights are given by mpg bysort rep78:egen tot=sum(mpg) bysort rep78:egen dome=sum(mpg*(foreign==0)) bysort rep78:egen for=sum(mpg*(foreign==1)) replace dome=dome/tot*100 replace for=for/tot*100 tabstat dome for, by(rep78) tab rep78 foreign [iw=mpg], row nofreq

Best,
Fernando
Comment
Abhishek Maheshwari

Join Date: Jun 2022

Posts: 12
#11

29 Jun 2022, 11:28

Originally posted by Nick Cox View Post

This also yields to an application of basic principles:

Code:

bysort smigpuma0: egen double mean = total(age * perwt) bysort smigpuma0: egen double den = total(perwt) replace mean = mean/den drop den

Thanks Nick, this was quite helpful!
Comment

Announcement

Problem using bysort, egen, mean(), and weight together

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment