Count total observations

Elisa Castagno

Join Date: Dec 2018

Posts: 3
#1

Count total observations

17 Apr 2019, 08:46

Hello! I am having some troubles in doing some analysis with stata.
I have a dataset in which each raw represents an household that could be composed by a maximum number of 6 people. The variables conprof* refer to the occupational status of each household member, in particular 2 means unemployed. The variable coeff is used as weight for each household.
What follows is an example of how my dataset is built.

Code:

clear input byte(conprof1 conprof2 conprof3 conprof4 conprof5 conprof6) double coeff 2 1 2 . . . 590.27028 2 2 2 . . . 1339.21721 2 2 . . . . 1196.43504 2 1 1 2 . . 168.75 2 2 4 . . . 373.10946 2 2 1 . . . 1528.00751 2 2 . . . . 1638.18854 2 2 . . . . 1142.63984 2 2 . . . . 764.71468 2 2 . . . . 751.01047 2 1 2 4 . . 753.32879 2 2 . . . . 1323.0436 2 1 2 2 . . 718.52977 2 2 . . . . 1227.99028 2 1 2 2 . . 1178.57795 2 2 . . . . 2148.28157 2 1 2 . . . 876.54834 2 2 4 4 . . 1178.836 2 2 2 1 . 2 875.58562 2 4 4 2 . . 879.83198 2 2 . . . . 351.51266 2 2 1 . . . 632.80426 2 1 2 2 2 . 259.68247 2 2 . . . . 347.96527 2 2 1 4 . . 464.83877 2 2 4 . . . 229.23751 2 1 2 . . . 1218.22996 2 4 2 8 . . 1336.40534 2 2 . . . . 1200.79717 2 2 . . . . 1373.21638 end

I am interested in knowing the total number and the share of unemployed people in my dataset, and then, by using the household weight, I want to obtain an estimate of the number and of the share of unemployed people in the entire population.

Do you have any suggestions on how to do this?
Thank you for your help.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30173
#2

17 Apr 2019, 09:16

The obstacle here is the wide layout of the data. As with most aspects of data management and analysis this is very simple if you go to long layout.

Code:

// CREATE A HOUSEHOLD IDENTIFIER VARIABLE // SKIP THIS IF YOUR REAL DATA ALREADY HAS SUCH A VARIABLE gen long hhid = _n // GO TO LONG LAYOUT reshape long conprof, i(hhid) j(person_num) drop if missing(conprof) // CREATE AN INDICATOR FOR UNEMPLOYED STATUS gen byte unemployed = (conprof == 2) // GET COUNT & PERCENT UNEMPLOYED IN UNWEIGHTED DATA tab unemployed // AND FOR A WEIGHTED ANALYSIS svyset [pweight = coeff] svy: total unemployed svy: proportion unemployed

Notes:

1. Your example data does not have a household identifier variable, which is needed for the -reshape-, so I created one. But if you already have one, skip that step, and use the name of that variable in place of hhid in the -i()- option of the -reshape- command.

2. I have assumed that the variable coeff in your data represents an inverse probability of sampling weight. If it is some other kind of weight, then you will have to figure out how to use it, as the -svy- commands will not be appropriate.

3. If the survey design also includes stratification and primary or higher order sampling units, then to get proper standard errors for the weighted results you need to reflect those aspects of the design in the -svyset- command as well.

4, If there is some compelling reason to return the data to the wide layout you started with once this is done, you can do so by just issuing the command -reshape wide conprof unemployed, i(hhid) j(person_num)-. But I encourage you not do that. It is highly likely that whatever else you plan to do with this data will be most easily done (or perhaps only possible) in the long layout. There isn't very much that Stata does best with wide data.
Comment
Scott Merryman

Join Date: Mar 2014

Posts: 896
#3

17 Apr 2019, 09:43

I agree with Clyde's point that it would be easier to work in the long format, but here is another way (it use -egen rcount- from Nick Cox's -egenmore-)

Code:

egen hh_size = rownonmiss(con*) egen unemp = rcount(con*), cond(@ ==2) total hh unemp [aw = coef] disp "Unemployed share =" _b[unemp]/ _b[hh_size]
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35783
#4

17 Apr 2019, 10:11

Also -- much though download statistics are often bemusing and even satisfying --

Code:

egen unemp = anycount(con*), values(2)
1 like
Comment
Scott Merryman

Join Date: Mar 2014

Posts: 896
#5

17 Apr 2019, 11:29

Thanks for the reminder of anycount(). I have forgotten more than I know.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35783
#6

17 Apr 2019, 11:31

I can't even remember what I've forgotten.
Comment

Announcement

Count total observations

Comment

Comment

Comment

Comment

Comment