Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count total observations

    Hello! I am having some troubles in doing some analysis with stata.
    I have a dataset in which each raw represents an household that could be composed by a maximum number of 6 people. The variables conprof* refer to the occupational status of each household member, in particular 2 means unemployed. The variable coeff is used as weight for each household.
    What follows is an example of how my dataset is built.

    Code:
    clear
    input byte(conprof1 conprof2 conprof3 conprof4 conprof5 conprof6) double coeff
    2 1 2 . . .  590.27028
    2 2 2 . . . 1339.21721
    2 2 . . . . 1196.43504
    2 1 1 2 . .     168.75
    2 2 4 . . .  373.10946
    2 2 1 . . . 1528.00751
    2 2 . . . . 1638.18854
    2 2 . . . . 1142.63984
    2 2 . . . .  764.71468
    2 2 . . . .  751.01047
    2 1 2 4 . .  753.32879
    2 2 . . . .  1323.0436
    2 1 2 2 . .  718.52977
    2 2 . . . . 1227.99028
    2 1 2 2 . . 1178.57795
    2 2 . . . . 2148.28157
    2 1 2 . . .  876.54834
    2 2 4 4 . .   1178.836
    2 2 2 1 . 2  875.58562
    2 4 4 2 . .  879.83198
    2 2 . . . .  351.51266
    2 2 1 . . .  632.80426
    2 1 2 2 2 .  259.68247
    2 2 . . . .  347.96527
    2 2 1 4 . .  464.83877
    2 2 4 . . .  229.23751
    2 1 2 . . . 1218.22996
    2 4 2 8 . . 1336.40534
    2 2 . . . . 1200.79717
    2 2 . . . . 1373.21638
    end
    I am interested in knowing the total number and the share of unemployed people in my dataset, and then, by using the household weight, I want to obtain an estimate of the number and of the share of unemployed people in the entire population.

    Do you have any suggestions on how to do this?
    Thank you for your help.

  • #2
    The obstacle here is the wide layout of the data. As with most aspects of data management and analysis this is very simple if you go to long layout.

    Code:
    //    CREATE A HOUSEHOLD IDENTIFIER VARIABLE
    //    SKIP THIS IF YOUR REAL DATA ALREADY HAS SUCH A VARIABLE
    gen long hhid = _n
    
    //    GO TO LONG LAYOUT
    reshape long conprof, i(hhid) j(person_num)
    drop if missing(conprof)
    
    //    CREATE AN INDICATOR FOR UNEMPLOYED STATUS
    gen byte unemployed = (conprof == 2)
    
    //    GET COUNT & PERCENT UNEMPLOYED IN UNWEIGHTED DATA
    tab unemployed
    
    //    AND FOR A WEIGHTED ANALYSIS
    svyset [pweight = coeff]
    svy: total unemployed
    svy: proportion unemployed
    Notes:

    1. Your example data does not have a household identifier variable, which is needed for the -reshape-, so I created one. But if you already have one, skip that step, and use the name of that variable in place of hhid in the -i()- option of the -reshape- command.

    2. I have assumed that the variable coeff in your data represents an inverse probability of sampling weight. If it is some other kind of weight, then you will have to figure out how to use it, as the -svy- commands will not be appropriate.

    3. If the survey design also includes stratification and primary or higher order sampling units, then to get proper standard errors for the weighted results you need to reflect those aspects of the design in the -svyset- command as well.

    4, If there is some compelling reason to return the data to the wide layout you started with once this is done, you can do so by just issuing the command -reshape wide conprof unemployed, i(hhid) j(person_num)-. But I encourage you not do that. It is highly likely that whatever else you plan to do with this data will be most easily done (or perhaps only possible) in the long layout. There isn't very much that Stata does best with wide data.

    Comment


    • #3
      I agree with Clyde's point that it would be easier to work in the long format, but here is another way (it use -egen rcount- from Nick Cox's -egenmore-)

      Code:
      egen hh_size = rownonmiss(con*)
      egen unemp = rcount(con*), cond(@ ==2)
      total hh unemp [aw = coef]
      disp "Unemployed share =" _b[unemp]/ _b[hh_size]

      Comment


      • #4

        Also -- much though download statistics are often bemusing and even satisfying --

        Code:
        egen unemp = anycount(con*), values(2)

        Comment


        • #5
          Thanks for the reminder of anycount(). I have forgotten more than I know.

          Comment


          • #6
            I can't even remember what I've forgotten.

            Comment

            Working...
            X