weighting variables before collapsing survey data in STATA

Mona Elsayed

Join Date: May 2020

Posts: 26
#1

weighting variables before collapsing survey data in STATA

04 Jul 2022, 15:40

Hello everyone,

I am using 4 rounds of a labor force panel survey for the years (1998, 2006, 2012 and 2018). My sample is the working age population who are wage workers. I intend to aggregate this data to the 4-digit occupation level to see how employment growth over each two successive years of the survey differs with occupations' skill-level. For instance, for the two years (1998 & 2006), I will rank occupations based on their mean wage in 1998 (as a measure of skills), group them into quantiles and see changes in employment share over the period for each skill quantile. The same analysis will be repeated after using a measure of task content of each occupation instead of its skill level.

So, in this case the list of variables will be: occupation, wage98, task content (constant in all years), employment98, employment06. The data files I have include expansion weights for cross-section analysis for each wave and panel weights for individuals observed in 98-06, 98-12, 06-12, 12-18, 06-18 and 98-18.

I am confused on how we use weights already available to adjust variables from survey data in STATA before collapsing it (like the example I've just mentioned). I will appreciate any help in this regard or any useful materials to understand well this step of my analysis.

Thanks

Last edited by Mona Elsayed; 04 Jul 2022, 15:43.
Tags: None
Fei Wang

Join Date: Oct 2021

Posts: 726
#2

04 Jul 2022, 18:00

Mona, to my understanding, there are two issues: Which type of weights should be used and how to use them.

The first issue depends on your research question. Do you investigate the employment transition for the entire labor market or for a same set of individuals? Both are valid questions. The former focuses on labor market transitions over time, and the latter cares about career transitions of a specific group of people.

For the former, you may treat the data in 1998 and 2006 as repeated cross-section of individuals. Some old workers only appear in 1998 and some young people only show up in 2006, and all such individuals should be included. For the latter, you may need to restrict the sample to those appearing in both waves. As a result, the former analysis needs the cross-section weight while the latter needs the panel weight in 98-06.

For the second issue, you may impose the weight while collapsing the original data. Please refer to the syntax of -collapse- and weight can be used in the process.
1 like
Comment
Mona Elsayed

Join Date: May 2020

Posts: 26
#3

05 Jul 2022, 09:24

Dear Fei Wang,
Thank you for your reply. I got the first point very well. Concerning the second one, is the collapse command with the weighting option an alternative to the survey set command (svy)? I mean when should we use it instead of the svy set command?
Thanks a lot
Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

05 Jul 2022, 18:55

Mona, -collapse- is not supported by -svy-, so you may directly specify the weight in the code of collapsing. For those supported by both -svy- and directly specifying the weight, results can be equivalent, as shown by the following toy example. One advantage of -svy- is that it can set complex survey structure. But in your dataset, the weight variables are well set and you may directly use them.

Code:

sysuse auto, clear

reg price weight [pw=mpg]

Linear regression                               Number of obs     =         74
                                                F(1, 72)          =      22.44
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2706
                                                Root MSE          =     2298.8

------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |   1.825052   .3852929     4.74   0.000     1.056985    2.593119
       _cons |   591.0175   977.3297     0.60   0.547    -1357.254    2539.289
------------------------------------------------------------------------------

svyset [pw=mpg]
svy: reg price weight

Survey: Linear regression

Number of strata   =         1                  Number of obs     =         74
Number of PSUs     =        74                  Population size   =      1,576
                                                Design df         =         73
                                                F(   1,     73)   =      22.75
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2706

------------------------------------------------------------------------------
             |             Linearized
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |   1.825052   .3826448     4.77   0.000     1.062442    2.587662
       _cons |   591.0175   970.6126     0.61   0.544    -1343.411    2525.446
------------------------------------------------------------------------------

Comment

Mona Elsayed

Join Date: May 2020

Posts: 26
#5

05 Jul 2022, 19:11

Many thanks Fei Wang for your quick and very helpful replies
Comment
Mona Elsayed

Join Date: May 2020

Posts: 26
#6

23 Jul 2022, 05:51

Hi Fei Wang
I am just have a follow-up question
If I will use the -collapse- command, what is the correct weighting option I should use?
I will collapse data by occupation and get the total number of individuals working in each occupation (total employment) and the average wage of each occupation. My weighting variable is not integer, so I can exclude fw.

Last edited by Mona Elsayed; 23 Jul 2022, 06:03.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#7

23 Jul 2022, 18:25

Originally posted by Mona Elsayed View Post

Hi Fei Wang
I am just have a follow-up question
If I will use the -collapse- command, what is the correct weighting option I should use?
I will collapse data by occupation and get the total number of individuals working in each occupation (total employment) and the average wage of each occupation. My weighting variable is not integer, so I can exclude fw.

Mona, the other three types of weights will give you identical average wage by occupation. But be cautious to compute the weighted total number of something -- you have to know exactly what the weighting variable means. If the weighting variable records the number of duplicated observations (fw), or means the inverse probability that some observation is included in the sample (pw), you may compute meaningful weighted total number. If you lack information of the weighting variable, then I would suggest specifying -aw- to avoid computing the weighted total number (meanwhile aw will calculate the weighted average).

Last edited by Fei Wang; 23 Jul 2022, 18:35.
1 like
Comment

Announcement

weighting variables before collapsing survey data in STATA

Comment

Comment

Comment

Comment

Comment

Comment