Hi Statalisters,
I am looking at the American Time Use dataset, where the variable for household income is categorical and has 16 categories (as shown in the tables below). I would like to convert this to a continuous variable, similar to what Zilanwala (2014), p. 10-11 does. Here is what she says she has done:
“Income is converted from these categorical responses to dollar amounts by assigning the midpoint of each category and representing income in thousands of dollars. The last category is topcoded to $200,000”
ta hefaminc
Edited: Family |
Income | Freq. Percent Cum.
---------------------+-----------------------------------
Less than $5,000 | 303 3.36 3.36
$5,000 to $7,499 | 173 1.92 5.28
$7,500 to $9,999 | 245 2.72 8.00
$10,000 to $12,499 | 289 3.21 11.21
$12,500 to $14,999 | 258 2.86 14.07
$15,000 to $19,999 | 445 4.94 19.01
$20,000 to $24,999 | 472 5.24 24.25
$25,000 to $29,999 | 485 5.38 29.64
$30,000 to $34,999 | 531 5.89 35.53
$35,000 to $39,999 | 475 5.27 40.80
$40,000 to $49,999 | 779 8.65 49.45
$50,000 to $59,999 | 717 7.96 57.41
$60,000 to $74,999 | 920 10.21 67.62
$75,000 to $99,999 | 1,113 12.35 79.98
$100,000 to $149,999 | 1,039 11.53 91.51
$150,000 and over | 765 8.49 100.00
---------------------+-----------------------------------
Total | 9,009 100.00
. ta hefaminc, nol
Edited: |
Family |
Income | Freq. Percent Cum.
------------+-----------------------------------
1 | 303 3.36 3.36
2 | 173 1.92 5.28
3 | 245 2.72 8.00
4 | 289 3.21 11.21
5 | 258 2.86 14.07
6 | 445 4.94 19.01
7 | 472 5.24 24.25
8 | 485 5.38 29.64
9 | 531 5.89 35.53
10 | 475 5.27 40.80
11 | 779 8.65 49.45
12 | 717 7.96 57.41
13 | 920 10.21 67.62
14 | 1,113 12.35 79.98
15 | 1,039 11.53 91.51
16 | 765 8.49 100.00
------------+-----------------------------------
The only way I can think of doing this right now is something like the following
recode hefamic (1=2500) (2=6250) and so on.
1. However, I am not sure how to generate the midpoint of a category like (5000-7499). Should I add the two endpoints, 5000 and 7499 and divide them by 2, or is there some other formula?
2. Secondly I was wondering whether there is a more elegant way of doing this, rather than generating each of the midpoints individually
Thanks in advance!
Monzur
Reference:
Zilanawala, A. (2014). Women’s Time Poverty and Family Structure Differences by Parenthood and Employment. Journal of Family Issues, 0192513X14542432.
I am looking at the American Time Use dataset, where the variable for household income is categorical and has 16 categories (as shown in the tables below). I would like to convert this to a continuous variable, similar to what Zilanwala (2014), p. 10-11 does. Here is what she says she has done:
“Income is converted from these categorical responses to dollar amounts by assigning the midpoint of each category and representing income in thousands of dollars. The last category is topcoded to $200,000”
ta hefaminc
Edited: Family |
Income | Freq. Percent Cum.
---------------------+-----------------------------------
Less than $5,000 | 303 3.36 3.36
$5,000 to $7,499 | 173 1.92 5.28
$7,500 to $9,999 | 245 2.72 8.00
$10,000 to $12,499 | 289 3.21 11.21
$12,500 to $14,999 | 258 2.86 14.07
$15,000 to $19,999 | 445 4.94 19.01
$20,000 to $24,999 | 472 5.24 24.25
$25,000 to $29,999 | 485 5.38 29.64
$30,000 to $34,999 | 531 5.89 35.53
$35,000 to $39,999 | 475 5.27 40.80
$40,000 to $49,999 | 779 8.65 49.45
$50,000 to $59,999 | 717 7.96 57.41
$60,000 to $74,999 | 920 10.21 67.62
$75,000 to $99,999 | 1,113 12.35 79.98
$100,000 to $149,999 | 1,039 11.53 91.51
$150,000 and over | 765 8.49 100.00
---------------------+-----------------------------------
Total | 9,009 100.00
. ta hefaminc, nol
Edited: |
Family |
Income | Freq. Percent Cum.
------------+-----------------------------------
1 | 303 3.36 3.36
2 | 173 1.92 5.28
3 | 245 2.72 8.00
4 | 289 3.21 11.21
5 | 258 2.86 14.07
6 | 445 4.94 19.01
7 | 472 5.24 24.25
8 | 485 5.38 29.64
9 | 531 5.89 35.53
10 | 475 5.27 40.80
11 | 779 8.65 49.45
12 | 717 7.96 57.41
13 | 920 10.21 67.62
14 | 1,113 12.35 79.98
15 | 1,039 11.53 91.51
16 | 765 8.49 100.00
------------+-----------------------------------
The only way I can think of doing this right now is something like the following
recode hefamic (1=2500) (2=6250) and so on.
1. However, I am not sure how to generate the midpoint of a category like (5000-7499). Should I add the two endpoints, 5000 and 7499 and divide them by 2, or is there some other formula?
2. Secondly I was wondering whether there is a more elegant way of doing this, rather than generating each of the midpoints individually
Thanks in advance!
Monzur
Reference:
Zilanawala, A. (2014). Women’s Time Poverty and Family Structure Differences by Parenthood and Employment. Journal of Family Issues, 0192513X14542432.
Comment