Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a variable of geometric average

    Hi,

    I have a data that provides household income as a categorical variable. I am trying to generate a new variable as the geometric average of the categorical variable. Could you please advise how to move forward?

    Here is the household income variable details:

    Household income | Freq. Percent Cum.
    ----------------------+-----------------------------------
    1 Less than $5,000 | 1,785 4.36 4.36
    2 5,000 to 7,499 | 667 1.63 6.00
    3 7,500 to 9,999 | 704 1.72 7.72
    4 10,000 to 12,499 | 1,095 2.68 10.39
    5 12,500 to 14,999 | 985 2.41 12.80
    6 15,000 to 19,999 | 1,450 3.55 16.35
    7 20,000 to 24,999 | 1,881 4.60 20.95
    8 25,000 to 2,999 | 1,933 4.73 25.67
    9 30,000 to 34,999 | 1,983 4.85 30.52
    10 35,000 to 39,999 | 1,869 4.57 35.09
    11 40,000 to 49,999 | 3,152 7.71 42.80
    12 50,000 to 59,999 | 3,292 8.05 50.85
    13 60,000 to 74,999 | 4,285 10.48 61.33
    14 75,000 to 99,999 | 5,360 13.11 74.43
    15 100,000 to 149,999 | 5,742 14.04 88.47
    16 150,000 or more | 4,715 11.53 100.00
    ----------------------+-----------------------------------
    Total | 40,898 100.00

  • #2
    The geometric mean of a categorical variable is almost worthless. It is easy to calculate it as exp(mean(log()) but that isn't the main difficulty.

    If the data in #1 are what you have, then the best you can do is impute income midpoints to each category -- not necessarily the mean of any two limits, and you need to do something different for the top category any way. Then take the geometric mean.

    That said, this is surely a very common problem with income data, so what do other people do in the literature?

    Comment

    Working...
    X