Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Index out of Scaled variables (0-10)

    I have to create an aggregate variable/index (say new1 indicating policy stringency of a particular sector) out of 5 variables (say x1-x5) indicating policy stringency. I have panel data from 40 countries and 20 years. These 5 variables are scaled between 0-10 (already normalized), 0 being no policy in place and 10 being high stringency. Now i have 5 variables, each scaled between 0-10. If I want to create an aggregate variable/index (new1) out of these 5 variables (x1-x5), I get into several issues:
    1. There are some missing values in some variables. for example, variables x1,x2 have missing values for the year 2007 for a country
    2. There are 0 values in some variables. for example, variables x3 and x4 have 0 values for 2007 for the same country
    3. There is a positive value for x5 in the year 2007. Let's say the value is 1.25.
    So, now, when i take mean values to create this aggregate variable/index (new1), i take it like this: (0+0+1.25)/3=.41 excluding the missing values and only considering values between 0-10. my question is, is this the right way to create this aggregate variable/index by just taking the average values of these 5 variables and creating the new variable? or should I consider an index criteria such as a principle component system? or some weighting index that puts high weight to values>0 and less weight to values=0? How best to deal with this situation? Is there any literature backing it up?

    NB: Interpolation/extrapolation is not an option here since i have so many missing values

  • #2
    If 0 to 10 is a pre-defined range, your simplest options seem to be to start with

    Code:
    egen xmean = rowmean(x1-x5)
    
    egen xcount = rownonmiss(x1-x5)
    and then it's your decision (not ours) how to take the results further. What is key here is that rowmean() ignores missings to the extent possible and will return missing if and only if all arguments are missing. That's why the other variable keeps track of how many nonmissings there are.


    Points of English: one criterion, principal component

    Comment


    • #3
      Thank you Nick!

      Comment

      Working...
      X