Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing Variables to Same Units

    Hi everyone,

    I have two variables measured in units of 'days', i.e., for how many days an individual did the thing measured by the variable. Then, for another variable, I only have the hours that an individual did another activity. I need these to be comparable, and thus, in the same units. Is it possible somehow to change days to hours, or hours to days? So that all three can be the same?

    I asked a professor and they told me that it can be done either through a principal component analysis or through converting each variable to a score. I do not know how either of these can be done in STATA. Can someone help?

    Thank you!

  • #2
    Welcome to the forum. If you simply want them in the same units, only arithmetic is needed. For example, to convert days to hours:

    Code:
    gen want_hours = have_days * 24

    Comment


    • #3
      It all depends on the finer details (as always). The simplest idea would be to say that there are 24 hours in a day, so if you divided your variable measured in hours by 24, then you get it in days, as Leonardo Guizzetti sugested. That is true, but probably not what the your variables measuring something in days mean; if you say that you did something for a day, it typically does not mean you did something 24 hours. At a minimum you need time to sleep, eat, relieve yourself. Moreover, if it is measured in days, then that could mean that at one point in that day the respondent did something, maybe for a minute or less. For example, a fairly common survey question asked how many days per week did you use Facebook to inform yourself about subject X. It all really depends on the exact formulation of the question.

      Moreover, even if you had all the variables measured in days, you may still have a problem. Consider an example where you want to measure cultural consumption, and you have two question: 1) How many days a year do you attend the opera? 2) How many days a year do you read a book? Reading everyday (maybe just before going to bed to relax) is not uncommon, notice that the question did not ask for the kind of book. Going to the opera every day is just plain ridiculous. The same numeric value has a completely different meaning, even though they have the same unit. So, now we have established you are in a lot of trouble, but not said anything about what to do about it.

      What you are looking for is sometimes called standardization. Some people like to subtract the mean and divide by the standard deviation. So now all variables are measured in standard deviations. This is a linear transformation, so the shape of that variable remains unchanged. That can be helpful. However, it is not an easy unit to communicate. Alternatively you could look at percentile scores: the proportion of respondents that have less than you. I find that that often works very well when constructing an index (but that also depends on the kind of problems I often deal with...). I find that the unit is often easier to explain (not easy, but easier than standard deviation), and it often fits better with what I want to measure (someone's relative position compared to others is in my research often more relevant than the absolute amount). However this is a non-linear transformation, so you do loose some information. In my case that is often desirable as the new measure actually moves me closer to what I really want to measure, but that is certainly not always the case. So you have options, and which one is best depends on the exact situation. Below is some example code for both types of standardization:

      Code:
      sysuse auto, clear
      
      // percentiles
      egen pweight = rank(weight)
      qui count if !missing(weight)
      qui replace pweight = (pweight-.5)/r(N)
      
      egen plength = rank(length)
      qui count if !missing(length)
      qui replace plength = (plength-.5)/r(N)
      
      egen pdisplacement = rank(displacement)
      qui count if !missing(displacement)
      qui replace pdisplacement = (pdisplacement-.5)/r(N)
      
      // z-standardization
      sum weight
      gen zweight = (weight - r(mean)) / r(sd)
      
      sum length
      gen zlength = (length - r(mean)) / r(sd)
      
      sum displacement
      gen zdisplacement = (displacement - r(mean)) / r(sd)
      That leaves the comment you received from your professor. (S)he did not answer to question you asked, but the question (s)he thought you should have asked. You asked: how do I get those variables on the same scale. (S)he thought that you should have asked: how do I meaningfully combine these variables into a single variable. Principle component analysis is one way of doing that. If I were to do something like that, then I would probably prefer factor analysis, as there is a clearer substantive story behind of how your observed variables relate to the index. However, since you seem to be out of your depth with all this, I would just recommend standardize your variables in some way, and than compute the mean of your standardized variables for each observation.

      Code:
      egen psize = rowmean(pweight plength pdisplacement)
      egen zsize = rowmean(zweight zlength zdisplacement)
      Last edited by Maarten Buis; 03 Aug 2021, 06:27.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment

      Working...
      X