Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Centering an Indicator Variable

    Hello, I am looking to center a covariate, per the reference below. This covariate is Hospital Referral Region (HRR), which has 306 values and enters my model as 305 indicator variables. I also use this variable in interaction terms in the model.

    Per Kraemer and Blaysey (and Cronbach's) recommendations, the indicator variables should be recoded from 1 and 0 to 1-1/m and -1/m, with one (arbitrary) HRR eliminated. I'd welcome any suggestions on how to easily code this in Stata.

    Kraemer HC & Blasey CM. Centring in regression analyses: a strategy to prevent errors in statistical inference. International Journal of Methods in Psychiatric Research, 13: 3.

  • #2
    You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also remember we are not from your area.

    In my literature, indicators are almost always coded 0/1 (which Stata does automatically) so I don't understand why you want to change this. It also seems odd to me to have 305 dummies and then interact them with other variables - you'll have many hundreds of parameters. I'm also not sure what m is. I suppose you can set up a loop from 2 to 306:

    forvalues j=2/100 {
    g ind`j'=1 if i==`j'
    su ind`j'
    replace ind`j'=-1/r(sum) if i!=`j'
    }

    Without your data, I can't be sure this works.


    Comment


    • #3
      I have never seen this advice before. I am pretty clear that thousands of researchers ignore it and aren't bitten by ignoring it. There are lots of really good reasons why 0 and 1 codes are congenial and practical.

      Comment


      • #4
        thank you for the citation; I don't always agree with Helena Kraemer but I always learn something from her

        whether this is easy or hard depends on the data set up (e.g., are the 306 variables contiguous? if no, do they share a common piece of their name (e.g., hrr1-hrr206)? here I assume they are contiguous no matter how named (though I assume hrr1-hrr206):
        Code:
        foreach var of varlist hrr1-hrr306 {
        replace `var'=1-(1/306) if `var'==1
        replace `var'=-1/306 if `var'==0
        }
        I recommend saving this as a new dataset to make it easy to undo

        note that shorter solutions are possible

        I'm not sure I agree with the authors that this is in general a good idea; in particular, I doubt that having 305 indicator variables in one model is a good idea (but I don't know what your project is about either), especially given the interactions you refer to

        Comment


        • #5
          Thanks very much for the sample code. I've thought through the need for the indicator variables quite a bit (this is actually linked in another post I had on StataList entitled,Three Level Models with FE, if folks would like the background). The need to recode is so that I can have an interpretation of "mean" geographic region rather than a "reference" geographic region that I would get from 0/1 coded variables.

          Comment


          • #6
            That sounds to me like the "mean" you are comparing with assumes that each Hospital Reference Region is of equal size. That is probably not true. You can include your categorical variable as a regular factor variable and do the interaction with the regular factor variable, and after you can use contrast with the gw. prefix for your categorical variable to get a more reasonable mean with which to compare.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Hi Maarten, I'm not looking to evaluate one HRR versus a mean HRR -- the HRR is really a nuisance parameter I have to adjust for and for which I need to produce predicted costs for a mean HRR. In that case I'm not sure the - contrast- command gets me what I need.

              Not sure why you infer that the the mean approach I denote above assumes HRR of equal size -- each HRR is going to contribute a different number of observations to the model, so Stata will already be taking that into account. Can you let me know more about what you think is the problem?

              Comment


              • #8
                You compare to the mean assuming each HRR is of equal size, because that is how you created that variable.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment

                Working...
                X