Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardized coefficients from xtreg?

    Hello,

    My research team is running some multi-level models to examine the associations between measures of child care center quality and child outcomes.

    The generic form of the model is shown below. Spring_score is the child's assessment score in the spring, e.g. Woodcock Johnson Tests of Achievement. We control for the fall score, child covariates, and site-level covariates in the model, and treat the measure of site quality as our variable of interest.

    xtset siteid
    xtreg spring_score site_quality fall_score student_covariates site_covariates, re vce(robust)

    Is there any way that I can generate standardized regression coefficients from the model? That would be our preference, but we haven't been able to figure out how to do that. If generating standardized coefficients is not an option, what other choices do we have for examining results for multiple outcomes that have different scales?

    For now we are generating predicted values for the spring test score and calculating the means for groups defined by the values on the site quality variable (which is a dummy). Our code for that is as follows.

    predict yhat_spring_score
    putexcel A3=("Adjusted Means") B3=("N") c3=("Mean") D3=("SD") using "directory and file name", sheet(sheet name, replace) modify
    sum yhat_spring_score if site_quality==1
    putexcel A4=("site_quality = 1") B4=(r(N)) C4=(r(mean)) D4=(r(sd)) using "directory and file name", sheet(sheet name, replace) modify
    sum yhat_spring_score if site_quality'==0
    putexcel A5=("site_quality = 0") B5=(r(N)) C5=(r(mean)) D5=(r(sd)) using "directory and file name", sheet(sheet name, replace) modify

    I understand that the margins command might be an alternative way of getting these regression-adjusted means? Is that correct? If so, can somebody write a snippet of sample code and help me understand how the results I would get from margins would differ from our current approach? In the past, I have only calculated marginal effects for logit models, so I'm having trouble understanding what this would mean in the context of a continuous outcome. I suspect the problem might be a semantics issues between disciplines.

    Another option recommended to us was to calculate effect sizes by hand using the coefficient for the variable of interest and the standard deviation of the "control group", i.e. children in low quality group. This approach was recommended by an experimental researcher, but our data are from a correlational study.

    Thank you,
    Aleksandra

  • #2
    Let me guess that your student_covariates and site_covariates are ordinary variables like gender, nationality, age, etc. These have natural scales (or are categorical) and it would make no sense to standardize them. I'm guessing that the issue arises with the spring_score and other outcomes that have arbitrary scales. You can create standardized versions of those particular variables using the std() [NOTE: not sd()] function of -egen- and then use those in the regressions if you like.

    The means you are calculating in the middle of your post can be calculated with -margins, over(site_quality)-. To then apply -putexcel-, you can access them in matrix r(table) immediately after running the -margins- command. Note, however, that you will not get sample standard deviations for those means out of -margins-; instead you will get standard errors. Generally speaking the standard error is more meaningful for something like this than a sample standard deviation, but you need to decide that based on your situation.

    There is another way in which -margins- can be used here, that will get you something different. First you need to use factor-variable notation in your -xtreg- command (-help fvvarlist-) so that -margins- will know that site_quality is a categorical variable. If you then run -margins site_quality- following the regression, Stata will calculate for you the mean predicted value if all of the observations had site_quality == 1, and then the mean predicted value if all of the observations had site_quality == 0. These are what we usually mean by the adjusted means. Again, for applying -putexcel-, you will find the statistics in matrix r(table) after -margins-.

    The marginal effect of a dichotomous predictor in a model with a continuous outcome is just the difference between the expected value of the outcome when the predictor is zero and the expected value of the outcome when the predictor is one. (If the model is linear, it is the same as the regression coefficient.) As such, I think it is even simpler to understand than the marginal effect in a -logit- model.

    The use of effect sizes using the standard deviation in the control group is another reasonable way of doing this. And it really differs from standardizing only in the choice of whether to take the standard deviation in the whole sample or just the controls. There may be traditions in your discipline that prefer one approach over the other (or for abstaining from any transformations, or perhaps other kinds of transformation). The main thing is to think about who the audience for your research is: you should present it in a way that will seem sensible and familiar (if possible) to them. i.e., it's not a statistical issue.

    Also, the appropriateness of standardizing outcomes vs calculating effect sizes (or other approaches) in experimental vs observational designs, I don't think it matters at all. It's equally appropriate in either: really it's just about making the results understandable. The audience, rather than the study design, is the determining factor.

    Comment


    • #3
      Hi Clyde,

      Thanks for this. I understand much better. Yes, those are exactly the types of covariates we are using. And yes, the challenge primarily relates to the outcomes. We'd like to compare results across our outcomes, which are all in different metrics. The downside of the approaches we've been discussing - regression-adjusted means for subgroups, margins, or effect sizes - is that they only allow us to compare across models for the specific variable of interest. We were hoping to compare the coefficients on covariates as well.

      Alex

      Comment


      • #4
        The downside of the approaches we've been discussing - regression-adjusted means for subgroups, margins, or effect sizes - is that they only allow us to compare across models for the specific variable of interest. We were hoping to compare the coefficients on covariates as well.
        Maybe I'm missing something, but if the outcome variables are the only ones without a natural metric, it seems to me that using standardized outcomes (or effect sizes) and ordinary covariates in the different models will lead to results that will support comparing covariate effects across different outcomes. And comparing adjusted means of different outcomes, if the outcomes have been standardized or transformed to effect sizes before regression, also should be quite sensible.

        Comment


        • #5
          Well, what I'm struggling with is that I have a lagged dependent variable. Perhaps this is a stats 101 question, but... what do I do with the lagged fall score if I z-score the spring score?

          Comment


          • #6
            I don't think it changes anything. I would standardize the outcome variable (or transform it to effect size) and then use L1.standardized_outcome as my dependent variable.

            That said, I'm a bit surprised to see a lagged score as the dependent variable. Usually one would have the current value of the outcome as the dependent variable and use lagged versions of some or all of the predictors. It's hard for me to think of a scenario where it makes sense to do it the other way.

            Comment


            • #7
              The fall score is a control variable. The spring score is the outcome. Should I standardize the spring score but leave the fall score in the original metric? Or should I standardize them both? If I standardize both, that could conceivably lead to a situation in which some children have negative learning over the course of the year, right?

              Comment


              • #8
                Assuming that fall score and spring score refer to scores on the same test or measurement instrument, it would be rather complicated to explain a regression coefficient from a model that used a standardized score in one place and a raw score in another. But if, as it seems, you are interested in within-unit (person? school? class?) relationships in your modeling, it probably makes more sense to put the spring and fall scores on a common scale, rather than standardizing each separately.

                I gather your data are in wide layout, with spring_score and fall_score as separate variables. I assume there is an identifier variable as well--call it id. If not, create one. If that's so, I might do something like this:

                Code:
                preserve
                keep *_score
                reshape long @_score, i(id) j(string)
                summ _score
                local score_mean = r(mean)
                local score_sd = r(sd)
                restore
                foreach v of varlist *_score {
                     gen std_`v' = (`v'-`score_mean')/(`score_sd')
                }
                
                // ... OTHER STUFF YOU MIGHT NEED
                
                xtreg std_spring_score std_fall_score // AND OTHER  VARIABLES IN THE MODEL
                That way if the raw-scores increase so do the standardized versions. Again, this approach only makes sense if spring_score, fall_score etc. are measured the same way.

                Another thought occurs to me. I don't know what instrument measures these scores, but if it was developed independently of your research program there may exist population norms for them that are adjusted, for example, for age and grade level etc. Those norms might be published, or available from the test developers. It might make sense to instead transform all the scores to percentiles or z-scores in the reference population or something like that.

                Comment


                • #9
                  Ah, great idea to pool the fall and spring data before standardizing. Thank you for this suggestion!

                  Comment


                  • #10
                    I spoke with a senior quantitative analyst in my office today, and he suggested we standardize all predictor and outcomes variables and not worry about the fact that some children will have negative gains, given that we are only interested in the overall estimate. The resulting coefficients will be standardized. The other options he provided would be to calculate them after running the model, as described here: http://www.ssicentral.com/hlm/help7/...efficients.pdf

                    Comment


                    • #11
                      Yes, you can calculate standardized coefficients in the way pointed out in that document. To my mind, just standardizing the variables first is easier, but you will get the same result either way.

                      If standardizing all predictors is what your employer wants you to do, then go ahead. But I think it's a very bad idea to standardize variables like age or other covariates that have natural units of measure. Because then you are left trying to explain to somebody that the coefficient of age represents the expected mean difference in outcome associated with a 1 sd difference in age. But what is a 1 sd difference in age? Well, it'll depend on your population. It could be a few years, it could be many. It's likely to be some oddball number like 9.36 years. So then your audience has to deal with a coefficient that corresponds to an expected mean difference in outcome when the difference in age is 9.36 years. Doesn't that seem a bizarre thing to do? And does anybody have trouble understanding an expected mean difference associated with an age difference of 1 year?

                      I could go on with other objections to standardizing variables that have a natural metric, but I think that's the most important one. As I say, if that's what your employer wants, it isn't as if it's morally evil and you should resign in protest. But just tuck away in your mind for later that it usually isn't a good idea to do that.

                      Good luck!

                      Comment

                      Working...
                      X