Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Seeking command to generate a new dependent variable that is adjusted for the effect of the IV's and covariates after regression

    Hi all,
    This may be a fairly simple question but I don't have anyone else I can get guidance from, so I am left to search the internet with no one to confirm my approach with.

    I am looking at the brains of children with ADHD and language disorder. I have a bunch of variables which tell me about the the white matter in particular tracts of these childrens brains. The raw figures for figures like these are means like .00000124 because we are working with very tiny white matter structures.

    I ran a regression analysis to determine if there was a significant main effect of having ADHD, a disorder and/or an interaction between the two. To make the results more interpretable, my supervisors made me standardise the white matter variables so that my beta coefficient would be on the same scale and not a huge number of exponents long.
    e.g., a simple state command for regression:


    regress LUF_FA i.group_w3##i.c3_c4_prob i.ChildGender c3_childage WB_FA_std

    Where LUF_FA refers to a measure of the white matter in a certain part of the brain (my dependent variable), group is just my adhd vs controls, c3_c4 is the measure of whether the kids of a language disroder (yes no), and the rest of the variables are factors that I want to control for, like age, sex and the size of the brain.

    I am now at the interpretation stage. I just want to be able to make comments about the difference in the white matter between groups.

    Is there a function in stata that will produced an new dependent variable which has been adjusted for the effect of the independent variables and for the covariate? For example, if the mean LUF_FA was 50, but then when presence or absence of ADHD and language disroders, age, sex etc had been adjusted for via regression, the resulting mean is 40. I wanted to use this new variable to give me the adjusted mean for the ADHD and control group for example, so that I could comment on the difference in the white matter e.g.,

    'Children with language problems were -.297 sd below the mean FA, while children without language problems were .119 sd above the mean (F: 4.501; r2: .173; adjusted r2: .135; β: -.541; p: .045; ±CI: .528).'

    Ive looked at the resid function but Im not certain thats actually what Im after (my understanding is that it refers to how far each data point was from the line of best fit, rather than an new DV value with the effect of the IVs and covariates added or removed). I've also been mucked around by the margin's command, thinking it was giving me the new means and sd for each group but later reaslised this was completely not the case.

    In short, I just want to know if its possible to get the new mean/sd for each group after adjusting for the effect of the IVs and covariates on the DV so I can compare my groups post regression.

    I realise this is a dumb question and I am currently bogged down in the minutia of my analysis. But it is extremely difficult to interpret neuroimaging findings across a large number of networks and I am trying to do what ever I can to help understand what is happening with these children brains.

    I would be extremely grateful for any help you could provide.

    Many thanks,
    Hannah




  • #2
    If I understand your question correctly, you want the adjusted means in the original scale. This is one way of doing that:

    Code:
    // open example data
    sysuse auto, clear
    
    // standardize
    sum price
    gen double z_price = (price-r(mean))/r(sd)
    
    // store the mean and standard deviation
    tempname m sd
    scalar `m' = r(mean)
    scalar `sd' = r(sd)
    
    // estimate regression
    reg z_price mpg i.foreign weight i.rep78
    
    // look at the mean price (original scale) of foreign and domestic cars when
    // rep78=3, mpg=20, weight=3000
    margins , at(foreign=(0 1) rep78=3 mpg=20 weight=3000) ///
              expression(predict(xb)*`sd' + `m')
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Hi Maarten,

      I follow this as a way to get a standarised variable. Which is fine because I've already done that. But I dont see how this is producing a new variable post regression thats adjusted for the effect of the IV's and covariates?

      At any rate, a simplier question, how do I get a plot of the interaction if my regression command is:

      regress LUF_FA i.group_w3##i.c3_c4_prob i.ChildGender c3_childage WB_FA_std

      I had previously used the margins command but Im not sure this is actually correct e.g,:

      margins i.group_w3#i.c3_c4_prob
      . marginsplot

      Click image for larger version

Name:	interaction.png
Views:	1
Size:	38.0 KB
ID:	1371565


      I just want to check this is ACTUALLY the interaction plot because the margins command can often be something else. SPSS easily provides an interaction plot after the regression analysis, how do I do that in stata?

      Many thanks,
      Hannah

      Comment


      • #4
        Let's for the moment ignore the issues raised by standardization and just take your dependent variable LUF_FA in whatever scale you are using it (standardized or not). If you want to have group means in each of the four groups defined by (ADHD or Control) X (Language Problem or No), adjusted for gender, age, and WB_FA, the command is just:

        Code:
        margins i.group_w3#i.c3_c4_prob
        You will get a table with four rows of output, corresponding to the four combinations of these two dichotomous variable. The columns of the table will designate the (adjusted) mean, standard error and 95% CI in each group. (There will also be a z-statistic and p-value, which you should ignore here.)

        You expressed some bad experience with using the -margins- command in #1, but as you don't show what actual command you tried, nor showed what Stata gave you and why it didn't meet your expectations, it's hard to advise. But if you run the command shown above, you will get the adjusted means in each of those four groups.

        As for the interaction plot, the term is used to mean different things by different people and in different contexts. If what you are looking for is a graph with four data points (and error bars around them), two on each of two lines, where the horizontal axis shows ADHD at one end and Control at the other, and where one of the lines is for language disorder and the other not, and the vertical axis represents the value of the adjusted mean, then the command to run immediately following the -margins- command shown above is -marginsplot-. It will produce a plot very much like the one you show in #3 (perhaps exactly that plot). Note that -marginsplot- accepts most of the options available with -graph twoway-, so you can customize the appearance anyway you like. -marginsplot- also has many options of its own, including one to exchange the variables represented on the x-axis and by the separate lines.

        If by interaction plot you mean something else, please give an explicit description of what you need: what is on the y axis, what is on the x-axis, and what variable is represented by the separate curves, and I'll try to help you get that. It might require a different -margins- command or tweaking the options on -marginsplot-.

        The -margins- command is both very powerful and rather complicated. While the manual chapter on it is quite complete and includes many worked examples, I think an easier way to get acquainted with its most basic use for the most common applications is by reading https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.

        Additional responses to other issues raised in #1:

        Ive looked at the resid function but Im not certain thats actually what Im after (my understanding is that it refers to how far each data point was from the line of best fit, rather than an new DV value with the effect of the IVs and covariates added or removed).
        -resid- is not a function in Stata; it is an option to the -predict- command. Your understanding of what it calculates is correct.

        I'm not sure what you mean by a new DV value with the effect of the IVs and covariates added or removed. Perhaps you are referring to what the -xb- option of -predict- will give you, which is simply b0 + b1*x1 + b2*x2 + etc., where the b's are the regression coefficients (b0, the constant term) and the x's are the variables in your regression model. If what you are looking for is a variable that includes some of those terms but excludes others, you will need to create it with a -generate- command. For that purpose you can pull the coefficients from the virtual matrix _b[] that Stata provides after estimation commands. Thus, for example: gen new_DV = _b[_cons] + _b[x1]*x1 would calculate just those two terms of the regression model.

        Hope this helps.

        Comment


        • #5
          Hi Clyde, thank you for your detailed answer! Yes I just wanted to be sure that the margins command was actually giving me the adjusted means after the regression. I know its a tricky command and I wasn't sure if I was interpreting it correctly.

          If I run my regression i.e.,

          Click image for larger version

Name:	Screen Shot 2017-01-27 at 9.46.37 am.png
Views:	1
Size:	56.3 KB
ID:	1371810


          And follow up with the margins command:

          Click image for larger version

Name:	Screen Shot 2017-01-27 at 9.47.20 am.png
Views:	1
Size:	36.9 KB
ID:	1371811


          My understanding is that this gives me the new adjusted means for each of my four groups in the sample post regression.

          My question now is, how do I know what the overall adjusted mean and sd is? I ask because I wanted to determine how many sd each of these groups were away from the mean (i.e., the controls without language problems had an adjusted mean of .0006, but becthis doesn't indicate the magnitude of the difference between each group. I'd like to use the more interpretable measure, and I believe I can use the simple equation to get z score ( z = X - μ / σ OR Group X’s z score = ((Group X mean – overall mean))/overall SD) but only if I know the overall adjusted mean and sd for the group as a whole. There doesnt seem to be a line for the whole group in the margins command? Is it possible to get the margins command to include an overall adjusted group mean and sd?

          Lastly, regarding the interaction plot, yes I am seeking what you described. I have previously used the margins plot command to achieve this e.g.,
          marginsplot:
          Click image for larger version

Name:	Screen Shot 2017-01-27 at 9.52.04 am.png
Views:	1
Size:	95.5 KB
ID:	1371812


          But because I was uncertain about what the margins command was giving me, I wasn't absolutely sure I was interpreting this correctly.

          Thank you so much for your help.
          Hannah

          Comment


          • #6
            So if you want the adjusted mean overall you can get it separately, but not, as far as I know, as an add-on to the group-specific outputs.
            Code:
            margins
            will provide an overall adjusted mean.

            Now, a couple of other points. It isn't meaningful to talk about the standard deviations of these adjusted means. These adjusted means are not attributes of individual observations; they are attributes of the sample as a whole (or subsets of the sample defined by certain conditions.) They do not have a distribution on the individual observation level, so they have no standard deviation. What they do have are sampling distributions, which, in turn give standard errors. And those standard errors are reported as part of the -margins- output. I don't know how meaningful using a standard error instead of a standard deviation for sigma in the formula z = (x-mu)/(sigma) really is. You seem to feel it's more intuitive than the raw figures, but to me it seems quite opaque, even bordering on the incomprehensible as a description (as opposed to its role in hypothesis testing where it is a t-statistic.)

            If you want to compare the four adjusted group means with each other, you can do that with:
            Code:
            margins i.group_w3#i.c3_c4_prob, pwcompare(effects)

            Comment

            Working...
            X