Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate dataset w/ averages, upper, and lower variation

    Say I want to estimate the prevalence of something across multiple weeks, and then also create an "upper" and "lower" bound where I add and subtract the SE to that average. I have the starts of two solutions but neither one is complete.

    Option 1: collapse

    Code:
    collapse(mean) mean_score = score, by(week)
    This gets me the mean test score, but not an SE.

    Option 2: margins
    Code:
    eststo margin: margins, over(week) post
    And then somehow extract the -Margin- and SE, but not output them using -estout- (or your package of choice), I'd want to keep it in Stata and then add and subtract the SE to create new upper and lower bound variables, and then export the margin, lower, and upper variables.

    Or perhaps there's some entirely easier way to do all this.

    Does this make any sense? Any thoughts on the most efficient way to do this?

  • #2
    You might take advantage of data frames for this. Check out example 4 here: https://www.stata.com/features/overv...ets-in-memory/

    Comment


    • #3
      collapse calculates various standard errors (choose which suits your case). So collapse mean and se at once, and then calculate mean + se or whatever else.

      A related approach is to use statsby and ci to generate a dataset of confidence intervals

      Comment


      • #4
        Thanks both. For anyone checking this in the future, the appropriate code is just

        Code:
        collapse(mean) mean_score = score (semean) se_score = score, by(week)
        Last edited by Dakota McAvoy; 11 Aug 2022, 15:36.

        Comment


        • #5
          Though I'll add, if anyone has a lead on how to essentially extract the exact output that -margins- provides and create a new dataframe with that, I'd be very curious to learn.

          Comment


          • #6
            If you run -margins- with the -post- option, it will leave behind a matrix, r(table)- in r(). You can pick that matrix up as a real matrix and then use -svmat- to turn that into a Stata data set.

            The main difficulty is that if your regression uses any factor variable notation (and if you are using -margins- it usually will), the column names of r(table) are not legal variable names. So the variable names you get from -svmat- are not very informative: M1, M2, etc. You will then need to think about how you want to encode the information of the matrix column names in legal Stata variable names and write some code to do the needed renaming. Sometimes that gets ugly.

            Comment


            • #7
              Thanks Clyde.

              Comment

              Working...
              X