Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I create a variable that stores the sample means of multiple variables?

    Greetings,

    I'm running Stata 15.1 on OSX. I'd like to create a variable that stores the (separate) means of 4 different ordinal variables. If still unclear: once the variable is created and tabulated, I expect to see something like:
    4.57
    3.29
    2.22
    5.43

    In attempting to create this variable, I tried the following:

    Code:
    egen sample_means=mean(intlwhts intlblks intlasns intlhsps)
    Stata returns the following error message:
    intlwhtsintlblksintlasnsintlhsps not found
    The rowmean function would simply create an average of the 4 variables, which is not what I want. Any help here would be much appreciated. Thanks!

    Example data:

    Code:
    * Example generated by -dataex-. To install: ssc    install    dataex
    clear
    input float(intlwhts intlblks intlasns intlhsps)
    4 4 4 4
    6 2 7 3
    6 5 6 5
    4 4 4 4
    5 4 4 3
    5 4 7 5
    2 3 1 4
    6 6 6 6
    4 3 4 3
    4 4 4 4
    5 3 3 2
    6 6 6 6
    4 4 4 4
    4 4 4 4
    5 4 6 4
    4 4 2 6
    4 4 4 4
    4 4 4 4
    5 5 5 5
    7 7 7 7
    7 3 4 4
    5 4 5 4
    7 3 7 3
    3 2 3 3
    4 4 4 4
    5 4 6 4
    4 4 5 4
    4 5 4 4
    4 4 4 4
    5 5 4 5
    4 4 4 4
    5 6 7 5
    5 3 5 2
    6 5 7 6
    7 7 7 7
    6 4 5 4
    3 3 2 2
    6 5 7 5
    6 5 6 6
    6 6 6 6
    5 5 5 5
    5 3 3 4
    7 3 6 6
    6 4 6 5
    4 4 4 4
    5 4 6 4
    4 4 4 4
    4 4 4 4
    4 4 5 4
    6 6 6 6
    4 4 6 4
    5 3 6 4
    4 4 7 4
    4 4 4 4
    5 5 5 5
    5 5 7 4
    6 2 7 4
    5 5 2 5
    5 5 6 5
    4 3 6 4
    5 5 5 5
    5 4 6 4
    7 4 4 4
    4 4 4 4
    6 5 7 6
    6 6 4 4
    5 3 5 4
    5 6 7 6
    4 4 4 4
    6 5 6 5
    6 4 6 4
    4 4 4 4
    5 5 5 5
    6 5 6 6
    7 3 4 5
    4 4 3 4
    5 4 6 5
    5 4 5 4
    3 4 4 4
    5 5 5 5
    5 2 7 2
    6 4 7 4
    5 5 5 5
    5 4 4 5
    4 4 4 4
    2 4 2 3
    6 4 6 5
    7 5 5 7
    4 4 4 4
    6 4 7 2
    5 3 3 3
    4 4 4 4
    4 3 5 4
    5 4 5 4
    7 6 7 6
    5 7 7 5
    5 4 4 3
    7 4 6 5
    7 7 7 7
    6 6 6 6
    end

  • #2
    Zach:
    you may want to try:
    Code:
    foreach var of varlist intl* {
      2. egen sample_means`var'=mean(`var')
      3.  }
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hey Carlo,

      Maybe I didn't follow you correctly. But what you're suggesting simply creates a variable with a single mean of the corresponding variable. I'd like to get all 4 variable means into a single variable. Is there a way to do this?

      Comment


      • #4
        It's not clear what you want here. Do you want a single variable that, in each observation, contains the mean of the values of the four variables intlwhts intlblks intlasns intlhsps in that observation? If so, that is
        Code:
        egen wanted = rowmean(intlwhts intlblks intlasns intlhsps)
        If that's not it, please post back with a new -dataex- example including the hand-calculated results you are looking for in a small number of observations so we can figure out what you mean.

        Comment


        • #5
          Hey Clyde,

          As I mentioned in my initial post, I don't want a variable that averages the scores of the 4 variables. Rather, I want a variable that stores the individual means (thus 4 in total) of each variable. When tabulating this variable, I should get a 4-row table that lists the means of each of these variables:

          Click image for larger version

Name:	sample means.jpeg
Views:	1
Size:	45.4 KB
ID:	1467802


          In other words, I want the information in the first column (Mean) in the table above stored in a single variable. Is that possible?

          Comment


          • #6
            I'm not sure I understand why you would want to do that, but here is a way.

            Code:
            tabstat intlwhts intlblks intlasns intlhsps, s(mean) save
            mat a=r(StatTotal)'
            svmat a, n("means")
            ren means1 means
            If the goal is simply to reuse the means in further computations, notice that once they are stored in a matrix you can do whatever you want with the values, no need to store them in a dataset variable.

            Also, there will be many missing values in this variable. To list only the means you need:

            Code:
            list means in f/4
            HTH

            Jean-Claude Arbaut
            Last edited by Jean-Claude Arbaut; 27 Oct 2018, 17:20.

            Comment


            • #7
              Originally posted by Jean-Claude Arbaut View Post
              Also, there will be many missing values in this variable. To list only the means you need:
              Not only will there be many missing values; the values that are not missing will have no meaningful relation to any other variable in the dataset, whatsoever. This approach seems like spreadsheet-like thinking and it is likely to cause trouble in Stata. Zach should reconsider why he wants to do this. Perhaps this is some sort of XY Problem?

              Best
              Daniel

              Comment


              • #8
                daniel klein

                Probably not the problem here, but there is one case that would likely require this hack: plots.
                I don't think it's easy to plot data from different datasets in Stata, and putting data in a single dataset is (I believe) a common way to achieve this.

                Jean-Claude Arbaut

                Comment

                Working...
                X