Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a new variable containing the mean from other variables

    I have panel data (extract shown below). For each subject, I would like to compute a single mean of bp_count over their visits. The number of study visits vary by subject.
    Then, I would like to create a new variable containing the mean of bp_count for each subject. Suggestions would be appreciated.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id visit bp_count)
     7 27 3
     7 18 2
     7 36 2
     7 30 2
     7  2 2
     7 12 3
     7 33 2
     7  9 2
     7 24 3
     7 39 2
     7  1 3
     7 15 3
     7  3 2
     7 42 2
     7  6 2
     7 51 2
     7 21 2
     7 48 2
     7  0 3
     7 45 2
    10 33 1
    10 24 2
    10  3 1
    10  6 2
    10 21 1
    10 27 2
    10 36 2
    10  2 1
    10  1 1
    10  0 1
    end
    Happy holidays,
    Al Bothwell

  • #2
    Hi Al,

    Not sure what you mean by "For each subject, I would like to compute a single mean of bp_count over their visits." But creating the mean of bp_count per subject is easy enough.

    Also note that visits seem to increment by 3.

    Code:
    sort id visit
    . list id visit bp_count if obs_no <=10, noobs
    
      +-----------------------+
      | id   visit   bp_count |
      |-----------------------|
      |  7       0          3 |
      |  7       1          3 |
      |  7       2          2 |
      |  7       3          2 |
      |  7       6          2 |
      |-----------------------|
      |  7       9          2 |
      |  7      12          3 |
      |  7      15          3 |
      |  7      18          2 |
      |  7      21          2 |
      |-----------------------|
      | 10       0          1 |
      | 10       1          1 |
      | 10       2          1 |
      | 10       3          1 |
      | 10       6          2 |
      |-----------------------|
      | 10      21          1 |
      | 10      24          2 |
      | 10      27          2 |
      | 10      33          1 |
      | 10      36          2 |
      +-----------------------+
    
    by id: gen obs_no = _n  // creating a counter for each id
    egen mean_bp_count = mean(bp_count), by(id)  // creating mean of bp_count by id
    tabstat bp_count, by(id) stats(n mean median min max)  // just checking the answer
    
    Summary for variables: bp_count
         by categories of: id
    
          id |         N      mean       p50       min       max
    ---------+--------------------------------------------------
           7 |        20       2.3         2         2         3
          10 |        10       1.4         1         1         2
    ---------+--------------------------------------------------
       Total |        30         2         2         1         3
    ------------------------------------------------------------
    
    
    . list if obs_no <=10, noobs abbrev(14)
    
      +------------------------------------------------+
      | id   visit   bp_count   mean_bp_count   obs_no |
      |------------------------------------------------|
      |  7       0          3             2.3        1 |
      |  7       1          3             2.3        2 |
      |  7       2          2             2.3        3 |
      |  7       3          2             2.3        4 |
      |  7       6          2             2.3        5 |
      |------------------------------------------------|
      |  7       9          2             2.3        6 |
      |  7      12          3             2.3        7 |
      |  7      15          3             2.3        8 |
      |  7      18          2             2.3        9 |
      |  7      21          2             2.3       10 |
      |------------------------------------------------|
      | 10       0          1             1.4        1 |
      | 10       1          1             1.4        2 |
      | 10       2          1             1.4        3 |
      | 10       3          1             1.4        4 |
      | 10       6          2             1.4        5 |
      |------------------------------------------------|
      | 10      21          1             1.4        6 |
      | 10      24          2             1.4        7 |
      | 10      27          2             1.4        8 |
      | 10      33          1             1.4        9 |
      | 10      36          2             1.4       10 |
      +------------------------------------------------+

    Comment


    • #3
      Thank you David, your solution is exactly what I needed.

      Happy Holidays,
      Al Bothwell

      Comment


      • #4
        It seems to me that Davis's code can be simplified as
        Code:
        bys id (visit): egen mean_bp_count1 = mean(bp_count)
        Ho-Chuan (River) Huang
        Stata 19.0, MP(4)

        Comment


        • #5
          River Huang Your syntax is equivalent to David's. The by() option is no longer documented, but many users learned about it when it was documented and they've passed that knowledge on through postings here.

          In fact specifying sorting within id according to visit is unnecessary but harmless. The mean is the same regardless of the sort order of the values.

          Comment


          • #6
            Dear Nick, Thanks for the message.

            Ho-Chuan (River) Huang
            Stata 19.0, MP(4)

            Comment


            • #7
              Originally posted by Nick Cox View Post
              River Huang Your syntax is equivalent to David's. The by() option is no longer documented, but many users learned about it when it was documented and they've passed that knowledge on through postings here.

              In fact specifying sorting within id according to visit is unnecessary but harmless. The mean is the same regardless of the sort order of the values.
              Nick, thank you for explaining that -egen, by()- is considered no longer documented. This issue struck me once when somebody told me that "I have to sort anyways". Then I read the manual, and I noticed that the difference between -by varname: egen - and -egen, by(varname)- is not explained. I just thought that they (who wrote the manual) did not make the distinction, because they like you think that the two are equivalent.

              I think that the two are not equivalent in a pretty major (to me at least) way:

              1. -bysort varname: egen- is not sort preserving, it will leave your data obviously in the order determined by the sort.

              2. -egen, by(varname)- on the other hand is sort preserving. It restores your data in whichever sort it was before you executed the command.

              Comment


              • #8
                Joel: You are correct. There is a difference in that using the by() option won’t change the sort order of the data.
                Last edited by Nick Cox; 22 Dec 2018, 07:15.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Joel: You are correct. There is a difference in that using the by() option won’t change the sort order of the data.
                  Joel is a different name, Nick :P. Joro is short for Gueorgui (= Георги).

                  More importantly: Is there any fascinating history behind the decision of StataCorp to relegate the -egen, by(varname)- syntax to undocumented?

                  Why do they not want us to use anymore the ancient syntax, and presumably want us all to migrate to the new one? (If they are not documenting the ancient one, presumably they found something which they dont like about it.)

                  In my personal history of using Stata, I learned the -egen, by(varname)- syntax from my first teacher of panel data econometrics Stepan Jurajda somewhere around year 2001.

                  In the next few years until 2004, I was reasonably successful in manipulating panel data without ever re-sorting the data. I wrote a pretty decent paper using the British Household Panel Study, again, without ever re-sorting the data and just using -egen, by(varname)- as a tool.

                  Comment


                  • #10
                    Sorry; “Joel” was an iPhone autocorrect I failed to spot.

                    I don’t think there is any hidden issue beyond StataCorp trying to standardize syntax across commands.

                    Comment

                    Working...
                    X