Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mean values in new variable

    I want to generate a new variable with mean values.

    How do I generate a new variable with the "mean" or "median" values for the same subgroup of people? Later on I want to use the mean values for each subgroups as the y in an x-y graph. I KNOW how to generate a new variable based on "sum" function for at subgroup of people.
    Last edited by Birgit Agerholm Larsen; 29 Sep 2014, 07:08.

  • #2
    start with -h even- and read about the mean and median functions; "if" and "in" are allowed which should allow you to get your subgroups - though note that depending on your setup (about which you told us nothing), us of "by" may be easier

    Comment


    • #3
      start with -h egen- and read about the mean and median functions; "if" and "in" are allowed which should allow you to get your subgroups - though note that depending on your setup (about which you told us nothing), us of "by" may be easier; note that some graphs use, sometimes with an option, the mean anyway - but you haven't even told us what kind of graph you want so we can't help you with that either

      Comment


      • #4
        Okay - see your point.

        I have data on blood pressure on thousand of people obtained at different timepoints during a year. I have chosen the incidence of a specifik event as a reference - i.e. first time the patient has experienced administration of a specific drug. All patients data has been allocated to small timeslots each of 5 minutes interval on both sides (before and after) the incidence (relative reference timepoint); i.e. the patients unique timestamps for an blood pressure observation were exchanged with 5 min timeslots relative to when they first experience the drug. All patients now have comparable timeslots. Now I would like to show the mean blood pressure for all the patients within the same relative timeslot as a function of time ---- to see if the mean blood pressure has changed from before to after they experienced the drug. A simple x-y graph with mean blood pressure as a function of time.
        How do I generate the blood pressure mean value to be used in the graph? I am looking for a functionality in the same category as the one calculating the "sum", which can be allocated to be a value in a new variable.

        I have tried the following, but I'm getting a notice of "unknown function mean ()"

        "by patientid (timeslots), sort: gen MEAN1 = mean(blood pressure value)"

        If I replace "mean" with "sum" in the above query it works fine - i get at value which is the sum of each group of patients. But I need the mean value not the sum for each timeslot!!!!!!

        Comment


        • #5
          The mean() function that Rich Goldstein referred to is used with -egen-, not -gen-.

          Code:
          by patientid (timeslots), sort: egen MEAN1 = mean(blood pressure value)
          will work, except that "blood pressure value" is not a valid Stata expression due to the embedded blanks. So whatever the correct name of the variable that holds the blood pressure measurements is belongs in there.

          Also, there really is no need to include (timeslots) in the by-prefix here: it doesn't do anything since the mean() function is invariant with respect to the order of the observations within the by-group.

          One other important thing to note: the sum() function that you mention, which works with -generate-, produces a running sum within by-groups. The -egen- function mean() produces a constant mean value within the by-group and it appears in all observations within the by-group.

          All of that said, I'm not sure I understand what you want to do, because my understanding does not correspond to the syntax above. The syntax above will produce, for each patient, the mean value of their blood pressure, averaged over all the different timeslots in which it was measured. In particular, it does not vary over time, so a plot of this as a function time will be very uninteresting indeed. If you are interested in the average blood pressure (averaged over all patients) at each time slot, the syntax goes differently:

          Code:
          by timeslots, sort: egen MEAN1 = mean(insert blood pressure variable name here)
          Hope this helps.

          Comment


          • #6
            In addition to much excellent advice from Richard and Clyde, note that the collapse command is a way of reducing your dataset to a dataset of means. Then you can just plot those mean directly.

            Comment


            • #7
              Thanks to all of you, it has been a pleasure to receive your help.

              Comment


              • #8
                Hello, this is interesting as I too want top create mean values for a variable split by group and by step (a time/ phase variable) .

                While I found your commands worked well with one by var, with two by variables, the following occurs:

                Command Error code Error note
                by step by group, sort: egen QALY_Mean_bystep2 = mean(QALYz) r(111) variable by not found
                . by step, by group, sort: egen QALY_Mean_bystep2 = mean(QALYz) r(198) invalid 'sort'
                by step + by group, sort: egen QALY_Mean_bystep2 = mean(QALYz) invalid name
                by step & by group, sort: egen QALY_Mean_bystep2 = mean(QALYz) invalid name
                by step+group, sort: egen QALY_Mean_bystep2 = mean(QALYz)
                by step &group, sort: egen QALY_Mean_bystep2 = mean(QALYz)
                by step + group, sort: egen QALY_Mean_bystep2 = mean(QALYz)
                by (step+group), sort: egen QALY_Mean_bystep2 = mean(QALYz)
                by (step&group), sort: egen QALY_Mean_bystep2 = mean(QALYz)
                by (step, group), sort: egen QALY_Mean_bystep2 = mean(QALYz) options not allowed in bound factor variable lists
                by (step group), sort: egen QALY_Mean_bystep2 = mean(QALYz) invalid syntax
                Any syntax or work arounds suggestions welcomed. Thankyou.

                Comment


                • #9
                  With syntax it's simpler than you fear. Stata explains its syntax in the help. Guessing what it might be is massively inefficient by comparison.

                  The help for by: and the help for varlist include no examples in which any extra punctuation is inserted beyond parentheses.

                  Code:
                  by step group, sort: egen QALY_Mean_bystep2 = mean(QALYz)
                  is legal and may even be what you want.


                  Comment


                  • #10
                    I have a similar problem. I have a panel data with yearly observations for e.g. temperature , precipitation, nightlight intensity etc . The original data was provided for each municipality . However I generated grid cells and some of those grid cells contain information of more than one municipality. My problem is that those grid cells are listed more than once in my panel for example
                    year Munucipality grid_id temperature precipiation nightlight
                    2000 1 100 20 86 0,00073
                    2000 2 100 24 108 0,00085
                    2000 3 100 23 90 0,00065
                    2001 1 100 21 89 0,00063
                    2001 2 100 23 105 0,00095
                    2001 3 100 24 88 0,00075
                    the command

                    by grid_id (year), sort: egen precipitation1=mean(precipitation)

                    works however instead of taking the mean for each grid_id in one specific year it averages all observation of the same grid_id for the whole panel.
                    What do I need to include in this command such that I have mean values for each year separately
                    Last edited by Nicole Bursztyn; 28 May 2019, 11:05.

                    Comment


                    • #11

                      Code:
                      by grid_id  year, sort: egen precipitation1=mean(precipitation)
                      or something similar, depending on your spelling.

                      Comment

                      Working...
                      X