Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating media of several variables with missing data

    Good afternoon, I like would to calculate the media of several variables with missing data.

    Code:
    var1    var2    var3    var4
     40      83       9        1
     43     .02      98       22
     33.5    2       78       3
      1      99       5       .
      .       6      7        .
     56.4    .        1       8
    When I generate a new variable, it do not count with the variable with missing values.

    Thank you!

    Gabriel Ferreira
    (Stata 10.1 SE)

  • #2
    Very unclear post. You do not show how you generated the new variable, nor do you show (or even try to explain) what you mean when you say it does not "count with the variable with missing values." Please post back with better information. Use the -dataex- command to show your data, instead of the tableau you showed. Show the exact code you used to generate your new variable and the results you got. Then, unless it is very obvious, explain what you wanted to get so people can see how they differ.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. as well as any code you have tried and the results it gave you. When showing example data, always use -dataex-.

    Comment


    • #3
      I also fail to understand. Which value would we believe a missing data should have?

      That said, hazarding 3 guesses, please check whether there is one option that suits your needs:

      Code:
      . egen float mymedian = median( var1 )
      
      . egen float myrowmedian = rowmedian(var1 var2 var3 var4)
      
      . list
      
           +-------------------------------------------------+
           | var1   var2   var3   var4   mymedian   myrowm~n |
           |-------------------------------------------------|
        1. |   40     83      9      1         40       24.5 |
        2. |   43    .02     98     22         40       32.5 |
        3. | 33.5      2     78      3         40      18.25 |
        4. |    1     99      5      .         40          5 |
        5. |    .      6      7      .         40        6.5 |
           |-------------------------------------------------|
        6. | 56.4      .      1      8         40          8 |
           +-------------------------------------------------+
      
      . tabstat var1 var2 var3 var4, statistics( median )
      
         stats |      var1      var2      var3      var4
      ---------+----------------------------------------
           p50 |        40         6         8       5.5
      --------------------------------------------------
      
      .
      Best regards,

      Marcos

      Comment


      • #4

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(var1 var2) byte(var3 var4)
          40  83  9 1
          43 .02 98 2
        33.5   2 78 3
           .   1 99 5
           .   .  6 7
        56.4   .  1 8
        end
        I have 4 different variables and I need to calculate the mean of the means, however some variables have missing data. I can manually generate mean of var1, mean of var2, mean of var3, mean of var4 and after that to do the mean of the means.

        I have tried also egen command.


        Code:
         egen average = rmean ( var1 var2 var3 var4)
        
        . sum average, det
        
                                   average
        -------------------------------------------------------------
              Percentiles      Smallest
         1%          6.5            6.5
         5%          6.5           21.8
        10%          6.5         29.125       Obs                   6
        25%         21.8          33.25       Sum of Wgt.           6
        
        50%      31.1875                      Mean           30.73833
                                Largest       Std. Dev.      15.95128
        75%       40.755         29.125
        90%           53          33.25       Variance       254.4435
        95%           53         40.755       Skewness      -.1568084
        99%           53             53       Kurtosis       2.251108

        But if gives me 6 observations, not 4.

        Comment


        • #5
          Thank you for your kind reply Marcos Almeida

          Comment


          • #6
            I'm still not entirely clear on what you want but I'm guessing from your last post that you want to compute the mean of the four variables only when all four variables are non-missing.
            If that is the case, try

            Code:
            egen average = rmean (var1 var2 var3 var4) if missing(var1, var2, var3, var4)==0
            If that isn't what you're looking for you'll have to try explaining how you would like the observations with missing values to be treated.

            Comment


            • #7
              The row mean of 4 variables -- calculated if and only if all are non-missing -- is just

              Code:
              gen wanted = (var1 + var2 + var3 + var4) / 4
              as any missing value on the right-hand side will imply a result of missing

              Comment


              • #8
                Nick Cox Exactly, in that case, the mean will be just var2+ var3, once var1 and var4 have missing values. This is my problem.

                thanks for your answer.

                Gabriel Ferreira
                (Stata 10.1 SE)

                Comment


                • #9
                  If you want to ignore missing values, then use

                  Code:
                  egen wanted = rowmean(var1 var2 var3 var4)
                  If you want means only if all variables are not missing (non-missing), the code in #7 stands.

                  If you want a rule that at least 2 or at least 3 variables be non-missing, tell us what your rule is.

                  Otherwise, see good advice in #2 #4 #6.
                  Last edited by Nick Cox; 24 Mar 2019, 01:50.

                  Comment

                  Working...
                  X