Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculating mean including missing values

    What is the command for calculating mean while taking missing values in consideration

    Thanks

  • #2
    And how do you want to take the missing values into account? E.g.,
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float var1
    1
    2
    .
    end
    how are you going to calculate the mean for these three values, "taking missing values in consideration"?

    Comment


    • #3
      Suppose this is my data

      Code:
      input int x
      670
      520
      570
        .
      600
      590
      640
      570
        .
      630
      630
      670
      650
      660
      520
        .
        .
      335
      459
      554
      end
      Now if i calculate mean:

      Code:
      . mean x
      Mean    estimation        Number    of obs   =    16
      
                          
              Mean    Std. Err.    [95% Conf.    Interval]
                          
          x    579.25    22.29845    531.722    626.778
      you can see that it is showing Number of obs = 16

      However i want that it should also consider missing values as observation. In that case Number of observations will be 20 and mean will be changed
      I know that i can do that by converting missing values into zeros but according my research work i can not do that (This data is just for example i am working on a different primary data)

      Thanks

      Comment


      • #4
        You cannot calculate the mean with missing values. If one of the values is not known, then the mean is also not known. You can estimate the (population) mean, making assumptions about the missing values. Joro is essentially asking what assumptions you are willing to make.

        Comment


        • #5
          Ohhkk
          I got it
          Thank You

          Comment


          • #6
            I know that i can do that by converting missing values into zeros
            Really?? Why would you convert the missing values into zeroes when all the non-missing values of the variable are in the mid-hundreds? That makes no sense to me at all.

            Added: Crossed with #5.

            Comment


            • #7
              There are many different strategies for handling missing values, but missing values are unknown by definition. You can make assumptions, but no matter what you do, never pretend that you actually know the values. As Clyde pointed out replacing all missing valules with zero seems like a very bad assumption to me.

              Comment


              • #8
                Akif:
                as others wisely commented, replacing missing values with any arbitrary deterministic value is definitely not a good idea: see, among tons of literature, https://www.lshtm.ac.uk/research/cen...a#introduction on how to deal with missing values in "scientific" (and documented) ways.
                All in all, replacing missing values with zero is as reasonable as replacing them with 1, 2, 3 or whatever.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  I feel I should mention a situation where 0 is a valid option. Suppose, the numbers in x represent cigarettes smoked per month. Suppose further that non-smokers skip the respective question in a questionnaire and, thus, are coded with a missing value. In such a scenario, replacing missing with 0 is perfectly fine.

                  The takeaway message is this: To make an educated guess, you need to have an idea about the mechanism that causes the values to be missing.

                  Comment


                  • #10
                    I agree with daniel klein but want to add that that is still poor survey design.

                    1. People in that category should be expected to affirm that they are non-smokers -- and then be expected to skip any other questions about smoking. Taking no answer to mean zero consumption would be a dubious imputation for those ignored the question for other reasons.

                    2. Wishful thinking or other kinds of image manipulation implies that some people want to report "Non-smoker" on the grounds that they only smoke very occasionally, have almost given up, and so forth.

                    I imagine these are standard points but they are perhaps worth making briefly.

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      1. People in that category should be expected to affirm that they are non-smokers -- and then be expected to skip any other questions about smoking. Taking no answer to mean zero consumption would be a dubious imputation for those ignored the question for other reasons.
                      I should have been more specific. This is what I had in mind: a telephone (or another kind of) interview where non-smokers are "filtered" so they do not get to report the number of cigarettes smoked. An even more intuitive example would be (not) asking (biological) men pregnancy-related questions.

                      Another point should be made. Using our non-smokers (or non-pregnant men) in any analyses/model along with the smokers (or pregnant women) might or might not be what we want. We might be interested in the mean of cigarettes smoked per month in the population; we might also be interested in the mean of cigarettes smoked per month among those who smoke.
                      Last edited by daniel klein; 30 Sep 2020, 04:06.

                      Comment

                      Working...
                      X