Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • percentage of missing data for each group?

    Dear All, Suppose that the data set is
    Code:
    webuse grunfeld, clear
             
    set seed 1234
    replace mvalue = . if uniform() < .1
    drop if uniform() < .1
    How can I obtain the frequency/percentage of missing data of the variable `mvalue' for each `company'? Thanks.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    webuse grunfeld, clear
             
    set seed 1234
    replace mvalue = . if uniform() < .1
    drop if uniform() < .1
    
    // create a variable containing that proportion
    bys company : egen prmiss = mean(missing(mvalue))
    list company mvalue prmiss, sepby(company)
    
    // make a table showing that proportion
    gen miss = missing(mvalue)
    table company, c(mean miss)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Dear Maarten, Thanks a lot. In case that we have more than one variable as
      Code:
      webuse grunfeld, clear
               
      set seed 1234
      replace invest = . if uniform() < .1
      replace mvalue = . if uniform() < .15
      replace kstock = . if uniform() < .2
      drop if uniform() < .1
      Do I need a loop to do the same thing?
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        You could loop:

        Code:
        webuse grunfeld, clear
                 
        set seed 1234
        replace invest = . if uniform() < .1
        replace mvalue = . if uniform() < .15
        replace kstock = . if uniform() < .2
        drop if uniform() < .1
        
        // create a variable containing that proportion
        local vars invest mvalue kstock
        sort company year
        foreach var of local vars {
            by company : egen pr`var' = mean(missing(`var'))
        }
        list company year `vars' pr*, sepby(company)
        
        // make a table
        foreach var of local vars {
            gen m`var' = missing(`var')
        }
        table company, c(mean minvest mean mmvalue mean mkstock)
        However, if it is only three variables I would usually not bother, and just create the three variables one after another.

        I loop either when it is a very large number of things I have to go through (If you ask me how large is large, then the answer is: I am inconsistent), or if I don't know in advance how many things I have to go through. The latter happens a lot in programming where the number of times you need to do something depends on what the user specifies. Alternatively, when you are dealing with an ongoing data collection project and you want to make it easy to accommodate future waves or sites that still have to come in.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          May I draw attention to the missings command from the Stata Journal?

          Using "missings" as a search keyword has its downsides, so I will add the otherwise unpredictable detail from Hogwarts teaching that dm0085 is the incantation that yields success

          Code:
          . search dm0085, entry
          
          Search of official help files, FAQs, Examples, SJs, and STBs
          
          SJ-17-3 dm0085_1  . . . . . . . . . . . . . . . . Software update for missings
                  (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                  Q3/17   SJ 17(3):779
                  identify() and sort options have been added
          
          SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
                  (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                  Q4/15   SJ 15(4):1174--1185
                  provides command, missings, as a replacement for, and extension
                  of, previous commands nmissing and dropmiss
          So anyone interested should download the code by first clicking on dm0085_1 but may find a paper at
          https://www.stata-journal.com/articl...article=dm0085 (pdf version freely available).

          Stealing some of Maarten's code, here's how it works:

          Code:
          . webuse grunfeld, clear
          
          . set seed 1234
          
          . replace invest = . if uniform() < .1
          (24 real changes made, 24 to missing)
          
          . replace mvalue = . if uniform() < .15
          (40 real changes made, 40 to missing)
          
          . replace kstock = . if uniform() < .2
          (39 real changes made, 39 to missing)
          
          . drop if uniform() < .1
          (25 observations deleted)
          
          . missings table
          
          Checking missings in all variables:
          76 observations with missing values
          
                 # of |
              missing |
               values |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |         99       56.57       56.57
                    1 |         62       35.43       92.00
                    2 |         14        8.00      100.00
          ------------+-----------------------------------
                Total |        175      100.00
          
          . missings report
          
          Checking missings in all variables:
          76 observations with missing values
          
          --------------------
                  | # missing
          --------+-----------
           invest |        22
           mvalue |        33
           kstock |        35
          --------------------
          
          . missings report, percent
          
          Checking missings in all variables:
          76 observations with missing values
          
          -------------------------------
                  | # missing  % missing
          --------+----------------------
           invest |        22      12.57
           mvalue |        33      18.86
           kstock |        35      20.00
          -------------------------------
          
          . missings report, percent sort
          
          Checking missings in all variables:
          76 observations with missing values
          
          -------------------------------
                  | # missing  % missing
          --------+----------------------
           kstock |        35      20.00
           mvalue |        33      18.86
           invest |        22      12.57
          -------------------------------
          Naturally if you need the variables created by #2 or #4 for other purposes, then that determines the path to take. missings has 6 subcommands, and the examples here are just directed at the question in #1.

          Comment


          • #6
            Dear Maarten, Thanks a lot for the helpful suggestions.
            Ho-Chuan (River) Huang
            Stata 19.0, MP(4)

            Comment


            • #7
              Dear Nick, Thanks a lot. It is helpful.
              Ho-Chuan (River) Huang
              Stata 19.0, MP(4)

              Comment

              Working...
              X