Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sum with missing values

    How can I have a summary of some factor variables with the proportions, std. dev. and total number of observation for each category as well as the total number and proportion of missing values for this variable? When I use as below, I get the total number of nonmising values, proportion and std. dev. for each category but not the total number of observations for these variables as well as missing values for the variable.

    Code:
    . summarize bn.loc_obit
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
        loc_obit |
       Hospital  |  18416474    .6655359    .4718028          0          1
     Outro cnes  |  18416474    .0224223    .1480525          0          1
      Domicilio  |  18416474    .2250346    .4176051          0          1
    Via publica  |  18416474    .0530886    .2242101          0          1
         Outros  |  18416474    .0339187      .18102          0          1

  • #2
    Paula:
    you may want to replace missing values with (let's say) 9999 and then obtain te desired statistics via -sum-, -tab- and -total-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Isn't there any other way of incorporating the number of missings to the table? I would like to keep it as "."

      Comment


      • #4
        Paula:
        any missing value is listwise deleted by Stata.
        Hence, as far as I know, there' s no way to include the number of missing values in a table without coding them with a given number (e.g. 9999).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          If precisely what you want is not easily available, write some code... Here's a start. Half the code is setting up a sandbox to play.

          Code:
          clear
          set obs 100
          set seed 2803 
          
          qui forval j = 1/10 { 
              gen x`j' = runiform() < (`j' + 0.5) / 10 
              replace x`j' = . if runiform() < (10 - `j') / 20 
          } 
          
          local start = 1 
          foreach v of varlist x* { 
              if `start' { 
                  di _n _n _n 
                  local start = 0 
              }
          
              qui count if missing(`v') 
              local Nmissing = r(N) 
              qui su `v' 
              di "`v'{col 5}" %5.0f r(N) %5.0f `Nmissing' %8.3f r(mean) %8.3f r(sd) %5.0f r(min) %5.0f r(max)
          } 
          
          x1     56   44   0.196   0.401    0    1
          x2     56   44   0.143   0.353    0    1
          x3     62   38   0.516   0.504    0    1
          x4     70   30   0.486   0.503    0    1
          x5     75   25   0.453   0.501    0    1
          x6     78   22   0.615   0.490    0    1
          x7     86   14   0.733   0.445    0    1
          x8     89   11   0.910   0.288    0    1
          x9     95    5   0.947   0.224    0    1
          x10   100    0   1.000   0.000    1    1
          For a more systematic introduction to some basic tricks, see http://www.stata-journal.com/article...article=pr0053
          Last edited by Nick Cox; 27 Jun 2015, 02:29.

          Comment


          • #6
            Thank you so much, Nick! Actually I need to put the missing percentage for each numeric variable. I did as below, but it doen't recognize the &"%". How can I add the percentage?
            Another question: how can I put the number of unique values. I would need it as well in the table but I didn't find a answer for this.


            Code:
            local dum d_mat d_fet d_inf
            
            local start = 1 
            foreach v of local dum { 
                if `start' { 
                    di _n _n _n 
                    local start = 0 
                }
                qui count if missing(`v') 
                local Nmis = r(N) 
                qui su `v' 
                local Pmis = `Nmis'/ r(N)
                di "`v'{col 10}" %5.0f r(N) %5.0f `Pmis'&"%" %8.2f r(mean) %8.2f r(sd) %5.0f r(min) %5.0f r(max)
            }

            Comment


            • #7
              Delete the ampersand character. In addition you need to multiply by 100 to get a percent.

              Comment


              • #8
                Ok, perfect. I have some other questions. It would be great if you could help me out.

                1) how can I have more space between from the second column on? I tried to put {col 10} for r(N) on but it did not work. I need some for space specially between r(N) and Pmis (as below).
                2) how can I include the number of unique values? I would like this for the identification variables (ex: number of hospital registrations) and would like to include in the same table.
                3) for variables like type of hospital (which has values 1 type 1, 2 for type 2, 3 for type 3), I would like to have a line for each type (with mean, sd, ...). how can I have that?



                Code:
                . foreach v of local dum { 
                  2.     if `start' { 
                  3.         di _n _n _n 
                  4.         local start = 0 
                  5.     }
                  6.     qui count if missing(`v') 
                  7.     local Nmis = r(N) 
                  8.     qui su `v' 
                  9.         local Pmis = `Nmis'/r(N)*100
                 10.     di "`v'{col 10}" %9.0f r(N) %5.2f `Pmis' "%" %8.2f r(mean) %8.2f r(sd) %5.0
                > f r(min) %5.0f r(max)
                 11. } 
                
                
                
                
                d_mat     19208779 0.00%    0.00    0.04    0    1
                d_fet     19208779 0.00%    0.03    0.18    0    1
                d_inf     19208779 0.00%    0.05    0.22    0    1

                Comment


                • #9
                  1) I typically add space by changing the format, e.g. from %5.2f to %10.2f

                  2) By "unique" values I take it you mean "distinct". See e.g. http://www.stata-journal.com/article...article=dm0042 for a review and a command.

                  3) You need a loop.

                  Comment


                  • #10
                    Nick, for the 3rd point I am having some trouble, once what I really need is the proportion of each type. I found the command below. But it stores r() only for the last factor. I would need all the three lines in the same table as above (last post).


                    Code:
                    . sum bn.loc_obit
                    
                        Variable |       Obs        Mean    Std. Dev.       Min        Max
                    -------------+--------------------------------------------------------
                        loc_obit |
                       hospital  |  19045393    .6745042    .4685598          0          1
                    other hea..  |  19045393    .0220366    .1468024          0          1
                         others  |  19045393    .3034592    .4597518          0          1
                    
                    . display r(mean)
                    .30345921

                    Comment

                    Working...
                    X