Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting non-missing observations by variable

    I have three id variables in string format with missing observations. How can I count the number of observations by id type? In other words, I want to count the number of non-missing observations by SEDOL, ISIN, and WSID.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str40(SEDOL ISIN WSID)
    "2107727" "ARBRIO010194" "C025A750A"
    "2107727" "ARBRIO010194" "C025A750A"
    "2383486" "ARBRIO010186" "C025A750B"
    "2383486" "ARBRIO010186" "C025A750B"
    "2107727" "ARBRIO010194" "C025A7500"
    "2107727" "ARBRIO010194" "C025A7500"
    "2264194" "ARP3577N1097" "C025BE180"
    "2613310" "ARP495251018" "C025BT200"
    "2613310" "ARP495251018" "C025BT200"
    "2592743" "ARP6823S1030" "C025C103A"
    "2592754" "ARP6823S1113" "C025C103B"
    "2592765" "ARP6823S1295" "C025C103C"
    "2592765" "ARP6823S1295" "C025C1030"
    "2268679" "ARP3594C1024" "C025D374A"
    "2268679" "ARP3594C1024" "C025D374A"
    "2276155" "ARP3594C1107" "C025D374B"
    "2276155" "ARP3594C1107" "C025D374B"
    "2276155" "ARP3594C1107" "C025D3740"
    "2276155" "ARP3594C1107" "C025D3740"
    "2158565" "ARGASB010027" "C025F1920"
    ""        "ARP6558L1095" "C025K020A"
    "2573492" "ARP6558L1178" "C025K020B"
    ""        "ARP6558L1251" "C025K020C"
    "2573492" "ARP6558L1178" "C025K0200"
    "2378783" "ARBHIP010161" "C025K5790"
    "2378783" "ARBHIP010161" "C025K5790"
    "2038980" "ARSAMA010012" "C025L1610"
    "2010159" "ARAGRO010015" "C025L1660"
    "2092254" "ARBOLD010054" "C025L1820"
    "2179061" "ARCADO010023" "C025L1910"
    "2192626" "ARP2071R1377" "C025L1930"
    "2194309" "ARP2885N1161" "C025L2020"
    ""        ""             "C025L214A"
    "2485296" "ARP3568E1180" "C025L214B"
    ""        ""             "C025L214C"
    end

  • #2
    Budu:
    I would start off with:
    Code:
    foreach var of varlist SEDOL-WSID  {
      2. gen miss_`var'=1 if `var'==""
      3. replace miss_`var'=0 if miss_`var'==.
      4.   }
    Then you can -label- missing/non-missing and eventually -table- each -miss_`var'-
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Is this what you want? (It's values that are missing or not, not observations. Similarly, being string is a matter of storage type, not display format as such.)

      Code:
      . egen nmissing = rowmiss(SEDOL ISIN WSID) 
      
      . tab nmissing 
      
         nmissing |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         31       88.57       88.57
                1 |          2        5.71       94.29
                2 |          2        5.71      100.00
      ------------+-----------------------------------
            Total |         35      100.00

      Comment


      • #4
        Carlo Lazzaro Thanks for the solution. After applying your code, I tried
        Code:
         tabstat miss_SEDOL miss_ISIN miss_WSID, stat(sum)
        and got the number of missing values by id, which serves my purpose.

        Nick Cox sorry for the wrong wording. Although your code does not apply in this case, I have learned something new. Thank you.

        Comment


        • #5
          Thanks for the report. I took your data example from #1, simplified Carlo's code from #2 and applied the code in #4. This shows everyone else what you got.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str40(SEDOL ISIN WSID)
          "2107727" "ARBRIO010194" "C025A750A"
          "2107727" "ARBRIO010194" "C025A750A"
          "2383486" "ARBRIO010186" "C025A750B"
          "2383486" "ARBRIO010186" "C025A750B"
          "2107727" "ARBRIO010194" "C025A7500"
          "2107727" "ARBRIO010194" "C025A7500"
          "2264194" "ARP3577N1097" "C025BE180"
          "2613310" "ARP495251018" "C025BT200"
          "2613310" "ARP495251018" "C025BT200"
          "2592743" "ARP6823S1030" "C025C103A"
          "2592754" "ARP6823S1113" "C025C103B"
          "2592765" "ARP6823S1295" "C025C103C"
          "2592765" "ARP6823S1295" "C025C1030"
          "2268679" "ARP3594C1024" "C025D374A"
          "2268679" "ARP3594C1024" "C025D374A"
          "2276155" "ARP3594C1107" "C025D374B"
          "2276155" "ARP3594C1107" "C025D374B"
          "2276155" "ARP3594C1107" "C025D3740"
          "2276155" "ARP3594C1107" "C025D3740"
          "2158565" "ARGASB010027" "C025F1920"
          ""        "ARP6558L1095" "C025K020A"
          "2573492" "ARP6558L1178" "C025K020B"
          ""        "ARP6558L1251" "C025K020C"
          "2573492" "ARP6558L1178" "C025K0200"
          "2378783" "ARBHIP010161" "C025K5790"
          "2378783" "ARBHIP010161" "C025K5790"
          "2038980" "ARSAMA010012" "C025L1610"
          "2010159" "ARAGRO010015" "C025L1660"
          "2092254" "ARBOLD010054" "C025L1820"
          "2179061" "ARCADO010023" "C025L1910"
          "2192626" "ARP2071R1377" "C025L1930"
          "2194309" "ARP2885N1161" "C025L2020"
          ""        ""             "C025L214A"
          "2485296" "ARP3568E1180" "C025L214B"
          ""        ""             "C025L214C"
          end
          
          foreach var of varlist SEDOL-WSID  {
             gen miss_`var' = missing(`var') 
          }
          
          tabstat miss_SEDOL miss_ISIN miss_WSID, stat(sum) 
          
             stats |  miss_S~L  miss_I~N  miss_W~D
          ---------+------------------------------
               sum |         4         2         0
          ----------------------------------------
          Now that I know clearly what you want, I can suggest another way to do it without needing to create any new variables. With the data example, you can use missings (Stata Journal)

          Code:
           
          . missings report
          
          Checking missings in all variables:
          4 observations with missing values
          
          -------------------
                 | # missing
          -------+-----------
           SEDOL |         4
            ISIN |         2
          -------------------
          
          . missings report, min(0)
          
          Checking missings in all variables 
          4 observations with missing values
          
          -------------------
                 | # missing
          -------+-----------
           SEDOL |         4
            ISIN |         2
            WSID |         0
          -------------------
          
          
          dm0085_1 from http://www.stata-journal.com/software/sj17-3
              SJ17-3 dm0085_1. Update: A set of utilities for managing missing values /
              Update: A set of utilities for managing missing / values / by Nicholas J.
              Cox, Department of Geography, / Durham University, Durham City, UK /
              Support:  [email protected] / After installation, type help missings
          
          dm0085 from http://www.stata-journal.com/software/sj15-4
              SJ15-4 dm0085. A set of utilities for... / A set of utilities for managing
              missing values / by Nicholas J. Cox, Department of Geography, / Durham
              University, Durham City, UK / Support:  [email protected] / After
              installation, type help missings

          Comment


          • #6
            Like in the fairy tales, the illusion of having an edge on Nick once in a while fades away well before midnight!!
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X