Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of distinct values of a variable into summary statistics

    Hi everybody.

    My dataset has panel data at the subnational level of regions. In my summary statistics I do not only want to include the number of observations for every variable, but also the corresponding number of regions and countries where the variables have nonmissing values.

    I can display the number of regions for a variable by
    distinct REGION if VARIABLE !=.
    But I want to include this number into the summary statistics for this variable.

    My idea was to create a summary statistics "by hand" using reshape and egen rowmean/rowmedian/etc. commands. But I don“t know how to include the number of distinct values of the variable REGION and COUNTRY for nonmissing values of the variables.


    Thanks for your help!

    Aiko Schmeisser

  • #2
    distinct is a user-written program from the Stata Journal, as you are asked to explain (FAQ Advice #12).

    search distinct in Stata brings up clickable links. I've edited the output below, but anyone interested should

    1. use the latest link to download program and help file

    2, use the link to the 2008 paper to get a discussion of the problem.

    Code:
    . search distinct, sj
    
    Search of official help files, FAQs, Examples, SJs, and STBs
    
    SJ-15-3 dm0042_2  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q3/15   SJ 15(3):899
            improved table format and display of large numbers of
            observations
    
    SJ-12-2 dm0042_1  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q2/12   SJ 12(2):352
            options added to restrict output to variables with a minimum
            or maximum of distinct values
    
    SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q4/08   SJ 8(4):557--568
            shows how to answer questions about distinct observations
            from first principles; provides a convenience command
    distinct doesn't include a generate() option. That's partly because the 2008 paper shows that egen functions (your instinct was right) were already available to do this.

    You don't give example data, but this silly example shows some technique.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . bysort foreign : distinct rep78
    
    ------------------------------------------------------------------------------
    -> foreign = Domestic
    
    ------------------------------
           |     total   distinct
    -------+----------------------
     rep78 |        48          5
    ------------------------------
    
    ------------------------------------------------------------------------------
    -> foreign = Foreign
    
    ------------------------------
           |     total   distinct
    -------+----------------------
     rep78 |        21          3
    ------------------------------
    
    . egen tag = tag(rep78 foreign)
    
    . egen ndistinct = total(tag), by(foreign)
    
    . tabdisp foreign, c(ndistinct)
    
    ----------------------
     Car type |  ndistinct
    ----------+-----------
     Domestic |          5
      Foreign |          3
    ----------------------


    Comment

    Working...
    X