Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Save results of distinct command to dataset

    I'm running the distinct command repeatedly on different subsets of my data. I'd like to create a second dataset that stacks the results of the distinct command. Here's a very simple example, which counts the makes and models of foreign and domestic cars.
    Code:
    sysuse auto, clear
    split make, limit(2)
    drop make
    rename make1 make
    rename make2 model
    
    display "All cars"
    distinct make
    display "Foreign cars"
    distinct make if foreign
    display "Domestic cars"
    distinct make if !foreign
    Instead of having the results in the output window (or in addition), though, I'd like to create a new dataset that looks something like this:
    Subset Makes Models
    All cars 23 74
    Foreign cars 12 22
    Domestic cars 11 52
    What's the most efficient way to do this? Thanks!

  • #2
    P.S. The real problem is more complex, with more subsets and more than two levels (make, model).

    Comment


    • #3
      -distinct- (by Gary Longton; SSC) saves its results in r(). I'd use -postfile- and its siblings to loop around the subsets you're looking at, and accumulate the results for each subset in a new dataset. You can then read that in and do whatever you want with it.

      Comment


      • #4
        The source for distinct is better given as the Stata Journal: At the time of writing the latest version of the code should be downloaded from SJ 15-3, but the longest write-up was in SJ 8-4.


        SJ-15-3 dm0042_2 . . . . . . . . . . . . . . . . Software update for distinct
        (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
        Q3/15 SJ 15(3):899
        improved table format and display of large numbers of
        observations

        SJ-12-2 dm0042_1 . . . . . . . . . . . . . . . . Software update for distinct
        (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
        Q2/12 SJ 12(2):352
        options added to restrict output to variables with a minimum
        or maximum of distinct values

        SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
        (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
        Q4/08 SJ 8(4):557--568
        shows how to answer questions about distinct observations


        Comment


        • #5
          Thanks to Stephen Jenkins for suggesting the post commands. Here is the code I wrote for the toy job:
          Code:
          sysuse auto, clear
          split make, limit(2)
          drop make
          rename make1 make
          rename make2 model
          
          postfile counts str20 subset int makes int models using counts, replace
          distinct make
          post counts ("All cars") (`r(ndistinct)') (`r(N)')
          distinct make if foreign
          post counts ("Foreign cars") (`r(ndistinct)') (`r(N)')
          distinct make if !foreign
          post counts ("Domestic cars") (`r(ndistinct)') (`r(N)')
          postclose counts
          
          use counts, clear
          list

          Comment


          • #6
            Apologies (especially to Nick) for not getting the provenance of -distinct- correct.

            Comment


            • #7
              Stephen Jenkins Thanks, and fine by me. In fact Gary did more of the coding and I did more on the paper so that is reflected in the order of the authors in the help file and in the paper.

              Comment

              Working...
              X