Save results of distinct command to dataset

paulvonhippel

Join Date: Apr 2014

Posts: 496
#1

Save results of distinct command to dataset

24 Jun 2020, 13:18

I'm running the distinct command repeatedly on different subsets of my data. I'd like to create a second dataset that stacks the results of the distinct command. Here's a very simple example, which counts the makes and models of foreign and domestic cars.

Code:

sysuse auto, clear split make, limit(2) drop make rename make1 make rename make2 model display "All cars" distinct make display "Foreign cars" distinct make if foreign display "Domestic cars" distinct make if !foreign

Instead of having the results in the output window (or in addition), though, I'd like to create a new dataset that looks something like this:

Subset Makes Models

All cars 23 74

Foreign cars 12 22

Domestic cars 11 52

What's the most efficient way to do this? Thanks!
Tags: None
paulvonhippel

Join Date: Apr 2014

Posts: 496
#2

24 Jun 2020, 13:26

P.S. The real problem is more complex, with more subsets and more than two levels (make, model).
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1427
#3

24 Jun 2020, 14:32

-distinct- (by Gary Longton; SSC) saves its results in r(). I'd use -postfile- and its siblings to loop around the subsets you're looking at, and accumulate the results for each subset in a new dataset. You can then read that in and do whatever you want with it.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35467
#4

24 Jun 2020, 16:21

The source for distinct is better given as the Stata Journal: At the time of writing the latest version of the code should be downloaded from SJ 15-3, but the longest write-up was in SJ 8-4.

SJ-15-3 dm0042_2 . . . . . . . . . . . . . . . . Software update for distinct
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q3/15 SJ 15(3):899
improved table format and display of large numbers of
observations

SJ-12-2 dm0042_1 . . . . . . . . . . . . . . . . Software update for distinct
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q2/12 SJ 12(2):352
options added to restrict output to variables with a minimum
or maximum of distinct values

SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
Comment

paulvonhippel

Join Date: Apr 2014
Posts: 496

24 Jun 2020, 20:39

Thanks to Stephen Jenkins for suggesting the post commands. Here is the code I wrote for the toy job:

Code:

sysuse auto, clear
split make, limit(2)
drop make
rename make1 make
rename make2 model

postfile counts str20 subset int makes int models using counts, replace
distinct make
post counts ("All cars") (`r(ndistinct)') (`r(N)')
distinct make if foreign
post counts ("Foreign cars") (`r(ndistinct)') (`r(N)')
distinct make if !foreign
post counts ("Domestic cars") (`r(ndistinct)') (`r(N)')
postclose counts

use counts, clear
list

Comment

Stephen Jenkins

Join Date: Apr 2014

Posts: 1427
#6

25 Jun 2020, 00:49

Apologies (especially to Nick) for not getting the provenance of -distinct- correct.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35467
#7

25 Jun 2020, 04:58

Stephen Jenkins Thanks, and fine by me. In fact Gary did more of the coding and I did more on the paper so that is reflected in the order of the authors in the help file and in the paper.
Comment

Subset	Makes	Models
All cars	23	74
Foreign cars	12	22
Domestic cars	11	52

Announcement

Save results of distinct command to dataset

Comment

Comment

Comment

Comment

Comment

Comment