Combining the e(sample) from two different regression

Dono Iskandar

Join Date: Aug 2015

Posts: 13
#1

Combining the e(sample) from two different regression

12 Aug 2015, 08:30

I have the same model that I run separately for two groups, (poor and less poor). and I want to get the descriptive statistic of each sub-sample, and also the descriptive statistics of the combine two subsample.

I am aware that I could use sum var_list, if e(sample) for each regression. but I have no idea how to combine those two e(sample) and get the descriptive statistics.

and simply using sum var_list also do not work, perhaps because I also include age and district fixed effect, so there could be the case some observations with particular district or age are omitted.

I know this should be easy, but I just dont know how to do it.

Thank You
Tags: descriptive statistics, e(sample)
Richard Williams

Join Date: Apr 2014

Posts: 5008
#2

12 Aug 2015, 08:44

After running a model you could do something like

gen sample1 = e(sample)

Having done that, I suppose you could create a variable coded sample 1 only, sample 2 only, and (if not mutually exclusive) sample 1 and 2, whatever.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#3

12 Aug 2015, 08:49

I doubt the usefulness of descriptive statistics for a combined sample if there are no multivariate analysis for this combined sample, but you can just create permanent variables to mark the two subsamples, then add them. Here is an example

Code:

sysuse nlsw88 , clear regress wage hours i.union if collgrad generate byte subsample1 = e(sample) regress wage hours i.union if !collgrad generate byte subsample2 = e(sample) generate mysample = subsample1 + subsample2 ta mysample summarize hours summarize hours if mysample

Best
Daniel
Comment
Dono Iskandar

Join Date: Aug 2015

Posts: 13
#4

12 Aug 2015, 08:56

Thank You very much Richard, Daniel.. have no idea that I could combine e(sample) with gen.. really appreciate your help :D

Last edited by Dono Iskandar; 12 Aug 2015, 09:03.
Comment
mohina saxena

Join Date: Mar 2016

Posts: 61
#5

18 Apr 2020, 15:20

Hello All,

Hope everyone is safe.

I intend to get descriptives statistics from my regression sample but separated by Industries that are contained in the estimation. I have an unbalanced panel data set from various Industries spanning across several years. I am aware of using e(sample) and tabstat both in isolation but unable to combine them so as to get summary stat industry wise only from the regression sample used. Can someone please suggest how to proceed?

thanks and regards,
Mohina
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#6

19 Apr 2020, 01:50

Code:

preserve keep if e(sample) tabstat ... restore
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35711
#7

19 Apr 2020, 02:07

Let's imagine that we have two samples, which might overlap. So separately after each model fit we go

Code:

gen byte sample1 = e(sample) gen byte sample2 = e(sample)

Now we could go

Code:

gen byte sample = sample1 + 2 * sample2

so that sample is 0 if in neither sample, 1 if in sample 1 only, 2 if in sample 2 only, 3 if in both.

With three such samples,

Code:

gen byte sample = sample1 + 2 * sample2 + 4 * sample3

which gives 0 if in no samples all the way up to 7 for if in all samples.

But hang on: these are just binary numbers in decimal. It's more direct to go

Code:

egen sample = concat(sample1 sample2)

or

Code:

egen sample = concat(sample1 sample2 sample3)

and so on, so that (in the last example) string values could be

Code:

000 001 010 011 100 101 110 111

and easy extensions give us the 2^k distinct subsets for k samples. It's not at all necessary that all the subsets occur in practice.
Comment
mohina saxena

Join Date: Mar 2016

Posts: 61
#8

20 Apr 2020, 11:39

Many Thanks Andrew, it got resolved.

stay safe and regards,
Mohina
Comment

Announcement

Combining the e(sample) from two different regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment