Proportion

Jocelyn Cherry

Join Date: Jun 2015

Posts: 47
#1

Proportion

10 Mar 2016, 00:33

I have a dataset, N around 15,000, with some hearing tests from which I am using egen to generate a persistence of hearing deficit score.
There are 8 timepoint variables so my varlist - (tmpnt20 tmpnt21 tmpnt22 tmpnt23 tmpnt24 tmpnt25 tmpnt26 tmpnt27)
The data at these timepoints is - hearing deficit =1 none =0 but there is some missing data =.
I am trying to identify the proportion of time an individual had a hearing deficit across these timepoints.
I have egen a variable rowtotscore20_27 using
egen rowtotscore20_27 = rowtotal (tmpnt20 tmpnt21 tmpnt22 tmpnt23 tmpnt24 tmpnt25 tmpnt26 tmpnt27)
I have tried to now identify the missing data using rownonmiss.
egen rownonmissscore20_27 = rownonmiss (tmpnt20 tmpnt21 tmpnt22 tmpnt23 tmpnt24 tmpnt25 tmpnt26 tmpnt27)

What I now need to do is egen a proportion - So someone with 8 tests with zero score with get 0/8, someone with 4 tests and 2 hearing deficit scores would get 2/4 and someone with 5 tests and 5 hearing deficit scores would get 100%...how can I do this?
Tags: None
Michael Jankowski

Join Date: Oct 2014

Posts: 13
#2

10 Mar 2016, 02:24

I do not think that you need egen for this.

Wouldn't this be sufficient?

Code:

gen proportion = rowtotscore20_27/rownonmissscore20_27
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#3

10 Mar 2016, 09:00

What Michael Jankowski proposes in #2 looks right to me, and given the layout of your data it is the simplest way forward at this point.

But if there are other analyses you are planning to do, sooner or later you will probably find yourself disadvantaged, if not entirely hamstrung, by the use of the wide layout. Most analyses in Stata are easier when the data are in long layout. So, let's assume you have a variable that identifies the different patients, call it patient_id.

Code:

reshape long tmpnt2, i(patient_id) j(time) rename tmpnt2 hearing_deficit

will get you to a long layout in which there is a separate observation for each observation of each patient, and the variable hearing_deficit will tell you whether they have a deficit at that time or not. Had your data already been laid out this way, you could get your proportion by running:

Code:

by patient_id, sort: egen proportion = mean(hearing_deficit)

a one-liner which would automatically handle the problem of missing values.
1 like
Comment

Announcement

Comment

Comment