Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Proportion

    I have a dataset, N around 15,000, with some hearing tests from which I am using egen to generate a persistence of hearing deficit score.
    There are 8 timepoint variables so my varlist - (tmpnt20 tmpnt21 tmpnt22 tmpnt23 tmpnt24 tmpnt25 tmpnt26 tmpnt27)
    The data at these timepoints is - hearing deficit =1 none =0 but there is some missing data =.
    I am trying to identify the proportion of time an individual had a hearing deficit across these timepoints.
    I have egen a variable rowtotscore20_27 using
    egen rowtotscore20_27 = rowtotal (tmpnt20 tmpnt21 tmpnt22 tmpnt23 tmpnt24 tmpnt25 tmpnt26 tmpnt27)
    I have tried to now identify the missing data using rownonmiss.
    egen rownonmissscore20_27 = rownonmiss (tmpnt20 tmpnt21 tmpnt22 tmpnt23 tmpnt24 tmpnt25 tmpnt26 tmpnt27)

    What I now need to do is egen a proportion - So someone with 8 tests with zero score with get 0/8, someone with 4 tests and 2 hearing deficit scores would get 2/4 and someone with 5 tests and 5 hearing deficit scores would get 100%...how can I do this?

  • #2
    I do not think that you need egen for this.

    Wouldn't this be sufficient?

    Code:
    gen proportion = rowtotscore20_27/rownonmissscore20_27

    Comment


    • #3
      What Michael Jankowski proposes in #2 looks right to me, and given the layout of your data it is the simplest way forward at this point.

      But if there are other analyses you are planning to do, sooner or later you will probably find yourself disadvantaged, if not entirely hamstrung, by the use of the wide layout. Most analyses in Stata are easier when the data are in long layout. So, let's assume you have a variable that identifies the different patients, call it patient_id.

      Code:
      reshape long tmpnt2, i(patient_id) j(time)
      rename tmpnt2 hearing_deficit
      will get you to a long layout in which there is a separate observation for each observation of each patient, and the variable hearing_deficit will tell you whether they have a deficit at that time or not. Had your data already been laid out this way, you could get your proportion by running:

      Code:
      by patient_id, sort: egen proportion = mean(hearing_deficit)
      a one-liner which would automatically handle the problem of missing values.

      Comment

      Working...
      X