Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating survey coverage and assigning calculated coverage value to all records

    Greetings. I have an appended dataset that consolidates many coverage surveys. Each source data file has an identifier, survid. The dichotomous variable "gotit" describes whether people got treatment.

    For each survey I would like to calculate the coverage for the survey as a whole, which is defined as (gotit=1) / (gotit=1 + gotit=0). Then I can contextualize each survey's responses with information about the survey's reach.

    What commands would allow me to calculate coverage and apply the calculation to all records? And if the calculation happens at the level of the overall survey (rather than the records comprising each survey), would I need to do this before appending all datasets?

    I will appreciate guidance. Thank you.

  • #2
    I'm not sure I understand your description of what you want. But I think it's:
    Code:
    by survid, sort: egen coverage = mean(gotit)
    Note: Assumes that gotit never takes on any values other than 0 and 1.

    Comment


    • #3
      I like this recommendation. Would this approach effectively ignore the missing data for gotit? (I think it would - I'll appreciate your confirmation.)

      Thank you.

      Comment


      • #4
        One more follow-up question: Would there be an easy way to keep coverage to only three decimal places? The recommended approach generates decimals of differing lengths up to seven digits. Thanks.

        Comment


        • #5
          Yes, the method proposed excludes missing values from the calculation altogether: they appear in neither the numerator nor the denominator.

          If you only want to see three decimal places, I suggest
          Code:
          format coverage %4.3f
          This will not change the actual values of coverage, and if you use it in calculating other things the decimal places beyond 3 will still participate in the calculations. But you will not have to look at those decimal places when you display the data on screen or in print. This is what I recommend. Bear in mind that if you are going to do a series of calculations it is best to work with unrounded numbers and then just round the final result--this reduces error propagation in the calculation, which can be suprisingly large when you calculate with rounded numbers.

          If you really want to change the values to round them to 3 decimal places you can
          Code:
          replace coverage = round(coverage, 0.001)
          But I think it's a bad idea to do this. First, there is the issue of suitability of using rounded numbers for subsequent calculations that I elaborated above. But another problem is that Stata represents its numbers internally as binary floating point. Consequently, numbers that are "round" in decimal are mostly not round in binary. In particular, 0.001 decimal corresponds to an infinite repeating binary number. But given that memory is finite (and actually the memory allocated to a single number is at most 8 bytes (for a double), and only 4 by default (as a float), that number gets cut off fairly early in its sequence and what is carried internally is the best approximation to 0.001 that can be managed within the allocated storage. (Remember, 1/3 has no exact finite decimal representation. This is the analogous problem for binary.) The point being that rounding in this way is not going to be completely accurate--so you are actually adding more errors into the results.

          Comment

          Working...
          X