Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking the Consistency of Insurance Records for the Same Person

    Hello,

    I have an insurance claims database with about 2,000,000 claims records. I'm trying to check the rate of inconsistency for various variables (gender, DOB and race), among multiple records for the same person. I've only been using STATA for about a year and this is my first time working with insurance claims data, so I'm not certain how to go about doing this. Any suggestions?

    Thanks your your help!
    Christina Martone

  • #2
    Well, I'm not sure what you mean by the "rate of inconsistency," as I can think of several ways to define this. But, the first step in any of them would be to identify, for each person, whether or not his/her data are consistent. That part I can help you with:

    Code:
    foreach v of varlist gender dob race {
        by person_id (`v'), sort: gen byte consistent_`v' = (`v'[1] == `v'[_N])
    }

    Comment


    • #3
      Thank you for your help and for that code! When I run the code, and look at the resulting frequencies, I get this (using gender as an example):

      Code:
      . tab consistent_idgendr
      
      consistent_ |
          idgendr |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |     41,467        2.07        2.07
                1 |  1,960,512       97.93      100.00
      ------------+-----------------------------------
            Total |  2,001,979      100.00

      Do the zero's indicated the number of claims with no inconsistencies, while the one's represent the number of claims with an inconsistency in gender? Or vice versa? Also, does this code include claims with only one record per person? And if so, do you know how to specify whether his/her data are consistent among only those with more than one record?

      When I mean "rate of inconsistency" I mean to ask that among those individuals with more than one claims record, what is the rate of inconsistency for DOB, gender, and race. I realize there are multiple ways to approach this, but I am unfamiliar with these codes, so any help or guidance with how to approach this would be greatly appreciated!
      Thank you very much for your help!

      Comment


      • #4
        If you study the code in #2 and understand it, as well as just thinking about the choice of the variable name, you will realize that for this variable, 1 means consistent, and 0 means there is an inconsistency. Again, if you study the code, you will realize that it will give a 1 to any person with only one record, because in that case the data are, in fact, consistent.

        You still haven't explained what you mean by rate of consistency. Do you mean the number of people with entirely consistent records divided by the total number of people? That would be, I think, the simplest version, though by no means the only statistic that could be called a consistency rate.. If that's what you want:

        Code:
        egen flag = tag(person_id)
        summ consistent_* if flag
        The numbers in the "mean" column of the output of -summ- will be the rates of consistency, so defined. If you want inconsistency rates, just subtract those from 1. If you want it in percentage terms, multiply by 100.

        Comment


        • #5
          This makes a lot of sense, thank you!

          To further clarify, I would like to know the number of people with entirely consistent records divided by the total number of people with more than one claims record. Do you know how to go about doing that?

          I apologize for all of these tedious questions, and for my lack of clarification earlier. As I said, I'm quite new to STATA!

          Comment


          • #6
            Code:
            egen flag = tag(person_id)
            by person_id, sort: gen byte n_gt_1 = (_N > 1)
            summ consistent_* if flag & n_gt_1

            Comment

            Working...
            X