Checking the Consistency of Insurance Records for the Same Person

Christina Martone

Join Date: Sep 2017

Posts: 8
#1

Checking the Consistency of Insurance Records for the Same Person

24 Sep 2017, 14:37

Hello,

I have an insurance claims database with about 2,000,000 claims records. I'm trying to check the rate of inconsistency for various variables (gender, DOB and race), among multiple records for the same person. I've only been using STATA for about a year and this is my first time working with insurance claims data, so I'm not certain how to go about doing this. Any suggestions?

Thanks your your help!
Christina Martone
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

24 Sep 2017, 16:12

Well, I'm not sure what you mean by the "rate of inconsistency," as I can think of several ways to define this. But, the first step in any of them would be to identify, for each person, whether or not his/her data are consistent. That part I can help you with:

Code:

foreach v of varlist gender dob race { by person_id (`v'), sort: gen byte consistent_`v' = (`v'[1] == `v'[_N]) }
Comment
Christina Martone

Join Date: Sep 2017

Posts: 8
#3

24 Sep 2017, 18:33

Thank you for your help and for that code! When I run the code, and look at the resulting frequencies, I get this (using gender as an example):

Code:

. tab consistent_idgendr consistent_ | idgendr | Freq. Percent Cum. ------------+----------------------------------- 0 | 41,467 2.07 2.07 1 | 1,960,512 97.93 100.00 ------------+----------------------------------- Total | 2,001,979 100.00

Do the zero's indicated the number of claims with no inconsistencies, while the one's represent the number of claims with an inconsistency in gender? Or vice versa? Also, does this code include claims with only one record per person? And if so, do you know how to specify whether his/her data are consistent among only those with more than one record?

When I mean "rate of inconsistency" I mean to ask that among those individuals with more than one claims record, what is the rate of inconsistency for DOB, gender, and race. I realize there are multiple ways to approach this, but I am unfamiliar with these codes, so any help or guidance with how to approach this would be greatly appreciated!
Thank you very much for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#4

24 Sep 2017, 18:42

If you study the code in #2 and understand it, as well as just thinking about the choice of the variable name, you will realize that for this variable, 1 means consistent, and 0 means there is an inconsistency. Again, if you study the code, you will realize that it will give a 1 to any person with only one record, because in that case the data are, in fact, consistent.

You still haven't explained what you mean by rate of consistency. Do you mean the number of people with entirely consistent records divided by the total number of people? That would be, I think, the simplest version, though by no means the only statistic that could be called a consistency rate.. If that's what you want:

Code:

egen flag = tag(person_id) summ consistent_* if flag

The numbers in the "mean" column of the output of -summ- will be the rates of consistency, so defined. If you want inconsistency rates, just subtract those from 1. If you want it in percentage terms, multiply by 100.
Comment
Christina Martone

Join Date: Sep 2017

Posts: 8
#5

24 Sep 2017, 20:02

This makes a lot of sense, thank you!

To further clarify, I would like to know the number of people with entirely consistent records divided by the total number of people with more than one claims record. Do you know how to go about doing that?

I apologize for all of these tedious questions, and for my lack of clarification earlier. As I said, I'm quite new to STATA!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30065

25 Sep 2017, 08:03

Code:

egen flag = tag(person_id)
by person_id, sort: gen byte n_gt_1 = (_N > 1)
summ consistent_* if flag & n_gt_1

Announcement

Checking the Consistency of Insurance Records for the Same Person

Comment

Comment

Comment

Comment

Comment