I have a sample of 44k observations that have a special characteristic: they are not eligible for a specific insurance (based on a particular requirement), but still they have it thanks to other criteria.
Now, I want to know why these 44k have the insurance even if they are not eligible based on the requirement I am focusing on.
A reason for that may be that individuals could suffer from a disability, which automatically grant them the insurance. For this puropose I have 9 dummies indicating whether an individual has a specific disability or not.
Actually, since data come from a survey, the question was not directly "do you have disability X?", but somehow I want to reconstruct it.
That is why I need to count the observations for each combination of these 9 dummies.
I'll make an example to be clearer:
I want to know how many of the 44k that have the insurance have it beacuse they answered "Yes" to all the 9 questions.
To see this I just type: "count if A==1 & B==1 & C==1 .... I==1" and the number is, say, 0.
However, they could have answered "Yes" to each of the 9 except for I (=0), hence I type: "count if A==1 & B==1 & C==1 .... H==1"
Again, I need to type this count command replacing I==1 for each dummy (A, B, C...) as some individuals could have A, B, C, ... G and I but not H and so on...
And so on always reducing the number of variables equal to 1, since the individual could have answered "Yes" only to 8, 7, 6, 5, etc questions.
Also, I would like to know your thoughts on whether this mechanism makes sense or not.
The question arised as soon as I typed these commands:
count if A==1
count if B==1
.
.
.
count if I==1
and then I summed the figures.
But then I thought that the units in the summed number could have had more than just one dummy equal to 1, that is why I started planning to count the observations for each combinantion.
In the end, when I'll have all the counts I can detract them from 44k to see how many of the original sample have the insurance due to disability.
Now, I want to know why these 44k have the insurance even if they are not eligible based on the requirement I am focusing on.
A reason for that may be that individuals could suffer from a disability, which automatically grant them the insurance. For this puropose I have 9 dummies indicating whether an individual has a specific disability or not.
Actually, since data come from a survey, the question was not directly "do you have disability X?", but somehow I want to reconstruct it.
That is why I need to count the observations for each combination of these 9 dummies.
I'll make an example to be clearer:
I want to know how many of the 44k that have the insurance have it beacuse they answered "Yes" to all the 9 questions.
To see this I just type: "count if A==1 & B==1 & C==1 .... I==1" and the number is, say, 0.
However, they could have answered "Yes" to each of the 9 except for I (=0), hence I type: "count if A==1 & B==1 & C==1 .... H==1"
Again, I need to type this count command replacing I==1 for each dummy (A, B, C...) as some individuals could have A, B, C, ... G and I but not H and so on...
And so on always reducing the number of variables equal to 1, since the individual could have answered "Yes" only to 8, 7, 6, 5, etc questions.
Also, I would like to know your thoughts on whether this mechanism makes sense or not.
The question arised as soon as I typed these commands:
count if A==1
count if B==1
.
.
.
count if I==1
and then I summed the figures.
But then I thought that the units in the summed number could have had more than just one dummy equal to 1, that is why I started planning to count the observations for each combinantion.
In the end, when I'll have all the counts I can detract them from 44k to see how many of the original sample have the insurance due to disability.
Comment