How to create a loop that counts the observations for each combination of variables?

Mike McDonald

Join Date: Sep 2023

Posts: 12
#1

How to create a loop that counts the observations for each combination of variables?

16 Oct 2023, 04:44

I have a sample of 44k observations that have a special characteristic: they are not eligible for a specific insurance (based on a particular requirement), but still they have it thanks to other criteria.
Now, I want to know why these 44k have the insurance even if they are not eligible based on the requirement I am focusing on.
A reason for that may be that individuals could suffer from a disability, which automatically grant them the insurance. For this puropose I have 9 dummies indicating whether an individual has a specific disability or not.
Actually, since data come from a survey, the question was not directly "do you have disability X?", but somehow I want to reconstruct it.
That is why I need to count the observations for each combination of these 9 dummies.

I'll make an example to be clearer:

I want to know how many of the 44k that have the insurance have it beacuse they answered "Yes" to all the 9 questions.
To see this I just type: "count if A==1 & B==1 & C==1 .... I==1" and the number is, say, 0.
However, they could have answered "Yes" to each of the 9 except for I (=0), hence I type: "count if A==1 & B==1 & C==1 .... H==1"
Again, I need to type this count command replacing I==1 for each dummy (A, B, C...) as some individuals could have A, B, C, ... G and I but not H and so on...
And so on always reducing the number of variables equal to 1, since the individual could have answered "Yes" only to 8, 7, 6, 5, etc questions.

Also, I would like to know your thoughts on whether this mechanism makes sense or not.

The question arised as soon as I typed these commands:
count if A==1
count if B==1
.
.
.
count if I==1
and then I summed the figures.
But then I thought that the units in the summed number could have had more than just one dummy equal to 1, that is why I started planning to count the observations for each combinantion.

In the end, when I'll have all the counts I can detract them from 44k to see how many of the original sample have the insurance due to disability.
Tags: count, loop
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

16 Oct 2023, 05:22

Originally posted by Mike McDonald View Post

That is why I need to count the observations for each combination of these 9 dummies.

Are you sure you need to distinguish between all 2^9 = 512* combinations of disabilities?

Originally posted by Mike McDonald View Post

In the end, when I'll have all the counts I can detract them from 44k to see how many of the original sample have the insurance due to disability.

sound more like you just want to know how many respondents have any of the 9 disabilities. And, that is simply

Code:

count if inlist(1, A, B, ..., I)

* Edit 2: Actually, there are only 511 combinations because having none of the 9 disabilities would not qualify as a disability.

Last edited by daniel klein; 16 Oct 2023, 05:34. Reason: Reason: anoyof() is Mata, sorry, the function of interest here is inlist()
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#3

16 Oct 2023, 05:32

Originally posted by Mike McDonald View Post

Also, I would like to know your thoughts on whether this mechanism makes sense or not.

You do not provide enough information to answer that question. It should be obvious that if there are other reasons than disability to have insurence, you cannot conlude that someone has insurence because of their disability; they might fullfill other criteia, too.
Comment
Mike McDonald

Join Date: Sep 2023

Posts: 12
#4

16 Oct 2023, 07:40

[QUOTE=daniel klein;n1730390]

Code:

count if inlist(1, A, B, ..., I)

is this command counting all the observations that have answered 1 to at least one of the 9 questions?
Comment
Mike McDonald

Join Date: Sep 2023

Posts: 12
#5

16 Oct 2023, 07:41

Originally posted by daniel klein View Post

You do not provide enough information to answer that question. It should be obvious that if there are other reasons than disability to have insurence, you cannot conlude that someone has insurence because of their disability; they might fullfill other criteia, too.

No exactly, I cannot conclude that, I have only 3 other variables that could help me get why they have the insurance, that are: younger than 18, older than 65 and blind yes/no. I do not have anything else unfortunately. I know that this is not perfect as to be granted the insurance there are way more criteria to take into account, still I don't have enough data and don't need to be super precise. Some observations will still leave me with the question "why do they have insurance?" but my aim is to (hopefully) get the most of those 44k with these variables I have.
Going back to my question, hoping to get clearer: e.g. one of the 9 dummies, say A, is the question "did you receive social security income because of disability in the previous year?" and another one, say B, is "have you evere been told to have autism?".
If I just count units that A==1 I might be counting some of those who B==1 and viceversa, that is why I was thinking about the combination stuff.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#6

16 Oct 2023, 08:26

Originally posted by Mike McDonald View Post

is this command counting all the observations that have answered 1 to at least one of the 9 questions?

Yes.

Originally posted by Mike McDonald View Post

[...]
If I just count units that A==1 I might be counting some of those who B==1 and viceversa, that is why I was thinking about the combination stuff.

Well, for all respondents for whom more than one indicator is true, you cannot really be certain as to why they have insurance. Thus, you could

Code:

generate byte sum_of_9 = (A+B+...+I)

Then

Code:

sum_of_9 == 0

have insurance because of unobserved criteria, and

Code:

sum_of_9 == 1

have insurance because of 1 observed, identifiable criteria. You could identify the respective criteria with something like

Code:

generate byte why = . local i 0 foreach var in A B ... I { local ++i replace why = `i' if (`var' == 1) & (sum_of_9 == 1) } label define why 1 "A" 2 "B" ... 9 "I" label values why why

Then,

Code:

tabulate why

would show the frequencies for all 9 indicators.

There is probably an egen function for the second part but I do not recall it.

Last edited by daniel klein; 16 Oct 2023, 08:29.
Comment
Mike McDonald

Join Date: Sep 2023

Posts: 12
#7

16 Oct 2023, 09:30

Thanks a lot! I get the rationale under your code but when I generate sum_of_9 Stata keeps creating a variable full of missing values
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#8

16 Oct 2023, 09:38

Originally posted by Mike McDonald View Post

when I generate sum_of_9 Stata keeps creating a variable full of missing values

Then you probably have missing values in the indicator variables. You do not show any data, so I cannot know that.

Substantively, missing values make conclusions even more complicated if not impossible. Say, a respondent has "receive[d] social security income because of disability in the previous year" (A==1) but has a missing value on whether they "have evere been told to have autism" (B==.). What do you make of this? Do they have insurance because of the disability? Maybe. We cannot know because the respondent might have autism or they might not have autism. Basically, you cannot say anything about respondents with missing values.
Comment
Mike McDonald

Join Date: Sep 2023

Posts: 12
#9

17 Oct 2023, 03:55

Yes unfortunately I do have plenty of missing values, thanks a lot for the clarification! I guess I'll go with the "inlist" command
Comment

Announcement

How to create a loop that counts the observations for each combination of variables?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment