Coarsened Exact Matching (CEM): assigning to treated and control groups

Sofiya Volvakova

Join Date: Jan 2023
Posts: 50

Coarsened Exact Matching (CEM): assigning to treated and control groups

28 Oct 2023, 04:13

Hey Stata List,

I am exploring the difference in school discipline administration on different ethnic groups. I have a dichotomous DV (1 = yes, 2 = no) and a categorical IV (eight categories). I would like to create seven treated groups and one control group (pertaining to White race) to see if disproportionality in school suspension exists between White and non-White school children.

How do I create treated and control groups based on the data snippet below:

Code:

. ssc install dataex
checking dataex consistency and verifying not already installed...
all files already exist and are up to date.

. dataex W1ExcludeYP W1ethgrpYP

----------------------- copy starting from the next line -----------------------


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input float W1ExcludeYP int W1ethgrpYP
0 1
0 4
1 2
. 1
0 1
0 1
0 3
0 1
1 1
0 1
0 1
0 1
0 6
0 1
0 1
0 1
0 1
0 3
0 3
0 1
0 1
. 4
0 5
0 1
0 1
0 1
0 3
0 1
0 5
0 1
0 1
0 1
0 6
1 1
1 4
0 1
0 1
1 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 8
0 1
0 1
0 1
0 1
0 1
. 1
0 4
0 6
0 1
0 1
0 1
. 4
0 5
0 1
1 6
0 1
0 1
. 7
. 1
0 4
0 5
0 1
0 1
0 7
0 1
0 1
0 1
. 8
1 1
0 1
0 2
0 1
0 1
0 1
0 4
0 1
0 1
0 1
0 1
1 3
1 1
0 1
0 1
0 1
1 4
0 1
0 1
0 4
0 1
. 5
0 3
end
label values W1ethgrpYP W1ethgrpYP
label def W1ethgrpYP 1 "White", modify
label def W1ethgrpYP 2 "Mixed", modify
label def W1ethgrpYP 3 "Indian", modify
label def W1ethgrpYP 4 "Pakistani", modify
label def W1ethgrpYP 5 "Bangladeshi", modify
label def W1ethgrpYP 6 "Black Caribbean", modify
label def W1ethgrpYP 7 "Black African", modify
label def W1ethgrpYP 8 "Other", modify
------------------ copy up to and including the previous line ------------------

Listed 100 out of 15770 observations
Use the count() option to list more

.

Thank you!

Last edited by Sofiya Volvakova; 28 Oct 2023, 04:17.

Tags: None

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

28 Oct 2023, 05:28

Code:

g treat= cond(inrange(W1ethgrpYP,2,8),1,0)
Comment
Sofiya Volvakova

Join Date: Jan 2023

Posts: 50
#3

29 Oct 2023, 14:25

Originally posted by Jared Greathouse View Post

Code:

g treat= cond(inrange(W1ethgrpYP,2,8),1,0)

Hi Jared,

I think I explained it in a bad way the first time around. I have a dichotomous DV, where students who have reported they experienced a discipline coded as 1 and those who did not experience a discipline in school coded as 0. Youths’ race/ethnicity should be the treatment variable in the analysis, which is to be measured in eight different ways with the mutually exclusive categories White, Black African, Black Caribbean, etc.

Do you know how to go about this ?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

29 Oct 2023, 14:38

I would like to create seven treated groups and one control group (pertaining to White race) to see if disproportionality in school suspension exists between White and non-White school children.

What do you mean by this? Do you want to partition your data set into 8 subsets and save each one separately? What would you do with those data sets?

When you say you want "to see if disproportionality in school suspension exists between White and non-White school children" it sounds like you are not actually interested in 8 groups, but just White vs non-White--which is two groups. In any case, examining this kind of disproportionality, if you have no other variables to adjust for, could be done quite simply with just -tab W1ExcludeYP W1ethgrpYP, col-, and if you want a test statistic, just add the -chi2- option to that. If you are just interested in White vs non-White, then you can create a variable that distinguishes those two groups with -gen White = 1.W1ethgrpYP- and then you can -tab White W1ethgrpYP, col-

If you do plan to introduce covariates to adjust for, then you will probably want to do a logistic or probit regression. There is no additional setup needed for this. If you want to use the 8-level ethnicity variable it's just:

Code:

logistic W1ExcludeYP i.W1ethgrpYP list_covariates_here

If you specifically want to just compare White and non-White, you can either use the White variable I referred to above, or you can

Code:

logistic W1ExcludeYP 1.W1ethgrpYP list_covariates_here

Note: In all of the above, references to 1. are used because your -dataex- output shows that your W1ethgrpYP variable is coded with White = 1.
Comment
Sofiya Volvakova

Join Date: Jan 2023

Posts: 50
#5

29 Oct 2023, 14:54

Originally posted by Clyde Schechter View Post

What do you mean by this? Do you want to partition your data set into 8 subsets and save each one separately? What would you do with those data sets?

When you say you want "to see if disproportionality in school suspension exists between White and non-White school children" it sounds like you are not actually interested in 8 groups, but just White vs non-White--which is two groups. In any case, examining this kind of disproportionality, if you have no other variables to adjust for, could be done quite simply with just -tab W1ExcludeYP W1ethgrpYP, col-, and if you want a test statistic, just add the -chi2- option to that. If you are just interested in White vs non-White, then you can create a variable that distinguishes those two groups with -gen White = 1.W1ethgrpYP- and then you can -tab White W1ethgrpYP, col-

If you do plan to introduce covariates to adjust for, then you will probably want to do a logistic or probit regression. There is no additional setup needed for this. If you want to use the 8-level ethnicity variable it's just:

Code:

logistic W1ExcludeYP i.W1ethgrpYP list_covariates_here

If you specifically want to just compare White and non-White, you can either use the White variable I referred to above, or you can

Code:

logistic W1ExcludeYP 1.W1ethgrpYP list_covariates_here

Note: In all of the above, references to 1. are used because your -dataex- output shows that your W1ethgrpYP variable is coded with White = 1.

Hi Clyde, I want to partition the data into 8 subsets, because not every race/ethnicity is affected by school discipline disproportionality equally. Hence, it should be: Black Caribbean vs. White, Black & African vs. White, Indian vs. White, etc, which would provide a more nuanced understanding to the overall problem. I have a number of controls to include like misbehaviour, SES, school type, region, etc.

Initially, I ran all my models using a logistic regression. However, I read this paper by Lehmann "Race and Ethnicity Effects in School Discipline: A Coarsened Exact Matching Analysis" and want to attempt the same

Last edited by Sofiya Volvakova; 29 Oct 2023, 15:05.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

30 Oct 2023, 09:29

I want to partition the data into 8 subsets

I still don't know what you mean by this. The information about the subsets is already carried by the variable W12ethgrpYP. Stata data sets do not have partitions. To carry out analyses with the kind of comparisons/contrasts you mention, you need only use i.W12ethgrpYP in your analysis.

The reference you give to the Lehmann paper is incomplete. It may be a paper well known to everyone in your field, but for others, the reference does not give enough information to even search for the paper. Even with a complete reference, there is a good chance that once found, the paper would be behind a paywall. It is better to provide a link to an open copy (if one is legally available) or to excerpt or paraphrase carefully the relevant section(s) describing the methods you wish to implement.
Comment
Sofiya Volvakova

Join Date: Jan 2023

Posts: 50
#7

30 Oct 2023, 11:02

Originally posted by Clyde Schechter View Post

I still don't know what you mean by this. The information about the subsets is already carried by the variable W12ethgrpYP. Stata data sets do not have partitions. To carry out analyses with the kind of comparisons/contrasts you mention, you need only use i.W12ethgrpYP in your analysis.

The reference you give to the Lehmann paper is incomplete. It may be a paper well known to everyone in your field, but for others, the reference does not give enough information to even search for the paper. Even with a complete reference, there is a good chance that once found, the paper would be behind a paywall. It is better to provide a link to an open copy (if one is legally available) or to excerpt or paraphrase carefully the relevant section(s) describing the methods you wish to implement.

Hi Clyde, thank you for staying on this issue ! The idea of that paper is that a simple regression does not always ensure adequate between-group balance on the measured covariates. Therefore, the author uses CEM to compensate for non-equivalence in confounders by pruning observations from the treatment and control groups such that the remaining cases are balanced on the measured covariates. In short, youths’ race/ethnicity in their design represents the treatment variable, which is measured trichotomously with the mutually exclusive categories Black, Hispanic, and White (in my case, there will be more categories of course ). They then present unadjusted average treatment effects on the treated (ATTs) of race/ethnicity on the outcome (discipline) using binary logistic regression with no covariates. Second, they present regression-adjusted estimates of the ATTs using binary logistic regression with all of the control variables included. Finally, they use CEM to create approximately identical treatment and control groups, and binary logistic regression is used to estimate the ATTs among the matched samples.

Last edited by Sofiya Volvakova; 30 Oct 2023, 11:19.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#8

30 Oct 2023, 12:14

Thank you for clarifying that. But I still don't understand what you mean when you refer to "partition"ing your data set into 8 racial groups. Nothing in the -cem- command requires that. I am not a regular user of -cem-, but I think I understand it well enough to say that all you will have to do is specify W12ethgrpYP in the -treatment()- observation of your -cem- command and it will (attempt to) balance those groups on whatever variables you require in the variable list of the -cem- command.
Comment

Announcement

Coarsened Exact Matching (CEM): assigning to treated and control groups

Comment

Comment

Comment

Comment

Comment

Comment

Comment