Trying to do psmatch with non-integer treatment variable

Pedro Ornelas

Join Date: Feb 2018

Posts: 3
#1

Trying to do psmatch with non-integer treatment variable

09 Feb 2018, 13:28

Hello (already saying sorry in case I don't include something essential) - it's my first post!

I'm trying to run a psmatch to try and get a notion of endogeneity in my model, using:

teffects psmatch (outcomevar) (treatmentvar varlist), gen(match)

However, when I run it, it says: 'treatment variable must contain nonnegative integers') - my variable does only have noninteger values between 0 and 1.

I was wondering if there is any other way of running this - whether that's another function or perhaps modifying the variable? Thank you so much!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

10 Feb 2018, 19:10

Welcome to Statalist.

Your treatment variable takes values between 0 and 1, but are there a limited number of values it takes? That is, are they something like 25, .50, .and 75? Because what psmatch wants is an indicator of treatment levels. So with my example you would need a new variable with, for example, 1 when treatmentvar is .25, 2 when it is .50, and 3 when it is .75.

An easy way to get this new variable is using the group function of the egen command. Here's an example on some made-up data.

Code:

. egen t_level = group(t) . list, clean noobs t t_level .75 4 .5 3 .75 4 .25 1 .3333333 2 .5 3 .25 1 .75 4 .75 4 .5 3

See help egen for further details.

If your treatment variable has a wide range of values - if for example it could be .01, .02, ..., .99 - then it seems unlikely to me that propensity score matching will work without a large amount of data.
Comment
Pedro Ornelas

Join Date: Feb 2018

Posts: 3
#3

12 Feb 2018, 14:59

Originally posted by William Lisowski View Post

Welcome to Statalist.

Your treatment variable takes values between 0 and 1, but are there a limited number of values it takes? That is, are they something like 25, .50, .and 75? Because what psmatch wants is an indicator of treatment levels. So with my example you would need a new variable with, for example, 1 when treatmentvar is .25, 2 when it is .50, and 3 when it is .75.

An easy way to get this new variable is using the group function of the egen command. Here's an example on some made-up data.

Code:

. egen t_level = group(t) . list, clean noobs t t_level .75 4 .5 3 .75 4 .25 1 .3333333 2 .5 3 .25 1 .75 4 .75 4 .5 3

See help egen for further details.

If your treatment variable has a wide range of values - if for example it could be .01, .02, ..., .99 - then it seems unlikely to me that propensity score matching will work without a large amount of data.

Hello, thanks so much for that! In fact, my data is very much so divided (e.g. 0.001, 0.002 etc.), but I have a large dates (250,000 observations).

Nonetheless, I just did what you suggested and it came up with an error message saying "treatment variable must have 2 levels, but 6000 were found."
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

12 Feb 2018, 15:30

I see I misinterpreted the output of help teffects psmatch - it appears it can take any integers, but only two of them. (I was overestimating the capability of propensity score matching.)

So where I wrote earlier

If your treatment variable has a wide range of values - if for example it could be .01, .02, ..., .99 - then it seems unlikely to me that propensity score matching will work without a large amount of data.

it now appears that the operational definition of "wide range" is "more than 2".

I'm not familiar with using propensity score matching to test for endogeneity, so perhaps some other Statalist member will have some idea in your setup. But it looks to me that propensity score matching will not yield the test you seek.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4493
#5

12 Feb 2018, 19:18

given that the help file says:

teffects psmatch (ovar) (tvar tmvarlist [, tmodel]) [if] [in]
[weight] [, stat options]

ovar is a binary, count, continuous, fractional, or nonnegative outcome
of interest.

I don't think that the OP is giving enough information
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

12 Feb 2018, 19:51

Rich, I don't quite catch the relevance of the material you quote. My understanding is that the problem lies with what Pedro calls treatmentvar which corresponds to what the help file calls tvar and describes thusly:

tvar must contain integer values representing the treatment levels.

tmvarlist specifies the variables that predict treatment assignment in the treatment
model. Only two treatment levels are allowed.

which sort of hides the restriction of tvar to two levels in the description of tmvarlist.
Comment
Pedro Ornelas

Join Date: Feb 2018

Posts: 3
#7

13 Feb 2018, 04:27

Originally posted by William Lisowski View Post

Rich, I don't quite catch the relevance of the material you quote. My understanding is that the problem lies with what Pedro calls treatmentvar which corresponds to what the help file calls tvar and describes thusly:

which sort of hides the restriction of tvar to two levels in the description of tmvarlist.

Hi William,

Thanks for the reply. I've decided to generate a new variable that is 1 if > mean for treatmentvar, so I've been able to do it!
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#8

11 May 2018, 20:24

Have same problem as Pedro here... anyone else know what might be the problem?

I am also using

Code:

teffects psmatch (outcomevar) (treatmentvar varlist), gen(match)

with 2 different outcome variables. with one of them it works, even though the outcome variable has NEGATIVE values, with another it does not work, although it has NO NEGATIVE values.
both have not missing values as well (but having them didn't matter)

if I use simply

Code:

psmatch (treatmentvar varlist), out(outcomevar) logit ate

it does work, but as this way does not give me the SE for ATE (and other stats) I wanted to use the

Code:

teffects

the problematic outcome variable has the following descriptives:

type: numeric (float)
range: [3.1780539,18.064938]
units: 1.000e-07
unique values: 609
missing .: 7879/51712
mean: 5.23944
std. dev: 1.54297

p.s. I use Stata 13 MP

hope someone can help, thank you!
Comment

Announcement

Trying to do psmatch with non-integer treatment variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment