Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to do psmatch with non-integer treatment variable

    Hello (already saying sorry in case I don't include something essential) - it's my first post!

    I'm trying to run a psmatch to try and get a notion of endogeneity in my model, using:

    teffects psmatch (outcomevar) (treatmentvar varlist), gen(match)

    However, when I run it, it says: 'treatment variable must contain nonnegative integers') - my variable does only have noninteger values between 0 and 1.

    I was wondering if there is any other way of running this - whether that's another function or perhaps modifying the variable? Thank you so much!


  • #2
    Welcome to Statalist.

    Your treatment variable takes values between 0 and 1, but are there a limited number of values it takes? That is, are they something like 25, .50, .and 75? Because what psmatch wants is an indicator of treatment levels. So with my example you would need a new variable with, for example, 1 when treatmentvar is .25, 2 when it is .50, and 3 when it is .75.

    An easy way to get this new variable is using the group function of the egen command. Here's an example on some made-up data.
    Code:
    . egen t_level = group(t)
    
    . list, clean noobs
    
               t   t_level  
             .75         4  
              .5         3  
             .75         4  
             .25         1  
        .3333333         2  
              .5         3  
             .25         1  
             .75         4  
             .75         4  
              .5         3
    See help egen for further details.

    If your treatment variable has a wide range of values - if for example it could be .01, .02, ..., .99 - then it seems unlikely to me that propensity score matching will work without a large amount of data.

    Comment


    • #3
      Originally posted by William Lisowski View Post
      Welcome to Statalist.

      Your treatment variable takes values between 0 and 1, but are there a limited number of values it takes? That is, are they something like 25, .50, .and 75? Because what psmatch wants is an indicator of treatment levels. So with my example you would need a new variable with, for example, 1 when treatmentvar is .25, 2 when it is .50, and 3 when it is .75.

      An easy way to get this new variable is using the group function of the egen command. Here's an example on some made-up data.
      Code:
      . egen t_level = group(t)
      
      . list, clean noobs
      
      t t_level
      .75 4
      .5 3
      .75 4
      .25 1
      .3333333 2
      .5 3
      .25 1
      .75 4
      .75 4
      .5 3
      See help egen for further details.

      If your treatment variable has a wide range of values - if for example it could be .01, .02, ..., .99 - then it seems unlikely to me that propensity score matching will work without a large amount of data.
      Hello, thanks so much for that! In fact, my data is very much so divided (e.g. 0.001, 0.002 etc.), but I have a large dates (250,000 observations).

      Nonetheless, I just did what you suggested and it came up with an error message saying "treatment variable must have 2 levels, but 6000 were found."

      Comment


      • #4
        I see I misinterpreted the output of help teffects psmatch - it appears it can take any integers, but only two of them. (I was overestimating the capability of propensity score matching.)

        So where I wrote earlier

        If your treatment variable has a wide range of values - if for example it could be .01, .02, ..., .99 - then it seems unlikely to me that propensity score matching will work without a large amount of data.
        it now appears that the operational definition of "wide range" is "more than 2".

        I'm not familiar with using propensity score matching to test for endogeneity, so perhaps some other Statalist member will have some idea in your setup. But it looks to me that propensity score matching will not yield the test you seek.

        Comment


        • #5
          given that the help file says:
          teffects psmatch (ovar) (tvar tmvarlist [, tmodel]) [if] [in]
          [weight] [, stat options]

          ovar is a binary, count, continuous, fractional, or nonnegative outcome
          of interest.
          I don't think that the OP is giving enough information

          Comment


          • #6
            Rich, I don't quite catch the relevance of the material you quote. My understanding is that the problem lies with what Pedro calls treatmentvar which corresponds to what the help file calls tvar and describes thusly:

            tvar must contain integer values representing the treatment levels.

            tmvarlist specifies the variables that predict treatment assignment in the treatment
            model. Only two treatment levels are allowed.
            which sort of hides the restriction of tvar to two levels in the description of tmvarlist.

            Comment


            • #7
              Originally posted by William Lisowski View Post
              Rich, I don't quite catch the relevance of the material you quote. My understanding is that the problem lies with what Pedro calls treatmentvar which corresponds to what the help file calls tvar and describes thusly:


              which sort of hides the restriction of tvar to two levels in the description of tmvarlist.
              Hi William,

              Thanks for the reply. I've decided to generate a new variable that is 1 if > mean for treatmentvar, so I've been able to do it!

              Comment


              • #8
                Have same problem as Pedro here... anyone else know what might be the problem?

                I am also using
                Code:
                teffects psmatch (outcomevar) (treatmentvar varlist), gen(match)
                with 2 different outcome variables. with one of them it works, even though the outcome variable has NEGATIVE values, with another it does not work, although it has NO NEGATIVE values.
                both have not missing values as well (but having them didn't matter)

                if I use simply
                Code:
                psmatch (treatmentvar varlist), out(outcomevar) logit ate
                it does work, but as this way does not give me the SE for ATE (and other stats) I wanted to use the
                Code:
                teffects
                the problematic outcome variable has the following descriptives:

                type: numeric (float)
                range: [3.1780539,18.064938]
                units: 1.000e-07
                unique values: 609
                missing .: 7879/51712
                mean: 5.23944
                std. dev: 1.54297

                p.s. I use Stata 13 MP

                hope someone can help, thank you!

                Comment

                Working...
                X