Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • teffects psmatch - how much of the data were used?

    All --

    Please forgive me if this is an elementary question. I have searched the treatment effects manual and the internet in general for an answer to this question and have come up short. I suspect that the solution is obvious.

    I am conducting a propensity score matching analysis in Stata 13 using teffects psmatch. My total N is 2123, with 672 "treated" and 1451 "control" individuals. When I run the teffects psmatch command, the output states that the "number of obs = 2123." I can't tell if that is before or after matching. I am currently running everything with the default settings, so nearest neighbor=1. If this is indeed what is happening and I have exactly one match/treated individual (and I keep all my treated individuals), then I should have a matched sample of 2*672.

    Another way to ask my question is -- does effects psmatch generate a variable (or can I make Stata generate a variable) that indicates the "matched" sample?

    Thanks in advance for your help!

    Jenni

  • #2
    Jenni,

    Check out the generate() option which can be used to specify the names for generated variables that will contain the observation numbers of the matching observations.

    Also note that, by default, teffects psmatch computes the ATE (average treatment effect), which has implications for how Stata performs the matching. You may, in fact, want to specify atet (average treatment effect on the treated) instead.

    Regards,
    Joe

    Comment


    • #3
      Also note that the algorithm uses matching with replacement, so the same controls could be used repeatedly

      Comment


      • #4
        Thanks Joe! I added the option "generate(match)" and generated two variables (to my surprise) - "treat1" and "match1". Their values are identical. I'm not sure why that happened or what the utility is of having both variables, but now I can figure out how many of my controls were used in the matching.


        Comment


        • #5
          I'm having a similar problem. I used "generate(match)" as an option, but it generated four variables "match1" "match2" "match3" and "match4." This may be an elementary question, but I am curious what this means? I'm trying to find the number of cases it kept (or the number it kicked out). Using attnd in Stata12, it provided the number in the "treated" and "control" groups based on the matches it made -- but I am not able to find that using teffects psmatch.

          Any help would be appreciated!
          Thanks!
          Cindy

          Comment


          • #6
            Cindy,

            Stata 13 generates match variables according to the number of neighbors you requested in nneighbors(#). The values of these variables are the observation numbers of the matches. You will have to do a little work to reconstruct the total number of cases with matches and the number of controls used as matches.

            Regards,
            Joe

            Comment


            • #7
              Originally posted by Cindy Ann Kilgo View Post
              I'm having a similar problem. I used "generate(match)" as an option, but it generated four variables "match1" "match2" "match3" and "match4." This may be an elementary question, but I am curious what this means?
              If 4 variables were created then there is at least one observation that was matched with four other observations (the output in the Stata window should show it as "max = 4". Joe Canner has already mentioned that the new variables take on the value of observation numbers. nneighbor(#) is the minimum number of observations that will be matched to any one observation and it defaults to 1.
              You should:

              1. Read the FAQ carefully.

              2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

              3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

              4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

              Comment


              • #8
                Originally posted by Cindy Ann Kilgo View Post
                II'm trying to find the number of cases it kept (or the number it kicked out).
                Maybe you mean something like:

                Code:
                clear all
                set more off
                
                webuse cattaneo2
                keep bweight mbsmoke mmarried mage fbaby medu
                
                teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu), gen(match) nneighbor(4)
                
                * counts
                egen cou = rownonmiss(match*)
                
                * just one example:
                quietly summarize cou if mbsmoke, meanonly
                display "number of controls (non smokers) matched with treated: " r(sum)
                
                quietly summarize cou if !mbsmoke, meanonly
                display "number of treated (smokers) matched with controls: " r(sum)
                but I'm not sure.
                You should:

                1. Read the FAQ carefully.

                2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

                3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

                4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

                Comment


                • #9
                  In Stata 12 I use psmatch2, it gives indications whether observations are on support and whether observations are matched. Usually, for nearest neighbor, replacement option is good (Angrist and Pischke 2008).

                  You can also try to use exact covariate matching by "ccmatch".

                  Reference: Angrist, J. D., & Pischke, J. S. (2008). Mostly harmless econometrics: An empiricist's companion. Princeton university press.

                  Comment


                  • #10
                    Originally posted by jepearso View Post
                    All --

                    Please forgive me if this is an elementary question. I have searched the treatment effects manual and the internet in general for an answer to this question and have come up short. I suspect that the solution is obvious.

                    I am conducting a propensity score matching analysis in Stata 13 using teffects psmatch. My total N is 2123, with 672 "treated" and 1451 "control" individuals. When I run the teffects psmatch command, the output states that the "number of obs = 2123." I can't tell if that is before or after matching. I am currently running everything with the default settings, so nearest neighbor=1. If this is indeed what is happening and I have exactly one match/treated individual (and I keep all my treated individuals), then I should have a matched sample of 2*672.

                    Another way to ask my question is -- does effects psmatch generate a variable (or can I make Stata generate a variable) that indicates the "matched" sample?

                    Thanks in advance for your help!

                    Jenni

                    Also, be sure to do stuff like box plots and kernel density plots for before/after matching, and do a balance check. It is suggested by Guo and Fraser book on PSM. Here are some sections (with graphs and procedures) that you can follow when writing up a report: http://papers.ssrn.com/sol3/papers.c...act_id=2335669

                    Comment


                    • #11
                      please I need help…. I am running teffects psmatch2 when the nearest neighbor==2, I am using the below codes to get the number of matched unites. However, I do not know how to adjust the belwo codes in case of having match1 and match2.


                      gen ob=_n //store the observation numbers for future use
                      save fulldata,replace // save the complete data set
                      keep if t // keep just the treated group
                      keep match1 // keep just the match1 variable (the observation numbers of their matches)
                      bysort match1: gen weight=_N // count how many times each control observation is a match
                      by match1: keep if _n==1 // keep just one row per control observation
                      ren match1 ob //rename for merging purposes
                      merge 1:m ob using fulldata // merge back into the full data
                      replace weight=1 if t // set weight to 1 for treated observations

                      Comment


                      • #12
                        Hi All,

                        I’m very sorry, but I’m still confused about this issue. I’m using Stata 15 to estimate the following model the output from this model indicates that it is based on 1,879 observations (people) (some variables have missing data, but results of a logistic regression with the below treatment variable and covariates indicates a sample of 1,879):

                        teffects psmatch (PostMisdemeanorA_sum) (person_type sex race marital_status cAge PreFelSUM PreMisSUM), generate(match)

                        This generates two new variables, match1 and match2. The frequency distribution for match1 lists a sample size of 1,879 and I gather that this frequency distribution lists the number of matches for each case (this ranges from 1 to 10)? The frequency distribution for match2 lists one case id with a frequency of 1. Does this mean that all but one case had a match and, if so, why does the output from the teffects model list 1,879 observations?

                        Thanks for any help you can provide with this.

                        Warmest Regards,

                        John

                        Comment


                        • #13
                          Hi John,

                          The variables created by teffects psmatch, generate() do not contain any information about frequency of matches. Rather, they store the observation numbers from the matched neighbors. If in your case teffects psmatch, generate() created two new variables, this means that you have at least one case for which two matches were found (which in this case is due to a tie in the propensity score given that you used the default of matching one neighbor). teffects psmatch is not arbitrarily dropping or trimming any observations and uses all observations with valid data across the used variables, in your case N=1,879.

                          Joerg

                          Comment


                          • #14
                            Many thanks, Joerg! John

                            Comment


                            • #15
                              I've read this post many times and am still uncertain about the proper n to report in a table with ATET results. The full sample N is listed in the output, but I specified a one-to-one match and think the N I should report is (2 x treatment group N). Why doesn't Stata provide the number of observations used for the ATET in the output directly? Am I missing something?
                              Thanks in advance.
                              Cathy

                              Comment

                              Working...
                              X