Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do pscore(propensity score matching) by industry with loop

    Dear all, I'm trying to do PSM matching with pscore. I have data about 1 million firms needed for propensity score matching. These firms are in 36 industries. I want to match control group by industry. My code is as follows: use firms.dta global vars lnl lntfp export valueadd forvalues i=1(1)36{ qui pscore treat $Vars if industry==`i', pscore(mypscore) blockid(myblock) logit } Then there comes error messege: "mypscore already defined". I know that's because when industry equals 2, myscore already exist in the dataset . But I have no idea how to do the loop. Anyone could give me suggestions? Thanks a lot! Cheers Owen

  • #2
    Dear all,

    I'm trying to do PSM matching with pscore. I have data about 1 million firms needed for propensity score matching. These firms are in 36 industries. I want to match control group by industry. My code is as follows:

    use firms.dta
    global vars lnl lntfp export valueadd

    forvalues i=1(1)36{
    qui pscore treat $Vars if industry==`i', pscore(mypscore) blockid(myblock) logit

    }


    Then there comes error messege: "mypscore already defined". I know that's because when industry equals 2, myscore already exist in the dataset . But I have no idea how to do the loop.

    Anyone could give me suggestions? Thanks a lot!

    Cheers

    Owen

    Comment


    • #3
      sorry, I posted for a second time because the first post is not clear.

      Comment


      • #4
        Try this, which gives each industry's estimated propensity score its own variable name with the corresponding industry number:
        Code:
        forvalues i=1(1)36{
        qui pscore treat $Vars if industry==`i', pscore(mypscore`i') blockid(myblock) logit
        }
        David Radwin
        Senior Researcher, California Competes
        californiacompetes.org
        Pronouns: He/Him

        Comment


        • #5
          Thanks David, that's the solution!

          Comment


          • #6
            Originally posted by David Radwin View Post
            Try this, which gives each industry's estimated propensity score its own variable name with the corresponding industry number:
            Code:
            forvalues i=1(1)36{
            qui pscore treat $Vars if industry==`i', pscore(mypscore`i') blockid(myblock) logit
            }
            Hi Statalist memebers,

            How can I match firms based on corresponding year along with industry. In my case this year is varying, different years for different firms. Can someone help with it ??

            Comment


            • #7
              Priyesh,

              We would need a lot more information to even attempt to give an informed response. Please read https://www.statalist.org/forums/help#stata on how to formulate a question and provide example data.

              As a guess, you might combine the firm ID, industry ID, and year to create a new unique ID for each combination, then reconfigure your data somehow (reshape?) to have one observation per new unique ID.
              David Radwin
              Senior Researcher, California Competes
              californiacompetes.org
              Pronouns: He/Him

              Comment


              • #8
                Thanks Radwin,

                Below is my sample data. I need to find control firms for treated firms (firm id is code) which belong to same industry and year. I have treatment events at different years. How do I match in this case.

                Thanks in advance.


                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str7 code int year byte industry double outcome byte treatment int iv1 double(iv2 iv3 iv4)
                "126872A" 2007 42  -.796494 0 14  8.84    75   6.71
                "213607A" 2015 62 -.5222495 . 26  4.84 36.84      .
                "176452A" 2004 21 -.4661654 0 13     .     .  18.76
                "48995A"  2007 62 -.4656251 1 18  5.07 55.34    .05
                "48995B"  2007 62 -.4656251 0 18  5.07 55.34    .05
                "213607A" 2016 62 -.4619563 . 27  18.1 40.59      .
                "206660A" 2006 42 -.4457149 0 18 10.99  54.8   14.5
                "81023A"  2005 62 -.3895238 0 23  37.8  29.7    .31
                "81023B"  2005 62 -.3895238 0 23  37.8  29.7    .31
                "126872A" 2011 42 -.3888639 1 18 18.84 67.98  24.03
                "141201A" 2001 62  -.387402 0 19 10.13 44.14      .
                "82197A"  2010 21 -.3587462 0 33 26.84 48.37     27
                "375271A" 2015 24 -.3582614 .  9   .15 36.49  55.06
                "324131A" 2012 62 -.3386264 0 12  2.15 58.34  11.69
                "223422A" 2004 10 -.3259988 0  9     .     .  44.76
                "65575A"  2010 26  -.321221 0 36   .05 27.29     18
                "244056A" 2000 28 -.3116581 0  5     .     .   8.53
                "126872A" 2008 42 -.3035966 0 15 13.23 73.68  12.24
                "170552A" 2002 27 -.3028515 0 24     . 65.56  18.15
                "140097A" 2007 10 -.2975993 0 19 17.08 63.45  24.55
                "140097B" 2007 10 -.2975993 0 19 17.08 63.45  24.55
                "93997A"  2001 42 -.2930033 0 75   .03 59.27  38.89
                "36073A"  2008 21 -.2915277 1 30  7.16  60.9   8.33
                "223422A" 2010 10 -.2901605 1 15 21.72 37.98  29.99
                "159120A" 2002 21 -.2892224 0 55  4.13 50.25  40.03
                "232983A" 2010 45 -.2886884 0 24   .82 63.25      .
                "178651A" 2006 28 -.2859501 0 21 10.14 31.22      .
                "81023A"  2002 62 -.2825716 0 20     . 55.54   4.79
                "81023B"  2002 62 -.2825716 0 20     . 55.54   4.79
                "352048A" 2005 62 -.2804814 0  8     .     .    .24
                "352048B" 2005 62 -.2804814 0  8     .     .    .24
                "212387A" 2004 61 -.2789802 0 28  5.05 68.46  15.21
                "83013A"  2006 20 -.2737612 1  6 16.94 68.19   1.93
                "83013B"  2006 20 -.2737612 0  6 16.94 68.19   1.93
                "164646A" 2007 62 -.2730317 1 12  7.06 81.65    .47
                "164646B" 2007 62 -.2730317 0 12  7.06 81.65    .47
                "76897A"  2012 27 -.2708862 0 30     . 62.11  58.38
                "100499A" 2000 62 -.2692363 0 12     .     .   9.92
                "170552A" 2000 27 -.2684754 0 22     .     .  33.84
                "42521A"  2017 45 -.2649833 .  .     .     .      .
                "76897A"  1999 27 -.2537713 .  .     .     .      .
                "91935A"  2005 62 -.2523612 0 14     . 46.75   39.5
                "178651A" 2007 28 -.2503318 1 22  9.53 30.13      .
                "223422A" 2003 10  -.246387 0  8     .     .  72.51
                "130382A" 2001 21 -.2457361 0 18     . 66.34  54.03
                "130382B" 2001 21 -.2457361 0 18     . 66.34  54.03
                "159120A" 1999 21  -.245101 .  .     .     .      .
                "73119A"  2006 45 -.2445612 0 18 25.56 47.66      .
                "223422A" 2013 10 -.2398888 0 18 16.74 38.34  32.13
                "81023A"  2011 62 -.2374351 0 29 41.05 23.23  32.71
                "81023B"  2011 62 -.2374351 0 29 41.05 23.23  32.71
                "67417A"  2009 21 -.2372583 0 26   .53 87.84   48.5
                "126872A" 2006 42 -.2364331 0 13     .     .   15.9
                "232983A" 1998 45 -.2356953 .  .     .     .      .
                "232983A" 2008 45 -.2349846 0 22  4.41 63.23   1.31
                "149899A" 1997 62 -.2338929 .  .     .     .      .
                "149899B" 1997 62 -.2338929 .  .     .     .      .
                "170552A" 2016 27 -.2338352 . 38     0 51.64 136.96
                "126872A" 2012 42  -.233217 0 19   7.5 72.22  21.44
                "196667A" 2005 19 -.2328193 0 32 21.55 46.76  24.01
                "196667B" 2005 19 -.2328193 0 32 21.55 46.76  24.01
                "100632A" 2007 62 -.2324546 0 26 32.55 16.54      .
                "100632B" 2007 62 -.2324546 0 26 32.55 16.54      .
                "400962A" 2013 45 -.2320503 0  7   .94 60.17  20.66
                "67417A"  2015 21 -.2281266 . 32 16.67 72.74   1.53
                "271515A" 1997 13 -.2281045 .  .     .     .      .
                "215829A" 2002 20 -.2265215 0 17     . 69.65  48.62
                "67417A"  2007 21 -.2264896 0 24  1.76 88.22   6.83
                "272724A" 2000 62 -.2259873 0 55     .     .   4.82
                "272724B" 2000 62 -.2259873 0 55     .     .   4.82
                "260321A" 2000 62   -.22448 0  7     .     .   5.28
                "203579A" 2015 13 -.2241423 . 25   .44  3.59 133.31
                "44042A"  2016 62 -.2239326 . 18     0 51.47   5.31
                "141201A" 2002 62 -.2198902 0 20 10.56  45.3    .38
                "118322A" 2009 26 -.2180476 1 13  2.23 14.96  44.97
                "260321A" 2007 62 -.2154226 0 14     .  8.27      .
                "127106A" 2007 42 -.2135036 0 61 18.26     .  11.81
                "83783A"  1999 32 -.2125095 .  .     .     .      .
                "85647A"  2005 28 -.2119042 0 83   .77 43.56  16.69
                "119829A" 2002 45 -.2116146 0 17     . 33.27   1.09
                "146456A" 1999 62 -.2101008 .  .     .     .      .
                "152087A" 1999 62 -.2096115 .  .     .     .      .
                "141201A" 2000 62 -.2084073 0 18     .     .   4.63
                "80522A"  2013 61 -.2074149 0 18 22.32 37.19  33.52
                "356060A" 2016 30 -.2046839 . 25 14.27 44.41 118.37
                "18495A"  2013 52 -.2044601 0 14 12.26  6.61  14.04
                "85379A"  2000 21 -.2039803 0  9     .     .  41.32
                "5747A"   2008 45  -.203584 0 15 13.93    75  44.83
                "5747A"   2009 45 -.2032193 0 16 13.23 74.91  47.82
                "248093B" 2008 28 -.2026184 1 63 16.96 33.42  24.04
                "248093A" 2008 28 -.2026184 0 63 16.96 33.42  24.04
                "81023A"  2007 62 -.2019407 0 25  54.4  24.9    .06
                "81023B"  2007 62 -.2019407 0 25  54.4  24.9    .06
                "152087A" 1997 62 -.2019394 .  .     .     .      .
                "44042A"  2010 62  -.199097 0 12     . 40.09  13.73
                "119083A" 2013 22 -.1989847 0 22   .23  8.92  21.93
                "122948A" 2001 62 -.1984061 0 11     . 52.79  18.76
                "122948B" 2001 62 -.1984061 0 11     . 52.79  18.76
                "348418A" 2007 62 -.1977073 0 15 14.45 39.67   6.52
                "149899A" 2000 62  -.197614 0  8     .     .    .43
                end

                Comment


                • #9
                  Here's a solution based on the "Matching within strata" section of psmatch2 (Leuven and Sianesi, SSC) help file. The first part manipulates the example data and is just for demonstration purposes.

                  Code:
                  * Increase size, add variation, replace missings
                  expand 100
                  set seed 12345
                  replace year = round(year - 3*runiform())
                  replace iv2 = iv2 + 3*runiform()
                  replace iv3 = iv3 + 7*runiform()
                  replace iv4 = iv4 + 5*runiform()
                  replace treatment = 1 if treatment==.
                  replace treatment = 1-treatment if runiform() < .2
                  
                  ***************************************************************
                  
                  * Make unique ID
                  gen industryyear = industry*10000 + year
                  
                  * From "Matching within strata" section of -psmatch2- help file
                  gen att = .
                  levelsof industryyear, local(gr)
                  foreach j of local gr {
                      capture noisily psmatch2 treatment iv1 iv2 iv3 iv4 if industryyear == `j', outcome(outcome) logit
                      replace att = r(att) if industryyear == `j'
                      }
                      
                  summ att
                  David Radwin
                  Senior Researcher, California Competes
                  californiacompetes.org
                  Pronouns: He/Him

                  Comment


                  • #10
                    @David Radwin Hi David, can I ask you a simple question based on your post #9?

                    Once you obtained the variable ATT for different groups of industries and years, how do you compute their t-statistics to see if they are significantly different with or without the treatment?

                    I will be very appreciative for your help!

                    Comment


                    • #11
                      There is more than one answer to this question. If you don't care about the Abadie and Imbens (2006) correction for variance, and you want to treat all the industry-years as a single sample, it's as simple as
                      Code:
                      ttest att, by(treatment)
                      I'm not sure how to implement it in psmatch2 when you are combining multiple groups or strata as in the example above, but you can search this Stata forum or start a new post asking for help.

                      Another possibility is to use logit or another method on the matched sample, as described in https://www.statalist.org/forums/for...86#post1366286.


                      Abadie A. and Imbens, G. (2006), "Large sample properties of matching estimators for average treatment effects", Econometrica 74(1), 235-267.
                      David Radwin
                      Senior Researcher, California Competes
                      californiacompetes.org
                      Pronouns: He/Him

                      Comment


                      • #12
                        @David Radwin

                        Hi David,

                        Thanks for sharing this code in the post #9. I have a question related.

                        Based on your codes, I am working to retrieve the 1:1 match results, but the last cycle of the loop overrides all prior cycles. Although I creat a new column to store the control group id, but it's hard to trace to the observations since ids are changing every round. Can you give me some hints?

                        My code is as below,

                        gen controlid=.

                        gen indyear = industry*10000 + year
                        egen group = group(indyear)

                        su group, meanonly

                        forvalues i = 1/`r(max)' {
                        capture noisily psmatch2 treatment $sample if group == `i', out(outcome) neighbor(1) noreplacement
                        capture noisily replace controlid=_n1 if group == `i'

                        }


                        Thanks,

                        Yanru

                        Comment


                        • #13
                          Yanru Yang, I'm not sure what you are trying to do, but maybe you can change the last line of code to
                          Code:
                          capture noisily replace controlid=_n1 if group == `i' & missing(controlid)
                          so that it only replaces missing values.
                          David Radwin
                          Senior Researcher, California Competes
                          californiacompetes.org
                          Pronouns: He/Him

                          Comment

                          Working...
                          X