Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • CALIPMATCH: caliper matching without replacement. Available in SSC.

    calipmatch matches case observations to control observations using "caliper" matching and (optionally) exact matching. Controls observations matched to a case observation will have values within +/- the caliper width for every caliper matching variable. Matched observations will also have identical values for every specified exact matching variable, if any are specified. calipmatch supports 1:1 or 1:m matching of cases to controls, without replacement.

    calipmatch was written in collaboration with Allan Garland, of the University of Manitoba Faculty of Medicine.

    It is now available in the SSC, with thanks to Kit Baum. It can also be viewed on Github.


    Details

    This program allows you to perform fuzzy case-control matching, matching cases to controls that have close-but-not-identical values for caliper matching variables. You specify a "caliper width" for each caliper matching variable, and all controls matched to a case will have values within +/- that width for the corresponding variable.

    Controls are randomly matched to cases without replacement. For each case, calipmatch searches for matching controls until it either finds the pre-specified maximum number of matches or runs out of controls. The search is performed greedily: it is possible that some cases end up unmatched because all possible matching controls have already been matched with another case.

    calipmatch is optimized to run extremely efficiently. Exact matching is performed before caliper matching, using a sort. Caliper matching is implemented in Mata, and searches only within exact match groups. This program was created because our original caliper matching Stata code ran on our problem for 10 days without completion. The version of calipmatch now available to you on SSC completed our matching problem in under 5 minutes.

  • #2
    Hi Michael,
    I used the stata calipmatch code but unfortunately, the result variable (newvar) does not look good because it matched the case group with the case group itself and also control group.
    could you please guide me what is the problem.

    below is the applied code.

    calipmatch, generate(newvar) casevar(cccc) maxmatches(1) calipermatch(Size ROA) caliperwidth(3 3)

    Best regards,
    Mahmoud

    Comment


    • #3
      Hi Seyed Mahmoud Hosseinniakani, unfortunately your post doesn't give me enough information to tell what's going on when you run -calipmatch- on your data. Could you post:

      1. The output calipmatch prints after you run it
      2. The output of -codebook newvar-

      Comment


      • #4
        Michael:


        Below is the code I run and the output.

        Note: I have in total of 282 observations for the case group (case=1, in rang of 1 to 282) and 2073 observations for the control group (case= 0, in range of 289 to 2073).


        Code:

        . calipmatch in 1/2355, generate(newvar) casevar(case) maxmatches(1) calipermatch
        > (Size ROA) caliperwidth(3 3)
        100.0% match rate.
        282 out of 282 cases matched.

        Successful matches for each case
        --------------------------------
        0 matched control obs: 0 (0.0%)
        1 matched control obs: 282 (100.0%)

        .
        end of do-file



        Here is the example:



        ----------------------- copy starting from the next line -----------------------
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str11 CIQID byte case float(Size ROA) int newvar
        "IQ875262"    1    5.9422     .3076084 171
        "IQ878077"    1  5.658115   -.01313736  40
        "IQ875143"    1  9.056618   .018215531 113
        "IQ526391"    1  9.068716   .006176507  70
        "IQ9220428"   1  5.887492    .04521498 149
        "IQ875306"    1  5.839315   .016425956 246
        "IQ4469438"   1   8.04372   .006810798 179
        "IQ770402"    1  9.279925   .012669099 212
        "IQ2907922"   1   7.94206    .02033617 190
        "IQ873951"    1 10.611293   .010577736  72
        "IQ9829913"   1  5.109539    .01938894 159
        "IQ248270"    1  9.035193   .019926276 162
        "IQ269706"    1 10.141032  -.003210821  84
        "IQ876681"    1  9.468121   .010503365  92
        "IQ874013"    1  8.149743   .014246375 262
        "IQ879697"    1  6.216342    .04001553 278
        "IQ739769"    1  7.841354  -.005699203 137
        "IQ879090"    1  5.320148    .02712993 176
        "IQ874497"    1   8.42903   .005143028 282
        "IQ420144"    1  8.651145   .011896455 251
        "IQ9238097"   1  6.712256   .013841235 231
        "IQ874186"    1  9.282967   .001945279 273
        "IQ874520"    1  8.518131   .017690856   3
        "IQ879374"    1  5.942805   .016215526 258
        "IQ312375"    1  10.67422   .012725808 237
        "IQ4462979"   1  6.358113   -.03160769 256
        "IQ874893"    1  7.902878  -.010574399 146
        "IQ879851"    1   5.23058    .02238418 238
        "IQ879924"    1  6.616875  .0042946353 264
        "IQ880162"    1 4.6847935    .01168016 130
        "IQ32543"     1 1.1957208   -.11112724 235
        "IQ7907057"   1  4.963677  .0008837398 117
        "IQ882949"    1 3.4379156  -.008256212  11
        "IQ784187"    1 4.5088663    .01261875 263
        "IQ4481455"   1  5.639683   -.02939144 223
        "IQ621976"    1   8.52564   .013294307 224
        "IQ879839"    1  7.630569    .01012259 240
        "IQ875516"    1  9.145005   .028042916 269
        "IQ875467"    1  8.299857   .016724665  23
        "IQ631781"    1  9.074807   .007409141 268
        "IQ139511"    1  6.372157   .003656912 181
        "IQ142742"    1  9.261429   .013931903 218
        "IQ4481256"   1  5.846624    .03269702 106
        "IQ5455504"   1  8.335067    .01442779  26
        "IQ139499"    1  6.313629 -.0001762358 209
        "IQ125899"    1  6.713025    .05845499 178
        "IQ134377"    1  8.316665  -.007300312 133
        "IQ139517"    1  5.628258  -.015989594 134
        "IQ6478806"   1  4.670587   .016950669 167
        "IQ9220802"   1  7.642087   .016031297 174
        "IQ883105"    1 4.5355573   .006049822 142
        "IQ5450008"   1  5.319767    .00628513 158
        "IQ9206345"   1  6.782927   .010482802 194
        "IQ5774947"   1  6.178321   -.03593611  30
        "IQ2480879"   1 4.1842475     -.108048  87
        "IQ408768"    1  9.814219    .01258989  38
        "IQ408408"    1  7.294678    .05061451 243
        "IQ5462496"   1 3.1589954  -.014664332 219
        "IQ4493214"   1   3.65869   .035096467 279
        "IQ652327"    1  8.464849    .01199508 164
        "IQ292856243" 1  4.853376  .0045333453  25
        "IQ142418"    1  2.482635 -.0013981727  85
        "IQ2420858"   1  3.532274    .01521359 242
        "IQ4509100"   1 4.2172866  .0042445646 211
        "IQ5645441"   1  8.993887  .0003660429 214
        "IQ418425"    1  5.801833    .04151598 166
        "IQ36827615"  1  6.745465   .006224071 229
        "IQ883159"    1  9.015266  .0090461895 183
        "IQ94947"     1  4.528439   .032041203 128
        "IQ12703856"  1 3.4554675    .02828205  88
        "IQ1233090"   1 4.4155684   .006350748 184
        "IQ4493208"   1  7.979531   .011193286  82
        "IQ587323"    1  5.604098   .012970718  24
        "IQ2481318"   1 4.3425183   .030997016 100
        "IQ9284137"   1  2.889475   .013833508  59
        "IQ2480533"   1  5.104532 -.0030705805 221
        "IQ712065"    1  3.549865   .027464453  16
        "IQ9844379"   1  3.927072   .028943935 161
        "IQ408906"    1  7.567901   .005480414 101
        "IQ1063939"   1  5.269331   .012719714 188
        "IQ8178522"   1  3.504629   .027267644 132
        "IQ419993"    1  3.299756    .03625282  98
        "IQ882078"    1  5.191676   .034096424  93
        "IQ784159"    1  5.765669    -.0451788 102
        "IQ4493212"   1  6.378404   .001090245 220
        "IQ4493203"   1  3.428802   .023616984 109
        "IQ27695913"  1  3.713324   .035366815 236
        "IQ1268289"   1 4.5177464    .04762603 180
        "IQ23675944"  1  9.300563    .01791334 182
        "IQ3733605"   1  4.107929    .06903595  33
        "IQ1027520"   1  5.652737    .00989762 175
        "IQ7867052"   1  2.856247   .018456414  43
        "IQ3449723"   1  7.310217     .0193221 195
        "IQ9391758"   1  3.687562     .0333842 210
        "IQ9329390"   1  4.781628   .016151177  36
        "IQ9142038"   1  2.973284    .02139288  80
        "IQ2917892"   1 3.6479836   .015350354  73
        "IQ5688970"   1  8.115936    .01851475 116
        "IQ1873559"   1  5.668314   .013104857 267
        "IQ4493313"   1  3.262085   .035780862  32
        end
        ------------------ copy up to and including the previous line ------------------

        Listed 100 out of 2355 observations
        Use the count() option to list more

        .

        Comment


        • #5
          Hi Seyed Mahmoud Hosseinniakani, I still need to see the output of running -codebook newvar-.

          I'd also like to see the output of:

          egen casecount=total(case), by(newvar)
          tab casecount if !mi(newvar)

          Comment


          • #6
            Michael:
            Here is the output.

            codebook of newvar:




            . codebook newvar

            -------------------------------------------------------------------------------------
            newvar (unlabeled)
            -------------------------------------------------------------------------------------

            type: numeric (int)

            range: [1,282] units: 1
            unique values: 282 missing .: 1,791/2,355

            mean: 141.5
            std. dev: 81.4781

            percentiles: 10% 25% 50% 75% 90%
            29 71 141.5 212 254

            .


            Here is the output of:

            . egen casecount=total(case), by(newvar)

            . tab casecount if !mi(newvar)

            casecount | Freq. Percent Cum.
            ------------+-----------------------------------
            1 | 564 100.00 100.00
            ------------+-----------------------------------
            Total | 564 100.00

            .
            end of do-file

            .



            Thanks!
            Mahmoud

            Comment


            • #7
              Hi Seyed Mahmoud Hosseinniakani, based on those outputs it looks to me like it matched one control to each case, as expected based on the command you ran:

              Code:
              . calipmatch in 1/2355, generate(newvar) casevar(case) maxmatches(1) calipermatch
              > (Size ROA) caliperwidth(3 3)
              100.0% match rate.
              282 out of 282 cases matched.
              
              Successful matches for each case
              --------------------------------
              0 matched control obs: 0 (0.0%)
              1 matched control obs: 282 (100.0%)
              The variable newvar has values {1,...,282} which define your matched groups. For each non-missing value of newvar, there is one case and one control.

              Comment


              • #8
                Hi Michael
                I agree with you but it is confusing when I read the -newvar-. For instance, the example shows that the first observation has the value of 171 which means that the first observation matched with observation number 171 while both observations are in the case group. Please correct me if I am not reading well or I am missing something here.

                Kind regards,
                Mahmoud

                Comment


                • #9
                  Hi Seyed Mahmoud Hosseinniakani, the observations that are matched to each other have the same value of newvar. So, the observation with newvar==171 is matched with the other observation that has newvar==171. I see this is a misunderstanding about how newvar indicates groupings, which means that the documentation should be clearer. Sorry about the confusion, I hope the matched groups are clear now!

                  Comment


                  • #10
                    Hi Michael,

                    Thank you very much for description. Now, I totally understand the -newvar-.

                    I appreciate your contribution of Calipmatch code in Stata.

                    Kindly,
                    Mahmoud

                    Comment


                    • #11
                      Hi Michael,
                      I am trying to match treatment with controls on avpatent +/-10% and exact match on patentclass. the code I am trying is

                      [calipmatch 1/667, generate(pair) casevar(treatment) maxmatches(1) calipermatch(avpatent) caliperwidth(1) exactmatch(patentclass)]

                      but I'm getting the error message:

                      varlist not allowed
                      r(101);

                      Comment


                      • #12
                        Originally posted by Syeda Haider View Post
                        Hi Michael,
                        I am trying to match treatment with controls on avpatent +/-10% and exact match on patentclass. the code I am trying is

                        [calipmatch 1/667, generate(pair) casevar(treatment) maxmatches(1) calipermatch(avpatent) caliperwidth(1) exactmatch(patentclass)]

                        but I'm getting the error message:

                        varlist not allowed
                        r(101);
                        Hi Syeda,

                        It looks like you're missing the word "in" before your condition around which observations to include in the match:

                        Code:
                        calipmatch in 1/667, generate(pair) casevar(treatment) maxmatches(1) calipermatch(avpatent) caliperwidth(1) exactmatch(patentclass)

                        Comment


                        • #13
                          Thanks Michael..even I have noticed that later on and it worked

                          Comment


                          • #14
                            Hi Michael,

                            thanks for your previous answers, they were already very useful to address my issue. Indeed, I now have a newvar column with matched control and treated observations. I will put the output of the process as asked by you in previous answers.

                            However, I am still struggling on the very last step of the process, namely calculating the effect of the treatment on a specific dependent variable y.
                            I am not able to figure out how to run a regression which does not capture the average difference between the whole treated and control group, but instead the difference for each pair (that is running a regression which takes into account the matching I previously did).

                            code:
                            calipmatch in 1/437653, generate (psm1) casevar(TREATMENT) maxmatches(1) calipermatch(TOTAL_ASSETS DATEINC) caliperwidth(3 0.1) exactmatch(Industry REGION)
                            codebook psm1
                            egen casecount=total(TREATMENT), by (psm1)
                            tab casecount if !mi(psm1)

                            . do "C:\Users\i6203978\AppData\Local\Temp\STD00000000. tmp"

                            . calipmatch in 1/437653, generate (psm1) casevar(TREATMENT) maxmatches(1
                            > ) calipermatch(TOTAL_ASSETS DATEINC) caliperwidth(3 0.1) exactmatch(Ind
                            > ustry REGION)
                            85.1% match rate.
                            229 out of 269 cases matched.

                            Successful matches for each case
                            --------------------------------
                            0 matched control obs: 40 (14.9%)
                            1 matched control obs: 229 (85.1%)

                            . codebook psm1

                            -------------------------------------------------------------------------
                            psm1 (unlabeled)
                            -------------------------------------------------------------------------

                            type: numeric (int)

                            range: [1,229] units: 1
                            unique values: 229 missing .: 437,195/437
                            > ,653

                            mean: 115
                            std. dev: 66.1783

                            percentiles: 10% 25% 50% 75% 90
                            > %
                            23 58 115 172 20
                            > 7

                            . egen casecount=total(TREATMENT), by (psm1)

                            . tab casecount if !mi(psm1)

                            casecount | Freq. Percent Cum.
                            ------------+-----------------------------------
                            1 | 458 100.00 100.00
                            ------------+-----------------------------------
                            Total | 458 100.00

                            .
                            end of do-file

                            Thanks for your help.

                            calipmatch matches case observations to control observations using "caliper" matching and (optionally) exact matching. Controls observations matched

                            Comment


                            • #15
                              Hello Stata users,
                              Is there a code that allows such matching within a percentage range of the variable as opposed to numeric range. For example what if I would like to match on variable x within +/- 30% range?

                              Comment

                              Working...
                              X