Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sampling and matching on the basis of specific characteristics

    Dear Stata Community

    I am trying to use a specific sampling procedure, however, I am currently stuck on how to implement this.

    I have two data sets, one with my sample firms inlcuing information regarding, completion date, returns, book-to-market ratio and size (File A) among others and one with potential matching firms including similar information (File B) for each point in time. My goal is to "randomly" sample 1,000 matching firms which are similar to my firms in File A from File B where the similarity is determined on the basis of size and book-to-market ratio. The smaller the difference between both characteristics the higher the similarity. One condition is that a matching firm can only be used once. Thus, the matching procedure should look as follows:
    1. Sample the matched firm out of File B which has the highest similarity to a sample firm in File A at completion date.
    2. Save both information from File A and File B in a new file.
    3. Delete respective matched firm in File B
    4. Repeat step 1 for 1,000 times.
    Right now, I perform have implemented the following procedure (see my code below):
    1. Sample 1,000 firms from File A with replacement (i.e. by using -bsample-)
    2. For each of these 1,000 firms, -joinby- the information of File B
    3. Minimize the difference between characteristics of Firm A and Firm B.
    4. Keep those firms where the difference is minimized and drop all others.
    Unfortunately, this procedure's drawback is that a sampled firm which is sampled several times due to sampling with replacement (which is necessary) will always have the same matched firms. Thus, it would be great if the next closest firm in terms of similarity is chosen. I know, the right approach to perform this matching would be the first option, however, I have no clue on how I could implement this.

    Does anybody have some inputs thereto?

    Kind regards
    Andreas

    Code:
    use File_A.dta, clear
    drop if diff<0
    drop if diff>0 // only data of completion date is needed
    gen match=string(month(date), "%02.0f")+string(year(date), "%02.0f")
    sort match
    rename permno permnoa
    rename size sizea
    rename bmratio bmratioa
    rename prc prca
    rename shrout shrouta
    rename return returna
    sort permnoa event date
    joinby match using File B
    drop if permnoc==permnoa  // drops observations where sample firms are its own matched firms
    drop if bmratioc==. // drops firms with no book-to-market ratio
    gen constrainedsize=sizea*0.9 // the size of the matched firm should be not smaller than 90% of the sample firm
    gen deltasize=abs(sizea-sizec) // absolute difference between size of sample and matched firm
    gen deltabmratio=abs(bmratioa-bmratioc) // absolute difference between book-to-market ratio of sample and matched firm
    gen pdiffsize=abs(deltasize/sizea) // calculates the percentage difference
    gen pdiffbmratio=abs(deltabmratio/bmratioa) // calculates the percentage difference
    sort permnoa event diff
    gen x=1 if sizec<constrainedsize // if the size of the matched firm is lower than the constrained size, mark it with x=1
    egen minsize=min(deltasize) if x==1, by(permnoa event diff) // if the size is lower than the constrained size, chose the smallest difference
    gen y=1 if minsize==deltasize & x==1 // mark the firm with the smallest difference if size is lower than constrained size
    gen z=1 if x==1 & y==. // mark all other firms if size is lower than constrained size but not the smallest difference
    drop if z==1 // drop those firms
    gen soapd=(pdiffsize+pdiffbmratio) // generate the sum of the absolut percentage difference of size and book-to-market ratio
    egen mindiff = min(soapd), by(permnoa event diff) // generate the minimum of the sum outlined above
    keep if soapd==mindiff // keep those firms where the sum is minimized
    save matching, replace
Working...
X