Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mahapick with foreach loop

    Dear Statalists,

    I want to find several (5) company matches based on financial data, always for a given year and a given industry. The industry is expressed as a code ranging from 100 to 9997 in one-unit steps. When there are less than 5 matching observations for a company, I would like to run mahapick again, but then searching for companies in the next industry (industry code + 1). If there are still less than 5 matches, the search shall be repeated with industry code + 2, and so on...

    Moreover, the "treated companies" are listed twice in my dataset, both as treated and not treated, because for other treated companies they might serve as a good match. Therefore, running mahapick leads to the result that many of the first matches are companies with themselves.

    If someone has a good idea, please help me out!

    Below you see an dataex example and the coding so far. I´m working with MP 15.0.


    Code:
    clear
    input float id byte pair_id int year byte(treat industry) int assets float(bm_ratio match_id_1 match_id_2 match_id_3 match_id_4 match_id_5 match_id_6)
     1  1 2015 1 10 100       .1 1 9 5 7  3 .
     1  2 2016 1 10 110      .11 1 9 5 3  7 .
     1  3 2017 1 10 120      .12 1 9 3 5  7 .
     1  4 2018 1 10 130      .13 1 9 3 7  5 .
     1  5 2019 1 10 140      .14 1 9 3 5  7 .
     2  6 2015 1 20 200       .2 2 4 8 6 10 .
     2  7 2016 1 20 210      .21 2 4 6 8 10 .
     2  8 2017 1 20 220      .22 2 4 8 6 10 .
     2  9 2018 1 20 230      .23 2 6 8 4 10 .
     2 10 2019 1 20 240      .24 2 6 4 8 10 .
     1  . 2015 0 10 100       .1 . . . .  . .
     1  . 2016 0 10 110      .11 . . . .  . .
     1  . 2017 0 10 120      .12 . . . .  . .
     1  . 2018 0 10 130      .13 . . . .  . .
     1  . 2019 0 10 140      .14 . . . .  . .
     2  . 2015 0 20 200       .2 . . . .  . .
     2  . 2016 0 20 210      .21 . . . .  . .
     2  . 2017 0 20 220      .22 . . . .  . .
     2  . 2018 0 20 230      .23 . . . .  . .
     2  . 2019 0 20 240      .24 . . . .  . .
     3  . 2015 0 10 105      .98 . . . .  . .
     3  . 2016 0 10 115     .101 . . . .  . .
     3  . 2017 0 10 125      .12 . . . .  . .
     3  . 2018 0 10 135      .15 . . . .  . .
     3  . 2019 0 10 145      .12 . . . .  . .
     4  . 2015 0 20 205     .202 . . . .  . .
     4  . 2016 0 20 215     .203 . . . .  . .
     4  . 2017 0 20 225      .23 . . . .  . .
     4  . 2018 0 20 235      .21 . . . .  . .
     4  . 2019 0 20 245      .23 . . . .  . .
     5  . 2015 0 10 130  .111111 . . . .  . .
     5  . 2016 0 10 120 .1099999 . . . .  . .
     5  . 2017 0 10 140  .132222 . . . .  . .
     5  . 2018 0 10 180  .102222 . . . .  . .
     5  . 2019 0 10 150      .12 . . . .  . .
     6  . 2015 0 20 290      .23 . . . .  . .
     6  . 2016 0 20 250      .23 . . . .  . .
     6  . 2017 0 20 340      .24 . . . .  . .
     6  . 2018 0 20 240      .25 . . . .  . .
     6  . 2019 0 20 250      .24 . . . .  . .
     7  . 2015 0 10  90      .13 . . . .  . .
     7  . 2016 0 10 105      .14 . . . .  . .
     7  . 2017 0 10  80      .14 . . . .  . .
     7  . 2018 0 10  90      .15 . . . .  . .
     7  . 2019 0 10 110      .15 . . . .  . .
     8  . 2015 0 20 190      .24 . . . .  . .
     8  . 2016 0 20 170      .24 . . . .  . .
     8  . 2017 0 20 200  .245555 . . . .  . .
     8  . 2018 0 20 205      .23 . . . .  . .
     8  . 2019 0 20 206  .243333 . . . .  . .
     9  . 2015 0 10  98      .09 . . . .  . .
     9  . 2016 0 10 101     .093 . . . .  . .
     9  . 2017 0 10 100     .096 . . . .  . .
     9  . 2018 0 10 104     .098 . . . .  . .
     9  . 2019 0 10 105     .104 . . . .  . .
    10  . 2015 0 20 197      .08 . . . .  . .
    10  . 2016 0 20 199     .089 . . . .  . .
    10  . 2017 0 20 201     .092 . . . .  . .
    10  . 2018 0 20 200     .095 . . . .  . .
    10  . 2019 0 20 208     .099 . . . .  . .
    11  . 2015 0 11 101      .11 . . . .  . .
    11  . 2016 0 11 102      .12 . . . .  . .
    11  . 2017 0 11 103   .13444 . . . .  . .
    11  . 2018 0 11 102  .135666 . . . .  . .
    11  . 2019 0 11 104     .147 . . . .  . .
    12  . 2015 0 21 206      .23 . . . .  . .
    12  . 2016 0 21 218      .23 . . . .  . .
    12  . 2017 0 21 198     .235 . . . .  . .
    12  . 2018 0 21 212     .237 . . . .  . .
    12  . 2019 0 21 220     .239 . . . .  . .
    13  . 2015 0 12 130    .1134 . . . .  . .
    13  . 2016 0 12 128    .1139 . . . .  . .
    13  . 2017 0 12 134    .1142 . . . .  . .
    13  . 2018 0 12 138    .1134 . . . .  . .
    13  . 2019 0 12 140    .1156 . . . .  . .
    14  . 2015 0 22 230     .212 . . . .  . .
    14  . 2016 0 22 228     .219 . . . .  . .
    14  . 2017 0 22 220     .221 . . . .  . .
    14  . 2018 0 22 231     .229 . . . .  . .
    14  . 2019 0 22 235      .24 . . . .  . .
    end
    
    
    gen match_id_1 = .
    gen match_id_2 = .
    gen match_id_3 = .
    gen match_id_4 = .
    gen match_id_5 = .
    gen match_id_6 = .
    recast float id
    mahapick assets bm_ratio, idvar(id) pickids(match_id_1 match_id_2 match_id_3 match_id_4 match_id_5 match_id_6) treated(treat) matchon(year industry)


  • #2
    Hey Statalists,

    i made some progress on my issue, but it´s still not solved. I added the variable "match_industry" that increases by one every time mahapick hasn´t found enough matches and the variable "start_missing" that displays the digit of the first match_id thatl is still missing.
    Below in the last line of the foreach loop, how can I ensure, that Stata for each observation starts with mahapick from the match_id that was the first missing in the last mahapick match?


    Code:
    clear
    input byte(id pair_id) int year byte(treat industry) int assets float bm_ratio
     1  1 2015 1 10 100       .1
     1  2 2016 1 10 110      .11
     1  3 2017 1 10 120      .12
     1  4 2018 1 10 130      .13
     1  5 2019 1 10 140      .14
     2  6 2015 1 20 200       .2
     2  7 2016 1 20 210      .21
     2  8 2017 1 20 220      .22
     2  9 2018 1 20 230      .23
     2 10 2019 1 20 240      .24
     1  . 2015 0 10 100       .1
     1  . 2016 0 10 110      .11
     1  . 2017 0 10 120      .12
     1  . 2018 0 10 130      .13
     1  . 2019 0 10 140      .14
     2  . 2015 0 20 200       .2
     2  . 2016 0 20 210      .21
     2  . 2017 0 20 220      .22
     2  . 2018 0 20 230      .23
     2  . 2019 0 20 240      .24
     3  . 2015 0 10 105      .98
     3  . 2016 0 10 115     .101
     3  . 2017 0 10 125      .12
     3  . 2018 0 10 135      .15
     3  . 2019 0 10 145      .12
     4  . 2015 0 20 205     .202
     4  . 2016 0 20 215     .203
     4  . 2017 0 20 225      .23
     4  . 2018 0 20 235      .21
     4  . 2019 0 20 245      .23
     5  . 2015 0 10 130  .111111
     5  . 2016 0 10 120 .1099999
     5  . 2017 0 10 140  .132222
     5  . 2018 0 10 180  .102222
     5  . 2019 0 10 150      .12
     6  . 2015 0 20 290      .23
     6  . 2016 0 20 250      .23
     6  . 2017 0 20 340      .24
     6  . 2018 0 20 240      .25
     6  . 2019 0 20 250      .24
     7  . 2015 0 10  90      .13
     7  . 2016 0 10 105      .14
     7  . 2017 0 10  80      .14
     7  . 2018 0 10  90      .15
     7  . 2019 0 10 110      .15
     8  . 2015 0 20 190      .24
     8  . 2016 0 20 170      .24
     8  . 2017 0 20 200  .245555
     8  . 2018 0 20 205      .23
     8  . 2019 0 20 206  .243333
     9  . 2015 0 10  98      .09
     9  . 2016 0 10 101     .093
     9  . 2017 0 10 100     .096
     9  . 2018 0 10 104     .098
     9  . 2019 0 10 105     .104
    10  . 2015 0 20 197      .08
    10  . 2016 0 20 199     .089
    10  . 2017 0 20 201     .092
    10  . 2018 0 20 200     .095
    10  . 2019 0 20 208     .099
    11  . 2015 0 11 101      .11
    11  . 2016 0 11 102      .12
    11  . 2017 0 11 103   .13444
    11  . 2018 0 11 102  .135666
    11  . 2019 0 11 104     .147
    12  . 2015 0 21 206      .23
    12  . 2016 0 21 218      .23
    12  . 2017 0 21 198     .235
    12  . 2018 0 21 212     .237
    12  . 2019 0 21 220     .239
    13  . 2015 0 12 130    .1134
    13  . 2016 0 12 128    .1139
    13  . 2017 0 12 134    .1142
    13  . 2018 0 12 138    .1134
    13  . 2019 0 12 140    .1156
    14  . 2015 0 22 230     .212
    14  . 2016 0 22 228     .219
    14  . 2017 0 22 220     .221
    14  . 2018 0 22 231     .229
    14  . 2019 0 22 235      .24
    end
    
    
    
    gen match_id_1 = .
    gen match_id_2 = .
    gen match_id_3 = .
    gen match_id_4 = .
    gen match_id_5 = .
    gen match_id_6 = .
    gen match_industry = industry
    gen start_missing = 0
    recast float id
    mahapick assets bm_ratio, idvar(id) pickids(match_id_1 match_id_2 match_id_3 match_id_4 match_id_5 match_id_6) treated(treat) matchon(year industry)
    foreach x in start_missing {
    replace start_missing = cond(match_id_1==.,1,cond(match_id_2==.,2,cond(match_id_3==.,3,cond(match_id_4==.,4,cond(match_id_5==.,5,cond(match_id_6==.,6,0)))))) if treat == 1
    replace match_industry = match_industry+1 if `x' >0 & treat == 1
    mahapick assets bm_ratio, idvar(id) pickids(match_id_`x' UNTIL match_id_6) treated(treat) matchon(year match_industry)
    }
    Last edited by Edgar Goldemer; 03 Nov 2019, 18:50.

    Comment


    • #3
      If your industry is an SIC code, then your method is flawed. Industries break at the 1000's so 1999 is not at all like 2000. What folks would normally do is move to a higher level (e.g., 3 digit sic code) level. I'm not sure how other industry codes work.

      Comment

      Working...
      X