Hello,
I need guidance to select observations according to different criteria by date (tid) and industry (gics). The process involves choosing ranked observations (sorted on score, and me = market-cap) based on the following steps, until a target coverage of 50% by cumulative me (for each date tid and gics) is achieved:
I would appreciate any insights or recommendations on how to implement this selection process effectively. Thank you!
Not sure if dataex will be useful given the large cross-section of the sample. So, here is a data generating code.
I need guidance to select observations according to different criteria by date (tid) and industry (gics). The process involves choosing ranked observations (sorted on score, and me = market-cap) based on the following steps, until a target coverage of 50% by cumulative me (for each date tid and gics) is achieved:
- Select all the observations within the top 35% of cumulative me.
- Select observations with a score higher than 4.5 ratings within the top 50% of cumulative me within the ranked universe. This includes the first observation that increases the cumulative coverage above 50%.
- Select observations in the top 65% cumulative me in the ranked universe.
I would appreciate any insights or recommendations on how to implement this selection process effectively. Thank you!
Code:
// Sort the data to get the ranked universe sort tid gics score me //Get cumsum by tid and gics bys tid gics (score): gen cum_me = sum(me) // Calculate the total market value for each 'tid' and 'gics' group bys tid gics: egen total_me = total(me) // Calculate the percentage of cumulative market value for each 'tid' and 'gics' group gen cum_market_value_pct = cum_me / total_me // Identify securities within the top cumulative market value coverage gen top_35_percent = cum_market_value_pct <= 0.35 gen top_50_percent = cum_market_value_pct <= 0.50 gen top_65_percent = cum_market_value_pct <= 0.50 // Identify securities with a score greater than 4.5 gen top_aa = score > 4.5 // Identify securities that are both 'AA' rated and within the top 50% cumulative market value coverage gen top_aa_50 = top_aa & top_50_percent // Create a binary variable to indicate the selection of securities based on the defined criteria gen select1 = top_35_percent gen select2 = select1 | top_aa_50 gen select3 = select2 | top_65_percent if cum_market_value_pct<0.50 // I am not sure how to code this line
Code:
set obs 1000 g tid = floor((100 - 0 + 1) * runiform()) g id = floor((50 - 0 + 1) * runiform()) g score = -runiform() * 10 g me = runiform() * 100000 g gics = floor((10 - 0 + 1) * runiform()) duplicates drop id tid , force xtset id tid