Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Select observations on different criteria based on cumulative sum.

    Hello,

    I need guidance to select observations according to different criteria by date (tid) and industry (gics). The process involves choosing ranked observations (sorted on score, and me = market-cap) based on the following steps, until a target coverage of 50% by cumulative me (for each date tid and gics) is achieved:
    1. Select all the observations within the top 35% of cumulative me.
    2. Select observations with a score higher than 4.5 ratings within the top 50% of cumulative me within the ranked universe. This includes the first observation that increases the cumulative coverage above 50%.
    3. Select observations in the top 65% cumulative me in the ranked universe.
    I'm unsure how to proceed with the selection process based on the second and third condition while still respecting the target coverage.
    I would appreciate any insights or recommendations on how to implement this selection process effectively. Thank you!

    Code:
    // Sort the data to get the ranked universe
    sort tid gics score me
    
    //Get cumsum by tid and gics
    bys tid gics (score): gen cum_me = sum(me)
    
    // Calculate the total market value for each 'tid' and 'gics' group
    bys tid gics: egen total_me = total(me)
    
    // Calculate the percentage of cumulative market value for each 'tid' and 'gics' group
    gen cum_market_value_pct = cum_me / total_me
    
    // Identify securities within the top cumulative market value coverage
    gen top_35_percent = cum_market_value_pct <= 0.35
    gen top_50_percent = cum_market_value_pct <= 0.50
    gen top_65_percent = cum_market_value_pct <= 0.50
    
    // Identify securities with a score greater than 4.5
    gen top_aa = score > 4.5
    
    // Identify securities that are both 'AA' rated and within the top 50% cumulative market value coverage
    gen top_aa_50 = top_aa & top_50_percent
    
    // Create a binary variable to indicate the selection of securities based on the defined criteria
    gen select1 =  top_35_percent 
    
    gen select2 = select1 | top_aa_50    
    
    gen select3 = select2 | top_65_percent if cum_market_value_pct<0.50 // I am not sure how to code this line
    Not sure if dataex will be useful given the large cross-section of the sample. So, here is a data generating code.

    Code:
    set obs 1000
    g tid = floor((100 - 0 + 1) * runiform())
    g id = floor((50 - 0 + 1) * runiform())
    g score = -runiform() * 10
    g me = runiform() * 100000
    g gics = floor((10 - 0 + 1) * runiform())
    duplicates drop id tid , force
    xtset id tid

Working...
X