Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating averages (industry+year) based on sub sample and then applying them to all firms

    Hi, I am still quite new to Stata (using Stata 16) and hope to find help here as I do not know how to solve this problem.

    I have panel data (S&P 500) for roughly 20 years and a plethora of sic2 industries.

    What I want to do is fo a certain variable (e.g., experience) for each sic2-industry and each year find the top performers (top decentile) and then create a variable where this "benchmark" can be recorded to. So in the end each firm (even if not a top performer) has the top performer average of this sic2 and year to compare to using this benchmark variable. I am using weighted average by sales.

    So my benchmark variable is price-cost margin:

    gen pg_margin = (sale-(cogs+xsga))/sale
    egen pg_deciles = xtile(pg_margin), by(fyear sic2) nq(10)

    // now I can claim that every firm firm with a "10" in pg_deciles is a top performer
    // I further generate a dummy variable for industry leader to mark them

    gen industrylead_sic2 = 0 // set up variable
    replace industrylead_sic2 = 1 if pg_deciles==10

    gen industrylead_experience = 0 // I create a new variable for experience in industry leaders (the benchmark variable)
    replace industrylead_experience = experience if industrylead_sic2 == 1 // I copy over the experience of only the industry leaders in the new variable
    bysort fyear sic2: egen wtavgindustrylead_experience = wtmean(industrylead_experience), weight(sale) // create an avg industryleader group per year per SIC2 weighted by sales

    I have two questions:
    1. Is this the correct way to do this or am I missing something?
    2. How do I exclude the focal firm itself from the industry leader group if it is a top performer itself? And do I even need to if I want to avid endogeneity?

    Thank you!

  • #2
    decentile as a variation on decile is new to me. I will add it to my collection over at https://stats.stackexchange.com/ques...half-a-percent

    The egen function xtile() is from egenmore on SSC, as you asked to explain (FAQ Advice #12). The function wtmean() is from its own package, ditto.

    Did you know this way to get an indicator variable (you say "dummy variable")? Instead of

    Code:
    gen industrylead_sic2 = 0 // set up variable
    
    replace industrylead_sic2 = 1 if pg_deciles==10
    you go

    Code:
    gen industrylead_sic2 = pg_deciles==10
    See for more discussion

    https://www.stata.com/support/faqs/d...rue-and-false/

    https://www.stata-journal.com/articl...article=dm0099

    Now to your question. I would approach it this way


    Code:
    egen numer = total(cond(industrylead_sic2, exoerience * sales, .)), by(fyear sic2) 
    egen denom = total(cond(industrylead_sic2, sales, .)), by(fyear sic2)) 
    gen wanted = numer / denom
    To get Stata to ignore some values, you assign missing, not zero.

    Your approach is like spreading the heights of basketball players to everyone by taking an average over

    height if basketball player
    0 if not

    which is not what you want. Note that for example

    Code:
    cond(industrylead_sic2, exoerience * sales, .)
    is exactly equivalent to

    Code:
    cond(industrylead_sic2 == 1, exoerience * sales, .)

    I don't follow your other questions. Anyone asking about endogeneity is assuming that everyone else reading is an economist, and I am innocent.

    Excluding a firm itself can make sense, but is a different question. rangestat from SSC and also asrol from SSC offer some functionality there.

    Comment

    Working...
    X