Calculating averages (industry+year) based on sub sample and then applying them to all firms

Brian Statham

Join Date: Sep 2021

Posts: 1
#1

Calculating averages (industry+year) based on sub sample and then applying them to all firms

25 Sep 2021, 01:46

Hi, I am still quite new to Stata (using Stata 16) and hope to find help here as I do not know how to solve this problem.

I have panel data (S&P 500) for roughly 20 years and a plethora of sic2 industries.

What I want to do is fo a certain variable (e.g., experience) for each sic2-industry and each year find the top performers (top decentile) and then create a variable where this "benchmark" can be recorded to. So in the end each firm (even if not a top performer) has the top performer average of this sic2 and year to compare to using this benchmark variable. I am using weighted average by sales.

So my benchmark variable is price-cost margin:

gen pg_margin = (sale-(cogs+xsga))/sale
egen pg_deciles = xtile(pg_margin), by(fyear sic2) nq(10)

// now I can claim that every firm firm with a "10" in pg_deciles is a top performer
// I further generate a dummy variable for industry leader to mark them

gen industrylead_sic2 = 0 // set up variable
replace industrylead_sic2 = 1 if pg_deciles==10

gen industrylead_experience = 0 // I create a new variable for experience in industry leaders (the benchmark variable)
replace industrylead_experience = experience if industrylead_sic2 == 1 // I copy over the experience of only the industry leaders in the new variable
bysort fyear sic2: egen wtavgindustrylead_experience = wtmean(industrylead_experience), weight(sale) // create an avg industryleader group per year per SIC2 weighted by sales

I have two questions:
1. Is this the correct way to do this or am I missing something?
2. How do I exclude the focal firm itself from the industry leader group if it is a top performer itself? And do I even need to if I want to avid endogeneity?

Thank you!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

25 Sep 2021, 03:10

decentile as a variation on decile is new to me. I will add it to my collection over at https://stats.stackexchange.com/ques...half-a-percent

The egen function xtile() is from egenmore on SSC, as you asked to explain (FAQ Advice #12). The function wtmean() is from its own package, ditto.

Did you know this way to get an indicator variable (you say "dummy variable")? Instead of

Code:

gen industrylead_sic2 = 0 // set up variable replace industrylead_sic2 = 1 if pg_deciles==10

you go

Code:

gen industrylead_sic2 = pg_deciles==10

See for more discussion

https://www.stata.com/support/faqs/d...rue-and-false/

https://www.stata-journal.com/articl...article=dm0099

Now to your question. I would approach it this way

Code:

egen numer = total(cond(industrylead_sic2, exoerience * sales, .)), by(fyear sic2) egen denom = total(cond(industrylead_sic2, sales, .)), by(fyear sic2)) gen wanted = numer / denom

To get Stata to ignore some values, you assign missing, not zero.

Your approach is like spreading the heights of basketball players to everyone by taking an average over

height if basketball player
0 if not

which is not what you want. Note that for example

Code:

cond(industrylead_sic2, exoerience * sales, .)

is exactly equivalent to

Code:

cond(industrylead_sic2 == 1, exoerience * sales, .)

I don't follow your other questions. Anyone asking about endogeneity is assuming that everyone else reading is an economist, and I am innocent.

Excluding a firm itself can make sense, but is a different question. rangestat from SSC and also asrol from SSC offer some functionality there.
Comment

Announcement

Calculating averages (industry+year) based on sub sample and then applying them to all firms

Comment