I am working with yearly panel data on firms in several different countries. These firms include a mix of financial sector firms and many non-financial firms. Currently, I am trying to aggregate this firm-level data to the country-level to examine country-level trends. My ultimate goal is to create a ratio of the financial sector's profits as a percentage of all of the profits for a country year so I can compare how profitable countries' financial sectors are.
This has proven a bit more difficult than I thought. I first created a dummy variable for financial sector firms (called "fin_sec"). I then used egen to sum all of the pre-tax income (pi) by year and country, used the same command to sum just financial sector income, and then divided the two. My code looked like this:
egen all_profits = total(pi), missing by (year country)
egen fin_profits = total(pi) if fin_sec==1, missing by (year country)
gen fin_share_profits = fin_profits/all_profits
This mostly seemed to work, but it created some strange outliers that point to problems. Basically, in years where the financial sector does uniformly bad (or good I suppose) things can get out of wack. Moreover, if net profits and financial profits are both negative, they create a positive ratio. Consider the case of the Netherlands in 2008 (the same year of the financial crisis). Net income for all firms summed (in millions) to -276 dollars. The financial sector, which suffered badly that year, summed -39586 making the financial share of profits 143 (!). This gives the impression the financial sector did extremely (i.e impossibly) well, but clearly this is not correct.
So, my question is, with panel data, how do correctly sum up a variable with positive and negative values (income) and create a proportion?
I've considered moving averages, but it doesn't address this more core problem as shown in the Dutch case. My only other thought was to take absolute values to sum everything up, but then I would lose the critical distinction of whether profits are positive or negative.
This has proven a bit more difficult than I thought. I first created a dummy variable for financial sector firms (called "fin_sec"). I then used egen to sum all of the pre-tax income (pi) by year and country, used the same command to sum just financial sector income, and then divided the two. My code looked like this:
egen all_profits = total(pi), missing by (year country)
egen fin_profits = total(pi) if fin_sec==1, missing by (year country)
gen fin_share_profits = fin_profits/all_profits
This mostly seemed to work, but it created some strange outliers that point to problems. Basically, in years where the financial sector does uniformly bad (or good I suppose) things can get out of wack. Moreover, if net profits and financial profits are both negative, they create a positive ratio. Consider the case of the Netherlands in 2008 (the same year of the financial crisis). Net income for all firms summed (in millions) to -276 dollars. The financial sector, which suffered badly that year, summed -39586 making the financial share of profits 143 (!). This gives the impression the financial sector did extremely (i.e impossibly) well, but clearly this is not correct.
So, my question is, with panel data, how do correctly sum up a variable with positive and negative values (income) and create a proportion?
I've considered moving averages, but it doesn't address this more core problem as shown in the Dutch case. My only other thought was to take absolute values to sum everything up, but then I would lose the critical distinction of whether profits are positive or negative.
Comment