Hey everyone, I am a 3rd year Business Economics student working on my thesis regarding firms' CO2 emissions, and I could really use your help. I am using Stata 16.1 on MacOS 10.14.
I need to merge 3 datasets together: 1 dataset with firm financials, 1 dataset with CEO characteristics, and 1 dataset with sustainability scores of different firms. The sustainability dataset is taken from the Sustainalytics database, and it includes 59 environmental categories (variables starting with the letter "E") alongside other governmental and social categories, that together make up the "ESG" score for each firm. My sample contains data between 2009-2017, and the raw data contains approximately 150,000 firm-year observations.
Here's the problem: Two other two datasets have annual observations, but the Sustainalytics dataset contains multiple observations per year for each firm. My goal is to take the averages of these observations and turn them into one annual average, so that I can merge the dataset with the other two.
In the dataex example below, I have removed most of the environmental and governance score variables for simplicity, but my goal is to eventually calculate the overall scores by taking the mean of these variables.
I have tried to use the collapse command as suggested in other posts about panel data, but every time I've tried, I've ended up dropping most of the important data.
Thank you in advance!
I need to merge 3 datasets together: 1 dataset with firm financials, 1 dataset with CEO characteristics, and 1 dataset with sustainability scores of different firms. The sustainability dataset is taken from the Sustainalytics database, and it includes 59 environmental categories (variables starting with the letter "E") alongside other governmental and social categories, that together make up the "ESG" score for each firm. My sample contains data between 2009-2017, and the raw data contains approximately 150,000 firm-year observations.
Here's the problem: Two other two datasets have annual observations, but the Sustainalytics dataset contains multiple observations per year for each firm. My goal is to take the averages of these observations and turn them into one annual average, so that I can merge the dataset with the other two.
In the dataex example below, I have removed most of the environmental and governance score variables for simplicity, but my goal is to eventually calculate the overall scores by taking the mean of these variables.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long(Date firmid) str26 Ticker double(E_1_1 E_1_2 G_1_1 G_1_3) 21094 1 "SLB" 50 60 100 0 21124 1 "SLB" 50 60 100 0 21159 1 "SLB" 50 60 100 0 21193 1 "SLB" 50 60 100 0 21216 1 "SLB" 50 60 100 0 21244 1 "SLB" 50 60 100 0 21276 1 "SLB" 100 80 100 0 21306 1 "SLB" 100 80 100 0 21339 1 "SLB" 100 80 100 0 21368 1 "SLB" 100 80 100 0 end format %td Date label values firmid firmid label def firmid 1 "AN8068571086", modify
I have tried to use the collapse command as suggested in other posts about panel data, but every time I've tried, I've ended up dropping most of the important data.
Thank you in advance!

) reply!
Comment