Hello all,
I am working with panel data and have a problem with one of my databases, namely the CDP database.
For example, a section of my database looks like this:
id fyear purpose
87 2009 0
138 2009 1
138 2009 1
138 2009 1
138 2009 1
252 2009 1
333 2009 1
333 2009 1
345 2009 0
462 2009 0
593 2009 0
629 2008 0
674 2009 0
689 2009 0
921 2009 1
921 2009 1
921 2009 1
921 2009 1
921 2009 1
921 2009 1
921 2009 1
There are several observations per year for a company (identified by id). For me it is only relevant if a company has purpose=1 at least once in a year. This means that they have voluntarily invested in carbon offsetting, which later becomes my independent variable.
I need to merge this database in a further step. For this I have an excel file that assigns a gvkey to the identification in the CDP database, i.e. id.
Is there a way that in this merge I not only assign the respective gvkey to the id, but also keep only one observation per company per year? However, it would still be important here that if company xy has, for example, in year 2009 once purpose=1 and once purpose=0, only the observation with purpose=1 remains. Otherwise, I would find out for the year that the company in question did not invest, even though it did so once.
I have already tried to merge, this is my code (where "CDP_gvkey.xlsx" is the database where the mapping of id and gvkey can be found, and "CDP.dta" is my database with id, fyear and purpose)
import excel "CDP_gvkey.xlsx", sheet("Matching North America") firstrow
keep GVKEY Account_number
duplicates drop
rename GVKEY gvkey
rename Account_number id
destring gvkey id, replace
merge 1:m id using "CDP.dta", nogenerate
drop if gvkey==.
drop if fyear==.
duplicates drop fyear gvkey, force
save "CDP_final.dta", replace
clear
Here it might be interesting to mention that after "drop if fyear==." I looked into the data and saw that a large number of the companies, if they had multiple observations in a specific year, then always had the same ones (i.e. always 0 or always 1). However, there are still a few that have purpose=1 several times in one year and then purpose=0 once in the same year. If I then execute the command "duplicates drop fyear gvkey, force", sometimes exactly the one observation with purpose=0 remains for the company in the year. Is there a way around this?
I appreciate any help very much!
Many greetings,
Jana
I am working with panel data and have a problem with one of my databases, namely the CDP database.
For example, a section of my database looks like this:
id fyear purpose
87 2009 0
138 2009 1
138 2009 1
138 2009 1
138 2009 1
252 2009 1
333 2009 1
333 2009 1
345 2009 0
462 2009 0
593 2009 0
629 2008 0
674 2009 0
689 2009 0
921 2009 1
921 2009 1
921 2009 1
921 2009 1
921 2009 1
921 2009 1
921 2009 1
There are several observations per year for a company (identified by id). For me it is only relevant if a company has purpose=1 at least once in a year. This means that they have voluntarily invested in carbon offsetting, which later becomes my independent variable.
I need to merge this database in a further step. For this I have an excel file that assigns a gvkey to the identification in the CDP database, i.e. id.
Is there a way that in this merge I not only assign the respective gvkey to the id, but also keep only one observation per company per year? However, it would still be important here that if company xy has, for example, in year 2009 once purpose=1 and once purpose=0, only the observation with purpose=1 remains. Otherwise, I would find out for the year that the company in question did not invest, even though it did so once.
I have already tried to merge, this is my code (where "CDP_gvkey.xlsx" is the database where the mapping of id and gvkey can be found, and "CDP.dta" is my database with id, fyear and purpose)
import excel "CDP_gvkey.xlsx", sheet("Matching North America") firstrow
keep GVKEY Account_number
duplicates drop
rename GVKEY gvkey
rename Account_number id
destring gvkey id, replace
merge 1:m id using "CDP.dta", nogenerate
drop if gvkey==.
drop if fyear==.
duplicates drop fyear gvkey, force
save "CDP_final.dta", replace
clear
Here it might be interesting to mention that after "drop if fyear==." I looked into the data and saw that a large number of the companies, if they had multiple observations in a specific year, then always had the same ones (i.e. always 0 or always 1). However, there are still a few that have purpose=1 several times in one year and then purpose=0 once in the same year. If I then execute the command "duplicates drop fyear gvkey, force", sometimes exactly the one observation with purpose=0 remains for the company in the year. Is there a way around this?
I appreciate any help very much!
Many greetings,
Jana

Comment