Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with merging data and multiple observations per firm per year

    Hello all,

    I am working with panel data and have a problem with one of my databases, namely the CDP database.
    For example, a section of my database looks like this:

    id fyear purpose
    87 2009 0
    138 2009 1
    138 2009 1
    138 2009 1
    138 2009 1
    252 2009 1
    333 2009 1
    333 2009 1
    345 2009 0
    462 2009 0
    593 2009 0
    629 2008 0
    674 2009 0
    689 2009 0
    921 2009 1
    921 2009 1
    921 2009 1
    921 2009 1
    921 2009 1
    921 2009 1
    921 2009 1

    There are several observations per year for a company (identified by id). For me it is only relevant if a company has purpose=1 at least once in a year. This means that they have voluntarily invested in carbon offsetting, which later becomes my independent variable.

    I need to merge this database in a further step. For this I have an excel file that assigns a gvkey to the identification in the CDP database, i.e. id.
    Is there a way that in this merge I not only assign the respective gvkey to the id, but also keep only one observation per company per year? However, it would still be important here that if company xy has, for example, in year 2009 once purpose=1 and once purpose=0, only the observation with purpose=1 remains. Otherwise, I would find out for the year that the company in question did not invest, even though it did so once.

    I have already tried to merge, this is my code (where "CDP_gvkey.xlsx" is the database where the mapping of id and gvkey can be found, and "CDP.dta" is my database with id, fyear and purpose)

    import excel "CDP_gvkey.xlsx", sheet("Matching North America") firstrow
    keep GVKEY Account_number
    duplicates drop
    rename GVKEY gvkey
    rename Account_number id
    destring gvkey id, replace
    merge 1:m id using "CDP.dta", nogenerate
    drop if gvkey==.
    drop if fyear==.
    duplicates drop fyear gvkey, force
    save "CDP_final.dta", replace
    clear

    Here it might be interesting to mention that after "drop if fyear==." I looked into the data and saw that a large number of the companies, if they had multiple observations in a specific year, then always had the same ones (i.e. always 0 or always 1). However, there are still a few that have purpose=1 several times in one year and then purpose=0 once in the same year. If I then execute the command "duplicates drop fyear gvkey, force", sometimes exactly the one observation with purpose=0 remains for the company in the year. Is there a way around this?

    I appreciate any help very much!

    Many greetings,
    Jana
    Last edited by Jana Schue; 01 Feb 2023, 08:54. Reason: I read through my question again and was able to formulate a few places more clearly

  • #2
    Perhaps this example will demonstrate useful technique to reduce the data you show to one observation for each id and fyear.
    Code:
    . collapse (max) purpose, by(id fyear)
    
    . list, clean noobs
    
         id   fyear   purpose  
         87    2009         0  
        138    2009         1  
        252    2009         1  
        333    2009         1  
        345    2009         0  
        462    2009         0  
        593    2009         0  
        629    2008         0  
        674    2009         0  
        689    2009         0  
        921    2009         1  
    
    .

    Comment


    • #3
      Thank you very much William! It worked!

      Comment

      Working...
      X