Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping unmatched data

    Hi everyone, I need some help in processing data for my phd research. I started with a dataset including firm-year observations from 2014-2019 but after the baseline regression I recognized that I also needed observations for year 2013, in order to implement the cross-sectional analysis. I downloaded the data for 2013 and now I need to append it to the previous dataset with observations from 2014-2019. The problem I have is that in the new download there are firms that were not present in the old file (with data from 2014-19) because Orbis has updated its data. How can I delete those firms that have a single year observation (2013) and keep only firms that match within the files?

    At the moment I launched the append command and after I sorted by country, company name and year. Then how to drop unmatched firms?

    Thanks a lot for the help!

  • #2
    Code:
    bys country company: drop if _N == 1
    Will drop all companies with only one observation.

    Comment


    • #3
      Ali Atia's advice is correct. However, that means that if there are any firms that have only one observation in the 2014-2018 era, they will also be dropped. If that is not what is intended, the following will drop only firms where there is a singleton 2013 observation:

      Code:
      assert year >= 2013
      by firm_id (year), sort: drop if year[_N] == 2013
      This will drop any firm whose only observation is in 2013, but will preserve firms that have only one observation but it is between 2014 and 2018.

      Comment


      • #4
        Dear Sirs, I guess I was able to solve the problem with a small modification to Ali's formula. In particular after having launched the code:

        bys country company: drop if _N == 1

        I realized that there were Companies with the same name but with a different BVD number that were grouped together.

        Then, I did the following: bys country companyNAME NumberBvDID: drop if _N == 1 and then sort country companyNAME Year to order the dataset as it was after the download.

        Is my process right? Thanks a lot!



        ​​​​​​​FOR CLYDE: Thanks for the explanation. As search strategy I required Orbis to download for me only firms that had data for all the years within the interval 2014-2019. Thus, I don't have any firm within that interval that may have just one obs.
        Last edited by Giovanni Coppola; 13 Jan 2021, 04:52.

        Comment


        • #5
          I realized that there were Companies with the same name but with a different BVD number that were grouped together.

          Then, I did the following: bys country companyNAME NumberBvDID: drop if _N == 1 and then sort country companyNAME Year to order the dataset as it was after the download.

          Is my process right? Thanks a lot!
          It is unclear from what you write whether you want companies with the same name but a different BVD number to be treated as a single company or as different companies. Your code here treats them as different companies. If that's what you intended, then your code is correct.

          Comment


          • #6
            Hi Prof. Clyde, yes I wanted them to be treated as different companies because, although there is a homonymy in their names, they are different companies with a different BVD number and a different activity. So I guess my process is right.

            Thanks a lot for the help!
            Last edited by Giovanni Coppola; 14 Jan 2021, 07:37.

            Comment

            Working...
            X