Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop Specific Data

    I am working with CRSP/WRDS data. I have company names in one column, dates/months (1990-2010) in other column and returns in another column.The data is arranged in WRDS format. I want to drop the company if the number of observations for that company are lets say less than 20. Would appreciate if someone could help me with the code.

  • #2
    Hi Zain,

    See if the code below is able to help you. Until the ******, I just created a mock dataset.

    The bysort command creates a variable (called N) which contains the number of times that data from each company appears (this is regardless of year, you can add this by adding year after company in the bysort command), then I just keep whatever data has N >20.

    Code:
    clear
    input str1 company float n
    "a" 22
    "b" 25
    "c" 18
    "d" 15
    "e" 23
    end
    
    expand n
    drop n
    
    *******
    
    bysort company: gen N = _N
    keep if N >20

    Comment


    • #3
      Welcome to the Stata Forum/Statalist,

      Please read the FAQ, There you'll find how to share data/command/output.

      That being said, you may try somethig like:

      Code:
      by company, sort : egen float todrop = count([_N])
      drop if todrop <20
      Hopefully that helps.
      Best regards,

      Marcos

      Comment


      • #4
        Zain:
        welcome to this forum.
        Maybe the second to last step in Igor's helpful code can be skipped:
        Code:
        . set obs 20
        number of observations (_N) was 0, now 20
        
        . g id=1 in 1/10
        (10 missing values generated)
        
        . replace id=2 if id==.
        (10 real changes made)
        
        . g A=runiform()*1000
        
        . drop in 20
        (1 observation deleted)
        
        . bysort id: drop if _N<10
        (9 observations deleted)
        
        . list
        
             +---------------+
             | id          A |
             |---------------|
          1. |  1   329.5547 |
          2. |  1   414.4089 |
          3. |  1   36.08474 |
          4. |  1   84.38109 |
          5. |  1   9.876246 |
             |---------------|
          6. |  1   320.0437 |
          7. |  1   5.196966 |
          8. |  1   227.5435 |
          9. |  1    851.468 |
         10. |  1   982.0067 |
             +---------------+
        
        .
        PS: I've joined the party just after Marcos!
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Code:
          bysort id: drop if _N<10
          Ahhh, haven't considered this Such a simple step that really helps making code cleaner and simpler in general. Thanks!

          Comment


          • #6
            Thank you all for the help. I have another related question, would appreciate some additional help. Working with the same data, a sample of which is posted below
            datadate conm
            31may2010 AAR CORP
            31may2011 AAR CORP
            31may2012 AAR CORP
            31may2013 AAR CORP
            31may2014 AAR CORP
            31may2015 AAR CORP
            31may2016 AAR CORP
            31may2017 AAR CORP
            31may2018 AAR CORP
            30nov2010 ASA GOLD AND PRECIOUS METALS
            30nov2011 ASA GOLD AND PRECIOUS METALS
            30nov2012 ASA GOLD AND PRECIOUS METALS
            30nov2013 ASA GOLD AND PRECIOUS METALS
            30nov2014 ASA GOLD AND PRECIOUS METALS
            30nov2015 ASA GOLD AND PRECIOUS METALS
            30nov2016 ASA GOLD AND PRECIOUS METALS
            30nov2017 ASA GOLD AND PRECIOUS METALS
            30nov2018 ASA GOLD AND PRECIOUS METALS
            .
            .
            .
            I want to drop a company if the return (in the other column, not shown here) in e.g. in last month ((30nov2018) here) is missing and the returns in 3 of the previous 5 months/years is also missing. Would appreciate if someone could help with the code for doing so.

            Comment


            • #7
              Please read the FAQ. There you'll find how to share data/command/output in the forum.

              That being said, you may start with - egen - under "count". Then, you may drop the observations by using the if clause twice.
              Best regards,

              Marcos

              Comment

              Working...
              X