Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • S&P 500 - how to remove observations for companies that were not in the sample the whole time?

    Dear fellow Stata users,

    I have downloaded data from IBES for S&P 500 firms from 2000 to 2014. Of course not all firms were part of S&P 500 for the entire 15 years. Now I need to eliminate the observations of those companies that were not part of the S&P 500 index for a given time frame. How can I do this?

    Thanks in advance.

    Best,

    Viktoria

  • #2
    Assuming the variable company is your company identifier, something like the following should do.
    Code:
    bysort company: generate nyears = _N
    drop if nyears!=15

    Comment


    • #3
      Thank you so much, William. But will it then drop all observations for that company? Because I still want to include those observations for the time when the company was part of S&P 500?

      Comment


      • #4
        The way the question is written makes it seem you have data only for S&P companies, so your follow up its a bit confusing.
        Most often, a data example helps clear these things up most quickly.

        Get dataex and use it to post and example.
        Code:
        ssc install dataex

        Comment


        • #5
          I installed it but I don't know how to post the example. Sorry, I really don't know anything about stata... but I'm trying.

          Comment


          • #6
            I'm afraid I was mislead by the subject of your original post, which asks how to "remove observations for companies that were not in the sample the whole time" which is exactly what the code I suggested does.

            Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, perhaps with just a few variables and observations, showing the data before the process and how you expect it to look after the process.

            Added in response to post #5 after this post crossed #5 in cyberspace:

            When I began using Stata in a serious way, I started by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax.
            Last edited by William Lisowski; 12 May 2016, 07:38.

            Comment


            • #7
              William gives good advice. Specifically on dataex, type:
              Code:
              help dataex
              Or, to extract an example, giving the first 20 lines of observations, do:
              Code:
              dataex in 1/20
              And when posting, put it in code blocks, as the FAQ no. 12.3 and the dataex help file explain.

              Comment


              • #8
                ----------------------- copy starting from the next line -----------------------
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str8 cusip str6(ticker oftic) str16 cname long statpers double(stdev numrec sellpct)
                "00846U10" "AT1" "A" "AGILENT TECH" 14594 .52  8 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14629 .53  9 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14657  .5  9 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14685 .52 11 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14720 .52 11 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14748 .52 11 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14776 .52 11 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14811 .52 11 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14839  .7 11 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14867 .79 11 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14902 .72 12 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14930 .73 13 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14958  .6 13 0
                "00846U10" "AT1" "A" "AGILENT TECH" 14993 .58 14 0
                "00846U10" "AT1" "A" "AGILENT TECH" 15021  .6 13 0
                "00846U10" "AT1" "A" "AGILENT TECH" 15049  .8 15 0
                "00846U10" "AT1" "A" "AGILENT TECH" 15084 .77 15 0
                "00846U10" "AT1" "A" "AGILENT TECH" 15112 .73 14 0
                "00846U10" "AT1" "A" "AGILENT TECH" 15140 .83 14 0
                "00846U10" "AT1" "A" "AGILENT TECH" 15175 .86 15 0
                end
                format %d statpers
                ------------------ copy up to and including the previous line ------------------

                Comment


                • #9
                  I don't know if I did this the right way...

                  Let me ask my question again more clearly:

                  I have downloaded data from the IBES database in the time of 2000 to 2014. Let's say there is a firm X and firm X was part of S&P 500 for 3/15 years. As I am only looking at companies from S&P 500, I only want to use the values of these three years of company X in my analysis and consequently I would like to drop all of the other values out of my dataset.

                  Thanks a lot for your insightful help. I will consult the getting started with Stata manual.

                  Comment


                  • #10
                    The data sample comes through okay.
                    However, the question now is: how do you identify which years a company was a S&P 500 company? Can we know that from one of the variables in this example?

                    Comment


                    • #11
                      I don't know that. My supervisor send me her old do-file and she used this for it:

                      format from %td
                      gen yfrom = year(from)
                      format yfrom %ty

                      gen insamplefrom = .
                      replace insamplefrom = 1 if yfrom < cyear
                      replace insamplefrom = 1 if yfrom == cyear
                      replace insamplefrom = 0 if yfrom > cyear

                      But I don't really know what to do with it?

                      Comment


                      • #12
                        That's a good start, thank you!

                        Remember, many of us are not familiar with your data source, but your problem only requires that we understand which variables identify the firm (is this cusip?), the date (which is statpers), and the variable that indicates whether the firm is or is not part of the S&P 500 on that date (which variable is that, and what values does it take?)

                        Also, from your sample data it is appears that you have monthly observations (1 day per month, to be precise) rather than annual. So suppose a firm became part of the S&P in July 2001 and leaves the S&P 500 after June 2004. You state your question as if you are dealing with whole years, so do you want to use just the values between July 2001 and June 2004, or do you want to use all the values in 2001, 2002, 2003, and 2004?

                        Added in edit: this post crossed #10 and #11 in cyberspace. The assumption here is that you understand your data. If you cannot identify which observations you want to keep when you look at the data, as if you were doing it by hand, we cannot assist you in implementing that using Stata.
                        Last edited by William Lisowski; 12 May 2016, 09:30.

                        Comment


                        • #13
                          Well, that's make it very hard to answer your question. There's many people on here that can help write a clearly defined problem into Stata syntax, but it is not often possible to guess the interpretation of variable names etc. Your supervisors example has variable names in them that dont appear in the data example, so you're missing something in your data. Your supervisor is probably the only one that knows how she created those.

                          Comment


                          • #14
                            Thanks for you help so far.

                            The identifier is cusip. I have no variable that indicates until what time a company was part of S&P 500. Apparently, it works with these codes only.

                            And ideed, I only want the values from July 2001 and June 2004.

                            Comment


                            • #15
                              As Jorrit says in #13, your supervisor's code will not work with the data that you have. That code require the existence of two variables that you do not have: "from" & "cyear".
                              Stata/MP 14.1 (64-bit x86-64)
                              Revision 19 May 2016
                              Win 8.1

                              Comment

                              Working...
                              X