Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data with more than one observation per year

    Hi everyone. This is my first post here and English is not my first language, so I apologize for any mistyping/misspelling.

    I have a dataset of around 8.000 firm-year observations for a 2011-2021 period with companies’ identifiers (ISIN) and the name of their CEO. This means there is more than one identifier value for each year when a CEO changes, although I have not been able to “eye-identify” the exact observations where this happens.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long iden int Year long iden2 str50 FullName
    177 2011  1 "A. Jayson Adair"                  
    177 2012  1 "A. Jayson Adair"                  
    177 2013  1 "A. Jayson Adair"                  
    177 2014  1 "A. Jayson Adair"                  
    177 2015  1 "A. Jayson Adair"                  
    177 2016  1 "A. Jayson Adair"                  
    177 2017  1 "A. Jayson Adair"                  
    177 2018  1 "A. Jayson Adair"                  
    177 2019  1 "A. Jayson Adair"                  
    177 2020  1 "A. Jayson Adair"                  
    177 2021  1 "A. Jayson Adair"                  
    615 2011  2 "A. Lorne Weil"                    
    615 2012  2 "A. Lorne Weil"                    
    615 2013  2 "A. Lorne Weil"                    
    700 2019  3 "A. Patrick Beharelle"            
    700 2020  3 "A. Patrick Beharelle"            
    700 2021  3 "A. Patrick Beharelle"            
    271 2019  4 "A. Ryals McMullian, Jr."          
    271 2020  4 "A. Ryals McMullian, Jr."          
    271 2021  4 "A. Ryals McMullian, Jr."          
    448 2021  5 "Aaron H. Ravenscroft"            
    294 2011  6 "Aaron P. Jagdfeld"                
    294 2012  6 "Aaron P. Jagdfeld"                
    294 2013  6 "Aaron P. Jagdfeld"                
    294 2014  6 "Aaron P. Jagdfeld"                
    294 2015  6 "Aaron P. Jagdfeld"                
    294 2016  6 "Aaron P. Jagdfeld"                
    294 2017  6 "Aaron P. Jagdfeld"                
    294 2018  6 "Aaron P. Jagdfeld"                
    294 2019  6 "Aaron P. Jagdfeld"                
    294 2020  6 "Aaron P. Jagdfeld"                
    294 2021  6 "Aaron P. Jagdfeld"                
    617 2018  7 "Adam P. Symson"                  
    617 2019  7 "Adam P. Symson"                  
    617 2020  7 "Adam P. Symson"                  
    617 2021  7 "Adam P. Symson"                  
    116 2011  8 "Adolphus B. Baker"                
    116 2012  8 "Adolphus B. Baker"                
    116 2013  8 "Adolphus B. Baker"                
    116 2014  8 "Adolphus B. Baker"                
    116 2015  8 "Adolphus B. Baker"                
    116 2016  8 "Adolphus B. Baker"                
    116 2017  8 "Adolphus B. Baker"                
    116 2018  8 "Adolphus B. Baker"                
    116 2019  8 "Adolphus B. Baker"                
    116 2020  8 "Adolphus B. Baker"                
    116 2021  8 "Adolphus B. Baker"                
     14 2021  9 "Ahmad Mohammad Ahmad Abu-Ghazaleh"
    635 2012 10 "Ajita G. Rajendra"                
    635 2013 10 "Ajita G. Rajendra"                
    635 2014 10 "Ajita G. Rajendra"                
    635 2015 10 "Ajita G. Rajendra"                
    635 2016 10 "Ajita G. Rajendra"                
    635 2017 10 "Ajita G. Rajendra"                
    468 2011 11 "Alan D. Wilson"                  
    468 2012 11 "Alan D. Wilson"                  
    468 2013 11 "Alan D. Wilson"                  
    468 2014 11 "Alan D. Wilson"                  
    468 2015 11 "Alan D. Wilson"                  
    569 2014 12 "Alan George Lafley"              
    569 2015 12 "Alan George Lafley"              
    240 2014 13 "Alan R. Hoskins"                  
    240 2015 13 "Alan R. Hoskins"                  
    240 2016 13 "Alan R. Hoskins"                  
    240 2017 13 "Alan R. Hoskins"                  
    240 2018 13 "Alan R. Hoskins"                  
    240 2019 13 "Alan R. Hoskins"                  
    240 2020 13 "Alan R. Hoskins"                  
    274 2011 14 "Alan R. Mulally"                  
    274 2012 14 "Alan R. Mulally"                  
    274 2013 14 "Alan R. Mulally"                  
    755 2011 15 "Alan S. Armstrong"                
    755 2012 15 "Alan S. Armstrong"                
    755 2013 15 "Alan S. Armstrong"                
    755 2014 15 "Alan S. Armstrong"                
    755 2015 15 "Alan S. Armstrong"                
    755 2016 15 "Alan S. Armstrong"                
    755 2017 15 "Alan S. Armstrong"                
    755 2018 15 "Alan S. Armstrong"                
    755 2019 15 "Alan S. Armstrong"                
    755 2020 15 "Alan S. Armstrong"                
    755 2021 15 "Alan S. Armstrong"                
    156 2011 16 "Alan S. McKim"                    
    156 2012 16 "Alan S. McKim"                    
    156 2013 16 "Alan S. McKim"                    
    156 2014 16 "Alan S. McKim"                    
    156 2015 16 "Alan S. McKim"                    
    156 2016 16 "Alan S. McKim"                    
    156 2017 16 "Alan S. McKim"                    
    156 2018 16 "Alan S. McKim"                    
    156 2019 16 "Alan S. McKim"                    
    156 2020 16 "Alan S. McKim"                    
    156 2021 16 "Alan S. McKim"                    
    152 2011 17 "Alan W. Stock"                    
    744 2011 18 "Albert H. Nahmad"                
    744 2012 18 "Albert H. Nahmad"                
    744 2013 18 "Albert H. Nahmad"                
    744 2014 18 "Albert H. Nahmad"                
    744 2015 18 "Albert H. Nahmad"                
    744 2016 18 "Albert H. Nahmad"                
    end
    label values iden iden
    label def iden 14 "KYG367381053", modify
    label def iden 116 "US1280302027", modify
    label def iden 152 "US17243V1026", modify
    label def iden 156 "US1844961078", modify
    label def iden 177 "US2172041061", modify
    label def iden 240 "US29272W1099", modify
    label def iden 271 "US3434981011", modify
    label def iden 274 "US3453708600", modify
    label def iden 294 "US3687361044", modify
    label def iden 448 "US5635714059", modify
    label def iden 468 "US5797802064", modify
    label def iden 569 "US7427181091", modify
    label def iden 615 "US80874P1093", modify
    label def iden 617 "US8110544025", modify
    label def iden 635 "US8318652091", modify
    label def iden 700 "US89785X1019", modify
    label def iden 744 "US9426222009", modify
    label def iden 755 "US9694571004", modify
    label values iden2 iden2
    label def iden2 1 "A. Jayson Adair", modify
    label def iden2 2 "A. Lorne Weil", modify
    label def iden2 3 "A. Patrick Beharelle", modify
    label def iden2 4 "A. Ryals McMullian, Jr.", modify
    label def iden2 5 "Aaron H. Ravenscroft", modify
    label def iden2 6 "Aaron P. Jagdfeld", modify
    label def iden2 7 "Adam P. Symson", modify
    label def iden2 8 "Adolphus B. Baker", modify
    label def iden2 9 "Ahmad Mohammad Ahmad Abu-Ghazaleh", modify
    label def iden2 10 "Ajita G. Rajendra", modify
    label def iden2 11 "Alan D. Wilson", modify
    label def iden2 12 "Alan George Lafley", modify
    label def iden2 13 "Alan R. Hoskins", modify
    label def iden2 14 "Alan R. Mulally", modify
    label def iden2 15 "Alan S. Armstrong", modify
    label def iden2 16 "Alan S. McKim", modify
    label def iden2 17 "Alan W. Stock", modify
    label def iden2 18 "Albert H. Nahmad", modify
    When I ran xtset trying to build the panel, Stata issued the following error: “repeated time values within panel r(451)”. I wonder how can I code the variable “FullName” or solve this error so I get to build the panel data.

    Kind regards and thank you in advance.


    Added: I'm using Stata 17.0
    Kind regards,
    David.
    (Stata 17.0 SE)

  • #2
    David:
    welcome to this forum.
    You can simply -xtset- yur dataset with -panelid- only.
    This approach comes at the cost of making time-series related operators, such as lags and leads, unavailable, while you still can plug in -i.timevar- in the right-hand side of your regression equation and obtain the related coefficients.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      Thank you for your response! The fact is I may need lagged variables in some of the models, so another solution would suit me better. I don’t know whether is possible to encode the variable “FullName” in some way that transforms it in multiple variables when repeated. That would solve my issue I think, but again, I have no idea if that can be done (and how). Of course I'm open to other solutions as well.

      Thank you again for your response.

      Kind regards,
      David.
      (Stata 17.0 SE)
      Last edited by David Cabreros; 16 Mar 2023, 13:06.
      Kind regards,
      David.
      (Stata 17.0 SE)

      Comment


      • #4
        David:
        a more sophisticated option, if feasible, rests on including the month in your -timevar-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          The fact is I may need lagged variables in some of the models,
          Well, then you are in trouble with this data. It is not just some peculiarity of Stata that it requires the panel variable and the time variable to uniquely identify observations in order to -xtset- the data in a way that supports lag operators. It's because, mathematically, if they do not uniquely identify observations, then lags are inherently undefinable.

          Suppose we have two observations with the the same iden in year 2013. Now consider an observation with that same iden in 2014. Which of that iden's two observations in 2013 is "the lag" for that 2014 observation. You see the problem?

          One possible resolution to this problem for you was suggested by Carlo in #4: fine-grain the time to months so that multiple observations of the same iden in the same year now become observations of that iden in different months (and then the lag means a 1-month lag, or for a 1-year lag you can then use L12).

          Another possible resolution is to make the combination of firm and CEO as the panel variable. You can do that using -egen, group()-. Now, this will not enable you to take the lag of anything within firm if the CEO has changed from the preceding year, so this might not be workable for you.

          A third possibility is that you don't actually want your lags to reflect a specific unit of time. Perhaps if there are two observations of the same firm in the same year, but different CEO's, the one with the predecessor CEO is the lag of the one with the successor. In that case, once you have the data sorted in correct chronological, just create a sequential variable that counts up from 1 within firms, and use that as the time variable when you -xtset-.

          But if you can't do any of the above, then you need an entirely new plan, because lags are not definable for the kind of data you have. Not in Stata, not in any software.

          Added: You said you haven't been able to visually locate these duplicate observations. Here's how to find them:
          Code:
          duplicates tag iden Year, gen(flag)
          browse if flag
          Crossed with #6.

          Comment


          • #6
            Dear Carlo,

            Thank you for answering again. I'm afraid I don't have months in my data, as it is all yearly data.
            Kind regards,
            David.
            (Stata 17.0 SE)

            Comment


            • #7
              Originally posted by Clyde Schechter View Post

              Added: You said you haven't been able to visually locate these duplicate observations. Here's how to find them:
              Code:
              duplicates tag iden Year, gen(flag)
              browse if flag
              Crossed with #6.
              Thank you, Clyde!
              I've run that code and it seems there are only 20 duplicated observations, so I'm able to evaluate each one and decide what to do with them. I'll also consider Carlo's solution about building the panel with only "iden" if I end up not using lagged variables, but I'm not sure it will be the case.
              Your answers have been (quick and) really useful for me, I appreciate it. Thank you very much to both Carlo Lazzaro and Clyde Schechter.
              Last edited by David Cabreros; 16 Mar 2023, 14:07. Reason: Typo
              Kind regards,
              David.
              (Stata 17.0 SE)

              Comment

              Working...
              X