Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking if companies in a panel are missing any years

    Hi all,

    Does anyone have a good way to check a if companies in panel data are missing any years? The problem I'm trying to solve: I am calculating annual revenue growth and if a year is missing (ex. 2015) I want to be sure I'm not calculating the revenue growth from 2014 to 2016 as though it's annual revenue. My problem is that I have companies entering and exiting the data at different times, so I can't just see if each company is reporting for the full 35 years of my period. What I'm hoping to do: Find a way to confirm that if an ID enters in 1992 and exits in 1999, then every year between those two years is present. Any ideas?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float ID int FiscalYear float adjustedRevenue
     1 1989          .
     1 1990          .
     1 1991          .
     1 1992          .
     1 1993          .
     1 1994          .
     1 1995          .
     1 1996          .
     1 1997          .
     2 1999  10.790997
     2 2000  17.897474
     2 2001   10.27825
     2 2002  1.2548368
     2 2003   .4863041
     2 2004   .7213615
     2 2005    .880989
     2 2006    .670846
     2 2007   .6115783
     2 2008   .4915955
     2 2009   .4933508
     2 2010   .5064929
     2 2011   .5410033
     2 2012  1.0400256
     2 2013   1.380583
     2 2014  .28833982
     2 2015  .09384204
     2 2016  .06923842
     2 2017   .5256655
     2 2018   .2066783
     2 2019       .283
     3 2006 .013949539
     3 2007  .57088864
     3 2008   19.61751
     3 2009   5.395875
     3 2010   79.07621
     4 1980          .
     4 1981          .
     4 1982   13.23058
     4 1983   23.79458
     4 1984   37.98192
     4 1985   25.28771
     4 1986  16.759996
     5 1996   5.911559
     5 1997   33.63363
     5 1998   93.91238
     5 1999  151.19211
     5 2000   215.2314
     5 2001    244.016
     5 2002   239.5701
     5 2003   260.2463
     5 2004   286.4847
     5 2005   311.4879
     5 2006    315.356
     6 1995   3.269524
     6 1996   5.430878
     6 1997  15.321898
     6 1998   9.992581
     6 1999   8.463076
     6 2000   7.424743
     6 2001   9.947671
     7 1998    345.987
     7 1999   454.0336
     7 2000  571.96204
     7 2001   638.4048
     7 2002   706.5811
     7 2003   785.8924
     7 2004    817.423
     7 2005   877.9507
     7 2006   991.3569
     7 2007  1125.2523
     7 2008  1091.7125
     7 2009   850.7917
     7 2010   782.8481
     7 2011   783.9854
     7 2012   797.5649
     7 2013    807.166
     7 2014   816.7955
     7 2015  1209.7059
     7 2016  1249.5127
     7 2017  1244.9355
     7 2018  1172.7936
     7 2019   1248.623
     8 2017   74.14074
     8 2018  148.96416
     8 2019    245.893
     9 2016  134.06581
     9 2017   153.8093
     9 2018  264.37518
     9 2019    567.579
    10 1975   79.97598
    10 1976  75.070595
    10 1977   65.71144
    11 2018   45.99356
    11 2019     58.343
    12 2003          .
    12 2004  288.90866
    12 2005   367.4065
    12 2006   425.3937
    12 2007   468.2618
    13 1995  1.8620687
    end
    label values ID ID
    label def ID 1 "007 PRECIOUS METALS INC", modify
    label def ID 2 "01 COMMUNIQUE LABORATORY INC", modify
    label def ID 3 "0373849 B C LTD", modify
    label def ID 4 "1 POTATO 2 INC", modify
    label def ID 5 "1-800 CONTACTS INC", modify
    label def ID 6 "1-800-ATTORNEY INC", modify
    label def ID 7 "1-800-FLOWERS.COM", modify
    label def ID 8 "10X GENOMICS INC", modify
    label def ID 9 "111 INC -ADR", modify
    label def ID 10 "1225 MAPLE CORP", modify
    label def ID 11 "17 EDUCATION & TE -ADR", modify
    label def ID 12 "180 CONNECT INC", modify
    label def ID 13 "180 DEGREE CPTL CORP", modify

  • #2
    Code:
    isid ID FiscalYear, sort
    
    
    by ID (FiscalYear): gen byte problem = ///
        FiscalYear[_N] - FiscalYear[1] != _N-1
        
    browse if problem
    will show you any ID's for which there is a gap or anything other than consecutive values of FiscalYear from the first to the last.

    This approach is good while you are working on data management. In production mode, after you believe you have cleaned up any such problems, change the -browse- command to -assert !problem- so it'll just go on to what ever is next unless you missed something.

    Comment


    • #3
      I want to be sure I'm not calculating the revenue growth from 2014 to 2016 as though it's annual revenue
      This is absolutely the right attitude and it is precisely what using tsset or xtset will ensure when you use lag operators. If there is a gap, so that an observation for a previous year for the same panel that you need for a calculation is absent from the data (a term I recommend given other meanings of missing) calculations will do what you want, namely return missing as a result.

      Other way round, using _n - 1 is inevitably a source of error if there are gaps in the data, even if you have the correct sort order.

      Comment


      • #4
        Nick and I have offered two perspectives on this problem. I suspect that each of us agrees entirely with the other--the difference arises from when and in what context this question arises.

        My response in #2 is oriented to the phase of a project in which we are preparing the data for analysis. I infer from the language used by O.P. that there is an expectation that the data set should have no gaps, because the data should be available. In that context, one doesn't want to be running the analysis and then getting surprised by missing results. One would rather know ahead of time if the data has gaps, with a view to going back and filling them in if any are found. Nick's response in #3 is oriented to the phase of a project in which we are running the analysis on what we believe is a cleaned-up data set. Even though precautions have been taken to check the data for gaps, the safe way to calculate revenue growth is, indeed, with the lag/lead operators, not -[_n-1]-. Even if this seems like safety "overkill," it makes the code more transparent: If somebody wants to review it, the use of lag/lead will assure the reviewer that the growth rates are being correctly calculated even if the reviewer has not seen the data management that created the data set.

        So I see #2 and #3 complementary, not exclusive. Best practice is to do both.
        Last edited by Clyde Schechter; 03 Oct 2023, 12:33.

        Comment


        • #5
          Yes to #4. I will just add that plotting numeric identifier against fiscal year is something that you can do, so long as you don't have too many firms. The eye is very good at spotting holes. This should never be your main or only check, however.

          Comment


          • #6
            Note that #5 is dangerous if there is a possibility of more than one observation for each identifier and time!

            Comment


            • #7
              Thank you both so much! It turns out there were an alarming number of firms that had absent data for a year or so in my panel. So, I'm very glad to have a method to go through and see who they are (from Clyde's #2) and be able to still create annual growth rate values that take into account the absences (from Nick's #3).

              As q quick follow up, does anyone have any advice for creating a growth rate when you want to divide by the absolute value of the previous entry for a situation like the above? For instance, if I were calculating the annual growth rate like so: (Revenue 2015 - Revenue 2014)/ abs(Revenue 2014). Is there a way to do that while taking into account these absences?

              Comment


              • #8
                #7 Year to year changes might be fudged using some kind of interpolation, but that sounds like an upward struggle in terms of both it working well and it being easy to defend or even to explain to whoever is evaluating your work.

                Comment


                • #9
                  It's because I'm also running the same annual growth rate code on operating income data, but the operating income data is often negative, so I think the way such calculations are usually done is: (Current Period Operating Income - Previous period operating income)/ abs(previous period operating income), otherwise the direction of the growth will be the opposite of what it should be (ex. 2015 OI: -300, 2016 OI: 30, without using the absolute value: (30- (-300)/-300 = -1.1 despite the company's operating income growing).

                  Comment

                  Working...
                  X