Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data

    I hope all is well and stay safe at home.

    I am a very beginner for STATA. I currently clean administrative firm-level data. I would like to create panel data for a certain period of years.

    Each firm reported its personal information in several fiscal years. They have individual “ID” and we can see “FY”, which is the fiscal year when they declared the information.

    For instance, some of these firms started reporting their income from 2006 till 2018 each year but some firms did not report it each year, others reported in a few years.
    It means there are different sets of panel data.

    Hence, I would like now to know which period of years has a large number of observations so that I can use a certain period of years to do further analysis. I need to include the information of FY 2018 (the latest) So, I would like to know how many numbers of observations for each combination of years of panel data.
    In total, there are currently around 100,000 observations. below is an extract.

    Would you mind sharing your knowledge on how I can figure it out...? I sincerely appreciate your support.


    ID Sector FY
    1733 Servicios 2006
    1733 Servicios 2007
    1733 Servicios 2008
    1733 Servicios 2009
    1733 Servicios 2010
    1733 Servicios 2011
    1808 Agropecuaria 2006
    1808 Agropecuaria 2007
    1808 Agropecuaria 2008
    1808 Agropecuaria 2009
    1808 Agropecuaria 2010
    1808 Agropecuaria 2011
    1808 Agropecuaria 2012
    1808 Agropecuaria 2013
    1808 Agropecuaria 2014
    1808 Agropecuaria 2015
    1808 Agropecuaria 2016
    1808 Agropecuaria 2017
    1808 Agropecuaria 2018

    2541 Servicios 2014
    2541 Servicios 2015
    2541 Servicios 2016
    2541 Servicios 2017
    2541 Servicios 2018

    3301 Servicios 2018

    3480 Industrias 2007
    3480 Industrias 2008
    3480 Industrias 2009
    3480 Industrias 2010
    3480 Industrias 2011
    3480 Industrias 2012
    3480 Industrias 2013
    3480 Industrias 2014
    3480 Industrias 2015
    3480 Industrias 2016
    3480 Industrias 2017
    3480 Industrias 2018

    3594 Servicios 2014
    3594 Servicios 2015
    3594 Servicios 2016
    3594 Servicios 2017
    3594 Servicios 2018

    4854 Servicios 2006

    5401 Servicios 2015
    5401 Servicios 2016
    5401 Servicios 2017
    5401 Servicios 2018

    6586 Servicios 2006
    6586 Servicios 2007
    6586 Servicios 2008
    6586 Servicios 2009
    6586 Servicios 2010
    6586 Servicios 2011
    6586 Servicios 2012
    6586 Servicios 2013
    6586 Servicios 2014
    6586 Servicios 2015
    6586 Servicios 2016
    6586 Servicios 2017
    6586 Servicios 2018

    6885 Servicios 2012
    6885 Servicios 2013
    6885 Servicios 2014
    6885 Servicios 2015
    6885 Servicios 2016
    6885 Servicios 2017
    6885 Servicios 2018

    7385 Servicios 2006
    7385 Servicios 2007
    7385 Servicios 2008
    7385 Servicios 2010

    7496 Industrias 2016
    7496 Industrias 2017

    7839 Servicios 2010

    7951 Servicios 2006
    7951 Servicios 2007
    7951 Servicios 2008
    7951 Servicios 2009
    7951 Servicios 2010
    7951 Servicios 2011
    7951 Servicios 2012
    7951 Servicios 2013
    7951 Servicios 2014
    7951 Servicios 2015
    7951 Servicios 2016
    7951 Servicios 2017
    7951 Servicios 2018

    8063 Industrias 2006
    8063 Industrias 2007
    8063 Industrias 2008
    8063 Industrias 2009
    8063 Industrias 2010
    8063 Industrias 2011
    8063 Industrias 2012
    8063 Industrias 2013

    8333 Servicios 2006
    8333 Servicios 2007
    8333 Servicios 2008
    8333 Servicios 2009
    8333 Servicios 2010
    8333 Servicios 2011
    8333 Servicios 2012
    8333 Servicios 2013
    8333 Servicios 2014
    8333 Servicios 2015
    8333 Servicios 2016
    8333 Servicios 2017
    8333 Servicios 2018

    8923 Servicios 2011
    8923 Servicios 2012
    8923 Servicios 2013
    8923 Servicios 2014
    8923 Servicios 2015


  • #2
    What you have is an unbalanced panel and most panel estimators handle unbalanced panels. Therefore, your reason for getting rid of observations is not sound and may ultimately bias your analysis. You should make use of all observations whenever you can. In any case, here is a way of finding out the most frequent time range.

    Code:
    bys ID (FY): gen range= string(FY[1])+"_"+string(FY[_N])
    tab range, sort

    Comment

    Working...
    X