Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Limiting panel data regression to groups with at least 10 consecutive observation

    Dear all,

    The following is my sample data. Could you please help how I can run panel data regression by only including firms that has more than 10 consecutive observation in variable "assets"? Secondly, more generally, the observation across firms in my data range from 1 to 30. How can I run regression with qualifier such as: xtreg ... if "observations>10"? Thank you for your help.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long firm float year double(assets roe)
    1 1990           .      .
    1 1991           .      .
    1 1992           .      .
    1 1993           .      .
    1 1994           .      .
    1 1995           .      .
    1 1996  1486808000      .
    1 1997  1912129000  12.39
    1 1998  2079410000  24.48
    1 1999  2137642000  16.97
    1 2000  2331655000   6.56
    1 2001  2418927000   8.05
    1 2002  2502544000  18.71
    1 2003  2748874000  19.89
    1 2004  3323099000  44.72
    1 2005  3130635000  33.72
    1 2006  3.4303e+09  29.32
    1 2007  5287194000  57.96
    1 2008  6458290000  57.09
    1 2009  7478906000  29.18
    1 2010  8733508000  30.02
    1 2011 10104882000  31.34
    1 2012 12310784000  28.08
    1 2013 14789299000  19.04
    1 2014 18354372000  23.49
    1 2015 21207642000   5.45
    1 2016 23204507000  14.12
    1 2017 23941380000  11.42
    1 2018 26018027000   7.76
    1 2019 26149093000   1.13
    1 2020 27058211000   4.47
    2 1990           .      .
    2 1991           .      .
    2 1992           .      .
    2 1993           .      .
    2 1994           .      .
    2 1995           .      .
    2 1996           .      .
    2 1997           .      .
    2 1998           .      .
    2 1999   250455000      .
    2 2000   219566000 245.85
    2 2001   207358000 -12.05
    2 2002   206917000   8.89
    2 2003   191785000   3.97
    2 2004   102719000 -275.4
    2 2005   210052000      .
    2 2006   233253000      .
    2 2007   178761000      .
    2 2008   185015000 -25.56
    2 2009   178287000  27.18
    2 2010   324493000  37.67
    2 2011   316048000  22.93
    2 2012   389094000   49.8
    2 2013   441064000  23.49
    2 2014   504865000  11.07
    2 2015   653224000  10.58
    2 2016   767479000   15.7
    2 2017   840236000   9.47
    2 2018   881275000   11.7
    2 2019   822375000  15.98
    2 2020   958791000  21.41
    3 1990           .      .
    3 1991           .      .
    3 1992           .      .
    3 1993           .      .
    3 1994           .      .
    3 1995           .      .
    3 1996           .      .
    3 1997           .      .
    3 1998           .      .
    3 1999           .      .
    3 2000           .      .
    3 2001           .      .
    3 2002           .      .
    3 2003           .      .
    3 2004           .      .
    3 2005           .      .
    3 2006           .      .
    3 2007           .      .
    3 2008  3355771000      .
    3 2009  4060574000  -7.86
    3 2010  4755801000  11.44
    3 2011  9727098000  20.04
    3 2012 11967997200      4
    3 2013 14471507176   1.33
    3 2014 13724788860 -41.47
    3 2015 16151390747 -21.29
    3 2016 14138487831   7.32
    3 2017 13834308696   3.12
    3 2018 12094994145  29.84
    3 2019 11711013834   2.84
    3 2020 11504997907 -16.49
    end
    label values firm firm
    label def firm 1 "ID:AAL", modify
    label def firm 2 "ID:AAP", modify
    label def firm 3 "ID:ABM", modify



    Best regards,

  • #2
    Code:
    bysort firm (year) : gen cons = 0 if assets==.
    by firm: replace cons = cons[_n-1] + 1 if cons==.
    by firm: egen max = max(cons)
    xtreg ... if max>=10
    Also see: https://www.stata.com/support/faqs/d...-observations/

    Comment


    • #3
      Hi Ali,

      Thank you very much for your help. Currently there are some groups/firms in my dataset that only have two observations (picture below). Is there any way to limit the regression to groups with minimum 10 observations? Thank you.



      Best regards,

      Comment


      • #4
        Your picture seems to have uploaded incorrectly because I can't see it. In general, showing pictures of data is less useful than the kind of data example you used in your first post (see the FAQ of this forum: https://www.statalist.org/forums/help, especially section 12.5)

        In any case, here is an easy way to limit the regression to groups with at least ten observations:

        Code:
        bys firm: gen obs = _N
        xtreg ... if obs>=10

        Comment


        • #5
          Hi Ali and all,

          When I run
          Code:
           
           bys firm: gen obs = _N
          ,

          The value of variable obs is the same for all of my panel id (i.e. 31 because the data is 31 years), eventhough some of the observations are missing across panel.

          When I run xtreg, the result shows:
          Number of observations: 4,796
          Number of groups: 308
          Observations per group: min: 2; average: 15; and max: 29.

          What I want is that to only include panel id with minimum of 10 observations, so that I will have smaller number of groups/panel id, but each has more observations included. Is there any way to do that? Thank you.

          Best,

          Comment


          • #6
            Your example isn't complete (i.e., 4,796 observations doesn't refer to the dataset you listed above), and unless I missed it you didn't actually show your estimation command so we don't know what variables you're using. Regardless, you probably want some variant of the following:
            Code:
            gen byte useme = !mi(firm,year,assets,roe)
            bys firm useme: gen obs = _N
            xtreg ... if useme and obs>10

            Comment

            Working...
            X