Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unbalanced Panel to Balanced Panel

    Hi All,

    I compiled an unbalanced panel dataset, which includes six years from 2010 to 2015.
    After getting the balanced panel, I want to do a panel regression at the firm level. Firm_id is a firm’s identification number, branch_id is each branch’s identification number. ROA is return on asset, which was measured at firm level. Year and branch_id can uniquely identify each observation. Year and firm_id cannot. In 2010: 1104 observations; 2011: 754 observations; 2012: 582 observations; 2013: 444 observations; 2014: 302 observations; 2015: 208 observations.
    I want to convert this unbalanced panel data to balanced panel data. That is, I want to keep the firms who consistently appear from year 2010 to 2015 and drop those firms who do not consistently appear throughout the six years. One issue is year and firm_id cannot uniquely identify each observation. Only year and branch_id can do so, which makes this task even more complex. I am not sure how to do it using Stata. Please advise.

    Below is the sample data.
    Code:
     * Example generated by -dataex-. To install: ssc install dataex clear input int year str9 firm_id int branch_id byte roa   
    2010 "E00000019" 111003308 0.053
    2010 "E00000038" 131052542 0.464
    2010 "E00000038" 139019291 0.464
    2010 "E00000039" 131000521 0.464
    2010 "E00008577" 135001204 0.055
    2010 "E00008578" 101000449 0.055
    2011 "E00000019" 139003503 0.075
    2011 "E00000019" 122007047 0.075
    2011 "E00000038" 131052542 -0.014
    2011 "E00000038" 131000521 -0.014
    2011 "E00000038" 139019291 -0.014
    2011 "E00008577" 124006320 0.035
    2012 "E00000019" 119001952 0.088
    2012 "E00000019" 139003503 0.088
    2012 "E00000038" 140008420 -0.002
    2012 "E00008577" 124006320 0.038
    2013 "E00000019" 139003503 0.036
    2013 "E00008577" 124006320 0.067
    2014 "E00000019" 139003503 0.049
    2014 "E00000019" 111003308 0.049
    2014 "E00008577" 124006320 0.049
    2015 "E00000019" 111003308 -0.109
    2015 "E00000019" 139003503 -0.109
    end
    Many thanks,
    David

  • #2
    Firm / year obviously cannot identify uniquely observations, because Branches are nested (and hence repeated) within Firms.

    You do not need to generate balanced sample from this. You can run any regression analysis you have in mind on your unbalanced sample.

    If you insist on a balanced panel, there are at least two ways how to arrive at it:

    a) drop years until all branches have the same number of years

    b) drop branches that do not have the maximum number of years observed in your sample.

    Comment


    • #3
      Thank you for the speedy response, Joro! For the second option, it is a great idea. I have one follow-up question. First, can you please talk a little bit more about the first option? year and firm_id are the key variables that uniquely identify each observation. I am not quite sure how to drop the years using Stata? Second, I intend to do panel regression at firm level. Is there a way to keep the same firms that consistently appear across 6 years using Stata?

      Thank you in advance.

      Best,
      ​​​​​​​David

      Comment


      • #4
        You need to think what to do about the following issue: branch/year uniquely identify observation (not firm/year), but your main variable of interest (ROA) varies only at firm level.

        Otherwise keeping balanced panel as in b) above is easy.

        bysort branch: gen maxyears = _N
        keep if maxyears==6

        Comment


        • #5
          Hi Joro, thank you very much for the help! It worked very well. Now I understand how to solve similar problems in the future.

          Best,
          David

          Comment

          Working...
          X