Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating across observations or across variables in panel data

    Dear Statalist,

    I am quiet new to Stata (Version 14 SE on Windows 10) and I am facing the following Problem: I have unbalanced quarterly panel data on 250 corporates (identified by idno) where the variables from the income statement (57 of 406 total variables) are reported in an accumulated manner, e.g. net income in the 3rd quarter contains the respective numbers from the 1st and 2nd quarter as well. In order to retrieve the correct income statement figures for the 2nd, 3rd and 4th quarter, I need to recalculate them (e.g. Q4 obs = Financial Year End obs - Q3 obs). In my understanding, howewer, there are two ways to do this: the first is to reshape the data from long to wide to calculate the Q2 to Q4 results across variables (afterwards reshape to long again); the second is to keep the data in long format and to create new observations for the respective quaters and then to calculate the results across observations.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float idno str8 periodq double(FC_CASH_DUE_BANKS_BNK FC_NET_LOANS_BNK FC_EQUITY_BNK FC_NET_INCOME_BNK FC_PERSONNEL_EXP_BNK)
    1 "2007_3" 3.1836e+10 4.42465e+11 1.17704e+11 4.787e+09 6.234e+09
    1 "2007_6" 3.5449e+10 4.57404e+11 1.19211e+11 4.234e+09 6.309e+09
    1 "2007_9" 3.2766e+10 4.78207e+11 1.19978e+11 3.373e+09 4.677e+09
    1 "2007_12" 4.0144e+10 5.100e+11 1.230e+11 1.5365e+10 2.2689e+10
    1 "2008_3" 4.6888e+10 5.2531e+11 1.25627e+11 2.373e+09 4.951e+09
    1 "2008_6" 3.2255e+10 5.24783e+11 1.27176e+11 2.003e+09 6.913e+09
    1 "2008_9" 5.435e+10 7.42329e+11 1.37691e+11 5.270e+08 5.858e+09
    1 "2008_12" 2.6895e+10 7.220e+11 1.350e+11 5.605e+09 2.2746e+10
    1 "2009_3" 2.6681e+10 6.80862e+11 1.38201e+11 2.141e+09 7.588e+09
    1 "2009_6" 2.5133e+10 6.51529e+11 1.46614e+11 2.721e+09 6.917e+09
    1 "2009_9" 2.1068e+10 6.22511e+11 1.54101e+11 3.588e+09 7.311e+09
    1 "2009_12" 2.6206e+10 6.020e+11 1.570e+11 1.1728e+10 2.6928e+10
    1 "2010_3" 3.1422e+10 6.75613e+11 1.56569e+11 3.326e+09 7.276e+09
    1 "2010_6" 3.2806e+10 6.63647e+11 1.62968e+11 4.795e+09 7.616e+09
    1 "2010_9" 2.396e+10 6.5637e+11 1.6603e+11 4.418e+09 6.661e+09
    1 "2010_12" 2.7567e+10 6.610e+11 1.680e+11 1.737e+10 2.8124e+10
    1 "2011_3" 2.3469e+10 6.56246e+11 1.72798e+11 5.555e+09 8.263e+09
    1 "2011_6" 3.0466e+10 6.61216e+11 1.75079e+11 5.431e+09 7.569e+09
    1 "2011_9" 5.6766e+10 6.68503e+11 1.74487e+11 4.262e+09 6.908e+09
    1 "2011_12" 5.9602e+10 6.960e+11 1.760e+11 1.8976e+10 2.9037e+10
    end

    I' ve already begun with the fist approach and figured that this involves a lot of typing, as I have 57 variables from the income statement and 36 quaterly observations. After reshaping from long to wide I would type something like:

    Code:
    gen incloan2007_q4= incloan2007_12 - incloan2007_9
    gen incloan2007_q3= incloan2007_9 - incloan2007_6
    gen incloan2007_q2= incloan2007_6 - incloan2007_3

    and so on for the following years (up to 2018) in order to calculate the quarterly results for each idno.

    My questions now are: (a) would the second approach with calculations across observations (or another one) be more suitable / time-saving compared to the first approach and (b) if yes, how could that be done with proper stata commands / functions (maybe a loop with foreach)? And (c) is there a simple way to shorten the calculations for the first reshapig-approach, maybe with foreach?

    Any help / advise would be appriciated.
    Kind regards,
    Julian



  • #2
    (a) Yes, this is much more easily done in long layout calculating over observations. In fact, almost everything in Stata is more easily done in long layout calculating over observations. Whenever you find yourself tempted to reshape your data to wide, stop and ask yourself whether you really should be doing it. The answer is usually no. Wide layout is useful for certain types of graphs, for making the data easier for human eyes to read (typically before exporting it to a spreadsheet or other type of document), or for a handful of commands such as paired ttests. But nearly everything else works best in long layout.

    (b) The only real obstacle here is that you don't have a Stata internal format date variable--you have this string variable that human eyes can read easily but makes no sense to Stata. Once you convert it, it's very simple to do what you ask:

    Code:
    //    GENERATE A REAL STATA INTERNAL FORMAT QUARTERLY DATE
    gen qdate = qofd(dofm(monthly(periodq, "YM")))
    assert !missing(qdate)
    format qdate %tq
    
    xtset idno qdate
    by idno: gen quarterly_fc_net_income_bank = FC_NET_INCOME_BNK
    by idno: replace  quarterly_fc_net_income_bank = D1.quarterly_fc_net_income_bank if _n > 1
    Note: I hope I have correctly guessed the variable that you want to "de-accumulate." Your example code refers to variables called incloan*, but there is no comparably named variable in your data example. If I picked the wrong one, clearly all you need to do is change the name of the variable in the first -by idno:...- command.

    (c) Yes, you could shorten the code by writing a loop. But it would still be very clunky compared to doing it in long layout. So yes, you can improve the wide-layout version of the code. But if it's not worth doing, it's not worth doing well.

    Comment


    • #3
      Dear Clyde, thanks for your insightful advise and the coding!
      The approach you outlined really looks a lot easier although I have not been able to calculate the correct quaterly numbers with the supplied codes. First, excuse my data example, I have already got a (hopefully correct) quarterly date variable, the misleading variabe periodq has been created for the reshape approach and contains the period length from 12 to 3 months.

      As the new timevar qdate seems ok to me, the challenge is to calculate the correct quaterly numbers, what I may have mistakenly described in #1. So the accounting numbers are mainly accumulated but only for one fiscal year, say 2007 or 2008. So q1 always displays the correct quarterly income figures whereas q2 contains the income figures of q1 as well. By subtracting the latter report from the actual report for q4, q3 and q2 per idno for each fiscal year would give the correct quarterly results.

      I tried the following codes with the displayed example data:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(idno qdate ydate) str22 FC_PERIOD_TYPE byte FC_PERIOD_LENGTH double(FC_NET_INCOME_BNK FC_NET_INT_INC_BNK FC_PERSONNEL_EXP_BNK)
      7 188 2007 "3 Months - 1st Quarter"  3  2.132e+09  2.053e+09  4.329e+09
      7 189 2007 "6 Months - Interim"      6  3.909e+09  4.024e+09  8.203e+09
      7 190 2007 "9 Months - 3rd Quarter"  9  5.540e+09  6.157e+09  9.899e+09
      7 191 2007 "Year End"               12  6.973e+09  9.188e+09 1.3122e+10
      7 192 2008 "3 Months - 1st Quarter"  3 -1.410e+08  2.676e+09  2.934e+09
      7 193 2008 "6 Months - Interim"      6  5.040e+08  5.627e+09  5.613e+09
      7 194 2008 "9 Months - 3rd Quarter"  9  9.180e+08  8.689e+09  7.541e+09
      7 195 2008 "Year End"               12 -3.896e+09 1.2453e+10  9.606e+09
      7 196 2009 "3 Months - 1st Quarter"  3  1.182e+09  3.843e+09  2.976e+09
      7 197 2009 "6 Months - Interim"      6  2.255e+09  6.607e+09  6.115e+09
      7 198 2009 "9 Months - 3rd Quarter"  9  3.649e+09  9.740e+09  8.956e+09
      7 199 2009 "Year End"               12  4.958e+09 1.2459e+10  1.131e+10
      7 200 2010 "3 Months - 1st Quarter"  3  1.777e+09  3.671e+09  3.575e+09
      7 201 2010 "6 Months - Interim"      6  2.943e+09  7.646e+09  6.612e+09
      7 202 2010 "9 Months - 3rd Quarter"  9  1.724e+09 1.1061e+10  9.594e+09
      7 203 2010 "Year End"               12  2.330e+09 1.5583e+10 1.2671e+10
      7 204 2011 "3 Months - 1st Quarter"  3  2.130e+09  4.167e+09  4.278e+09
      7 205 2011 "6 Months - Interim"      6  3.363e+09  8.659e+09  7.643e+09
      7 206 2011 "9 Months - 3rd Quarter"  9  4.140e+09 1.2934e+10 1.0337e+10
      7 207 2011 "Year End"               12  4.326e+09 1.7445e+10 1.3135e+10
      7 208 2012 "3 Months - 1st Quarter"  3  1.401e+09  4.193e+09  3.656e+09
      7 209 2012 "6 Months - Interim"      6  2.063e+09  8.087e+09  7.048e+09
      7 210 2012 "9 Months - 3rd Quarter"  9  2.818e+09 1.1804e+10 1.0349e+10
      7 211 2012 "Year End"               12  3.160e+08 1.5975e+10  1.349e+10
      7 212 2013 "3 Months - 1st Quarter"  3  1.661e+09  3.650e+09  3.548e+09
      7 213 2013 "6 Months - Interim"      6  1.995e+09  7.301e+09  6.752e+09
      7 214 2013 "9 Months - 3rd Quarter"  9  2.047e+09 1.0939e+10  9.657e+09
      7 215 2013 "Year End"               12  6.810e+08 1.4834e+10 1.2329e+10
      7 216 2014 "3 Months - 1st Quarter"  3  1.103e+09  3.375e+09  3.349e+09
      7 217 2014 "6 Months - Interim"      6  1.341e+09  7.041e+09  6.339e+09
      7 218 2014 "9 Months - 3rd Quarter"  9  1.250e+09 1.0454e+10  9.530e+09
      7 219 2014 "Year End"               12  1.691e+09 1.4272e+10 1.2512e+10
      7 220 2015 "3 Months - 1st Quarter"  3  5.590e+08  4.210e+09  3.433e+09
      7 221 2015 "6 Months - Interim"      6  1.377e+09  8.332e+09  6.880e+09
      7 222 2015 "9 Months - 3rd Quarter"  9 -4.647e+09 1.2025e+10 1.0189e+10
      7 223 2015 "Year End"               12 -6.772e+09 1.5881e+10 1.3293e+10
      7 224 2016 "3 Months - 1st Quarter"  3  2.360e+08  3.924e+09  3.194e+09
      7 225 2016 "6 Months - Interim"      6  2.560e+08  7.617e+09  6.153e+09
      7 226 2016 "9 Months - 3rd Quarter"  9  5.340e+08 1.1142e+10  9.047e+09
      7 227 2016 "Year End"               12 -1.356e+09 1.4707e+10 1.1874e+10
      7 228 2017 "3 Months - 1st Quarter"  3  5.750e+08  3.067e+09  3.147e+09
      7 229 2017 "6 Months - Interim"      6  1.041e+09  6.175e+09  6.068e+09
      7 230 2017 "9 Months - 3rd Quarter"  9  1.689e+09  9.561e+09  8.874e+09
      7 231 2017 "Year End"               12 -7.350e+08 1.2378e+10 1.2253e+10
      7 232 2018 "3 Months - 1st Quarter"  3  1.200e+08  2.913e+09  3.002e+09
      7 233 2018 "6 Months - Interim"      6  5.210e+08  6.342e+09  6.052e+09
      7 234 2018 "9 Months - 3rd Quarter"  9  7.500e+08  9.698e+09  8.911e+09
      end
      format %tq qdate
      Both codes seem to disregard the fiscal year structure for the accumulated figures, so only the figures for obs 1 to 3 are correct in the second code. I would appreciate further guidance

      Code:
      //Clydes codes
      xtset idno qdate
      by idno: gen quarterly_fc_net_income_bank = FC_NET_INCOME_BNK
      by idno: replace  quarterly_fc_net_income_bank = D1.quarterly_fc_net_income_bank if _n > 1
      //other code
      bysort idno: gen quarter_netinc2 = FC_NET_INCOME_BNK
      by idno (qdate): replace quarter_netinc2 = (FC_NET_INCOME_BNK - FC_NET_INCOME_BNK[_n-1])
      Best regards,
      Julian
      Last edited by Julian Scholz; 05 Jun 2019, 02:27.

      Comment


      • #4
        OK. That's not hard to fix.

        Code:
        egen panel = group(idno ydate)
        xtset panel qdate
        
        by panel: gen quarterly_fc_net_income_bank = FC_NET_INCOME_BNK
        by panel: replace  quarterly_fc_net_income_bank = D1.quarterly_fc_net_income_bank if _n > 1

        Comment


        • #5
          Thanks again, Clyde. The D-Operator did its job, although I had to change the code a little

          Code:
          by panel: gen quarterly_fc_net_income_bank = FC_NET_INCOME_BNK
          by panel: replace quarterly_fc_net_income_bank = D1.FC_NET_INCOME_BNK if _n > 1
          I replaced quarterly_fc_net_income_bank with FC_NET_INCOME_BNK in the 2nd line and now the quaterly results are correct.

          Comment

          Working...
          X