Weakly balanced Panel Data Set

Katharina Maier

Join Date: May 2019
Posts: 29

Weakly balanced Panel Data Set

01 Jun 2019, 09:30

Hi Statalisters,

I am a novice user in Stata and it's my first post. I'm working with Stata.14 and Windows 7.

I'm working on a Panel Data Set for all commerical banks (ID Variable) in the U.S. for the period Q4-1995 - Q4-2018 (time variable). So I have data on a bank-year level (I only use the Q4 data of each year). My goals are

1) to calculate four bank risk proxies
2) to show the correlations between all four risk proxies
3) and do finally two regeressions with the risk proxies (a binary probability model).

I have converted the string variables name and date to numeric variables name2 date2. I have replaced missing variables with 0 and have checked for other missing variables.
There are duplicates within my bank names, (because banks with the same name have several bank branches ) which I fixed with generating the ID variable id.
I tryed to use the dataex command to provide you a data sample, but I'm not sure if I used the command correctly, so here is a data example:

name2

date2

asset

lnatres

ore

1st American State Bank of Minnesota

Q4 1995

16050

1st Bank

Q4 1995

15908

16050

1st Bank & Trust

Q4 1995

12888

16050

1st Bank of Troy

Q4 1995

16050

1st Business Bank

Q4 1995

16050

1st Constitution Bank

Q4 1995

16050

1st Financial Bank South Dakota

Q4 1995

16050

1st Floyd Bank

Q4 1995

16050

1st National Bank

Q4 1995

16050

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id long name2 float date2 double asset long(lnatres ore)
 1  6 143   16050   103   37
 2  7 143   99230  1419    0
 3  8 143   54413   450    0
 4 11 143   14627   132    0
 5 13 143  922734  8995  692
 6 18 143  130484  1077    0
 7 23 143   77569   393    0
 8 28 143   40343   921    5
 9 30 143   26353   245    0
10 34 143   62315   207    0
11 37 143   38636   266  111
12 42 143   20637   114    0
13 43 143   20313   184    0
14 44 143 1736391 27470 1452
15 47 143   54356   257  270
end
format %tq!Qq-CCYY date2
label values name2 name2
label def name2 6 "1st American State Bank of Minnesota", modify
label def name2 7 "1st Bank", modify
label def name2 8 "1st Bank & Trust", modify
label def name2 11 "1st Bank of Troy", modify
label def name2 13 "1st Business Bank", modify
label def name2 18 "1st Choice Bank", modify
label def name2 23 "1st Constitution Bank", modify
label def name2 28 "1st Financial Bank South Dakota", modify
label def name2 30 "1st Floyd Bank", modify
label def name2 34 "1st National Bank", modify
label def name2 37 "1st National Community Bank", modify
label def name2 42 "1st Security Bank of Laurel", modify
label def name2 43 "1st Security Bank of West Yellowstone, Montana", modify
label def name2 44 "1st Source Bank", modify
label def name2 47 "1st State Bank and Trust Company of Palos Hills", modify

Code:

replace p3asset = 0 if (p3asset >= .)

Code:

mdesc

Code:

gen id = _n

Code:

sort id date2

Code:

xtset id date2
       panel variable:  id (weakly balanced)
        time variable:  date2, Q4-1995 to Q4-2018
                delta:  1 quarter

Code:

xtdescribe

      id:  1, 2, ..., 172431                                 n =     172431
   date2:  Q4-1995, Q4-1996, ..., Q4-2018                    T =         24
           Delta(date2) = 1 quarter
           Span(date2)  = 93 periods
           (id*date2 uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       1       1         1         1       1       1

     Freq.  Percent    Cum. |  Pattern*
 ---------------------------+--------------------------
     9940      5.76    5.76 |  1.......................
     9527      5.53   11.29 |  .1......................
     9142      5.30   16.59 |  ..1.....................
     8773      5.09   21.68 |  ...1....................
     8579      4.98   26.65 |  ....1...................
     8314      4.82   31.48 |  .....1..................
     8079      4.69   36.16 |  ......1.................
     7887      4.57   40.74 |  .......1................
     7769      4.51   45.24 |  ........1...............
    94421     54.76  100.00 | (other patterns)
 ---------------------------+--------------------------
   172431    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXX
 ------------------------------------------------------
 *Each column represents 4 periods.

A weakly balanced dataset arise, if each panel contains the same number of observations, but NOT the same time points. I do have this case, because I have a lot more observations in one time period.
Can you please help me, how to have a balanced panel? Does a weakly balanced panel has an influence on my correlation and regression results?

Thank you very much!

Katharina

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

01 Jun 2019, 10:44

Welcome to Statalist.

Thank you for the example with dataex - it is correct, and it makes it clear that you correctly converted your dates from string to Stata Internal Format dates. That's often a stumbling block, so it's good that you avoided that.

Your problems stem from the incorrect creation of your id variable.

Code:

gen id = _n

What you have done is given a separate id to each observation - in effect, to each combination of bank, branch, and date. What you perhaps wanted, if indeed your unit of analysis is to be bank branches, is an id that is distinct for each combination of bank and branch. That might be something like

Code:

egen id = group(name branch), missing

where I assume your data includes some identifier of branches that is distinct within a given bank, and I allow for the branch identifier to be missing, which perhaps is the case for single-branch banks.

Or perhaps your unit of analysis is to be banks rather than bank branche, so you will want to use the collapse command to reduce your data into a single observation per bank, combining the values from the individual observations. That might be something like

Code:

collapse (sum) asset inatres ore, by(name2 date2)

where I'm just adding up the values from the separate branches, because I have no clue what asset and inatres and ore mean.

Or perhaps while you have observations for multiple branches for the same bank, it is in fact the case that the variables asset and inatres and ore are measured at the bank level rather than the branch level and are the same for every branch of any bank, in which case you want to drop all but one branch for each bank. That would probably best be done with collapse again using something like

Code:

collapse (firstnm) asset inatres ore, by(name2 date2)

The failure of your id variable was apparent from the output of the xtdescribe command, which suggests you need to review the documentation for xtdescribe in the Stata Longitudinal Data/Panel Data Reference Manual PDF included with your Stata installation and accessible from Stata's Help menu. In particular xtdescribe tells us that you have 172,431 distinct values for id and each id appears in exactly one observation.

Code:

id: 1, 2, ..., 172431 n = 172431 ... Distribution of T_i: min 5% 25% 50% 75% 95% max 1 1 1 1 1 1
Comment
Katharina Maier

Join Date: May 2019

Posts: 29
#3

01 Jun 2019, 11:56

Dear William,

thank you for your quick and detailed response and I am sorry for my unclear presentation of the ID Variable. The variables like asset (total assets) lnatres(banks' loan loss reserves) ore (other real estate owned loans) are quarterly provided balance sheet and profit and loss data provided by the Federal Deposit Insurance Corporation (FDIC). In the data set used to be the column for the information in which city the bank is located. For example there is one "Commercial Bank A" in London and one "Commercial Bank A" in Oxford. So the banks just have the same names, but different balance sheet values. Unfortunatly I already have deleted the column City, before I have known that I would need it I have also tryed to rename the banks just in numbers with

Code:

egen name_new = fill(1 2)

What I think should be the same result as

Code:

gen id = _n

Code:

duplicates list name2 date2

Thank you very much for your advice within xtdescribe . I have to study it first. But it is not good if the number of id is the same like n (172431)?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

01 Jun 2019, 13:42

So each branch has a separate balance sheet, and your unit of analysis is going to be at the branch level rather than the bank level. Thus your id should be something like

Code:

egen id = group(name city)

it is not good if the number of id is the same like n (172431)?

It is not good.

Unfortunatly I already have deleted the column City, before I have known that I would need it

Then you must go back to the data that includes it and work your way forward again because without city there is no reliable way for you construct a identifier for each branch of each bank, and with a distinct identifier for each branch of each bank, you cannot create the panel information with xtset.

I thought there might be a way around the problem if your data had been organized by bank, branch, and year at the time you constructed the id variable, but your sample data in post #1 makes it clear that it was not.
Comment

Katharina Maier

Join Date: May 2019
Posts: 29

02 Jun 2019, 05:41

Thank you very much for your input. I still have the column bankingclass "bkclass2" in my data set. I tryed to built an ID id2 with name2 bkclass2 date2 with the command

Code:

egen id2 = group(name2 bkclass2 date2)

But there are still duplicates, because of the missing City as a distinctive feature. The first duplicate variable starts at id2 number 714 ( I am sorry, I didn't know how to show you just the duplicated ones).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id2 long(name2 bkclass2) float date2 double asset
  1 1 2 191 512346
  2 2 3 219 191843
  3 2 3 220 194321
  4 2 3 221 192823
  5 2 3 222 318444
  6 2 3 223 316293
  7 2 3 224 314601
  8 2 3 225 305914
  9 2 3 226 314711
 10 2 3 227 324550
 11 2 3 228 325441
 12 2 3 229 323306
 13 2 3 230 327837
 14 2 3 231 354590
 15 2 3 232 349891
 16 2 3 233 343830
 17 2 3 234 354182
 18 2 3 235 377395
 19 3 1 219  66310
 20 3 1 220  66027
 21 3 1 221  72358
 22 3 1 222  76377
 23 3 1 223  62999
 24 4 2 205 754473
 25 4 2 206 768136
 26 4 2 207 785634
 27 4 2 208 768016
 28 4 3 185 294667
 29 4 3 186 293519
 30 4 3 187 324868
 31 4 3 188 313619
 32 4 3 189 470509
 33 4 3 190 438547
 34 4 3 191 456267
 35 4 3 192 451490
 36 4 3 193 431129
 37 4 3 194 481242
 38 4 3 195 467198
 39 4 3 196 582559
 40 4 3 197 592266
 41 4 3 198 598751
 42 4 3 199 651257
 43 4 3 200 632297
 44 4 3 201 643899
 45 4 3 202 653837
 46 4 3 203 716070
 47 4 3 204 745201
 48 5 2 181  21595
 49 5 2 182  34110
 50 5 2 183  39840
 51 5 2 184  42138
 52 5 2 185  47078
 53 5 2 186  60600
 54 5 2 187  68406
 55 5 2 188  65534
 56 5 2 189  68749
 57 5 2 190  78635
 58 5 2 191  90143
 59 5 2 192  91747
end
format %tq!Qq-CCYY date2
label values name2 name2
label def name2 1 "09Republic Bank, Inc.", modify
label def name2 2 "1880 Bank", modify
label def name2 3 "1919 Investment Counsel & Trust Company,  National Association", modify
label def name2 4 "1ST BANK", modify
label def name2 5 "1st Advantage Bank", modify
label values bkclass2 bkclass2
label def bkclass2 1 "N", modify
label def bkclass2 2 "NM", modify
label def bkclass2 3 "SM", modify

Is there any command to numbering the duplicate variables (bank names), to make a difference between them just in the name, like "Access Bank" and "Access Bank 1".

An command like "If the name is duplicated, then name it name_1"

Otherwise I am learning my lesson with rushed clearing of dates and I have to do my dataset again with the columns city and/or zip - but my time to the deadline of my thesis is running.

Thank you

Comment

Katharina Maier

Join Date: May 2019
Posts: 29

02 Jun 2019, 06:26

I numbered the bank names with

Code:

egen name3 = fill(1 2)

Then I generatet a new variable name4 with the bank name2 and name3 to make a difference.

Code:

egen name4 = group(name2 name3)

Generated a new ID variable id3 with the new name4, bankingclass and the date

Code:

egen id3 = group(name4 bkclass2 date2)

Code:

xtset id3 date2
       panel variable:  id3 (weakly balanced)
        time variable:  date2, Q1-1995 to Q4-2018
                delta:  1 quarter

. xtset id3
       panel variable:  id3 (balanced)

. xtset date2
       panel variable:  date2 (unbalanced)

. xtdescribe
must specify timevar; use xtset
r(459);

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long name2 float(name3 name4 id3) long bkclass2 float(date2 date) double asset
1 428586   1   1 2 191 17531 512346
2 610282   2   2 3 219 20088 191843
2 615923   3   3 3 220 20178 194321
2 621491   4   4 3 221 20269 192823
2 627000   5   5 3 222 20361 318444
2 632445   6   6 3 223 20453 316293
2 637818   7   7 3 224 20544 314601
2 643142   8   8 3 225 20635 305914
2 648413   9   9 3 226 20727 314711
2 653617  10  10 3 227 20819 324550
2 658764  11  11 3 228 20909 325441
2 663862  12  12 3 229 21000 323306
2 668908  13  13 3 230 21092 327837
2 673914  14  14 3 231 21184 354590
2 678868  15  15 3 232 21274 349891
2 683785  16  16 3 233 21365 343830
2 688655  17  17 3 234 21457 354182
2 693465  18  18 3 235 21549 377395
3 610283  19  19 1 219 20088  66310
3 615924  20  20 1 220 20178  66027
3 621492  21  21 1 221 20269  72358
3 627001  22  22 1 222 20361  76377
3 632446  23  23 1 223 20453  62999
4 384227  24  24 3 185 16982 294667
4 391705  25  25 3 186 17074 293519
4 399154  26  26 3 187 17166 324868
end
format %tq!Qq-CCYY date2
format %tdNN/DD/CCYY date
label values name2 name2
label def name2 1 "09Republic Bank, Inc.", modify
label def name2 2 "1880 Bank", modify
label def name2 3 "1919 Investment Counsel & Trust Company,  National Association", modify
label def name2 4 "1ST BANK", modify
label values bkclass2 bkclass2
label def bkclass2 1 "N", modify
label def bkclass2 2 "NM", modify
label def bkclass2 3 "SM", modify

The panel is still weakly balanced, and date2 is unbalanced not id3. How can I fix this? Thank you very much for your patience.

Comment

Katharina Maier

Join Date: May 2019
Posts: 29

02 Jun 2019, 07:27

The number of id3 and n are still the same ..

Code:

xtdescribe

     id3:  1, 2, ..., 698215                                 n =     698215
   date2:  Q1-1995, Q2-1995, ..., Q4-2018                    T =         96
           Delta(date2) = 1 quarter
           Span(date2)  = 96 periods
           (id3*date2 uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       1       1         1         1       1       1

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+--------------------------------------------------------------------------------------------------
    10242      1.47    1.47 |  1...............................................................................................
    10169      1.46    2.92 |  .1..............................................................................................
    10052      1.44    4.36 |  ..1.............................................................................................
     9940      1.42    5.79 |  ...1............................................................................................
     9838      1.41    7.20 |  ....1...........................................................................................
     9690      1.39    8.58 |  .....1..........................................................................................
     9586      1.37    9.96 |  ......1.........................................................................................
     9527      1.36   11.32 |  .......1........................................................................................
     9450      1.35   12.67 |  ........1.......................................................................................
   609721     87.33  100.00 | (other patterns)
 ---------------------------+--------------------------------------------------------------------------------------------------
   698215    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

02 Jun 2019, 08:00

In post #2 I wrote

What you perhaps wanted, if indeed your unit of analysis is to be bank branches, is an id that is distinct for each combination of bank and branch. That might be something like

Code:

egen id = group(name branch), missing

You persistently include the date in your id. That is wrong. You need the id to be distinct for each combination of bank and branch. There will be duplicates of the id - one for each quarter of data. It is the combination of id and date2 that must be distinct.

You wrote in post #5

I still have the column bankingclass "bkclass2" in my data set

What is bankingclass? For each bank, does each branch of that bank have a different bankingclass? And is bankingclass the same in every year for a given branch?

If you run

Code:

isid name2 bankingclass2 date2

and it reports duplicates, then bankingclass2 is of no use to you, and you must face reality and go back to data that includes something that distinguishes branches of the same bank.

If it does not report any duplicates, then the following code might accomplish what you need.

Code:

// eliminate mistaken ids drop id id2 id3 // create id to identify bank and branch egen id = group(name2 bankingclass2), missing // define the panel structure xtset id date2 // report on the results xtdescribe

I am not certain that city and zip are adequate - what if a bank has two branches in the same city? What if it has two branches in the same zip? You need

Code:

isid name city date2

or

Code:

isid name zip date2

to not report any duplicates.
1 like
Comment

Katharina Maier

Join Date: May 2019
Posts: 29

02 Jun 2019, 10:04

You persistently include the date in your id. That is wrong. You need the id to be distinct for each combination of bank and branch. There will be duplicates of the id - one for each quarter of data. It is the combination of id and date2 that must be distinct.

Thank you very much! Now I (hopefully) understand it.

I went back in my data organisation and here isp, a test data sample for the period 1999 - 2000 with quarterly data. With the variables cert (certification number), city, zip, asset, date2, bkclass2 (banking class) and the new id variable.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(cert name2) str16 city long(zip asset bkclass2) float(date2 id)
15448  1 "Hancock"      56244  19854 2 156  1
15448  1 "Hancock"      56244  19996 2 157  1
15448  1 "Hancock"      56244  19491 2 158  1
15448  1 "Hancock"      56244  20328 2 159  1
15448  1 "Hancock"      56244  19239 2 160  1
15448  1 "Hancock"      56244  19896 2 161  1
15448  1 "Hancock"      56244  19415 2 162  1
15448  1 "Hancock"      56244  20418 2 163  1
22039  2 "Sidney"       59270  42944 3 156  2
22039  2 "Sidney"       59270  42813 3 157  2
22039  2 "Sidney"       59270  42932 3 158  2
22039  2 "Sidney"       59270  43778 3 159  2
22039  2 "Sidney"       59270  42086 3 160  2
22039  2 "Sidney"       59270  39771 3 161  2
22039  2 "Sidney"       59270  40115 3 162  2
22039  2 "Sidney"       59270  43333 3 163  2
23309  2 "West Branch"  48661 154272 2 156  3
23309  2 "West Branch"  48661 158911 2 157  3
23309  2 "West Branch"  48661 166351 2 158  3
23309  2 "West Branch"  48661 177982 2 159  3
23309  2 "West Branch"  48661 182182 2 160  3
23309  2 "West Branch"  48661 187822 2 161  3
23309  2 "West Branch"  48661 187754 2 162  3
16419  3 "Broken Bow"   74728  68587 2 156  4
16419  3 "Broken Bow"   74728  68613 2 157  4
16419  3 "Broken Bow"   74728  68949 2 158  4
end
format %tq!Qq-CCYY date2
label values name2 name2
label def name2 1 "1st American State Bank of Minnesota", modify
label def name2 2 "1st Bank", modify
label def name2 3 "1st Bank & Trust", modify
label values bkclass2 bkclass2
label def bkclass2 2 "NM", modify
label def bkclass2 3 "SM", modify

I was doing the following codes on your advice:

Code:

isid name2 bkclass2 date2variables name2 bkclass2 date2
do not uniquely identify the observations r(459);

So I checked for the variable cert:

Code:

 isid name2 cert date2

egen id = group(name2 cert), missing

xtset id date2
       panel variable:  id (unbalanced)
        time variable:  date2, Q1-1999 to Q4-2000, but with gaps
                delta:  1 quarter

Why does time variable report "..., but with gaps" ?

Code:

xtdescribe

      id:  1, 2, ..., 9651                                   n =       9651
   date2:  Q1-1999, Q2-1999, ..., Q4-2000                    T =          8
           Delta(date2) = 1 quarter
           Span(date2)  = 8 periods
           (id*date2 uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       2       8         8         8       8       8

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+----------
     7441     77.10   77.10 |  11111111
      194      2.01   79.11 |  1111....
      184      1.91   81.02 |  111111..
      181      1.88   82.89 |  11111...
      180      1.87   84.76 |  1.......
      176      1.82   86.58 |  11......
      172      1.78   88.36 |  1111111.
      164      1.70   90.06 |  111.....
      141      1.46   91.52 |  .....111
      818      8.48  100.00 | (other patterns)
 ---------------------------+----------
     9651    100.00         |  XXXXXXXX

Code:

xtset id
       panel variable:  id (unbalanced)

. xtset date2
       panel variable:  date2 (unbalanced)

And now the panel is unbalanced.

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

#10

02 Jun 2019, 12:07

Why does time variable report "..., but with gaps" ?

Presumably for at least one of your branches, there is a quarter missing between the first and last quarter of that branch's data. That is also why the panel is unbalanced rather than weakly balanced. Do not worry about balance at this point - the documentation tells us that the xt commands are robust to this problem, and Stata simply excludes from the analysis any observation for which any of the variables are missing values.

Here's an example using yearly dates. Noe that 2004 is missing for id 2.

Code:

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input float(id year)

            id       year
  1. 1 2001
  2. 1 2002
  3. 1 2003
  4. 1 2004
  5. 1 2005
  6. 2 2001
  7. 2 2002
  8. 2 2003
  9. 2 2005
 10. end

. xtset id year
       panel variable:  id (unbalanced)
        time variable:  year, 2001 to 2005, but with a gap
                delta:  1 unit

. xtdescribe

      id:  1, 2, ..., 2                                      n =          2
    year:  2001, 2002, ..., 2005                             T =          5
           Delta(year) = 1 unit
           Span(year)  = 5 periods
           (id*year uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         4       4       4         4         5       5       5

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+---------
        1     50.00   50.00 |  111.1
        1     50.00  100.00 |  11111
 ---------------------------+---------
        2    100.00         |  XXXXX

Comment

Katharina Maier

Join Date: May 2019

Posts: 29
#11

03 Jun 2019, 05:37

Presumably for at least one of your branches, there is a quarter missing between the first and last quarter of that branch's data.

I checked my data for a missing quarter and it's complete. I might found reseon of the gaps in another Statalist Post. So I think the gaps are not due to a forgotten quarter, rather my data set is unbalanced due to bank branchesmentering and dropping out of the data set I am analyzing. Here ist the link to the post https://www.statalist.org/forums/forum/general-stata-discussion/general/1452302-dealing-with-unbalanced-panel-data

If I just run the command tsset with date2 the follwoing error message shows up:

Code:

tsset date2 repeated time values in sample r(451);

If I tsset the id and date2 variable it works.

Code:

tsset id date2 panel variable: id (unbalanced) time variable: date2, Q4-1999 to Q4-2000, but with gaps delta: 1 quarter

Do not worry about balance at this point - the documentation tells us that the xt commands are robust to this problem, and Stata simply excludes from the analysis any observation for which any of the variables are missing values.

Thank you very much for this advice. Where in the documentation (xtdescribe) can I see that the xt commands are robust to this problem?

Thank you again for guiding me through. So I'm working now on my data set with the full period of 1995-2018 to generate the correct ID Variable.

Dealing with unbalanced panel data - Statalist

https://www.statalist.org

Hi, I am intending to analyze company payout policy determinants and have a set of panel data. My data set is unbalanced due to companies entering and
Comment

Katharina Maier

Join Date: May 2019
Posts: 29

#12

03 Jun 2019, 09:33

This is hopefully my final dataset for the period Q4 1995 - Q4 2018. I generated the ID with the variable cert (certification number).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(cert name2) float date2 double asset float id
35260 1 191 512346  1
 4829 2 219 191843  2
 4829 2 223 316293  2
 4829 2 227 324550  2
 4829 2 231 354590  2
 4829 2 235 377395  2
33785 3 219  66310  3
33785 3 223  62999  3
16522 4 187 324868  4
16522 4 191 456267  4
16522 4 195 467198  4
16522 4 199 651257  4
16522 4 203 716070  4
16522 4 207 785634  4
57899 5 183  39840  5
57899 5 187  68406  5
57899 5 191  90143  5
57899 5 195 126262  5
57899 5 199 111591  5
57899 5 203  92716  5
57899 5 207  78804  5
57899 5 211  74764  5
57899 5 215  76660  5
57899 5 219  76151  5
57899 5 223  84397  5
57899 5 227  95198  5
end
format %tq!Qq-CCYY date2
label values name2 name2
label def name2 1 "09Republic Bank, Inc.", modify
label def name2 2 "1880 Bank", modify
label def name2 3 "1919 Investment Counsel & Trust Company,  National Association", modify
label def name2 4 "1ST BANK", modify
label def name2 5 "1st Advantage Bank", modify

How I generated the id variable and declared the panel variable

Code:

isid name2 cert date2

. egen id = group(name2 cert), missing

. sort id date2

. xtset id date2
       panel variable:  id (unbalanced)
        time variable:  date2, Q4-1995 to Q4-2018, but with gaps
                delta:  1 quarter

The documentation:

Code:

xtdescribe

      id:  1, 2, ..., 16208                                  n =      16208
   date2:  Q4-1995, Q4-1996, ..., Q4-2018                    T =         24
           Delta(date2) = 1 quarter
           Span(date2)  = 93 periods
           (id*date2 uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       1       3         8        18      24      24

     Freq.  Percent    Cum. |  Pattern*
 ---------------------------+--------------------------
     2501     15.43   15.43 |  111111111111111111111111
      879      5.42   20.85 |  1.......................
      869      5.36   26.22 |  11......................
      748      4.62   30.83 |  111.....................
      585      3.61   34.44 |  1111....................
      559      3.45   37.89 |  11111...................
      392      2.42   40.31 |  111111..................
      302      1.86   42.17 |  1111111.................
      292      1.80   43.97 |  1111111111..............
     9081     56.03  100.00 | (other patterns)
 ---------------------------+--------------------------
    16208    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXX
 ------------------------------------------------------
 *Each column represents 4 periods.

Is my id variable now fine and can I start based on this panel data set my regressions? Thank you so much!

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#13

03 Jun 2019, 12:17

Because your time variable is a quarterly date, and because you have dropped quarters 1, 2, and 3, you now have gaps in your data because the observation following Q4-1995 is not Q1-1996 but rather Q4-1996. You should change your xtset command to

Code:

xtset id date2, delta(4 quarters)

Where in the documentation (xtdescribe) can I see that the xt commands are robust to this problem?

See example 2 of the xtset documentation - in the output of help xtset click on "Remarks and examples" in the section Links to PDF documentation.

I checked my data for a missing quarter and it's complete

You do not tell us which version of your data you checked or how you did your check.

In post #9 your data that reported gaps was quarterly data for all 4 quarters in each of 2 years. If that data is what you checked, then I am confident that in that dataset, for at least one of your branches, there is a quarter missing between the first and last quarter of that branch's data.

In post #9 your data that reported gaps was data for just the 4th quarter in each of 14 years. If that data is what you checked, then the gaps are perhaps just due to having deleted 3 out of every 4 quarters, and if so, no gaps will be reported when you use the appropriate xtset command. IF it does then report gaps, I am confident they exist. With this dataset,

Code:

bysort id (date2): assert (date2[_N]-date2[1]) == (_N-1)*4

would check for the existence of gaps. But again, you need not necessarily be concerned about gaps - it depends on the analysis you are planning on doing.
Comment
Katharina Maier

Join Date: May 2019

Posts: 29
#14

04 Jun 2019, 08:03

Dear William,

thank you very much for your support! I had a meeting with my Professor and there are gaps in the panel data, due to bank branches entering and dropping out of the data set. And I can continue with my unbalanced data set
Comment

Announcement

Weakly balanced Panel Data Set

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment