Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weakly balanced Panel Data Set

    Hi Statalisters,

    I am a novice user in Stata and it's my first post. I'm working with Stata.14 and Windows 7.

    I'm working on a Panel Data Set for all commerical banks (ID Variable) in the U.S. for the period Q4-1995 - Q4-2018 (time variable). So I have data on a bank-year level (I only use the Q4 data of each year). My goals are

    1) to calculate four bank risk proxies
    2) to show the correlations between all four risk proxies
    3) and do finally two regeressions with the risk proxies (a binary probability model).

    I have converted the string variables name and date to numeric variables name2 date2. I have replaced missing variables with 0 and have checked for other missing variables.
    There are duplicates within my bank names, (because banks with the same name have several bank branches ) which I fixed with generating the ID variable id.
    I tryed to use the dataex command to provide you a data sample, but I'm not sure if I used the command correctly, so here is a data example:
    id name2 date2 asset lnatres ore
    1 1st American State Bank of Minnesota Q4 1995 16050 16050 16050
    2 1st Bank
    Q4 1995
    15908 16050 16050
    3 1st Bank & Trust Q4 1995 12888 16050 16050
    4 1st Bank of Troy Q4 1995 16050 16050 16050
    5 1st Business Bank Q4 1995 16050 16050 16050
    6 1st Constitution Bank Q4 1995 16050 16050 16050
    7 1st Financial Bank South Dakota Q4 1995 16050 16050 16050
    8 1st Floyd Bank Q4 1995 16050 16050 16050
    9 1st National Bank Q4 1995
    16050 16050 16050

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id long name2 float date2 double asset long(lnatres ore)
     1  6 143   16050   103   37
     2  7 143   99230  1419    0
     3  8 143   54413   450    0
     4 11 143   14627   132    0
     5 13 143  922734  8995  692
     6 18 143  130484  1077    0
     7 23 143   77569   393    0
     8 28 143   40343   921    5
     9 30 143   26353   245    0
    10 34 143   62315   207    0
    11 37 143   38636   266  111
    12 42 143   20637   114    0
    13 43 143   20313   184    0
    14 44 143 1736391 27470 1452
    15 47 143   54356   257  270
    end
    format %tq!Qq-CCYY date2
    label values name2 name2
    label def name2 6 "1st American State Bank of Minnesota", modify
    label def name2 7 "1st Bank", modify
    label def name2 8 "1st Bank & Trust", modify
    label def name2 11 "1st Bank of Troy", modify
    label def name2 13 "1st Business Bank", modify
    label def name2 18 "1st Choice Bank", modify
    label def name2 23 "1st Constitution Bank", modify
    label def name2 28 "1st Financial Bank South Dakota", modify
    label def name2 30 "1st Floyd Bank", modify
    label def name2 34 "1st National Bank", modify
    label def name2 37 "1st National Community Bank", modify
    label def name2 42 "1st Security Bank of Laurel", modify
    label def name2 43 "1st Security Bank of West Yellowstone, Montana", modify
    label def name2 44 "1st Source Bank", modify
    label def name2 47 "1st State Bank and Trust Company of Palos Hills", modify
    Code:
    replace p3asset = 0 if (p3asset >= .)
    Code:
    mdesc
    Code:
    gen id = _n
    Code:
    sort id date2
    Code:
    xtset id date2
           panel variable:  id (weakly balanced)
            time variable:  date2, Q4-1995 to Q4-2018
                    delta:  1 quarter
    Code:
    xtdescribe
    
          id:  1, 2, ..., 172431                                 n =     172431
       date2:  Q4-1995, Q4-1996, ..., Q4-2018                    T =         24
               Delta(date2) = 1 quarter
               Span(date2)  = 93 periods
               (id*date2 uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             1       1       1         1         1       1       1
    
         Freq.  Percent    Cum. |  Pattern*
     ---------------------------+--------------------------
         9940      5.76    5.76 |  1.......................
         9527      5.53   11.29 |  .1......................
         9142      5.30   16.59 |  ..1.....................
         8773      5.09   21.68 |  ...1....................
         8579      4.98   26.65 |  ....1...................
         8314      4.82   31.48 |  .....1..................
         8079      4.69   36.16 |  ......1.................
         7887      4.57   40.74 |  .......1................
         7769      4.51   45.24 |  ........1...............
        94421     54.76  100.00 | (other patterns)
     ---------------------------+--------------------------
       172431    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXX
     ------------------------------------------------------
     *Each column represents 4 periods.
    A weakly balanced dataset arise, if each panel contains the same number of observations, but NOT the same time points. I do have this case, because I have a lot more observations in one time period.
    Can you please help me, how to have a balanced panel? Does a weakly balanced panel has an influence on my correlation and regression results?

    Thank you very much!

    Katharina



  • #2
    Welcome to Statalist.

    Thank you for the example with dataex - it is correct, and it makes it clear that you correctly converted your dates from string to Stata Internal Format dates. That's often a stumbling block, so it's good that you avoided that.

    Your problems stem from the incorrect creation of your id variable.
    Code:
    gen id = _n
    What you have done is given a separate id to each observation - in effect, to each combination of bank, branch, and date. What you perhaps wanted, if indeed your unit of analysis is to be bank branches, is an id that is distinct for each combination of bank and branch. That might be something like
    Code:
    egen id = group(name branch), missing
    where I assume your data includes some identifier of branches that is distinct within a given bank, and I allow for the branch identifier to be missing, which perhaps is the case for single-branch banks.

    Or perhaps your unit of analysis is to be banks rather than bank branche, so you will want to use the collapse command to reduce your data into a single observation per bank, combining the values from the individual observations. That might be something like
    Code:
    collapse (sum) asset inatres ore, by(name2 date2)
    where I'm just adding up the values from the separate branches, because I have no clue what asset and inatres and ore mean.

    Or perhaps while you have observations for multiple branches for the same bank, it is in fact the case that the variables asset and inatres and ore are measured at the bank level rather than the branch level and are the same for every branch of any bank, in which case you want to drop all but one branch for each bank. That would probably best be done with collapse again using something like
    Code:
    collapse (firstnm) asset inatres ore, by(name2 date2)
    The failure of your id variable was apparent from the output of the xtdescribe command, which suggests you need to review the documentation for xtdescribe in the Stata Longitudinal Data/Panel Data Reference Manual PDF included with your Stata installation and accessible from Stata's Help menu. In particular xtdescribe tells us that you have 172,431 distinct values for id and each id appears in exactly one observation.
    Code:
          id:  1, 2, ..., 172431                                 n =     172431
    ...
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             1       1       1         1         1       1

    Comment


    • #3
      Dear William,

      thank you for your quick and detailed response and I am sorry for my unclear presentation of the ID Variable. The variables like asset (total assets) lnatres(banks' loan loss reserves) ore (other real estate owned loans) are quarterly provided balance sheet and profit and loss data provided by the Federal Deposit Insurance Corporation (FDIC). In the data set used to be the column for the information in which city the bank is located. For example there is one "Commercial Bank A" in London and one "Commercial Bank A" in Oxford. So the banks just have the same names, but different balance sheet values. Unfortunatly I already have deleted the column City, before I have known that I would need it I have also tryed to rename the banks just in numbers with

      Code:
      egen name_new = fill(1 2)
      What I think should be the same result as

      Code:
      gen id = _n
      Code:
      duplicates list name2 date2
      Thank you very much for your advice within xtdescribe . I have to study it first. But it is not good if the number of id is the same like n (172431)?

      Comment


      • #4
        So each branch has a separate balance sheet, and your unit of analysis is going to be at the branch level rather than the bank level. Thus your id should be something like
        Code:
        egen id = group(name city)
        it is not good if the number of id is the same like n (172431)?
        It is not good.

        Unfortunatly I already have deleted the column City, before I have known that I would need it
        Then you must go back to the data that includes it and work your way forward again because without city there is no reliable way for you construct a identifier for each branch of each bank, and with a distinct identifier for each branch of each bank, you cannot create the panel information with xtset.

        I thought there might be a way around the problem if your data had been organized by bank, branch, and year at the time you constructed the id variable, but your sample data in post #1 makes it clear that it was not.

        Comment


        • #5
          Thank you very much for your input. I still have the column bankingclass "bkclass2" in my data set. I tryed to built an ID id2 with name2 bkclass2 date2 with the command

          Code:
          egen id2 = group(name2 bkclass2 date2)
          But there are still duplicates, because of the missing City as a distinctive feature. The first duplicate variable starts at id2 number 714 ( I am sorry, I didn't know how to show you just the duplicated ones).

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float id2 long(name2 bkclass2) float date2 double asset
            1 1 2 191 512346
            2 2 3 219 191843
            3 2 3 220 194321
            4 2 3 221 192823
            5 2 3 222 318444
            6 2 3 223 316293
            7 2 3 224 314601
            8 2 3 225 305914
            9 2 3 226 314711
           10 2 3 227 324550
           11 2 3 228 325441
           12 2 3 229 323306
           13 2 3 230 327837
           14 2 3 231 354590
           15 2 3 232 349891
           16 2 3 233 343830
           17 2 3 234 354182
           18 2 3 235 377395
           19 3 1 219  66310
           20 3 1 220  66027
           21 3 1 221  72358
           22 3 1 222  76377
           23 3 1 223  62999
           24 4 2 205 754473
           25 4 2 206 768136
           26 4 2 207 785634
           27 4 2 208 768016
           28 4 3 185 294667
           29 4 3 186 293519
           30 4 3 187 324868
           31 4 3 188 313619
           32 4 3 189 470509
           33 4 3 190 438547
           34 4 3 191 456267
           35 4 3 192 451490
           36 4 3 193 431129
           37 4 3 194 481242
           38 4 3 195 467198
           39 4 3 196 582559
           40 4 3 197 592266
           41 4 3 198 598751
           42 4 3 199 651257
           43 4 3 200 632297
           44 4 3 201 643899
           45 4 3 202 653837
           46 4 3 203 716070
           47 4 3 204 745201
           48 5 2 181  21595
           49 5 2 182  34110
           50 5 2 183  39840
           51 5 2 184  42138
           52 5 2 185  47078
           53 5 2 186  60600
           54 5 2 187  68406
           55 5 2 188  65534
           56 5 2 189  68749
           57 5 2 190  78635
           58 5 2 191  90143
           59 5 2 192  91747
          end
          format %tq!Qq-CCYY date2
          label values name2 name2
          label def name2 1 "09Republic Bank, Inc.", modify
          label def name2 2 "1880 Bank", modify
          label def name2 3 "1919 Investment Counsel & Trust Company,  National Association", modify
          label def name2 4 "1ST BANK", modify
          label def name2 5 "1st Advantage Bank", modify
          label values bkclass2 bkclass2
          label def bkclass2 1 "N", modify
          label def bkclass2 2 "NM", modify
          label def bkclass2 3 "SM", modify
          Is there any command to numbering the duplicate variables (bank names), to make a difference between them just in the name, like "Access Bank" and "Access Bank 1".

          An command like "If the name is duplicated, then name it name_1"

          Otherwise I am learning my lesson with rushed clearing of dates and I have to do my dataset again with the columns city and/or zip - but my time to the deadline of my thesis is running.

          Thank you

          Comment


          • #6
            I numbered the bank names with

            Code:
            egen name3 = fill(1 2)
            Then I generatet a new variable name4 with the bank name2 and name3 to make a difference.
            Code:
            egen name4 = group(name2 name3)
            Generated a new ID variable id3 with the new name4, bankingclass and the date
            Code:
            egen id3 = group(name4 bkclass2 date2)
            Code:
            xtset id3 date2
                   panel variable:  id3 (weakly balanced)
                    time variable:  date2, Q1-1995 to Q4-2018
                            delta:  1 quarter
            
            . xtset id3
                   panel variable:  id3 (balanced)
            
            . xtset date2
                   panel variable:  date2 (unbalanced)
            
            . xtdescribe
            must specify timevar; use xtset
            r(459);
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long name2 float(name3 name4 id3) long bkclass2 float(date2 date) double asset
            1 428586   1   1 2 191 17531 512346
            2 610282   2   2 3 219 20088 191843
            2 615923   3   3 3 220 20178 194321
            2 621491   4   4 3 221 20269 192823
            2 627000   5   5 3 222 20361 318444
            2 632445   6   6 3 223 20453 316293
            2 637818   7   7 3 224 20544 314601
            2 643142   8   8 3 225 20635 305914
            2 648413   9   9 3 226 20727 314711
            2 653617  10  10 3 227 20819 324550
            2 658764  11  11 3 228 20909 325441
            2 663862  12  12 3 229 21000 323306
            2 668908  13  13 3 230 21092 327837
            2 673914  14  14 3 231 21184 354590
            2 678868  15  15 3 232 21274 349891
            2 683785  16  16 3 233 21365 343830
            2 688655  17  17 3 234 21457 354182
            2 693465  18  18 3 235 21549 377395
            3 610283  19  19 1 219 20088  66310
            3 615924  20  20 1 220 20178  66027
            3 621492  21  21 1 221 20269  72358
            3 627001  22  22 1 222 20361  76377
            3 632446  23  23 1 223 20453  62999
            4 384227  24  24 3 185 16982 294667
            4 391705  25  25 3 186 17074 293519
            4 399154  26  26 3 187 17166 324868
            end
            format %tq!Qq-CCYY date2
            format %tdNN/DD/CCYY date
            label values name2 name2
            label def name2 1 "09Republic Bank, Inc.", modify
            label def name2 2 "1880 Bank", modify
            label def name2 3 "1919 Investment Counsel & Trust Company,  National Association", modify
            label def name2 4 "1ST BANK", modify
            label values bkclass2 bkclass2
            label def bkclass2 1 "N", modify
            label def bkclass2 2 "NM", modify
            label def bkclass2 3 "SM", modify

            The panel is still weakly balanced, and date2 is unbalanced not id3. How can I fix this? Thank you very much for your patience.

            Comment


            • #7
              The number of id3 and n are still the same ..
              Code:
              xtdescribe
              
                   id3:  1, 2, ..., 698215                                 n =     698215
                 date2:  Q1-1995, Q2-1995, ..., Q4-2018                    T =         96
                         Delta(date2) = 1 quarter
                         Span(date2)  = 96 periods
                         (id3*date2 uniquely identifies each observation)
              
              Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                       1       1       1         1         1       1       1
              
                   Freq.  Percent    Cum. |  Pattern
               ---------------------------+--------------------------------------------------------------------------------------------------
                  10242      1.47    1.47 |  1...............................................................................................
                  10169      1.46    2.92 |  .1..............................................................................................
                  10052      1.44    4.36 |  ..1.............................................................................................
                   9940      1.42    5.79 |  ...1............................................................................................
                   9838      1.41    7.20 |  ....1...........................................................................................
                   9690      1.39    8.58 |  .....1..........................................................................................
                   9586      1.37    9.96 |  ......1.........................................................................................
                   9527      1.36   11.32 |  .......1........................................................................................
                   9450      1.35   12.67 |  ........1.......................................................................................
                 609721     87.33  100.00 | (other patterns)
               ---------------------------+--------------------------------------------------------------------------------------------------
                 698215    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

              Comment


              • #8
                In post #2 I wrote

                What you perhaps wanted, if indeed your unit of analysis is to be bank branches, is an id that is distinct for each combination of bank and branch. That might be something like

                Code:
                egen id = group(name branch), missing
                You persistently include the date in your id. That is wrong. You need the id to be distinct for each combination of bank and branch. There will be duplicates of the id - one for each quarter of data. It is the combination of id and date2 that must be distinct.

                You wrote in post #5

                I still have the column bankingclass "bkclass2" in my data set
                What is bankingclass? For each bank, does each branch of that bank have a different bankingclass? And is bankingclass the same in every year for a given branch?

                If you run
                Code:
                isid name2 bankingclass2 date2
                and it reports duplicates, then bankingclass2 is of no use to you, and you must face reality and go back to data that includes something that distinguishes branches of the same bank.

                If it does not report any duplicates, then the following code might accomplish what you need.
                Code:
                // eliminate mistaken ids
                drop id id2 id3
                // create id to identify bank and branch
                egen id = group(name2 bankingclass2), missing 
                // define the panel structure
                xtset id date2
                // report on the results
                xtdescribe
                I am not certain that city and zip are adequate - what if a bank has two branches in the same city? What if it has two branches in the same zip? You need
                Code:
                isid name city date2
                or
                Code:
                isid name zip date2
                to not report any duplicates.

                Comment


                • #9
                  You persistently include the date in your id. That is wrong. You need the id to be distinct for each combination of bank and branch. There will be duplicates of the id - one for each quarter of data. It is the combination of id and date2 that must be distinct.
                  Thank you very much! Now I (hopefully) understand it.

                  I went back in my data organisation and here isp, a test data sample for the period 1999 - 2000 with quarterly data. With the variables cert (certification number), city, zip, asset, date2, bkclass2 (banking class) and the new id variable.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input long(cert name2) str16 city long(zip asset bkclass2) float(date2 id)
                  15448  1 "Hancock"      56244  19854 2 156  1
                  15448  1 "Hancock"      56244  19996 2 157  1
                  15448  1 "Hancock"      56244  19491 2 158  1
                  15448  1 "Hancock"      56244  20328 2 159  1
                  15448  1 "Hancock"      56244  19239 2 160  1
                  15448  1 "Hancock"      56244  19896 2 161  1
                  15448  1 "Hancock"      56244  19415 2 162  1
                  15448  1 "Hancock"      56244  20418 2 163  1
                  22039  2 "Sidney"       59270  42944 3 156  2
                  22039  2 "Sidney"       59270  42813 3 157  2
                  22039  2 "Sidney"       59270  42932 3 158  2
                  22039  2 "Sidney"       59270  43778 3 159  2
                  22039  2 "Sidney"       59270  42086 3 160  2
                  22039  2 "Sidney"       59270  39771 3 161  2
                  22039  2 "Sidney"       59270  40115 3 162  2
                  22039  2 "Sidney"       59270  43333 3 163  2
                  23309  2 "West Branch"  48661 154272 2 156  3
                  23309  2 "West Branch"  48661 158911 2 157  3
                  23309  2 "West Branch"  48661 166351 2 158  3
                  23309  2 "West Branch"  48661 177982 2 159  3
                  23309  2 "West Branch"  48661 182182 2 160  3
                  23309  2 "West Branch"  48661 187822 2 161  3
                  23309  2 "West Branch"  48661 187754 2 162  3
                  16419  3 "Broken Bow"   74728  68587 2 156  4
                  16419  3 "Broken Bow"   74728  68613 2 157  4
                  16419  3 "Broken Bow"   74728  68949 2 158  4
                  end
                  format %tq!Qq-CCYY date2
                  label values name2 name2
                  label def name2 1 "1st American State Bank of Minnesota", modify
                  label def name2 2 "1st Bank", modify
                  label def name2 3 "1st Bank & Trust", modify
                  label values bkclass2 bkclass2
                  label def bkclass2 2 "NM", modify
                  label def bkclass2 3 "SM", modify
                  I was doing the following codes on your advice:

                  Code:
                  isid name2 bkclass2 date2variables name2 bkclass2 date2
                  do not uniquely identify the observations r(459);
                  So I checked for the variable cert:

                  Code:
                   isid name2 cert date2
                  
                  egen id = group(name2 cert), missing
                  
                  xtset id date2
                         panel variable:  id (unbalanced)
                          time variable:  date2, Q1-1999 to Q4-2000, but with gaps
                                  delta:  1 quarter
                  Why does time variable report "..., but with gaps" ?

                  Code:
                  xtdescribe
                  
                        id:  1, 2, ..., 9651                                   n =       9651
                     date2:  Q1-1999, Q2-1999, ..., Q4-2000                    T =          8
                             Delta(date2) = 1 quarter
                             Span(date2)  = 8 periods
                             (id*date2 uniquely identifies each observation)
                  
                  Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                           1       2       8         8         8       8       8
                  
                       Freq.  Percent    Cum. |  Pattern
                   ---------------------------+----------
                       7441     77.10   77.10 |  11111111
                        194      2.01   79.11 |  1111....
                        184      1.91   81.02 |  111111..
                        181      1.88   82.89 |  11111...
                        180      1.87   84.76 |  1.......
                        176      1.82   86.58 |  11......
                        172      1.78   88.36 |  1111111.
                        164      1.70   90.06 |  111.....
                        141      1.46   91.52 |  .....111
                        818      8.48  100.00 | (other patterns)
                   ---------------------------+----------
                       9651    100.00         |  XXXXXXXX
                  Code:
                  xtset id
                         panel variable:  id (unbalanced)
                  
                  . xtset date2
                         panel variable:  date2 (unbalanced)
                  And now the panel is unbalanced.

                  Comment


                  • #10
                    Why does time variable report "..., but with gaps" ?
                    Presumably for at least one of your branches, there is a quarter missing between the first and last quarter of that branch's data. That is also why the panel is unbalanced rather than weakly balanced. Do not worry about balance at this point - the documentation tells us that the xt commands are robust to this problem, and Stata simply excludes from the analysis any observation for which any of the variables are missing values.

                    Here's an example using yearly dates. Noe that 2004 is missing for id 2.
                    Code:
                    . * Example generated by -dataex-. To install: ssc install dataex
                    . clear
                    
                    . input float(id year)
                    
                                id       year
                      1. 1 2001
                      2. 1 2002
                      3. 1 2003
                      4. 1 2004
                      5. 1 2005
                      6. 2 2001
                      7. 2 2002
                      8. 2 2003
                      9. 2 2005
                     10. end
                    
                    . xtset id year
                           panel variable:  id (unbalanced)
                            time variable:  year, 2001 to 2005, but with a gap
                                    delta:  1 unit
                    
                    . xtdescribe
                    
                          id:  1, 2, ..., 2                                      n =          2
                        year:  2001, 2002, ..., 2005                             T =          5
                               Delta(year) = 1 unit
                               Span(year)  = 5 periods
                               (id*year uniquely identifies each observation)
                    
                    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                             4       4       4         4         5       5       5
                    
                         Freq.  Percent    Cum. |  Pattern
                     ---------------------------+---------
                            1     50.00   50.00 |  111.1
                            1     50.00  100.00 |  11111
                     ---------------------------+---------
                            2    100.00         |  XXXXX

                    Comment


                    • #11
                      Presumably for at least one of your branches, there is a quarter missing between the first and last quarter of that branch's data.
                      I checked my data for a missing quarter and it's complete. I might found reseon of the gaps in another Statalist Post. So I think the gaps are not due to a forgotten quarter, rather my data set is unbalanced due to bank branchesmentering and dropping out of the data set I am analyzing. Here ist the link to the post https://www.statalist.org/forums/forum/general-stata-discussion/general/1452302-dealing-with-unbalanced-panel-data

                      If I just run the command tsset with date2 the follwoing error message shows up:
                      Code:
                      tsset date2
                      repeated time values in sample
                      r(451);
                      If I tsset the id and date2 variable it works.
                      Code:
                      tsset id date2
                             panel variable:  id (unbalanced)
                              time variable:  date2, Q4-1999 to Q4-2000, but with gaps
                                      delta:  1 quarter
                      Do not worry about balance at this point - the documentation tells us that the xt commands are robust to this problem, and Stata simply excludes from the analysis any observation for which any of the variables are missing values.
                      Thank you very much for this advice. Where in the documentation (xtdescribe) can I see that the xt commands are robust to this problem?

                      Thank you again for guiding me through. So I'm working now on my data set with the full period of 1995-2018 to generate the correct ID Variable.
                      Hi, I am intending to analyze company payout policy determinants and have a set of panel data. My data set is unbalanced due to companies entering and

                      Comment


                      • #12
                        This is hopefully my final dataset for the period Q4 1995 - Q4 2018. I generated the ID with the variable cert (certification number).
                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input long(cert name2) float date2 double asset float id
                        35260 1 191 512346  1
                         4829 2 219 191843  2
                         4829 2 223 316293  2
                         4829 2 227 324550  2
                         4829 2 231 354590  2
                         4829 2 235 377395  2
                        33785 3 219  66310  3
                        33785 3 223  62999  3
                        16522 4 187 324868  4
                        16522 4 191 456267  4
                        16522 4 195 467198  4
                        16522 4 199 651257  4
                        16522 4 203 716070  4
                        16522 4 207 785634  4
                        57899 5 183  39840  5
                        57899 5 187  68406  5
                        57899 5 191  90143  5
                        57899 5 195 126262  5
                        57899 5 199 111591  5
                        57899 5 203  92716  5
                        57899 5 207  78804  5
                        57899 5 211  74764  5
                        57899 5 215  76660  5
                        57899 5 219  76151  5
                        57899 5 223  84397  5
                        57899 5 227  95198  5
                        end
                        format %tq!Qq-CCYY date2
                        label values name2 name2
                        label def name2 1 "09Republic Bank, Inc.", modify
                        label def name2 2 "1880 Bank", modify
                        label def name2 3 "1919 Investment Counsel & Trust Company,  National Association", modify
                        label def name2 4 "1ST BANK", modify
                        label def name2 5 "1st Advantage Bank", modify

                        How I generated the id variable and declared the panel variable
                        Code:
                        isid name2 cert date2
                        
                        . egen id = group(name2 cert), missing
                        
                        . sort id date2
                        
                        . xtset id date2
                               panel variable:  id (unbalanced)
                                time variable:  date2, Q4-1995 to Q4-2018, but with gaps
                                        delta:  1 quarter
                        The documentation:
                        Code:
                        xtdescribe
                        
                              id:  1, 2, ..., 16208                                  n =      16208
                           date2:  Q4-1995, Q4-1996, ..., Q4-2018                    T =         24
                                   Delta(date2) = 1 quarter
                                   Span(date2)  = 93 periods
                                   (id*date2 uniquely identifies each observation)
                        
                        Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                                 1       1       3         8        18      24      24
                        
                             Freq.  Percent    Cum. |  Pattern*
                         ---------------------------+--------------------------
                             2501     15.43   15.43 |  111111111111111111111111
                              879      5.42   20.85 |  1.......................
                              869      5.36   26.22 |  11......................
                              748      4.62   30.83 |  111.....................
                              585      3.61   34.44 |  1111....................
                              559      3.45   37.89 |  11111...................
                              392      2.42   40.31 |  111111..................
                              302      1.86   42.17 |  1111111.................
                              292      1.80   43.97 |  1111111111..............
                             9081     56.03  100.00 | (other patterns)
                         ---------------------------+--------------------------
                            16208    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXX
                         ------------------------------------------------------
                         *Each column represents 4 periods.
                        Is my id variable now fine and can I start based on this panel data set my regressions? Thank you so much!

                        Comment


                        • #13
                          Because your time variable is a quarterly date, and because you have dropped quarters 1, 2, and 3, you now have gaps in your data because the observation following Q4-1995 is not Q1-1996 but rather Q4-1996. You should change your xtset command to
                          Code:
                          xtset id date2, delta(4 quarters)
                          Where in the documentation (xtdescribe) can I see that the xt commands are robust to this problem?
                          See example 2 of the xtset documentation - in the output of help xtset click on "Remarks and examples" in the section Links to PDF documentation.

                          I checked my data for a missing quarter and it's complete
                          You do not tell us which version of your data you checked or how you did your check.

                          In post #9 your data that reported gaps was quarterly data for all 4 quarters in each of 2 years. If that data is what you checked, then I am confident that in that dataset, for at least one of your branches, there is a quarter missing between the first and last quarter of that branch's data.

                          In post #9 your data that reported gaps was data for just the 4th quarter in each of 14 years. If that data is what you checked, then the gaps are perhaps just due to having deleted 3 out of every 4 quarters, and if so, no gaps will be reported when you use the appropriate xtset command. IF it does then report gaps, I am confident they exist. With this dataset,
                          Code:
                          bysort id (date2): assert (date2[_N]-date2[1]) == (_N-1)*4
                          would check for the existence of gaps. But again, you need not necessarily be concerned about gaps - it depends on the analysis you are planning on doing.

                          Comment


                          • #14
                            Dear William,

                            thank you very much for your support! I had a meeting with my Professor and there are gaps in the panel data, due to bank branches entering and dropping out of the data set. And I can continue with my unbalanced data set

                            Comment

                            Working...
                            X