Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data analysis

    Hello everyone, i am working on a mothly panel data of 36 banks from 2011-2019. My data look sth like this
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte bankid float months long(month total_assets) float(NIRP assets_lagged Loan_ratio RD_ratio EL_ratio)
    6 21762 201908 18967 1 18751 .7217748 .8713028  .002530711
    6 21793 201909 19006 1 18967 .7266304 .8660949  .002420288
    6 21823 201910 19241 1 19006 .7492897 .8650798 .0022348112
    6 21854 201911 19404 1 19241 .7601995 .8693053 .0023191094
    6 21884 201912 19254 1 19404 .7626263 .8672484  .003272047
    7 18628 201101 12242 0 19254 .4247429 .9005882           0
    7 18659 201102 12145 0 12242 .6738278 .9258131           0
    7 18687 201103 12206 0 12145 .6738576 .9454367           0
    7 18718 201104 12323 0 12206   .67262 .9424653           0
    7 18748 201105 12341 0 12323 .6674511 .9431164           0
    7 18779 201106 12409 0 12341 .6614537 .9466516           0
    end
    format %d months

    1. I want to draw a line graph for some of my variables over time (e.g Loan_ratio) shown the average value of Loan_ratio for all the banks that have RD_ratio (retail deposits to assets) above the median and another line graph for banks with RD_ratio below the median. I assume i have to use the preserve and collapse commands but i dont know how to make this only for those specific bank subgroups.

    2. I want to run a regression only on the subgroup of banks with RD_ratio > median and maybe on a subperiod from 2011-2017. Is the correct way to do that this:
    Code:
    egen median = median(RD_ratio)
    xtreg Loan_ratio Loan_ratio_lagged c.EL_ratio_lagged#NIRP i.month if RD_ratio > median & month > (date), fe robust
    ? Where Loan_ratio_lagged is the lagged value of the variable and NIRP is a dummy with 0 for the period before 2015 and 1 afterwards. Thank you in advance.

  • #2
    or all the banks that have RD_ratio (retail deposits to assets) above the median and another line graph for banks with RD_ratio below the median
    What does this mean? A bank could have above median RD_ratio one year and below median RD_ratio another year. And that is true whether you use a single median for all banks in all years, or whether you do a separate median for each year. Please clarify both of these aspects of your question.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      What does this mean? A bank could have above median RD_ratio one year and below median RD_ratio another year. And that is true whether you use a single median for all banks in all years, or whether you do a separate median for each year. Please clarify both of these aspects of your question.
      I would prefer the criteria to be if the bank has RD_ratio > median in that specific month to belong to the group of high RD_ratio. And about the second point you pointed out i would prefer the median to be the median of all banks of that specific month.

      However, could it be possible to show me also this if i want the criteria to be if the bank has RD_ratio > median, where the median is calculated for all banks in all months?

      Comment


      • #4
        Code:
        by months, sort: egen median_rd_ratio_this_month = median(RD_ratio)
        gen byte high_RD_ratio = RD_ratio > median_rd_ratio_this_month if !missing(RD_ratio)
        will give you the median RD ratio for all banks separately in each month and an indicator variable distinguishing those above and below the median.

        Code:
        preserve
        collapse (mean) Loan_ratio, by(high_RD_ratio months)
        xtset high_RD_ratio months
        xtline Loan_ratio
        restore
        will give you graphs over the Loan_ratio over time in the two groups. If you want the two curves on the same panel, rather than one panel for each group, then add the -overaly- option to the -xtline- command.

        show me also this if i want the criteria to be if the bank has RD_ratio > median, where the median is calculated for all banks in all months?
        Code:
        //  OTHER CRITERION: MEDIAN FOR ALL BANKS IN ALL MONTHS
        summ RD_ratio, detail
        gen byte high_RD_ratio_all_months = RD_ratio > `r(p50)' if !missing(RD_ratio)

        Comment


        • #5
          Dear Clyde Schechter thank you very much for your help! Regarding the second part of my question, in order to run the regression do i have to write the code like this:
          For the first case and for a subperiod from 2011-2017:
          Code:
           
           xtreg Loan_ratio Loan_ratio_lagged c.EL_ratio_lagged#NIRP i.month if RD_ratio > high_RD_ratio & months > (e.g 21762), fe robust
          and for the second case:

          Code:
           
           xtreg Loan_ratio Loan_ratio_lagged c.EL_ratio_lagged#NIRP i.month if RD_ratio > high_RD_ratio_all_months & months > (e.g 21762), fe robust

          Comment


          • #6
            No, that's not quite right. high_RD_ratio (and the all months versions) are not the median values: they are 0/1 indicator variables for whether the bank is above the median value or not.

            Code:
            xtreg Loan_ratio Loan_ratio_lagged c.EL_ratio_lagged#NIRP i.months if high_RD_ratio == 1  & months > td(01aug2019), fe robust
            This is the correct use of the variable high_RD_ratio. Also, while it is legal to use -months > 21762-, it is poor programming style to have numbers with no obvious meaning flying around in code. When you come to look at this code in the future, you won't know where that number came from and whether you had it correct or not. Since that number is intended to be the numeric date value for 1 Aug 2019, use the -td()- notation instead.

            There are some other potential improvements to this code.
            1. If, as the name implies, EL_ratio_lagged is the lagged value of EL_ratio, then eliminate that variable and just use Stata's lag operator with the original.
            2. The variable month in your data is both useless and dangerous. It is useless because it cannot be used for any calculations--it is for human eyes only. And it is dangerous because it is too easy to make an error refer to the variable month where the usable variable months is intended. So drop the variable -month- before you make that mistake.
            3. You are using a daily date variable to encode a date that is actually just a pointer to a month. So better to convert it to a monthly date variable:
            Code:
            gen mdate = mofd(months)
            format mdate %tm
            drop month months
            xtset bankid mdate
            xtreg Loan_ratio L1.EL_ratio  c.L1.EL_ratio#NIRP i.mdate if high_RD_ratio == 1  & mdate> tm(2019m8), fe robust
            Finally, if the variable NIRP is a time-invariant property of bankid, then this code is fine. But if NIRP can vary over time within a bankid, then it is an ill-formed model and needs a separate term for i.NIRP by itself. Actually, without having to worry about that, you can do this in either case:
            Code:
            xtreg Loan_ratio  c.L1.EL_ratio##NIRP i.mdate if high_RD_ratio == 1  & mdate> tm(2019m8), fe robust
            This version will automatically generate a separate term for NIRP if it does vary within bankids, and will omit it if it is time-invariant--so you can't go wrong. (It also automatically generates the separate term for L1.EL_ratio.)



            Comment


            • #7
              Thank you for the things you have pointed out. However i have some questions about two of the comments. Regarding the lagged variables i have used
              Code:
              gen Loan_ratio_lagged = Loan_ratio[_n-1]
              gen RD_ratio_lagged = RD_ratio[_n-1]
              gen EL_ratio_lagged = EL_ratio[_n-1]
              in order to generate them. Is this not the best way to do it? Is the lagged operator this one
              Code:
              generate lag_EL_ratio = L.EL_ratio
              Should i use this code for all lagged variables that i want to create? And also, when having missing values is the lagged operator preferable to the other method that i have used?

              Also, regarding the NIRP variable, it is a dummy for showing the introduction of negative rates after 02/2015, so it is 0 before 02/2015 and 1 afterwards for all bank. So in this first baseline model i want to estimate how banks that have excess liquidity have managed their lending towards consumers after the implementation of negative rates. The model that i am trying to follow is this:
              Click image for larger version

Name:	reg_model.PNG
Views:	1
Size:	13.7 KB
ID:	1607267

              Comment


              • #8
                Also, sorry for not stating this earlier when you answered all my previous questions, but if i want to make this part

                Code:
                // OTHER CRITERION: MEDIAN FOR ALL BANKS IN ALL MONTHS 
                 summ RD_ratio, detail  gen byte high_RD_ratio_all_months = RD_ratio > `r(p50)' if !missing(RD_ratio)
                for a 3 groups, one <p33, one between p33-p66 and one for >p66, what would the code be like?

                Comment


                • #9
                  Code:
                  gen Loan_ratio_lagged = Loan_ratio[_n-1]
                  gen RD_ratio_lagged = RD_ratio[_n-1]
                  gen EL_ratio_lagged = EL_ratio[_n-1]
                  in order to generate them. Is this not the best way to do it? Is the lagged operator this one
                  It's a terrible way to do it, for two reasons. First, if your data contain gaps, X[_n-1] will refer to the previous value that happens to appear in the data set, even though it may be many periods earlier. Second, even if your data is perfectly balanced with no gaps, in every case this will calculate a "lagged variable" for the first observation of each bankid that is, in reality, the final observation of the preceding bank!

                  Code:
                  generate lag_EL_ratio = L.EL_ratio

                  Should i use this code for all lagged variables that i want to create? And also, when having missing values is the lagged operator preferable to the other method that i have used?
                  Yes. Whenever you want a lagged variable, use Stat's lag operator: it always gives you correct results. I should also point out that for most purposes there is no need to create a separate variable for the lag anyway. For example, when you are using lagged variables in a regression, just use L.EL_ratio directly in the regression command. You don't need a separate variable that contains the lagged values for this purpose. You might need a separate variable for some other purposes, but I would always try using the lag operator first, and only resort to creating a separate variable if the command rejects the use of time-series operators when you try it.

                  Also, regarding the NIRP variable, it is a dummy for showing the introduction of negative rates after 02/2015, so it is 0 before 02/2015 and 1 afterwards for all bank. So in this first baseline model i want to estimate how banks that have excess liquidity have managed their lending towards consumers after the implementation of negative rates. The model that i am trying to follow is this:...
                  Since this variable does change over time within banks (specifically, it changes after 2/2015) it needs to be included in the model in its own right, not just as a constituent of an interaction. So you need to use the ## notation I showed in order to have a properly specified model. As for the model equation you show, I don't follow it at all. On the one hand, NIR is appearing as an exponent, so this looks like you need a non-linear model. Moreover parameterizing the model with separate terms involving 1-DNIR and DNIR does not make any sense to me. It also involves a number of variables that, from their names, I do not recognize in your code and data. So I'm not sure what to tell you about this.

                  ...if i want to make this part

                  Code:
                  // OTHER CRITERION: MEDIAN FOR ALL BANKS IN ALL MONTHS
                  summ RD_ratio, detail gen byte high_RD_ratio_all_months = RD_ratio > `r(p50)' if !missing(RD_ratio)

                  for a 3 groups, one <p33, one between p33-p66 and one for >p66, what would the code be like?
                  Well, -summ- does not compute 33rd and 66th percentiles. But the -centile- command does. So:
                  Code:
                  centile RD_ratio, centiles(33 66)
                  local p33 `r(c_1)'
                  local p66 `r(c_2)'
                  gen byte RD_ratio_group_all_months = 0 if RD_ratio < `p33'
                  replace RD_ratio_group_all_months = 1 if inrange(RD_ratio, `p33', `p66')
                  replace RD_ratio_group_all_months = 2 if RD_ratio > `p66' & !missing(RD_ratio)

                  Comment


                  • #10
                    Dear Clyde Schechter thank you very much for your time and help and for the extra things you have pointed out to me! They all have been really helpful.

                    As for the model, DNIR is just an indicator for the dummy. From the other variables that they use i don't have data for two of them. However, 1-DNIR and DNIR also troubled me, as i believe one of the terms will be removed automatically due to multicolinearity.

                    Comment


                    • #11
                      Dear Clyde Schechter i have some more questions that came up. I have gotten these results from one of my regressions:

                      Code:
                      . xtreg total_lending c.L1.EL##NIRP##c.L1.Households_dep c.L1.EL#NIRP#c.L1.Households_dep L1.Deposits_Riksbank L1.Debt_sec_ra
                      > tio i.mdate if high_HH_dep_all_months == 1 & mdate< tm(2018m12), fe vce(robust) /* full regression for subgroup 1, and peri
                      > od 2011-2018 */
                      note: 706.mdate omitted because of collinearity
                      
                      Fixed-effects (within) regression               Number of obs     =      1,340
                      Group variable: bankid                          Number of groups  =         20
                      
                      R-sq:                                           Obs per group:
                           within  = 0.3352                                         min =          1
                           between = 0.2820                                         avg =       67.0
                           overall = 0.2604                                         max =         94
                      
                                                                      F(18,19)          =          .
                      corr(u_i, Xb)  = 0.2317                         Prob > F          =          .
                      
                                                                      (Std. Err. adjusted for 20 clusters in bankid)
                      ----------------------------------------------------------------------------------------------
                                                   |               Robust
                                     total_lending |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -----------------------------+----------------------------------------------------------------
                                                EL |
                                               L1. |  -2.350226   1.352385    -1.74   0.098    -5.180802    .4803489
                                                   |
                                            1.NIRP |  -3083.538   29063.59    -0.11   0.917    -63914.32    57747.25
                                                   |
                                        NIRP#cL.EL |
                                                1  |   2.022824   1.549003     1.31   0.207    -1.219276    5.264924
                                                   |
                                    Households_dep |
                                               L1. |   .3024698   .8856131     0.34   0.736     -1.55114    2.156079
                                                   |
                           cL.EL#cL.Households_dep |   .0000178   8.43e-06     2.11   0.048     1.63e-07    .0000355
                                                   |
                            NIRP#cL.Households_dep |
                                                1  |   .3856121   .2049208     1.88   0.075    -.0432922    .8145164
                                                   |
                      NIRP#cL.EL#cL.Households_dep |
                                                1  |  -.0000182   .0000104    -1.75   0.096    -.0000398    3.53e-06
                                                   |
                                 Deposits_Riksbank |
                                               L1. |  -8.233764    8.90753    -0.92   0.367    -26.87744    10.40991
                                                   |
                                    Debt_sec_ratio |
                                               L1. |    74953.3   71783.13     1.04   0.310    -75290.52    225197.1
                      Firstly, if i use the lagged variable of the dependent variable as a regresson, all my R^2 will reach 0.9+ values, should i still use it?
                      Also, my F-test has no values. I read that if all my standard errors appear normally then there could be no problem to worry about, but i wanted to double check.
                      Also, in some other regression that i run, whenever i have an interraction term with my dummy NIRP, in the results appear the coefficients for both the values 0 and 1 of NIRP, however in these results i dont get two results for interraction terms with NIRP. Here is an example:

                      Code:
                      . xtreg Loan_ratio Loan_ratio_lagged c.EL_ratio_lagged#NIRP c.EL_ratio_lagged#NIRP#c.RD_ratio_lagged NIRP#c.RD_ratio_lagged i
                      > .bankid i.mdate, fe robust
                      
                      Fixed-effects (within) regression               Number of obs     =      2,770
                      Group variable: bankid                          Number of groups  =         33
                      
                      R-sq:                                           Obs per group:
                           within  = 0.5201                                         min =         11
                           between = 0.9951                                         avg =       83.9
                           overall = 0.9557                                         max =        106
                      
                                                                      F(32,32)          =          .
                      corr(u_i, Xb)  = 0.9421                         Prob > F          =          .
                      
                                                                                  (Std. Err. adjusted for 33 clusters in bankid)
                      ----------------------------------------------------------------------------------------------------------
                                                               |               Robust
                                                    Loan_ratio |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -----------------------------------------+----------------------------------------------------------------
                                             Loan_ratio_lagged |    .704402   .0888594     7.93   0.000     .5234013    .8854027
                                                               |
                                        NIRP#c.EL_ratio_lagged |
                                                            0  |  -.1582787   .2575225    -0.61   0.543    -.6828349    .3662774
                                                            1  |   -.038262   .2221044    -0.17   0.864    -.4906739    .4141499
                                                               |
                      NIRP#c.EL_ratio_lagged#c.RD_ratio_lagged |
                                                            0  |   .3007768   .4791904     0.63   0.535    -.6753021    1.276856
                                                            1  |   .0668993   .5672858     0.12   0.907    -1.088624    1.222423
                                                               |
                                        NIRP#c.RD_ratio_lagged |
                                                            0  |  -.0273265   .0360759    -0.76   0.454    -.1008108    .0461577
                                                            1  |  -.0449217    .034182    -1.31   0.198    -.1145482    .0247049
                      Finally, in the first results that i posted i get some weird results for cofficients and standard erros like this: 8.43e-06. Can this be interpreted or have i done something completely wrong that i need to change?

                      Comment


                      • #12
                        Firstly, if i use the lagged variable of the dependent variable as a regresson, all my R^2 will reach 0.9+ values, should i still use it?
                        So, the dependent variable is strongly predicted by its lagged variable--that's not unusual or surprising. However, it raises questions such as whether -xtreg- is a suitable estimator for this model. You might need a dynamic panel-data model for this. This kind of situation is not common in my discipline (epidemiology) and I don't know much about it.

                        Also, my F-test has no values. I read that if all my standard errors appear normally then there could be no problem to worry about, but i wanted to double check.
                        That is correct--nothing to worry about here.

                        Also, in some other regression that i run, whenever i have an interraction term with my dummy NIRP, in the results appear the coefficients for both the values 0 and 1 of NIRP, however in these results i dont get two results for interraction terms with NIRP.
                        The model you show that gives you two results for interaction terms looks that way because your model fails to include i.NIRP by itself in the list of predictor variables. This model is ill-formed. If you make a habit of using the ## notation instead of # when you put interaction terms into regressions you won't make that mistake.

                        Code:
                        Finally, in the first results that i posted i get some weird results for cofficients and standard erros like this: 8.43e-06. Can this be interpreted or have i done something completely wrong that i need to change?
                        Well, the coefficient for that term is also very small. All that tells you as that this interaction is very small, but your data has a lot of information about it so your estimate of it is rather precise. I wouldn't worry about it.
                        Last edited by Clyde Schechter; 05 May 2021, 14:21.

                        Comment


                        • #13
                          Clyde Schechter However, If i get such a number like -9.53e-07for my coefficients can it be interpreted? For example in this regression:

                          Code:
                          . xtreg total_lending L1.total_lending c.L1.EL##NIRP c.L1.EL#NIRP#c.L1.Households_dep NIRP##c.L1.Households_dep i.bankid i.md
                          > ate if high_RD_ratio_all_months == 1 & mdate< tm(2018m12),fe  robust /* full regression for subgroup 1, and period 2011-201
                          > 8 */
                          
                          
                          Fixed-effects (within) regression               Number of obs     =      1,097
                          Group variable: bankid                          Number of groups  =         18
                          
                          R-sq:                                           Obs per group:
                               within  = 0.9671                                         min =          1
                               between = 0.9997                                         avg =       60.9
                               overall = 0.9989                                         max =         94
                          
                                                                          F(16,17)          =          .
                          corr(u_i, Xb)  = 0.9801                         Prob > F          =          .
                          
                                                                          (Std. Err. adjusted for 18 clusters in bankid)
                          ----------------------------------------------------------------------------------------------
                                                       |               Robust
                                         total_lending |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -----------------------------+----------------------------------------------------------------
                                         total_lending |
                                                   L1. |   .9445454   .0229484    41.16   0.000     .8961284    .9929623
                                                       |
                                                    EL |
                                                   L1. |   .2332589   .1137809     2.05   0.056    -.0067979    .4733156
                                                       |
                                                1.NIRP |    2310.25   1308.861     1.77   0.096    -451.2046    5071.705
                                                       |
                                            NIRP#cL.EL |
                                                    1  |  -.3934421   .1437637    -2.74   0.014    -.6967571   -.0901272
                                                       |
                          NIRP#cL.EL#cL.Households_dep |
                                                    0  |  -9.53e-07   5.17e-07    -1.84   0.083    -2.04e-06    1.38e-07
                                                    1  |   4.46e-07   2.64e-07     1.69   0.109    -1.11e-07    1.00e-06
                                                       |
                                        Households_dep |
                                                   L1. |  -.0103126   .0181993    -0.57   0.578    -.0487097    .0280846
                                                       |
                                NIRP#cL.Households_dep |
                                                    1  |   .0116478   .0054442     2.14   0.047     .0001617     .023134

                          Comment


                          • #14
                            The two coefficients for NIRP#cL.EL#cL.Households_dep are extremely small--and they are literally 7 orders of magnitude smaller than the rest of the coefficients in your model. So unless the variables EL or Households_dep range up as high as 107, the contribution of this three-way interaction is negligible. So you can interpret it as "three way interaction too small to bother with." In fact, if I were you, I would re-estimate the model omitting it, and just having the two-way interactions:

                            Code:
                            xtreg total_lending L1.total_lending i.NIRP##(c.L1.Households_dep c.L1.EL) i.bankid i.mdate if high_RD_ratio_all_months == 1 & mdate< tm(2018m12),fe robust /* full regression for subgroup 1, and period 2011-2018

                            Comment


                            • #15
                              I see, that makes sense. Dear Clyde Schechter Thank you again very much for all your help!

                              Comment

                              Working...
                              X