Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using coefficeints from regress command to generate a new variable

    good evening everyone,

    I am using Stata 16.1. I have to run a pooled OLS on a panel data and the use the estimated coefficients to generate a new variable:

    1. The model I'm trying to implement is: ∆ki,t = ( λ0 + ΛZi,t-1) (Gapi,t-1) + ηi,t
    2. I use the following regress command after setting the dataset as panel using xtset:


    regress actual_tier1_gap l.tier1_gap l.((c.tier1_gap)#(i.state1nonstate0 c.size c.return_on_equity_w ///
    i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)), noconst

    (I apologise for the variable names being unwieldy.)

    3. I have run into a few issues:

    a. my first question is whether the command is appropriate for the model I am trying to implement

    b. the coefficient ofλ0*Gapi,t-1 tends to remain low if I add few controls, but increases considerably when I add all of the controls above.

    c. Finally in the second stage I have to estimate a variable ΛZi,t-1 from the estimated coefficients of Eq1. I'm unable to grasp how to do it.

    Any help would be much appreciated.

    regards,
    Gagan

  • #2
    Looks right, I think.

    The effect of Gap is not independent of the Z. When you change Z, you'll change λ0

    Λ is a vector, so you have multiple ΛZi,t-1.

    Wouldn't you just take each coefficient of Λi and multiply by Zi?

    Comment


    • #3
      Gagandeep:
      1) why did you not cluster the standard errors on your pooled OLS, as the observations within panel (at least) are not independent (and Stata ignores the panel structure of your data since you did not run an -xt- command)? Obviously, Clustering has no bearing on coefficients sample estimate.
      2) are you sure that the -noconst- option is what you need?
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Gagandeep:

        1) why did you not cluster the standard errors on your pooled OLS, as the observations within panel (at least) are not independent (and Stata ignores the panel structure of your data since you did not run an -xt- command)? Obviously, Clustering has no bearing on coefficients sample estimate.
        2) are you sure that the -noconst- option is what you need?

        Thanks Carlos,

        1) I'm setting the data set as panel using xtset. I am running two versions of the command in my Point 2. I use the cl(bankcode) option (my panel identifier), but i do not get the adj R-squared option which I need to report to my supervisor.


        2) the model is specified without the constant option and the first interaction term of Gap is treated as constant in subsequent analysis.

        3) do you have any pointers on my points 3b) and 3c)?

        Comment


        • #5
          Originally posted by George Ford View Post
          Looks right, I think.

          The effect of Gap is not independent of the Z. When you change Z, you'll change λ0

          Λ is a vector, so you have multiple ΛZi,t-1.

          Wouldn't you just take each coefficient of Λi and multiply by Zi?
          thanks for your reply.

          So should the coefficient not go down as i add more regressors as controls?

          Also my coefficients are all interactions terms for eg. cL.tier1_gap#cL.size

          If i need to use just ΛZi,t-1 then does it mean I need to divide all coefficients by the standalone coefficient of Gapi,t-1 and then multiply with the variable observations of Zi,t-1 ? What should I do with the predicted error term of the regression model in that case?

          Comment


          • #6
            Gagandeep:
            1) you do not need to -xtset- your data first if you go pooled OLS (BTW: pooled OLS would not be my first choice for panel data regression). With a bit of guess-work, your supervisor may be interested in within- R-sg (if -fe-) or betwee R_sq (if -re-); both are produced by -xtreg- (with a bit of guess-work again, I assume that your regressand is continuous);
            2) Ok. I assume that the literature in your research field sponsors your approach;
            3) about your question 3b (and with no other pieces of information from your side), provided that it is not clear for what you're controlling for, if the coefficient you're concerned about reports wide variations, I would check your model specification, just to be sure that you're on the right track.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              dY/dGap depends on all the coefficients (often calculated at the means of the Z, but not required). As you interact with more Z, I'm not surprised λ0 changes. It may be, however, that dY/dGap doesn't change all that much once you account for the interactions. Use margins to calculate it to see what's up.

              As for passing on those results to a second-stage, I'm not sure what you're up to. Are you passing through one variable or many? There are many ΛZi,t-1. Do you want to pass through the prediction? Consider whether you have a generated regressor problem, meaning you'll need to bootstrap both stages to do hypothesis tests.

              This problem reminds me a bit of the Bresnahan/Reiss market power model, where there's a pass through of a coefficient from one stage to the next. Might look at that literature.

              And, as Carlo suggests, you may want to keep the constant, or at least determine it is in fact zero (even if it should be, theoretically).

              It might help us to point to what literature you basing your model on.

              Comment


              • #8
                Hi Carlo,

                first of all apologies for addressing you as Carlos.

                I have a panel dataset and the variable Gap in my equation above is actually a predicted variable from a system GMM equation. So it is already set as a panel. Nevertheless I take your point about pooled ols not requiring it.

                The model that I am using is pretty standard and comes from Jiang, C., Liu, H., & Molyneux, P. (2019). Do different forms of government ownership matter for bank capital behavior? Evidence from China. Journal of Financial Stability, 40, 38–49. https://doi.org/10.1016/j.jfs.2018.11.005


                I need to construct a new variable VarX = ΛhatZi,t-1 where vector z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation))
                But as I mentioned, all these variables are interacted with Gapi,t-1 so how can i use the estimated coefficients?

                Thanks and regards


                Comment


                • #9
                  Originally posted by George Ford View Post
                  dY/dGap depends on all the coefficients (often calculated at the means of the Z, but not required). As you interact with more Z, I'm not surprised λ0 changes. It may be, however, that dY/dGap doesn't change all that much once you account for the interactions. Use margins to calculate it to see what's up.

                  As for passing on those results to a second-stage, I'm not sure what you're up to. Are you passing through one variable or many? There are many ΛZi,t-1. Do you want to pass through the prediction? Consider whether you have a generated regressor problem, meaning you'll need to bootstrap both stages to do hypothesis tests.

                  This problem reminds me a bit of the Bresnahan/Reiss market power model, where there's a pass through of a coefficient from one stage to the next. Might look at that literature.

                  And, as Carlo suggests, you may want to keep the constant, or at least determine it is in fact zero (even if it should be, theoretically).

                  It might help us to point to what literature you basing your model on.
                  Thanks George,

                  I'm not sure of how to use margins but I'm looking into it and how it may help me. It's just that the coefficient of Gap is theoretically bound between 0 and 1 and my first stage GMM results give me an idea of the value (the upper bound at least) it should ideally take. As a matter of fact addition of one of the controls is accounting for majority of the increase in coefficient of Gapi,t-1

                  I have linked the paper explaining the model that I am using. Perhaps you can be kind enough just to look at the specification therein. Since this is a second stage in a series of regressions, the literature is quite clear on a pooled OLS with no constant term as also no firm fixed effects, which have been accounted for in the first stage system GMM specification.

                  I reiterate:
                  I need to construct a new variable VarX = ΛhatZi,t-1 where vector Z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)
                  But as I mentioned, all these variables are interacted with Gapi,t-1 so how can i use the estimated coefficients? For eg, my results show a coefficient for L.state1nonstate0#cL.tier1_gap1. Should I divide this coefficient by gapl1 and them multiply the dummy values of state1nonstate0 to get the estimates (and so on for all the variables in Z).

                  thanks and regards.


                  Comment


                  • #10
                    no link

                    Comment


                    • #11
                      Originally posted by George Ford View Post
                      no link
                      Here's the link to the paper.

                      Jiang, C., Liu, H., & Molyneux, P. (2019). Do different forms of government ownership matter for bank capital behavior? Evidence from China. Journal of Financial Stability, 40, 38–49. https://doi.org/10.1016/j.jfs.2018.11.005

                      Thanks and regards.

                      Comment


                      • #12
                        Looks like voodoo, but what do I know?

                        Here's what a few minutes of review does for you (by someone unfamiliar with this literature).

                        This is a three stage model with 2 generated regressors: S1 to S2, and S2 to S3. Unaccounted for, so the hypothesis test are invalid (an error of unknown magnitude, though bootstrap usually increases SEs).

                        In any case, in Step 2 they take a portion of the prediction from Step 1 to craft a new variable (eq 3). The adjustment λ is assumed constant. This can be done by multiplying Beta*Z's. The goal is to get a mean prediction that they then create a new variable of the difference between the mean and the actual value (the gap) (the true value-predicted value). Not sure taking a portion of the regression makes sense due to scaling (the model includes year dummies and a lagged DV). Insert that generated regressor into Step 2 (which kinda looks like your equation), then take the prediction of S2 and insert into Step 3--another generated regressor.

                        Your model does not match the ones in that paper. There is no λ0 just ΛZi,t-1 , and it appears λ is a predetermined constant in this paper. (That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor.

                        You'll need to bootstrap all 3 stages simultaneously for hypothesis testing. You'll have to code it using bsample.

                        I wonder if they are using predications of the DV for the all the generated regressors? That seems to be what they are after, but it is unclear.

                        I'd ask the authors' for their code, or find a better approach.

                        Comment


                        • #13
                          Originally posted by George Ford View Post
                          Looks like voodoo, but what do I know?

                          Here's what a few minutes of review does for you (by someone unfamiliar with this literature).

                          This is a three stage model with 2 generated regressors: S1 to S2, and S2 to S3. Unaccounted for, so the hypothesis test are invalid (an error of unknown magnitude, though bootstrap usually increases SEs).

                          In any case, in Step 2 they take a portion of the prediction from Step 1 to craft a new variable (eq 3). The adjustment λ is assumed constant. This can be done by multiplying Beta*Z's. The goal is to get a mean prediction that they then create a new variable of the difference between the mean and the actual value (the gap) (the true value-predicted value). Not sure taking a portion of the regression makes sense due to scaling (the model includes year dummies and a lagged DV). Insert that generated regressor into Step 2 (which kinda looks like your equation), then take the prediction of S2 and insert into Step 3--another generated regressor.

                          Your model does not match the ones in that paper. There is no λ0 just ΛZi,t-1 , and it appears λ is a predetermined constant in this paper. (That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor.

                          You'll need to bootstrap all 3 stages simultaneously for hypothesis testing. You'll have to code it using bsample.

                          I wonder if they are using predications of the DV for the all the generated regressors? That seems to be what they are after, but it is unclear.

                          I'd ask the authors' for their code, or find a better approach.
                          George, thanks a ton for going through the model. My supervisor is also of the view that the process is not very sound econometrically, which I guess is your point too.

                          "(That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor."

                          Could you be a bit clearer and comment on the following:

                          I need to construct a new variable VarX = ΛhatZi,t-1 where vector Z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)
                          But as I mentioned, all these variables are interacted with Gapi,t-1 so how can i use the estimated coefficients? For eg, my results show a coefficient for L.state1nonstate0#cL.tier1_gap1. Should I divide this coefficient by gapl1 and them multiply the dummy values of state1nonstate0 to get the estimates (and so on for all the variables in Z).

                          I will email the authors, but honestly i have had very little success in getting researchers to part with their codes.

                          Thanks and regards.


                          Comment


                          • #14
                            Originally posted by Carlo Lazzaro View Post
                            Gagandeep:
                            1) you do not need to -xtset- your data first if you go pooled OLS (BTW: pooled OLS would not be my first choice for panel data regression). With a bit of guess-work, your supervisor may be interested in within- R-sg (if -fe-) or betwee R_sq (if -re-); both are produced by -xtreg- (with a bit of guess-work again, I assume that your regressand is continuous);
                            2) Ok. I assume that the literature in your research field sponsors your approach;
                            3) about your question 3b (and with no other pieces of information from your side), provided that it is not clear for what you're controlling for, if the coefficient you're concerned about reports wide variations, I would check your model specification, just to be sure that you're on the right track.
                            Hi Carlo,

                            first of all apologies for addressing you as Carlos.

                            I have a panel dataset and the variable Gap in my equation above is actually a predicted variable from a system GMM equation. So it is already set as a panel. Nevertheless I take your point about pooled ols not requiring it.

                            The model that I am using is pretty standard and comes from Jiang, C., Liu, H., & Molyneux, P. (2019). Do different forms of government ownership matter for bank capital behavior? Evidence from China. Journal of Financial Stability, 40, 38–49. https://doi.org/10.1016/j.jfs.2018.11.005

                            P.S
                            . sorry for spamming. I thought maybe you missed my post. I'm at my wits' end and would appreciate any help possible.

                            Thanks and regards.


                            I need to construct a new variable VarX = ΛhatZi,t-1 where vector z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation))
                            But as I mentioned, all these variables are interacted with Gapi,t-1 so how can i use the estimated coefficients?

                            Thanks and regards

                            Comment


                            • #15
                              Code:
                              sysuse auto, clear
                              reg price mpg weight length foreign
                              * use this if you want the prediction of the regression
                              predict pfit , xb
                              * use this if you want parts of the regression (2 ways to get to the same result)
                              gen newvar = _b[_cons] + _b[mpg]*mpg + _b[weight]*weight +_b[length]*length 
                              gen newvaralt = pfit-_b[foreign]*foreign
                              * newvar = newvaralt
                              You've got ugly variable names. If you have trouble matching up the variable names with the coefficients (_b[x]), then

                              Code:
                              matrix list e(b)
                              to see what Stata sees.




                              Comment

                              Working...
                              X