Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • limiting regression for observations that only have a specific value

    Dear Statalist,

    I am doing an analysis of ESG ratings on stock returns. I have a variable called Ratdum which equals 1 if a company is rated, and zero if it is not rated. My dataset is a panelndataset.

    What I want to know is: how can I limit my regression to only include observations for which Ratdum !=0 for all t. Such that: I don't want to include companies that never got rated.

    my current code is:
    Code:
    reghdfe $ylist $h1, absorb(n_ID i.Quarter) vce(cluster n_ID)
    in $h1 is the variable i.Ratdum.

    kind regards.

  • #2
    Maarten:
    the -if- clause seems to be what you're looking for:
    Code:
    . use "https://www.stata-press.com/data/r17/nlswork.dta"
    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
    
    . reghdfe ln_wage tenure if idcode<=3, abs(idcode year)
    (dropped 2 singleton observations)
    (MWFE estimator converged in 4 iterations)
    
    HDFE Linear regression                            Number of obs   =         37
    Absorbing 2 HDFE groups                           F(   1,     21) =       0.10
                                                      Prob > F        =     0.7537
                                                      R-squared       =     0.6704
                                                      Adj R-squared   =     0.4350
                                                      Within R-sq.    =     0.0048
                                                      Root MSE        =     0.2830
    
    ------------------------------------------------------------------------------
         ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          tenure |  -.0085828   .0269944    -0.32   0.754    -.0647207    .0475551
           _cons |   1.789625    .089467    20.00   0.000     1.603569    1.975682
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
          idcode |         3           0           3     |
            year |        13           1          12     |
    -----------------------------------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Maarten:
      the -if- clause seems to be what you're looking for:
      Code:
      . use "https://www.stata-press.com/data/r17/nlswork.dta"
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . reghdfe ln_wage tenure if idcode<=3, abs(idcode year)
      (dropped 2 singleton observations)
      (MWFE estimator converged in 4 iterations)
      
      HDFE Linear regression Number of obs = 37
      Absorbing 2 HDFE groups F( 1, 21) = 0.10
      Prob > F = 0.7537
      R-squared = 0.6704
      Adj R-squared = 0.4350
      Within R-sq. = 0.0048
      Root MSE = 0.2830
      
      ------------------------------------------------------------------------------
      ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      tenure | -.0085828 .0269944 -0.32 0.754 -.0647207 .0475551
      _cons | 1.789625 .089467 20.00 0.000 1.603569 1.975682
      ------------------------------------------------------------------------------
      
      Absorbed degrees of freedom:
      -----------------------------------------------------+
      Absorbed FE | Categories - Redundant = Num. Coefs |
      -------------+---------------------------------------|
      idcode | 3 0 3 |
      year | 13 1 12 |
      -----------------------------------------------------+
      
      .
      Carlo:
      I do not think that the -if- code is sufficient. If I were to add an -if- to my function, it would ' drop ' all the observations that have a value for which Ratdum = 0. However, this is not my goal. I only want to do the regression for firms that see a change in Ratdum --> from 0 to 1.

      Comment


      • #4
        Maarten:
        sorrym but without an example, I find your query unclear.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          In post #1 you wrote
          What I want to know is: how can I limit my regression to only include observations for which Ratdum !=0 for all t.
          In post #3 you wrote
          I only want to do the regression for firms that see a change in Ratdum --> from 0 to 1.
          These are not consistent. How can Ratdum change from 0 to 1 if Ratdum != 0 for all t?

          A discussion including an example of your data would go a long way here. The answer is that you do want an if-clause, but you will need to construct a variable that is 1 for every observation of a firm that you want to include and 0 for every observation of a firm you do not want to include. With data we can suggest code to do that.

          Comment


          • #6
            Hi all, I am sorry for the unclear query. I made a typo. I will first clearly explain the possible useful variables.
            ESG_Q: it is the specific ESG rating of a company, it goes from 0 - 100. It equals '.' if a company is not rated.
            Ratdum: It equals 1 if a company is rated, and 0 if a company is not rated, e.g. '.' Ratdum can change throughout time --> a firm was not rated at let's say Q2, but got a rating in Q15.
            Quarter: it is a variable indicating in which quarter the observation was. 1 - 64.
            ID: the firm ID.

            I want to create a dummy variable which indicates if a company ever got a rating. e.g.:
            ESG_Q !=. for at least 1 Q.

            Thus: if a company got a rating in Q5, then for all the observations from Q=0 for Q=64 for firm 'ID', the variable indicating Change = 1.

            I have tried it in many ways.
            I currently have the following:
            Code:
            bys ID(Quarter): gen Change = 0
            bys ID(Quarter): replace Change = 1 if ESG_Q != ESG_Q[_n-1]
            The problem is that a company that got rated in e.g. Q10 would have
            Change = 0 for Q [ 1 ; 10 ]



            Comment


            • #7
              Maarten:
              what about:
              Code:
              gen Change=1 if ESG_Q!=.
              replace Change=0 if ESG_Q==.
              bysort ID (Quarter): egen wanted=max(Change)
              Caveat emptor: code not tested (I'm away from my PC).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Perhaps (untested in the absence of the recommended example data)
                Code:
                bysort ID (Quarter): egen wanted = max(ESG_Q!=.)
                which collapses the logic of Carlo Lazzaro into a single command. The sort by Quarter is not needed, but that's the order you're eventually going to want your data to be in.

                Comment


                • #9
                  Dear Carlo and William, I want to thank you both! It works!

                  Comment


                  • #10
                    I have another question related to this. I know want to test if my coefficient for my variable of interest with the addition of
                    Code:
                    if wanted == 1
                    is significantly different from the regression without the inclusion.

                    My two regressions are:
                    Code:
                    reghdfe $ylist $h1, absorb(i.Quarter n_ID) vce(cluster n_ID)
                    reghdfe $ylist $h1 if wanted==1, absorb(i.Quarter n_ID) vce(cluster n_ID)
                    In $h1 I have my variable of interest, called Ratdum.

                    I hope this is clear enough that someone is able to point me in the right direction.
                    Kind regards,
                    Maarten Loomans.

                    Comment


                    • #11
                      Maarten:
                      if your idea rests on comparing two linear regression models (instead than their coefficients, when common), you should taka a look at their -e(r2_a)-: the lower, the better.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Originally posted by Carlo Lazzaro View Post
                        Maarten:
                        if your idea rests on comparing two linear regression models (instead than their coefficients, when common), you should taka a look at their -e(r2_a)-: the lower, the better.
                        Carlo, I am interested in only the coefficient of Ratdum. is -e(r2_a)- then still applicable?

                        Comment


                        • #13
                          Maarten:
                          1) in my previous reply I should have written "you should take a look at their -e(r2_a)-: the higher, the better.";
                          2) as the community contributed module -reghdfe- does not support -suest-, your best bet is to include -wanted- (which soulf be a two-level 0/1 categorical variable) and test it against zero via -test-.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment

                          Working...
                          X