limiting regression for observations that only have a specific value

Maarten Loomans

Join Date: Jun 2022

Posts: 46
#1

limiting regression for observations that only have a specific value

23 Jun 2022, 02:28

Dear Statalist,

I am doing an analysis of ESG ratings on stock returns. I have a variable called Ratdum which equals 1 if a company is rated, and zero if it is not rated. My dataset is a panelndataset.

What I want to know is: how can I limit my regression to only include observations for which Ratdum !=0 for all t. Such that: I don't want to include companies that never got rated.

my current code is:

Code:

reghdfe $ylist $h1, absorb(n_ID i.Quarter) vce(cluster n_ID)

in $h1 is the variable i.Ratdum.

kind regards.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

23 Jun 2022, 02:40

Maarten:
the -if- clause seems to be what you're looking for:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. reghdfe ln_wage tenure if idcode<=3, abs(idcode year)
(dropped 2 singleton observations)
(MWFE estimator converged in 4 iterations)

HDFE Linear regression                            Number of obs   =         37
Absorbing 2 HDFE groups                           F(   1,     21) =       0.10
                                                  Prob > F        =     0.7537
                                                  R-squared       =     0.6704
                                                  Adj R-squared   =     0.4350
                                                  Within R-sq.    =     0.0048
                                                  Root MSE        =     0.2830

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      tenure |  -.0085828   .0269944    -0.32   0.754    -.0647207    .0475551
       _cons |   1.789625    .089467    20.00   0.000     1.603569    1.975682
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      idcode |         3           0           3     |
        year |        13           1          12     |
-----------------------------------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Maarten Loomans

Join Date: Jun 2022
Posts: 46

23 Jun 2022, 04:24

Originally posted by Carlo Lazzaro View Post

Maarten:
the -if- clause seems to be what you're looking for:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. reghdfe ln_wage tenure if idcode<=3, abs(idcode year)
(dropped 2 singleton observations)
(MWFE estimator converged in 4 iterations)

HDFE Linear regression Number of obs = 37
Absorbing 2 HDFE groups F( 1, 21) = 0.10
Prob > F = 0.7537
R-squared = 0.6704
Adj R-squared = 0.4350
Within R-sq. = 0.0048
Root MSE = 0.2830

------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
tenure | -.0085828 .0269944 -0.32 0.754 -.0647207 .0475551
_cons | 1.789625 .089467 20.00 0.000 1.603569 1.975682
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
idcode | 3 0 3 |
year | 13 1 12 |
-----------------------------------------------------+

.

Carlo:
I do not think that the -if- code is sufficient. If I were to add an -if- to my function, it would ' drop ' all the observations that have a value for which Ratdum = 0. However, this is not my goal. I only want to do the regression for firms that see a change in Ratdum --> from 0 to 1.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

23 Jun 2022, 08:44

Maarten:
sorrym but without an example, I find your query unclear.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

23 Jun 2022, 08:55

In post #1 you wrote

What I want to know is: how can I limit my regression to only include observations for which Ratdum !=0 for all t.

In post #3 you wrote

I only want to do the regression for firms that see a change in Ratdum --> from 0 to 1.

These are not consistent. How can Ratdum change from 0 to 1 if Ratdum != 0 for all t?

A discussion including an example of your data would go a long way here. The answer is that you do want an if-clause, but you will need to construct a variable that is 1 for every observation of a firm that you want to include and 0 for every observation of a firm you do not want to include. With data we can suggest code to do that.
1 like
Comment
Maarten Loomans

Join Date: Jun 2022

Posts: 46
#6

25 Jun 2022, 09:45

Hi all, I am sorry for the unclear query. I made a typo. I will first clearly explain the possible useful variables.
ESG_Q: it is the specific ESG rating of a company, it goes from 0 - 100. It equals '.' if a company is not rated.
Ratdum: It equals 1 if a company is rated, and 0 if a company is not rated, e.g. '.' Ratdum can change throughout time --> a firm was not rated at let's say Q2, but got a rating in Q15.
Quarter: it is a variable indicating in which quarter the observation was. 1 - 64.
ID: the firm ID.

I want to create a dummy variable which indicates if a company ever got a rating. e.g.:
ESG_Q !=. for at least 1 Q.

Thus: if a company got a rating in Q5, then for all the observations from Q=0 for Q=64 for firm 'ID', the variable indicating Change = 1.

I have tried it in many ways.
I currently have the following:

Code:

bys ID(Quarter): gen Change = 0 bys ID(Quarter): replace Change = 1 if ESG_Q != ESG_Q[_n-1]

The problem is that a company that got rated in e.g. Q10 would have
Change = 0 for Q [ 1 ; 10 ]
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

25 Jun 2022, 11:18

Maarten:
what about:

Code:

gen Change=1 if ESG_Q!=. replace Change=0 if ESG_Q==. bysort ID (Quarter): egen wanted=max(Change)

Caveat emptor: code not tested (I'm away from my PC).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

25 Jun 2022, 13:13

Perhaps (untested in the absence of the recommended example data)

Code:

bysort ID (Quarter): egen wanted = max(ESG_Q!=.)

which collapses the logic of Carlo Lazzaro into a single command. The sort by Quarter is not needed, but that's the order you're eventually going to want your data to be in.
1 like
Comment
Maarten Loomans

Join Date: Jun 2022

Posts: 46
#9

26 Jun 2022, 03:15

Dear Carlo and William, I want to thank you both! It works!
Comment
Maarten Loomans

Join Date: Jun 2022

Posts: 46
#10

26 Jun 2022, 04:00

I have another question related to this. I know want to test if my coefficient for my variable of interest with the addition of

Code:

if wanted == 1

is significantly different from the regression without the inclusion.

My two regressions are:

Code:

reghdfe $ylist $h1, absorb(i.Quarter n_ID) vce(cluster n_ID) reghdfe $ylist $h1 if wanted==1, absorb(i.Quarter n_ID) vce(cluster n_ID)

In $h1 I have my variable of interest, called Ratdum.

I hope this is clear enough that someone is able to point me in the right direction.
Kind regards,
Maarten Loomans.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

26 Jun 2022, 04:25

Maarten:
if your idea rests on comparing two linear regression models (instead than their coefficients, when common), you should taka a look at their -e(r2_a)-: the lower, the better.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maarten Loomans

Join Date: Jun 2022

Posts: 46
#12

26 Jun 2022, 15:09

Originally posted by Carlo Lazzaro View Post

Maarten:
if your idea rests on comparing two linear regression models (instead than their coefficients, when common), you should taka a look at their -e(r2_a)-: the lower, the better.

Carlo, I am interested in only the coefficient of Ratdum. is -e(r2_a)- then still applicable?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#13

27 Jun 2022, 02:17

Maarten:
1) in my previous reply I should have written "you should take a look at their -e(r2_a)-: the higher, the better.";
2) as the community contributed module -reghdfe- does not support -suest-, your best bet is to include -wanted- (which soulf be a two-level 0/1 categorical variable) and test it against zero via -test-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement