Analyzing Share Turnover with Moderating Variables in STATA

Hirindu Kawshala

Join Date: Nov 2022

Posts: 37
#1

Analyzing Share Turnover with Moderating Variables in STATA

18 Nov 2023, 02:29

I am analyzing to examine the impact of six winsorised and standardised independent variables (LM_QA_w, LM_ALL_w, ML_B_QA_w, ML_B_ALL_w, ML_U_QA_w, ML_U_ALL_w) on SHARE_TURNOVER_Le (Lead variable), with eight control variables SHARE_TURNOVER, SIZE, LEVERAGE_w, AGE, RD_w, BM, DIV_w, FILE_SIZE (I didn't winsorize log variables - SHARE_TURNOVER (current year), SIZE, AGE, & BM). I have also created two moderating variables: "continuous position" and "position dummies". I would appreciate your insights on the correctness of my STATA codes.

Constructing Continuous Position:
egen quarter_id = group(fyearq fqtr)
sort quarter_id tic
by quarter_id: gen alph_rank = _n
by quarter_id: egen total_firms = count(tic)
gen continuous_position = (alph_rank - 1) / (total_firms - 1)

Constructing Position Dummies:
gen first_5 = alph_rank <= total_firms*0.05
gen first_10 = alph_rank <= total_firms*0.10
gen first_5_10 = alph_rank > total_firms*0.05 & alph_rank <= total_firms*0.10
gen first_10_25 = alph_rank > total_firms*0.10 & alph_rank <= total_firms*0.25

Standardizing Independent Variables:
egen LM_QA_w = std(LM_QA1_w)
egen LM_ALL_w = std(LM_ALL1_w)
egen ML_B_QA_w = std(ML_B_QA1_w)
egen ML_B_ALL_w = std(ML_B_ALL1_w)
egen ML_U_QA_w = std(ML_U_QA1_w)
egen ML_U_ALL_w = std(ML_U_ALL1_w)

Constructing Interaction Terms for Continuous Positions:
gen interaction1_cont = LM_QA_w * continuous_position
gen interaction2_cont = LM_ALL_w * continuous_position
gen interaction3_cont = ML_B_QA_w * continuous_position
gen interaction4_cont = ML_B_ALL_w * continuous_position
gen interaction5_cont = ML_U_QA_w * continuous_position
gen interaction6_cont = ML_U_ALL_w * continuous_position

Constructing Interaction Terms for Position Dummies:
foreach var in LM_QA_w LM_ALL_w ML_B_QA_w ML_B_ALL_w ML_U_QA_w ML_U_ALL_w {
gen interaction_`var'_5 = `var' * first_5
gen interaction_`var'_5_10 = `var' * first_5_10
gen interaction_`var'_10_25 = `var' * first_10_25
}

Regression for LM_QA_w:
ssc install estout, replace
eststo clear
eststo: reghdfe SHARE_TURNOVER_Le LM_QA_w continuous_position first_5 first_5_10 first_10_25 interaction1_cont interaction_LM_QA_w_5 interaction_LM_QA_w_5_10 interaction_LM_QA_w_10_25 SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)

I'm only posting regress codes here for my first independent variable (LM_QA_w). Please review my approach, suggest any potential improvements, or confirm if the methodology is appropriate.

Thank you for your guidance!

Last edited by Hirindu Kawshala; 18 Nov 2023, 03:01.
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30150

18 Nov 2023, 10:30

Your code can be made cleaner, more transparent, and more compact if you avail yourself of factor-variable notation, which is available in all but ancient versions of Stata. (See -help fvvarlist- for details).

So, first, instead of creating a series of first_* variables, create a single variable:

Code:

gen group = 1 if  alph_rank <= total_firms*0.05
replace group = 2 if alph_rank <= total_firms*0.10
replace group = 3 if alph_rank > total_firms*0.05 & alph_rank <= total_firms*0.10
replace group = 4 if alph_rank > total_firms*0.10 & alph_rank <= total_firms*0.25

Then eliminate all of the code creating interaction variables.

Now you can write a better -reghdfe- command as follows:

Code:

eststo: reghdfe SHARE_TURNOVER_Le c.(LM_QA_w LM_ALL_w ML_B_QA_w ML_B_ALL_w ML_U_QA_w ML_U_ALL_w)##(i.group c.continuous_position) ///
SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)

Comment

Hirindu Kawshala

Join Date: Nov 2022

Posts: 37
#3

18 Nov 2023, 16:03

Clyde Schechter Thank you so much for your codes and fvvarlist reference. As I understand, is this correct:

My Revised Codes:
egen quarter_id = group(fyearq fqtr)
sort quarter_id tic
by quarter_id: gen alph_rank = _n
by quarter_id: egen total_firms = count(tic)
gen continuous_position = (alph_rank - 1) / (total_firms - 1)

gen group = 1 if alph_rank <= total_firms*0.05
replace group = 2 if alph_rank <= total_firms*0.10
replace group = 3 if alph_rank > total_firms*0.05 & alph_rank <= total_firms*0.10
replace group = 4 if alph_rank > total_firms*0.10 & alph_rank <= total_firms*0.25

gen interaction1_cont = LM_QA_w * continuous_position
gen interaction2_cont = LM_ALL_w * continuous_position
gen interaction3_cont = ML_B_QA_w * continuous_position
gen interaction4_cont = ML_B_ALL_w * continuous_position
gen interaction5_cont = ML_U_QA_w * continuous_position
gen interaction6_cont = ML_U_ALL_w * continuous_position

gen interaction1_group = LM_QA_w * group
gen interaction2_group = LM_ALL_w * group
gen interaction3_group = ML_B_QA_w * group
gen interaction4_group = ML_B_ALL_w * group
gen interaction5_group = ML_U_QA_w * group
gen interaction6_group = ML_U_ALL_w * group

ssc install estout, replace
eststo clear
eststo: reghdfe SHARE_TURNOVER_Le LM_QA_w interaction1_cont interaction1_group group continuous_position SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)
eststo: reghdfe SHARE_TURNOVER_Le LM_ALL_w interaction2_cont interaction2_group group continuous_position SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)
eststo: reghdfe SHARE_TURNOVER_Le ML_B_QA_w interaction3_cont interaction3_group group continuous_position SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)
eststo: reghdfe SHARE_TURNOVER_Le ML_B_ALL_w interaction4_cont interaction4_group group continuous_position SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)
eststo: reghdfe SHARE_TURNOVER_Le ML_U_QA_w interaction5_cont interaction5_group group continuous_position SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)
eststo: reghdfe SHARE_TURNOVER_Le ML_U_ALL_w interaction6_cont interaction6_group group continuous_position SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)

Can you please comment on these. Thank you.
Comment
Hirindu Kawshala

Join Date: Nov 2022

Posts: 37
#4

18 Nov 2023, 16:10

My results are as follows:

Last edited by Hirindu Kawshala; 18 Nov 2023, 16:14.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30150

18 Nov 2023, 16:39

This is wrong. The interaction terms with *group in their definition are all incorrect and will lead to incorrect results. But what you show as your -reghdfe- output is clearly not preceded by the code you show in #3. I suspect your actual code leading to the -reghdfe- command looked more or less the way I recommended you do it, but you intend to use each of the standardized variables in a separate regression, i.e., like this:

Code:

egen quarter_id = group(fyearq fqtr)
sort quarter_id tic
by quarter_id: gen alph_rank = _n
by quarter_id: egen total_firms = count(tic)
gen continuous_position = (alph_rank - 1) / (total_firms - 1)

gen group = 1 if alph_rank <= total_firms*0.05
replace group = 2 if alph_rank <= total_firms*0.10
replace group = 3 if alph_rank > total_firms*0.05 & alph_rank <= total_firms*0.10
replace group = 4 if alph_rank > total_firms*0.10 & alph_rank <= total_firms*0.25

egen LM_QA_w = std(LM_QA1_w)
egen LM_ALL_w = std(LM_ALL1_w)
egen ML_B_QA_w = std(ML_B_QA1_w)
egen ML_B_ALL_w = std(ML_B_ALL1_w)
egen ML_U_QA_w = std(ML_U_QA1_w)
egen ML_U_ALL_w = std(ML_U_ALL1_w)

foreach v of varlist LM_QA_w LM_ALL_w ML_B_QA_w ML_B_ALL_w ML_U_QA_w ML_U_ALL_w {
    eststo: reghdfe SHARE_TURNOVER_Le c.`v'##(i.group c.continuous_position) ///
    SHARE_TURNOVER SIZE LEVERAGE_w AGE RD_w BM DIV_w FILE_SIZE_LM_QA_w, ///
    absorb(quarter_id fama_french_49) vce(cluster gvkey_numeric)
}

Comment

Hirindu Kawshala

Join Date: Nov 2022

Posts: 37
#6

18 Nov 2023, 17:16

Clyde Schechter, Thank you for your assistance. I've applied your code, but without standardizing. I wonder if I apply standardized independent variables to the interpretation of the coefficient changes (how many standard deviations away a value is from the mean).
I have a few unclear points based on the results:
1. In my results, only groups 3 and 4 appear in all six tables.
2. My dataset has a total of 160,000 observations, but only 40,608 to 41,064 are reflected in all tables.
3. The coefficients and R-squared values in the models are very high.

Thank you so much!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#7

18 Nov 2023, 17:45

1. In my results, only groups 3 and 4 appear in all six tables.

Yes, there is an error in the creation of variable group. I modeled it on your code and copied your mistake. It should be:

Code:

gen group = 1 if alph_rank <= total_firms*0.05 replace group = 2 if alph_rank > total_firms*0.05 & alph_rank <= total_firms*0.10 replace group = 3 if alph_rank > total_firms*0.10 & alph_rank <= total_firms*0.25

There should be only three groups. Your original code had a separate group for alph_rank <= total_firms*0.10, but you don't use it in your regression, and it makes no sense to have because it overlaps with group 1. Your groups should be mutually exclusive and exhaustive.

My dataset has a total of 160,000 observations, but only 40,608 to 41,064 are reflected in all tables.

Well, having not seen your data, it is not possible for me to give you specific advice about this. The most common cause is missing values in the variables. Bear in mind that an observation can only appear in the regression's estimation sample if it has non-missing values for every variable mentioned in the regression command. Since your regressions have large numbers of variables, even a relatively small amount of missing values sporadically sprinkled in the data can result in a large number of observations being omitted. Another frequent cause of this is that -reghdfe- omits all singleton groups: you might have a lot of those.

The coefficients and R-squared values in the models are very high.

I can't really help you with this. As I don't know what your variables are, and even if I did, I probably wouldn't understand what the relationships among them are as I do not work in finance, I can't assess your expectations about this. I will only say that it does not surprise me when a regression with a large number of variables has a high R². And as for the coefficients, I can only say that you need to assess that in relationship to the scales of the variables themselves--there is no absolute standard of large or small for coefficients in linear regression.
1 like
Comment
Hirindu Kawshala

Join Date: Nov 2022

Posts: 37
#8

18 Nov 2023, 17:54

Thank you so much for all of your guidance Clyde Schechter. I'll work on these.
Comment

Announcement