Use of continuous and continuous derived dummy variable in regression

Zeenat Murtaza

Join Date: Aug 2021

Posts: 44
#1

Use of continuous and continuous derived dummy variable in regression

17 Sep 2023, 16:57

Hi,

I hope you are doing well. I apologise for this silly question and posting it in stata forum as it is more research related. Can you please advise if a continuous variable e.g. firm size can be used alongside a dummy that is derived from the same variable e.g. young firms in a regression model? Dummy is used as output wont differentiate results for the two different categories of firms. While, firm size (measured by assets) is the main independent variable so it is used in form of continuous variable that how firms with relatively larger asset size may influence dependent variable? as use of dummy as main independent variable may have its own disadvantages. Please advise
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

18 Sep 2023, 08:43

Zeenat:
provided that I got you correctly, the answer is: yes, you can, as in the following toy-example:

Code:

. set obs 100
Number of observations (_N) was 0, now 100.

. g id=_n

. g turnover=1000000*runiform()

. g assets=100000*runiform()

. g company_size=0 if assets<=50000

. replace company_size=1 if company_size==.

. label define company_size 0 "Small firm" 1 "Big firm", add

. label val company_size company_size

. regress turnover c.assets i.company_size

      Source |       SS           df       MS      Number of obs   =       100
-------------+----------------------------------   F(2, 97)        =      0.24
       Model |  4.7361e+10         2  2.3680e+10   Prob > F        =    0.7834
    Residual |  9.3863e+12        97  9.6766e+10   R-squared       =    0.0050
-------------+----------------------------------   Adj R-squared   =   -0.0155
       Total |  9.4337e+12        99  9.5290e+10   Root MSE        =    3.1e+05

------------------------------------------------------------------------------
    turnover | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      assets |   1.625111   2.322953     0.70   0.486    -2.985308    6.235529
             |
company_size |
   Big firm  |  -81018.14   131890.3    -0.61   0.540    -342783.8    180747.5
       _cons |   458984.4   71672.33     6.40   0.000     316734.7    601234.2
------------------------------------------------------------------------------

. estat vce, corr

Correlation matrix of coefficients of regress model

             |                  1.          
        e(V) |   assets  compan~e     _cons 
-------------+------------------------------
      assets |   1.0000                     
1.company_~e |  -0.8805    1.0000           
       _cons |  -0.7625    0.4439    1.0000

The correlation between the two variables is, as expected, really high and may raise come concerns about quasi-extreme multicollinearity (or call for a different spedification of the right-hand side of your regression equation).

Kind regards,
Carlo
(Stata 19.0)

Comment

Zeenat Murtaza

Join Date: Aug 2021

Posts: 44
#3

18 Sep 2023, 08:57

Dear Carlo,

I appreciate your detailed response. Thank you; you covered everything in your response. Carlo, while running regression, multicollinearity can be confirmed using vif test. In my data, due to the usage of lagged dependent variable as an independent variable, the ARDL model is estimated using 2-step system GMM given the endogeneity and autocorrelation issues. So, in this scenario, how can I detect multicollinearity? Shall I run vif? and, if multicollinearity is detected, how shall I resolve this issue?

Thank you once again.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#4

18 Sep 2023, 09:25

Zeenat:
quasi-estreme multicollinearity is simply a matter of fact when a vaiable is derived fron another one and both are used as predictors.
Other things being equal, and preovided you cannot go for a diofferent specifoication, I would plug in both in the right-hand side of my regression equation and highlight the unavoidable high collineraity in my rrearch report/paper.
As a (hopefully not that) pedantic aside, it is better to declare from the very start the inferential tools you're using/going to use, so that interested readerd can reply more positively. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Zeenat Murtaza

Join Date: Aug 2021

Posts: 44
#5

18 Sep 2023, 09:44

Thank you so much Carlo. Just to clarify if I understood it right. In my model, I used ownership concentration as a continuous independent variable while Firm-control as a factor independent variable. Ownership is determined by the control rights of strategic investors while Firm-control is derived from the control rights keeping a threshold criteria. That is, if strategic investors cross this threshold they are labelled x and if less than that they are labelled y. The purpose of using factor variable is to see how firm x and firm y used liquid reserves (independent variable) to fund R&D (dependent variable). Can I use 2 model specifications for that as 2 hypotheses are to be tested. First, the impact of ownership on R&D funding. Second, the interaction of control type of firm with liquid assets to see how each category use their assets to fund R&D. Can you please advise if it is okay to use two separate specifications to avoid collinearity issues?

R&D = R&D_L1 + Ownership + Liquidity + Control variables + firm-specific effects + time-specific effects + error term. (1) ---------- Specification for predicting the effect of ownership.

R&D = R&D_L1 + (Control dummy * Liquidity) + Liquidity + Control variables + firm-specific effects + time-specific effects + error term. (2) ------- Specification for testing the use of liquidity by each control type for funding R&D.

Many Thanks & Regards,

Zeenat
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#6

18 Sep 2023, 10:04

Zeenat:
provided that my crush with corporate finance dates back to 35 years ago (EBITDA was an unknow acronym those days!), your explanation sounds good.
Just in case, I woud recommend to discuss your model specifications with your colleagues/teacher/supervisor.

Kind regards,
Carlo
(Stata 19.0)
Comment
Zeenat Murtaza

Join Date: Aug 2021

Posts: 44
#7

18 Sep 2023, 10:11

Carlo,

I am truly humbled and I extend sincere gratitude for your valuable time and for your expert feedback. Good to hear that you have a crush with corporate finance. Your advice is such a relief God bless you.
Comment

Announcement

Use of continuous and continuous derived dummy variable in regression

Comment

Comment

Comment

Comment

Comment

Comment