Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use of continuous and continuous derived dummy variable in regression

    Hi,

    I hope you are doing well. I apologise for this silly question and posting it in stata forum as it is more research related. Can you please advise if a continuous variable e.g. firm size can be used alongside a dummy that is derived from the same variable e.g. young firms in a regression model? Dummy is used as output wont differentiate results for the two different categories of firms. While, firm size (measured by assets) is the main independent variable so it is used in form of continuous variable that how firms with relatively larger asset size may influence dependent variable? as use of dummy as main independent variable may have its own disadvantages. Please advise

  • #2
    Zeenat:
    provided that I got you correctly, the answer is: yes, you can, as in the following toy-example:
    Code:
    . set obs 100
    Number of observations (_N) was 0, now 100.
    
    . g id=_n
    
    . g turnover=1000000*runiform()
    
    . g assets=100000*runiform()
    
    . g company_size=0 if assets<=50000
    
    . replace company_size=1 if company_size==.
    
    . label define company_size 0 "Small firm" 1 "Big firm", add
    
    . label val company_size company_size
    
    . regress turnover c.assets i.company_size
    
          Source |       SS           df       MS      Number of obs   =       100
    -------------+----------------------------------   F(2, 97)        =      0.24
           Model |  4.7361e+10         2  2.3680e+10   Prob > F        =    0.7834
        Residual |  9.3863e+12        97  9.6766e+10   R-squared       =    0.0050
    -------------+----------------------------------   Adj R-squared   =   -0.0155
           Total |  9.4337e+12        99  9.5290e+10   Root MSE        =    3.1e+05
    
    ------------------------------------------------------------------------------
        turnover | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          assets |   1.625111   2.322953     0.70   0.486    -2.985308    6.235529
                 |
    company_size |
       Big firm  |  -81018.14   131890.3    -0.61   0.540    -342783.8    180747.5
           _cons |   458984.4   71672.33     6.40   0.000     316734.7    601234.2
    ------------------------------------------------------------------------------
    
    . estat vce, corr
    
    Correlation matrix of coefficients of regress model
    
                 |                  1.          
            e(V) |   assets  compan~e     _cons 
    -------------+------------------------------
          assets |   1.0000                     
    1.company_~e |  -0.8805    1.0000           
           _cons |  -0.7625    0.4439    1.0000
    The correlation between the two variables is, as expected, really high and may raise come concerns about quasi-extreme multicollinearity (or call for a different spedification of the right-hand side of your regression equation).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      I appreciate your detailed response. Thank you; you covered everything in your response. Carlo, while running regression, multicollinearity can be confirmed using vif test. In my data, due to the usage of lagged dependent variable as an independent variable, the ARDL model is estimated using 2-step system GMM given the endogeneity and autocorrelation issues. So, in this scenario, how can I detect multicollinearity? Shall I run vif? and, if multicollinearity is detected, how shall I resolve this issue?

      Thank you once again.

      Comment


      • #4
        Zeenat:
        quasi-estreme multicollinearity is simply a matter of fact when a vaiable is derived fron another one and both are used as predictors.
        Other things being equal, and preovided you cannot go for a diofferent specifoication, I would plug in both in the right-hand side of my regression equation and highlight the unavoidable high collineraity in my rrearch report/paper.
        As a (hopefully not that) pedantic aside, it is better to declare from the very start the inferential tools you're using/going to use, so that interested readerd can reply more positively. Thanks.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you so much Carlo. Just to clarify if I understood it right. In my model, I used ownership concentration as a continuous independent variable while Firm-control as a factor independent variable. Ownership is determined by the control rights of strategic investors while Firm-control is derived from the control rights keeping a threshold criteria. That is, if strategic investors cross this threshold they are labelled x and if less than that they are labelled y. The purpose of using factor variable is to see how firm x and firm y used liquid reserves (independent variable) to fund R&D (dependent variable). Can I use 2 model specifications for that as 2 hypotheses are to be tested. First, the impact of ownership on R&D funding. Second, the interaction of control type of firm with liquid assets to see how each category use their assets to fund R&D. Can you please advise if it is okay to use two separate specifications to avoid collinearity issues?

          R&D = R&D_L1 + Ownership + Liquidity + Control variables + firm-specific effects + time-specific effects + error term. (1) ---------- Specification for predicting the effect of ownership.

          R&D = R&D_L1 + (Control dummy * Liquidity) + Liquidity + Control variables + firm-specific effects + time-specific effects + error term. (2) ------- Specification for testing the use of liquidity by each control type for funding R&D.

          Many Thanks & Regards,

          Zeenat

          Comment


          • #6
            Zeenat:
            provided that my crush with corporate finance dates back to 35 years ago (EBITDA was an unknow acronym those days!), your explanation sounds good.
            Just in case, I woud recommend to discuss your model specifications with your colleagues/teacher/supervisor.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Carlo,

              I am truly humbled and I extend sincere gratitude for your valuable time and for your expert feedback. Good to hear that you have a crush with corporate finance. Your advice is such a relief God bless you.

              Comment

              Working...
              X