Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-linear relationship between IV and DV - Which is the right solution: stepwise, piecewise or nl- regression?

    Dear Statalist,

    1.Context of the problem:
    I'm trying to analyze the impact of excise duty (IV) increase with one CZK on the average daily price of gasoline(DV). Using scatter there is a clear non-linear relationship between the variables.
    The graph looks more like a step function. I tried to transform the data (log, square, etc), still the same graph with non-linear relationship, so the regular OLS estimation is not applicable.

    2. Which method to use? Broken regression or Stepwise? I tried to run stepwise and the results are the same as from linear regression.

    3. I tried to use piecewise regression, but unfortunately Stata is not helping me (or I don't know how to use it). When I use mkspline (make spline command) to create a new variable, after running the regression one of the IV is omitted due to collinearity. To be more precise: the data available is for two years 2009-2010 (daily obs.), the price of gasoline varies each day, but the excise duty has only one change from 2010 onward increases with 1 CZK per liter (from 11.84 to 12.84 CZK/l).
    This is the result of my mkspline command:

    mkspline exg1 11.84 exg2=exciseg

    . reg priceg crude exg1 exg2

    reg priceg crude exg1 exg2
    note: exg1 omitted because of collinearity

    Source | SS df MS Number of obs = 730
    -------------+------------------------------ F( 2, 727) = 819.41
    Model | 3914.31085 2 1957.15543 Prob > F = 0.0000
    Residual | 1736.42686 727 2.38848261 R-squared = 0.6927
    -------------+------------------------------ Adj R-squared = 0.6919
    Total | 5650.73771 729 7.75135488 Root MSE = 1.5455

    ------------------------------------------------------------------------------
    priceg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    crude | 1.443327 .0675808 21.36 0.000 1.31065 1.576004
    exg1 | 0 (omitted)
    exg2 | .7173191 .1891469 3.79 0.000 .3459798 1.088658
    _cons | 17.02441 .5025775 33.87 0.000 16.03773 18.01108
    ------------------------------------------------------------------------------



    I intend to create a new IV exg1 that is 11.84 CZK corresponding to 2009 and exg2 the variable that is equal with the whole period. Crude = is the crude oil price.
    Running the regression exg1 is excluded due to the collinearity. Please help me - where I am wrong? How I'm suppose to define a correct spline??


    4. Is there possible to use nl command?? It is important to say that all the parameters are available, so I can include also knots, because it is obvious when the excise duty increases.

    Thank you very much for your help.

  • #2
    If exg takes on only two values, 11.84 and 12.84, then a spline will lead to exactly the problem you describe. I think I would treat exg as a discrete variable, and create an integer-valued dichotomy from it first, so that you can use factor-variable notation and -margins-. So:

    Code:
    egen byte _exg = group(exg), label
    regress priceg crude i._exg
    margins _exg
    The variable _exg that the -egen- statement creates will actually take on the values 1 and 2, but it will be labeled as 11.84 and 12.84, so your regression and -margins- outputs will be easier to read. Since it happens that the two values of exg differ by exactly 1, you will still be able to interpret the coefficient of 1._exg in the -regress- output as the effect of a 1 CZK increase in the excise tax.

    Comment


    • #3
      Thank you for your reply. Much appreciated. This forum is very helpful.

      Comment

      Working...
      X