Non-linear relationship between IV and DV - Which is the right solution: stepwise, piecewise or nl- regression?

Marian Dobranschi

Join Date: Mar 2015

Posts: 18
#1

Non-linear relationship between IV and DV - Which is the right solution: stepwise, piecewise or nl- regression?

03 Dec 2015, 06:57

Dear Statalist,

1.Context of the problem:
I'm trying to analyze the impact of excise duty (IV) increase with one CZK on the average daily price of gasoline(DV). Using scatter there is a clear non-linear relationship between the variables.
The graph looks more like a step function. I tried to transform the data (log, square, etc), still the same graph with non-linear relationship, so the regular OLS estimation is not applicable.

2. Which method to use? Broken regression or Stepwise? I tried to run stepwise and the results are the same as from linear regression.

3. I tried to use piecewise regression, but unfortunately Stata is not helping me (or I don't know how to use it). When I use mkspline (make spline command) to create a new variable, after running the regression one of the IV is omitted due to collinearity. To be more precise: the data available is for two years 2009-2010 (daily obs.), the price of gasoline varies each day, but the excise duty has only one change from 2010 onward increases with 1 CZK per liter (from 11.84 to 12.84 CZK/l).
This is the result of my mkspline command:

mkspline exg1 11.84 exg2=exciseg

. reg priceg crude exg1 exg2

reg priceg crude exg1 exg2
note: exg1 omitted because of collinearity

Source | SS df MS Number of obs = 730
-------------+------------------------------ F( 2, 727) = 819.41
Model | 3914.31085 2 1957.15543 Prob > F = 0.0000
Residual | 1736.42686 727 2.38848261 R-squared = 0.6927
-------------+------------------------------ Adj R-squared = 0.6919
Total | 5650.73771 729 7.75135488 Root MSE = 1.5455

------------------------------------------------------------------------------
priceg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
crude | 1.443327 .0675808 21.36 0.000 1.31065 1.576004
exg1 | 0 (omitted)
exg2 | .7173191 .1891469 3.79 0.000 .3459798 1.088658
_cons | 17.02441 .5025775 33.87 0.000 16.03773 18.01108
------------------------------------------------------------------------------

I intend to create a new IV exg1 that is 11.84 CZK corresponding to 2009 and exg2 the variable that is equal with the whole period. Crude = is the crude oil price.
Running the regression exg1 is excluded due to the collinearity. Please help me - where I am wrong? How I'm suppose to define a correct spline??

4. Is there possible to use nl command?? It is important to say that all the parameters are available, so I can include also knots, because it is obvious when the excise duty increases.

Thank you very much for your help.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30191
#2

03 Dec 2015, 09:02

If exg takes on only two values, 11.84 and 12.84, then a spline will lead to exactly the problem you describe. I think I would treat exg as a discrete variable, and create an integer-valued dichotomy from it first, so that you can use factor-variable notation and -margins-. So:

Code:

egen byte _exg = group(exg), label regress priceg crude i._exg margins _exg

The variable _exg that the -egen- statement creates will actually take on the values 1 and 2, but it will be labeled as 11.84 and 12.84, so your regression and -margins- outputs will be easier to read. Since it happens that the two values of exg differ by exactly 1, you will still be able to interpret the coefficient of 1._exg in the -regress- output as the effect of a 1 CZK increase in the excise tax.
Comment
Marian Dobranschi

Join Date: Mar 2015

Posts: 18
#3

03 Dec 2015, 09:51

Thank you for your reply. Much appreciated. This forum is very helpful.
Comment

Announcement

Non-linear relationship between IV and DV - Which is the right solution: stepwise, piecewise or nl- regression?

Comment

Comment