Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Log values in dependent variable

    Hello There

    I am in the process of writing an assignment for an econometrics course and I have a couple of questions:

    1) I am estimating a multiple linear regression y x1 x2 x3... In order to check that my model is correctly specified, I run a RESET test (ovtest), and get p value of 0, rejecting the null that my model has no omitted variables. When transforming my dependent from y to log(y) this changes to a 0.8, meaning that my model is no longer misspecified. The question I have is if this creates problems if my dependent is a % value (% of waste recycled). Since most of the values are rather low percentages the data is rather skewed, so it makes sense to use a log to correct the distribution. However, if I have percentage values as decimals, all of my dependent variables become negative. I can correct this by multiplying my dependent by 100 and keeping values as full numbers (e.g. 0.33=33%).
    So, I wanted to ask if it is possible to log the dependent variable if it is a percentage and if it is okay to multiply it by 100 to keep positive predicted values?

    2) So far I have kept the model as log(y*100). Since I possess panel data and since I know regions have many time-invariant effects that may impact waste recycled (e.g. geography), I am using a FE (model) - I also ran a hausman test to statistically show this is preferred over a RE model. In my class, we have also learnt that there are situations where a first differences model is even better: e.g. with positive autocorrelation or when T is large and N is small. I tested for autocorrelation through the following method:

    gen y = d.y
    gen x = d.x1
    gen x = d.x2
    .... etc.

    then: reg d.y d.x1 d.x2.... timedummy2 timedummy3 timedummy(n)....

    I then predict the residuals and lag the residuals (which gives me fewer periods) and run the regression: reg resid resid_1 timedummy3 timedummy4 timedummy(n)....

    Looking at the phat value I could theoretically tell if there is autocorrelation with one lag AR(1). But, the results show a small and insignificant value, suggesting no real evidence of autocorrelation. As a result, I chose to run both an FE and FD model and compare the two (I ran both robust e.g. (xtreg, robust) since a previous test showed heteroskedasticity). What I found when doing this is that some of Beta values of my independents changed from positive to negative. Does anyone know if it is possible for coefficients to change in sign when running FE vs FD?
    The only reason I could come up with is because of the quite high standard errors so some Betas have confidence intervals that are both in the negative and positive range.

    Besides this I am unsure if I am maybe misspecifying my model or if maybe I have any notable omitted variables.
    Any help would be highly appreciated

  • #2
    A logarithm that is negative below 0 just implies that original values were below 1. That is benign fact and not a problem to be remedied.

    That's a hint but as you have declared this is an assignment please note our policy as explained at #4 of https://www.statalist.org/forums/help#adviceextras

    Comment


    • #3
      Oh sorry, I hadn't seen that policy. Please ignore this post then.
      Thank you very much for the heads up!

      Comment

      Working...
      X