Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is level easier to be significant than log?

    Hi all,

    I'm running a difference-in-difference regression with leads and lags (event study) with FEs using - reghdfe - from SSC. I notice when I use log and level of my dependent variable Y (everything else equal, my independent variable is the interaction between time dummy and treatment dummy). My Y takes value from 0 to 10000, it is very right skewed, so I think I should take log, but I just want to compare the two results.

    There are 3 things that confuse me:
    1. I got similar trend (if you draw lines to connect the point estimates) but not exact the same trend across time as shown in the following graphs (the first one uses level Y, the second one uses log Y). For instance, from -5 to -4, one has increasing point estimate, one has decreasing point estimate(also from 0 to 1), why this trend is not the same?
    One of my intuition is: for small y less than 1, log transformation gives a negative value. So even if this y goes from below 1 to larger than 1, the effect would be larger when I use the log version than when I just use the level version. I checked this by dropping all observations with y <1, and i did get the same trend (connected point estimates), but it is not clear to me the exact detail. And the following 2 questions persist.

    2. Why the significance levels are different at different time? For instance, at t=4, level one is insignificant whereas log one is significant

    3. How could direction be the opposite? For instance at t = 5, the level one says the treatment group has higher average Y than the control group after the treatment, whereas the log one says the reverse. I understand that when taking log, the coefficient interpretation is percentage change, so it depends on the change size and the baseline size at t=-1, but at least the direction should be the same (higher/lower). But why in the data, I observe different thing?
    Click image for larger version

Name:	20240808_dd_2intfe_clonet_zkt_level_winsor.png
Views:	1
Size:	20.2 KB
ID:	1762754
    Click image for larger version

Name:	20240808_dd_2intfe_clonet_zkt_winsor.png
Views:	1
Size:	22.1 KB
ID:	1762755


    Can you share your insight?
    Thank you

  • #2
    1. logarithm is a highly non-linear function when applied over a wide range. If you just draw a graph of y = log(x) over the range from .0001 to 10000 you will see that for x < 1 the curve is very steep vertically. By the time you get to x > 1000 you will see that it is nearly horizontal. So the way in which log and level change with values of the predictors can be very, very different. The trends do not have to be similar at all.

    2. This is just another good example of why statistical significance is a poor criterion to use in assessing the results of models. If you are thinking of statistical significance as meaning "there is an effect" and not statistically significant as meaning "there is no effect" then there appears to be a contradiction. But that's because there is, in fact no correspondence between statistical significance and presence of effect. p-values are inherently continuous statistics, and statistical significance just imposes some arbitrary cutoff on them. Also p-values are not a measure of effect size anyway. They are influenced by effect size, but also by variance in the outcome and by sample size--those all get smashed together into a single number that is misleading if you pretend that it is about any one of them. In other words, if you think about statistical significance correctly, this question never arises.

    3. Because ratios and differences do not have to go the same way if the baselines are different. Your interaction term is giving you either a difference in difference (level model) or a ratio of ratios (log model). Let's say the pre-treatment level values in the two groups are 5 and 50, respectively. Suppose the treatment effects in the two groups are +5 and +10 respectively. Then the difference in differences is 10-5 > 0. But the ratio of ratios is (60/50)/(10/5) = 1.2/2 = 0.6 < 1. When you log transform that ratio of ratios, which is less than 1,you get a negative number. So the level result is positive and the log result is negative. It is also possible to construct examples where it works the opposite way.

    Added: The other thing to remember is that you as the investigator/analyst do not have the freedom to choose between a level model and a log model when the variable in question, as here, has a wide enough range that the non-linearity of the logarithm function is pronounced. Only at most one of the two models can possibly be a reasonable description of the data. If the log model fits the data, then the level model will not, and vice versa. People are accustomed to thinking of a log transformation as a convenient way of expressing results as ratio (percentage) effects instead of difference effects. If the range of the variable is small enough that the non-linearity of log() is minimal, you can do that. But in your situation, the non-linearity is enormous and that inevitably invalidates at least one of the two models. They can't both be right.
    Last edited by Clyde Schechter; 30 Aug 2024, 20:45.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      1. logarithm is a highly non-linear function when applied over a wide range. If you just draw a graph of y = log(x) over the range from .0001 to 10000 you will see that for x < 1 the curve is very steep vertically. By the time you get to x > 1000 you will see that it is nearly horizontal. So the way in which log and level change with values of the predictors can be very, very different. The trends do not have to be similar at all.

      2. This is just another good example of why statistical significance is a poor criterion to use in assessing the results of models. If you are thinking of statistical significance as meaning "there is an effect" and not statistically significant as meaning "there is no effect" then there appears to be a contradiction. But that's because there is, in fact no correspondence between statistical significance and presence of effect. p-values are inherently continuous statistics, and statistical significance just imposes some arbitrary cutoff on them. Also p-values are not a measure of effect size anyway. They are influenced by effect size, but also by variance in the outcome and by sample size--those all get smashed together into a single number that is misleading if you pretend that it is about any one of them. In other words, if you think about statistical significance correctly, this question never arises.

      3. Because ratios and differences do not have to go the same way if the baselines are different. Your interaction term is giving you either a difference in difference (level model) or a ratio of ratios (log model). Let's say the pre-treatment level values in the two groups are 5 and 50, respectively. Suppose the treatment effects in the two groups are +5 and +10 respectively. Then the difference in differences is 10-5 > 0. But the ratio of ratios is (60/50)/(10/5) = 1.2/2 = 0.6 < 1. When you log transform that ratio of ratios, which is less than 1,you get a negative number. So the level result is positive and the log result is negative. It is also possible to construct examples where it works the opposite way.

      Added: The other thing to remember is that you as the investigator/analyst do not have the freedom to choose between a level model and a log model when the variable in question, as here, has a wide enough range that the non-linearity of the logarithm function is pronounced. Only at most one of the two models can possibly be a reasonable description of the data. If the log model fits the data, then the level model will not, and vice versa. People are accustomed to thinking of a log transformation as a convenient way of expressing results as ratio (percentage) effects instead of difference effects. If the range of the variable is small enough that the non-linearity of log() is minimal, you can do that. But in your situation, the non-linearity is enormous and that inevitably invalidates at least one of the two models. They can't both be right.
      Hi Clyde,
      Thank you for the reply, they are super helpful! I get all your points of why the two results look differently and how much can I trust the significance.
      I do have some more doubts regarding your last added comment. In my case, since the range of Y is very large, log(Y) would have large nonlinearity, which seems to say log model is not valid. But on the other hand:

      1. linear regression needs the residual to be normal. So if my Y is highly right skewed, does this mean: my residual is also highly right skewed, thus to satisfy the normality assumption, it is necessary to take log?
      ---- from your reply to another post about OLS, which states when the sample is large enough, normality distribution is not necessary, central limit theorem can rescue this, my Y is very skewed (skewness = 3.436454, kurtosis = 14.62237) but I have a sample size of 4,280,972 ( I add some FE, but I guess I should look at the total number of observation). so it should not be a problem here right?

      2. is it possible that my residual distribution violate homoscedastic assumption? I think in this case, taking log would help because it makes the variance smaller across different level of X. Or maybe this can be dealt by clustering and has nothing to do with log?

      Thank you!
      Last edited by Chengmou Lei; 31 Aug 2024, 08:43.

      Comment


      • #4
        Chengmou: You said Y ranges from 0 to 10000. Is the value 0 possible? If so, what are you doing with log(0)? It can matter a lot.

        The underlying parallel trends assumption is not usually invariant to functional form. It seems closer to being satisfied using log(Y), but it still matters what you did with log(0).

        BTW, normality is not needed for OLS to have good properties. You haven't shown us the sample size or any estimates, so it's hard to say more. But if N is pretty large, you shouldn't worry too much about normality (or heteroskedasticity or serial correlation if you are clustering).

        Comment


        • #5
          Originally posted by Jeff Wooldridge View Post
          Chengmou: You said Y ranges from 0 to 10000. Is the value 0 possible? If so, what are you doing with log(0)? It can matter a lot.

          The underlying parallel trends assumption is not usually invariant to functional form. It seems closer to being satisfied using log(Y), but it still matters what you did with log(0).

          BTW, normality is not needed for OLS to have good properties. You haven't shown us the sample size or any estimates, so it's hard to say more. But if N is pretty large, you shouldn't worry too much about normality (or heteroskedasticity or serial correlation if you are clustering).
          Hi Jeff,

          Thanks for the reply! Indeed, I realize later when having a large sample, I do not need to worry about normality (or heteroskedasticity or serial correlation if I cluster). I added the information about my sample size in response to Clyde earlier today. My sample size is 4,280,972. But in my regression, I do have some FE, for instance I have country-year FE, then should I look at the number of observations as a whole, or should I look at the number of observations within each country-year group?

          For the log(0), I just checked it. My Y is concentration measurement which theoretically could go from 0 to 10000, but in my data, the minimum is 5.53e-06, so I do not have the case of taking log wrt 0. I should keep in mind this next time.

          Comment

          Working...
          X