I am analysing the effect of intervention (0-no, 1-intervention) on terrorism, per country/year.
It has been suggested that I keep the terrorism variable as a count variable, however, there is a great deal of skewness and there is also heteroskedacity when I run the model. Thus I took the natural logarithm of the variable: gen llta = log(ta+0.00001). I know that this can be an issue for zeros. This new variable however now has a normal distribution and the R2 significantly increases.
I am unsure as to whether it is suitable to make terrorism a log variable. When using different independent variables of intervention (hu1, hu2, hu3), the results sometimes vary. For instance hu3 increases terrorism when logged, but reduces it significantly when not logged. I'm not sure as to why this is.
I have attached boxplot.docx to show the box plot of terrorist attacks (showing skewness).
I have also attached forum_ta.txt which shows the regression with and without logging, as well as the hettest.
It has been suggested that I keep the terrorism variable as a count variable, however, there is a great deal of skewness and there is also heteroskedacity when I run the model. Thus I took the natural logarithm of the variable: gen llta = log(ta+0.00001). I know that this can be an issue for zeros. This new variable however now has a normal distribution and the R2 significantly increases.
I am unsure as to whether it is suitable to make terrorism a log variable. When using different independent variables of intervention (hu1, hu2, hu3), the results sometimes vary. For instance hu3 increases terrorism when logged, but reduces it significantly when not logged. I'm not sure as to why this is.
I have attached boxplot.docx to show the box plot of terrorist attacks (showing skewness).
I have also attached forum_ta.txt which shows the regression with and without logging, as well as the hettest.
Comment