Hello guys,
Few weeks ago I started my first econometrics project at my university where I want to measure "The (medium term) Effect of Smoking Cessation on Bodyweight/BMI" by using a panel data set.
My biggest problem right now is that I am not sure how to identify the effect that I am interested in although I spent so much time researching and looking for a fitting model on the internet.
The data set goes from 2002 to 2012 where people answer the question about smoking/bodyweight every two years. Of course only a small sample of the participants take part in all years.
Firstly I tried the model that my prof suggested me:
But the estimated value seemed to be too small and wrong: in t+4 the value descreased, which shouldnt happen I guess. Moreover I am not sure how the model deals with people who started to smoke once again in the years between l2.smoking and t+4.
Then I tried two models with a) a dummy variable for the last year where the people smoked or b) a dummy variable for the first year of the cessation ... With these dummies I had no concern about the problem form the first model where I was not sure if the people started to smoke once again
Also tried to include i.year and even tried fixed effects, but once again the values were small and sometimes far away to be significant.
My last idea was the Difference-in-Difference method with Always Smokers as controll group and the people who stopped smoking in 2006 or 2004 as the treatment group. I am aware that "treatment" is not exogenous but the results looked good.
Since I found myself often on this website, especially because I havent touched Stata ever before I would ask the users here if they have an idea for me. I already asked a similiar question on another statistics forum but they suggested me a very different non-regression approach, which I have no idea about. Moreover the models do not have to the "perfect" because this is my first project in this field.
Thanks !
Few weeks ago I started my first econometrics project at my university where I want to measure "The (medium term) Effect of Smoking Cessation on Bodyweight/BMI" by using a panel data set.
My biggest problem right now is that I am not sure how to identify the effect that I am interested in although I spent so much time researching and looking for a fitting model on the internet.
The data set goes from 2002 to 2012 where people answer the question about smoking/bodyweight every two years. Of course only a small sample of the participants take part in all years.
Firstly I tried the model that my prof suggested me:
Code:
reg weight smoking $vars if l2.smoking == 1, r cluster(id) reg weightt+2 smoking $vars if l2.smoking == 1, r cluster(id) and so on...
Then I tried two models with a) a dummy variable for the last year where the people smoked or b) a dummy variable for the first year of the cessation ... With these dummies I had no concern about the problem form the first model where I was not sure if the people started to smoke once again
Code:
reg weight dummy $vars, r cluster(id) reg weightt+2 dummy $vars, r cluster(id) but also reg dummy $vars, r cluster(id)
My last idea was the Difference-in-Difference method with Always Smokers as controll group and the people who stopped smoking in 2006 or 2004 as the treatment group. I am aware that "treatment" is not exogenous but the results looked good.
Since I found myself often on this website, especially because I havent touched Stata ever before I would ask the users here if they have an idea for me. I already asked a similiar question on another statistics forum but they suggested me a very different non-regression approach, which I have no idea about. Moreover the models do not have to the "perfect" because this is my first project in this field.
Thanks !
Comment