Dear everyone,
I have (based upon literature) chosen a model for my thesis, which is actually slightly above my head.... So, I know the basics of stata (OLS and so on), but never had any lecture on the more advanced models. Unfortunately, both my promoters are the kind of: "everything advanced above OLS is scary..." I have checked various websites, and read various books and powerpoints I found online, but its simply too much to comprehend in one time. I condensed everything as per below information, and I would kindly ask you, if my reasoning is correct? , am i missing something? is the order of steps correct? Thanks a million in advance for your reply/replies, and my apologies for the long post, but I wanted to be as complete as possible...
Situation: I want to test if digital (il)literacy has an effect on corruption.
Dataset: 142 countries, 13 years -> +/- 30000 observations (but its unbalanced, i.e. not every country has a value for a specific variable for a specific year, I have 810 missing values, which I coded as -9999, and then set -9999 as missing)
Main model: 1 Dependent Var, 3 Independent Vars, 5 control vars, 2 moderating vars.
*a one independent var i am not sure, and possibly is a mediating variable, i want to test this also.
*b moreover, i want to test also if region or subregion has an effect (or if countries in a (sub)region) are homogeneneous or heterogeneous) as well, so I have added the possibility in my dataset to group countries by region or by subregion.
So my thoughts are as follows, and that I perform this in this order as well:
1) the DV is a rank, which is not normally distributed, but ofcourse uniform distribution. therefore I need to transform the rank to a normal distribution? I found that I use this by converting the rank by Stata command:
or
?
2) I want to test whether to use fixed effects or random effect, so I use the Hausmann test? I would that by following the commands as provided by http://www.stata.com/manuals13/rhausman.pdf
3) Then, I test whether to use a dynamic model or static model (i.e. whether or not lagged values of IV). I would convert the IV by
right? and then a 'stupid question' what would be my decision criteria?
4) I perform the
command for a mixed model, and test either the fixed effect or random effects model?
Is this correct so far? what about the order? what am I missing?
----
Then a few addition questions which I havent figured out yet:
*a: for one independent variable, i am not sure whether its independent or its mediating the other independent variables. for OLS simple regression, I would do a Baron & Kenny test, and then a sobel test to confirm. How would this work out for panel data? I found http://www.ats.ucla.edu/stat/stata/f...mediation2.htm for multilevel data, which I have, but it doesnt look like its not for panel data?
*b: To test whether to ungroup countries, or group them by either region or subregion, I would make 3 models: 1 'flat', 1 multilevel grouped by region and 1 multilevel grouped by subregion.
Is that correct? And where in the order of processes will fit this in?
----
Then some more additional questions:
a) I want to reduce the model to as simply as possible. I am doubting. Would I start from 1 dv, and 1 IV, and slowly build up my model, performing all tests again for every variable I include extra? Or would I start the other way around, and start by the most complete model, and deleting variable by variable?
On top of that, is there a single command for it? or do I do it manually, variable after variable?
b) To test the moderating variables: In OLS simple regression, I would add an interaction term, and compare the p values and VIF values. Would it be just as simply for panel data?
c) On internet there is much ambiguity and unclarity as to the assumptions I have to check. Do I check just as OLS for outliers, heteroskedasticity, normality of error terms, multicolinearity, and for independence? OR do check other assumptions, and if so which ones?
d) there are other measurements of corruption, so I have other DV's at hand. I would want to use the other DVs to confirm my findings, and thus will need to test for robustness? any suggestion on how to perform this?
e) In the end, I would like to say something about a 'granger causality' and would want to test for vector autoregression. for that, i would want to follow http://paneldataconference2015.ceu.h...ael-Abrigo.pdf which is quite a lengthy process. Is there a quick way to do it? Or shall i keep it as in this paper?
---
And finally? am i complete now? am I missing something? is the order correct? any other useful feedback?
Thank you in advance for your feedback and reply,
Trebor
I have (based upon literature) chosen a model for my thesis, which is actually slightly above my head.... So, I know the basics of stata (OLS and so on), but never had any lecture on the more advanced models. Unfortunately, both my promoters are the kind of: "everything advanced above OLS is scary..." I have checked various websites, and read various books and powerpoints I found online, but its simply too much to comprehend in one time. I condensed everything as per below information, and I would kindly ask you, if my reasoning is correct? , am i missing something? is the order of steps correct? Thanks a million in advance for your reply/replies, and my apologies for the long post, but I wanted to be as complete as possible...
Situation: I want to test if digital (il)literacy has an effect on corruption.
Dataset: 142 countries, 13 years -> +/- 30000 observations (but its unbalanced, i.e. not every country has a value for a specific variable for a specific year, I have 810 missing values, which I coded as -9999, and then set -9999 as missing)
Main model: 1 Dependent Var, 3 Independent Vars, 5 control vars, 2 moderating vars.
*a one independent var i am not sure, and possibly is a mediating variable, i want to test this also.
*b moreover, i want to test also if region or subregion has an effect (or if countries in a (sub)region) are homogeneneous or heterogeneous) as well, so I have added the possibility in my dataset to group countries by region or by subregion.
So my thoughts are as follows, and that I perform this in this order as well:
1) the DV is a rank, which is not normally distributed, but ofcourse uniform distribution. therefore I need to transform the rank to a normal distribution? I found that I use this by converting the rank by Stata command:
Code:
generate zscore = invnorm(pctrank/100)
Code:
generate nce = invnorm(pctrank/100)*21.06 + 50
2) I want to test whether to use fixed effects or random effect, so I use the Hausmann test? I would that by following the commands as provided by http://www.stata.com/manuals13/rhausman.pdf
3) Then, I test whether to use a dynamic model or static model (i.e. whether or not lagged values of IV). I would convert the IV by
Code:
sort Country Year, xtset Country Year, gen iv_lag = L1.iv
4) I perform the
Code:
xtmixed
Is this correct so far? what about the order? what am I missing?
----
Then a few addition questions which I havent figured out yet:
*a: for one independent variable, i am not sure whether its independent or its mediating the other independent variables. for OLS simple regression, I would do a Baron & Kenny test, and then a sobel test to confirm. How would this work out for panel data? I found http://www.ats.ucla.edu/stat/stata/f...mediation2.htm for multilevel data, which I have, but it doesnt look like its not for panel data?
*b: To test whether to ungroup countries, or group them by either region or subregion, I would make 3 models: 1 'flat', 1 multilevel grouped by region and 1 multilevel grouped by subregion.
Is that correct? And where in the order of processes will fit this in?
----
Then some more additional questions:
a) I want to reduce the model to as simply as possible. I am doubting. Would I start from 1 dv, and 1 IV, and slowly build up my model, performing all tests again for every variable I include extra? Or would I start the other way around, and start by the most complete model, and deleting variable by variable?
On top of that, is there a single command for it? or do I do it manually, variable after variable?
b) To test the moderating variables: In OLS simple regression, I would add an interaction term, and compare the p values and VIF values. Would it be just as simply for panel data?
c) On internet there is much ambiguity and unclarity as to the assumptions I have to check. Do I check just as OLS for outliers, heteroskedasticity, normality of error terms, multicolinearity, and for independence? OR do check other assumptions, and if so which ones?
d) there are other measurements of corruption, so I have other DV's at hand. I would want to use the other DVs to confirm my findings, and thus will need to test for robustness? any suggestion on how to perform this?
e) In the end, I would like to say something about a 'granger causality' and would want to test for vector autoregression. for that, i would want to follow http://paneldataconference2015.ceu.h...ael-Abrigo.pdf which is quite a lengthy process. Is there a quick way to do it? Or shall i keep it as in this paper?
---
And finally? am i complete now? am I missing something? is the order correct? any other useful feedback?
Thank you in advance for your feedback and reply,
Trebor

Comment