Hello,
I’m trying to do the analysis part of my dissertation. I guess I should ask my question to a biostatistician but since my faculty doesn’t have that possibility I thought I would try my luck here, sorry if that really off the mark…
I’m trying to analyze a dataset that consists of 1 dependent variable (number of cases resistant to a certain drug/1000 inhabitants)+ 5 independent variables for the same year (doses prescribed of a particular drug /1000 inhabitants) + 5 independent variables for the previous year (doses prescribed of a particular drug /1000 inhabitants).
These data were measured during 5 years in every province (96 provinces). I therefore have 5*96=480 lines in my dataset which all include that 1 dependent and those 10 independent variables. That means my data looks something like this:
There are a couple more variables that I am correcting for such as age, gender etc which I am not mentioning here for the sake of simplicity.
I would like to see if last year’s value of the independent variables can predict the dependent variable (one national model).
My question is not about how to construct lags. I know there are very nice ways of creating lag variables but as you can see I just added them as a variable.
My question is twofold:
1) I think I should call this type of data “auto correlated” because of the repeated measures in the same area over time, but also “panel data” because measurements took place grouped by province. Is that correct?
2) I initially thought I would just use linear regression (regress command). However, I guess I need to take the auto correlation into account, meaning that the value of a certain province in 2009 is correlated to its value in 2008. I guess I would even know how to do that, but the panel part makes it a bit too complicated for me. My second question therefor is: any idea as to what type of command I could use to have a linear regression that takes both auto-correlation and panels into account?
Sorry if this really isn't the type of forum I should ask this to, like I said, I thought I’d try my luck
Best regards,
Michiel
I’m trying to do the analysis part of my dissertation. I guess I should ask my question to a biostatistician but since my faculty doesn’t have that possibility I thought I would try my luck here, sorry if that really off the mark…
I’m trying to analyze a dataset that consists of 1 dependent variable (number of cases resistant to a certain drug/1000 inhabitants)+ 5 independent variables for the same year (doses prescribed of a particular drug /1000 inhabitants) + 5 independent variables for the previous year (doses prescribed of a particular drug /1000 inhabitants).
These data were measured during 5 years in every province (96 provinces). I therefore have 5*96=480 lines in my dataset which all include that 1 dependent and those 10 independent variables. That means my data looks something like this:
Year | Prov | Resis (dependent) | UseA_lag0 | UseA_lag1 | UseB_lag0 | UseB_lag1 | UseC_lag0 | UseC_lag1 |
2008 | 21 | 5,453 | 425,25 | 390,64 | 459,02 | 424,41 | 592,79 | 558,18 |
2009 | 21 | 6,458 | 430,85 | 425,25 | 464,62 | 459,02 | 598,39 | 592,79 |
2010 | 21 | 7,001 | 485,45 | 430,85 | 519,22 | 464,62 | 652,99 | 598,39 |
2011 | 21 | 7,009 | 490,46 | 485,45 | 524,23 | 519,22 | 658 | 652,99 |
2012 | 21 | 7,401 | 486,45 | 490,46 | 520,22 | 524,23 | 653,99 | 658 |
2008 | 45 | 3,005 | 385,65 | 486,45 | 419,42 | 520,22 | 553,19 | 653,99 |
2009 | 45 | 3,452 | 390,64 | 385,65 | 424,41 | 419,42 | 558,18 | 553,19 |
I would like to see if last year’s value of the independent variables can predict the dependent variable (one national model).
My question is not about how to construct lags. I know there are very nice ways of creating lag variables but as you can see I just added them as a variable.
My question is twofold:
1) I think I should call this type of data “auto correlated” because of the repeated measures in the same area over time, but also “panel data” because measurements took place grouped by province. Is that correct?
2) I initially thought I would just use linear regression (regress command). However, I guess I need to take the auto correlation into account, meaning that the value of a certain province in 2009 is correlated to its value in 2008. I guess I would even know how to do that, but the panel part makes it a bit too complicated for me. My second question therefor is: any idea as to what type of command I could use to have a linear regression that takes both auto-correlation and panels into account?
Sorry if this really isn't the type of forum I should ask this to, like I said, I thought I’d try my luck
Best regards,
Michiel