Multilevel Regression with plausible values as dependent variable

Maleika Krüger

Join Date: Oct 2014

Posts: 19
#1

Multilevel Regression with plausible values as dependent variable

23 Oct 2014, 05:55

Dear All,

I'm trying to make an analysis with plausible values. The dependent variable is english skills of students. As the study design included incomplete booklets 5 plausibel vlaues were created for the test results.
To make things more complicated, it is a multilevel data structure (Students nested in classes, nested in schools, nested in school types).
Now for the start I wanted to calculate a simple model with only gender, age and migration status as independent variables on the individual level and no variables on the other levels.
It looks like it worked (at least there was no error message), but the output lookes quite different from what I'm used to and I would be so thanksful to get a few hints from some of you

This is the command I used:
pv, pv (pv1 pv2 pv3 pv4 pv5): mixed @pv Age Sex German ||idclass: , cov(un) ||idschool: , cov(un) || schooltype: , cov(un)

And this is the output I got:

Estimates for pv1 complete
Estimates for pv2 complete
Estimates for pv3 complete
Estimates for pv4 complete
Estimates for pv5 complete

Number of observations: 598
Average R-Squared: .

Coef Std Err t t Param P>|t|
pv5: Age -.00253257 .0023581 -1.0739901 . .
pv5: Sex .16583687 .06449874 2.5711647 . .
pv5: Deutsch .05954325 .06045505 .98491764 . .
pv5: ISEI .00362554 .00172722 2.0990664 . .

pv5:_cons -.19917917 .11831648 -1.683444 . .
lns1_1_1:_cons -1.8443343 55.885289 -.03300214 . .
lns2_1_1:_cons -1.8443525 55.917522 -.03298344 . .
lns3_1_1:_cons -1.8443384 55.922589 -.0329802 . .
lnsig_e:_cons -.59434564 .03554486 -16.721 . .

Is this what it is sopposed to look like? If it is, why is there no indication of significants in the last two columns?
I'm quiet confused. I wasn't able to find a useful discripiton anywhere.
Could someone help me out here?

Thanks in advance
Minka

P.S.:If there is already a thread to this topic, please let me know
Tags: None

1 like
wbuchanan

Join Date: Mar 2014

Posts: 1362
#2

23 Oct 2014, 10:32

You may have better luck/ease using the multiple imputation commands to register the plausible values as imputed values. The other option would be to manually combine the vectors of coefficients and vce matrices, but I think that might be much more work than setting up your data as a dataset with imputed values. If you could provide a bit more context regarding the goals of your analysis you might be able to get some additional suggestions on how to handle things as well.
1 like
Comment
Maleika Krüger

Join Date: Oct 2014

Posts: 19
#3

24 Oct 2014, 02:38

Thank you for your answer wbuchanan.

[QUOTE=wbuchanan;n347942]You may have better luck/ease using the multiple imputation commands to register the plausible values as imputed values.QUOTE]

But isn't this what the pv prefix command is especially designed to do? To deal with imputed values (since plausible values are imputed values)?

And if I wanted to register the plausible values as imputed values and use the mi commands afterwards, how would I have to do that? I know how to impute missing values with Stata, in which case Stata automatically knows, that these are imputed values. However the plausible values where delivered to me from another source, so they are already there in my data set. Is it still possible to declare them as imputed values now?

Originally posted by wbuchanan View Post

If you could provide a bit more context regarding the goals of your analysis you might be able to get some additional suggestions on how to handle things as well.

Well the goal is to see which factors (individual and class/school level factors) influence the englisch skills of students. As I said, I wanted to start off with an easy model with only age gender migration status and ISEI on the individual level and just see how the output looks like etc.
Therefore my question if this is what the output is supposed to look like, why there are no numbers in the last to columns (t Param and P>|t|) and how to interpret it.

Last edited by Maleika Krüger; 24 Oct 2014, 02:59.
1 like
Comment
Lien Castelein

Join Date: May 2015

Posts: 1
#4

29 May 2015, 10:32

Dear all,

I am having a question very similar to what Minka had. In the end what did you do Minka?
Did you define them as imputed values? Did that change your results? or did you keep on defining them with the pv command and did that result in correct results for your multi-level model?
1 like
Comment
Vincent Puchades

Join Date: Aug 2015

Posts: 6
#5

27 Aug 2015, 05:49

Hi everyone,

I am facing the same problem like Minka. I was wondering how did you solve it? (If this is the case how did you do it?). Furthermore, Minka asked some really interesting questions and I am still waiting to get some answers that they will help me out.

wbchanan, thanks for all your comments about plausible values (not only in this post). However, I can not understand what you meant, I am a beginner user and I would like to know how would you deal with minka's question? (step by step if that is possible, I think many people would appreciate it).

Finally, I have to say that I have been reading a lot about it (not just in this forum). I also have tried pv and repest command. As Minka I got the results without p-value.

"But isn't this what the pv prefix command is especially designed to do? To deal with imputed values (since plausible values are imputed values)?

And if I wanted to register the plausible values as imputed values and use the mi commands afterwards, how would I have to do that? I know how to impute missing values with Stata, in which case Stata automatically knows, that these are imputed values. However the plausible values where delivered to me from another source, so they are already there in my data set. Is it still possible to declare them as imputed values now?" (Minka, 2014)

Thank you very much

Last edited by Vincent Puchades; 27 Aug 2015, 05:56.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#6

27 Aug 2015, 07:37

Welcome to the Forum.

Have you read the following help-file entries?

Code:

help mi_import help mi_set

And the corresponding Manual entries? (You can click through to them.)

After doing that reading, formulate some specific questions to ask that are related to your problem, citing help-file or manual entries where relevant, and perhaps posting a relevant extract from what you've typed into Stata and what you've got back, using CODE delimiters for legibility. All this is related to posing questions in ways that maximize the chances of getting a helpful response. Please read the Forum FAQ -- it has a lot about this issue. (Please also note its request that members use their fullnames -- firstname lastname -- and the easy way to re-register to achieve this.) thank you.
1 like
Comment

Vincent Puchades

Join Date: Aug 2015
Posts: 6

27 Aug 2015, 13:16

Hi Stephen and everyone,

Stephen thanks for you suggestions, but you did not answer the question. Anyway, I am going to explain myself again and I hope some researcher will try to explain it. Unfortunately, there is not a clear answer about this yet.

I read the help-file entries and manuals provided by STATA and a bunch of information about this, but I still have some questions about it.

Extra information about the research: multilevel analysis using PISA 2012 focusing on "Top performers"

Code:

·use "/Users/Vincent/Desktop/BBDDPROA.dta"

*We examine data for missing values using misstable summarize:

·misstable summarize
(variables nonmissing or string)

****Example: multiple imputation: Plausible values (P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH) (continuous measure)

· mi set mlong
· mi register imputed P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH

*For simplicity, let is consider a linear regression and  I arbitrarily create 5 imputations and I set the random-number seed for reproducibility:

·mi impute regress  P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH gender  attitudes, add(5) rseed(123)

(variables P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH registered as imputed and used to model variable P12_PV1MATH; this may cause some observations to be omitted from the estimation and may lead to missing imputed value note: variable P12_PV1MATH contains no soft missing (.) values (imputation variable is complete; imputing nothing)

The question would be how can I impute something that it has not missing values?. Then, the command import is suggested, but I already have the variables in my database (importing from where??, I do not see the point).

Taking into account that mi estimation performs analysis of multiple-imputed data but it requires at least 2 imputations. We should not forget that mi command has 3 steps for multiple imputation: 1) mi impute performs imputation, 2) mi estimate performs individual completed-data analysis and 3) uses Rubin's rule to consolidate the obtained individual estimates in a single set of mi estimates. This information is explained here: http://www.stata.com/meeting/boston1..._marchenko.pdf

Code:

*I already have tried these commands:

repest PISA, estimate(corr pv@math pv@read pv@scie) by(cnt)
pv, pv(pv*math) weight(w_fstuwt) brr rw(w_fstr*) fays(0.5): reg @pv stratio propqual [aw=@w]
svyset [pweight= w_fstuwt], brrweight(w_fstr1-w_fstr80) vce(brr) fay(.5) mse
pv, pv(pv*math): xtmixed @pv wealth schsize || schoolid:

*But as Minka, Stata does not provide the p-values.

So, how would be an alternative approach to get the p-values?

Thank you very much,

Comment

Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#8

27 Aug 2015, 16:12

You say that I didn't answer your questions. I was trying to address this one from "Minka" that you reproduced as if you wanted an answer:

And if I wanted to register the plausible values as imputed values and use the mi commands afterwards, how would I have to do that?

What you show in #7 doesn't help readers much. Without seeing (an extract of) a listing of key variables in your "BBDDPROA.dta" file, we can't comment on how one should import the variables so that they can be used by the mi command suite. (dataex on SSC may help you with this.) From the look of your subsequent code, I guess that you have data in wide format: each row (i.e. each observation) contains 5 plausible values followed by the value of each covariate. If so, my reading of help files and manuals suggests that your work flow should be something like

Code:

use "/Users/Vincent/Desktop/BBDDPROA.dta" mi import wide, imputed(P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH) * Use mi describe and mi varying to verify that the result is as you anticipated. * Optionally, use mi convert to convert the data to what you consider a more convenient style. * Run your mi estimate: regress commands

Your mi impute regress command looks odd to me. You don't need to impute values of your outcome variable -- you already have the imputed values , I understand -- they are the plausible values. Hence my reference to mi import ...

Your remarks like

The question would be how can I impute something that it has not missing values?.

were not at all clear to me.
1 like
Comment
Vincent Puchades

Join Date: Aug 2015

Posts: 6
#9

28 Aug 2015, 06:35

Hi everyone,

Thank you, Stephen, for the quick help and for the time spent helping me. However, I still missing something and I am still facing the same problem. Moreover, you are absolutely right, #7 does not help readers much. I am going to try to be more specific about this.

I am using Stata 13.0 and the database is for PISA2012(Spain) and it has:

- Observation: 25,313 ; Variables: 929 and Size: 208,376,616 -

Variables that I am using:

1. DEPENDENT VARIABLE:

5 Plausible values (math) and they already have imputed values pv1math pv2math pv3math pv4math pv5math

2. INDEPENDENT VARIABLES:

Discrete variables:

- gender: Dummy variable (1 for female and 0 male)

- immigrant: Dummy variable ( 1 immigrant and 0 otherwise)

Continuous variables: (For more Information see: http://www.oecd.org/pisa/keyfindings...ol3-AnnexA.pdf and http://www.oecd.org/pisa/pisaproduct...port-final.pdf

- wealth: The index of family wealth (WEALTH) is based on students’ responses on whether they had the following at home: a room of their own, a link to the Internet,a dishwasher (treated as a country-specific item), a DVD player, and three other country-specific items (some items in ST26); and their responses on the number of cellular phones, televisions, computers, cars and the number of rooms with a bath or shower (ST27).

- home_resources: Home educational resources: The index of home educational resources (HEDRES) is based on the items measuring the existence of educational resources at home including a desk and a quiet place to study, a computer that students can use for schoolwork, educational software, books to help with students’ school work, technical reference books and a dictionary (some items in ST26).

- cultpos: Cultural possessions: The index of cultural possessions (CULTPOSS) is based on students’ responses to whether they had the following at home: classic literature, books of poetry and works of art (some items in ST26).

- parents_education: highest occupational status of parents (HISEI), highest educational level of parents in years of education according to ISCED (PARED)

- attitudes: Attitudes towards school (learning outcomes): The index of attitudes towards school (learning outcomes) (ATSCHL) was constructed using student responses (ST88) over the extent they strongly agreed, agreed, disagreed or strongly disagreed to the following statements when asked about what they have learned in school: School has done little to prepare me for adult life when I leave school; school has been a waste of time; school has helped give me confidence to make decisions; school has taught me things which could be useful in a job.

I attached photos (.png format) where you will be able to see the key variable (for the level 1), and what I got from Stata when I type the command that you suggested ( mi imputed wide), ice command, pv command and even repeat command.

use "/Users/Vincent/Desktop/BBDDPROA.dta" mi import wide, imputed(P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH) * Use mi describe and mi varying to verify that the result is as you anticipated. * Optionally, use mi convert to convert the data to what you consider a more convenient style. * Run your mi estimate: regress commands

However I got this from Stata:

Code:

mi import wide, imputed(pv1math pv2math pv3math pv4math pv5math) no; data in memory would be lost r(4);

Then, I read about r(4), r(198), r(6) and r(7) errors, but I could not fix it. In addition, I found this quite interesting: http://www.stata.com/support/faqs/st...s-ice-and-mim/ and http://www.stata.com/statalist/archi.../msg00125.html . From theses links I would like to highlight this: "Ice use the same imputation method, but their features are not the same... ice includes stepwise model selection and is compatible with all releases since Stata 9. And if you have Stata 11 or more recent, you can use mi ice"(Marchenko, 2015).

As I could not fix the command that Stephen provided me, I tried to find an alternative way by using ice command. Nevertheless, I did not solve my problem.

Code:

ice pv1math pv2math pv3math pv4math pv5math gender immigrant wealth home_resources cultpos occupation parents_education attitudes, saving(icedata) m(5) seed(123)

I got from Stata the following message: "All relevant cases are complete, no imputation required". and r(2000). So, I started to try to understand what was r(2000), and I found this: "In addition, r(2000) can mean that a string variable inhibits something intrinsically numerical. Or that there are no non-missing values." (Cox, 2009) from: http://www.stata.com/statalist/archi.../msg00125.html

With respect to this:

Your mi impute regress command looks odd to me

I found that command here: http://www.stata.com/meeting/boston1..._marchenko.pdf (slide number 12).

Thank you soooo much

PD: The question is the same as before -the same problem-, and I also want to attach the commands that related to pv and repest:

Code:

pv, pv(pv*math): xtmixed @pv gender immigrant wealth home_resources cultpos occupation parents_education attitudes || schoolid: repest PISA, estimate(corr pv@math pv@read pv@scie) by(cnt)

5 Photos

Last edited by Vincent Puchades; 28 Aug 2015, 06:40.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#10

28 Aug 2015, 07:30

You report a problem with:

Code:

mi import wide, imputed(pv1math pv2math pv3math pv4math pv5math) no; data in memory would be lost r(4);

Ensure you have a copy of your original data saved, and then try the following

Code:

mi import wide, imputed(pv1math pv2math pv3math pv4math pv5math) clear

Note the "clear". I got this from reading help import wide. You have access to the same resources ...
1 like
Comment
Philip Matthews

Join Date: Apr 2014

Posts: 23
#11

28 Aug 2015, 11:24

I think one source of confusion with using plausible values within Stata's mi environment stems from the following issue(s): it is indeed possible to import a PISA data set and register a set of five plausible values (e.g. pv1maths, ... pv5maths) as imputed variables. These should (in principle) be indexed as m=1, m=2,..,m=5. However, the PISA data set contains no variable that serves as an original variable (m=0) that has/had missing values. Thus it is not possible to inform Stata that, for example, mi import wide, imputed(maths = pv1maths pv2maths ... pv5maths) because there is no variable maths that can be associated with m=0. Similarly, it is not possible to ask Stata to perfom a command that would require the use of maths so that Stata could get to work with the five imputed variables. I am insufficiently familiar with Stata's mi to know if a work-around is possible; but at first glance I cannot find one.
1 like
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#12

28 Aug 2015, 11:29

That's helpful clarification -- thanks. I too don't know of a workaround. (I'm not an mi expert.) Just curious: how then does pv get around these issues? A glance at its ado code using viewsource suggests that it is applying "Rubin's Rules".
1 like
Comment
Vincent Puchades

Join Date: Aug 2015

Posts: 6
#13

28 Aug 2015, 11:48

Hi again,

Stephen, I already tried it and I attached a photo in the previous comment mentioning all the problems I had. Anyway, I am going to use the code here.

Then, I read about r(4), r(198), r(6) and r(7) errors, but I could not fix it

Code:

·mi import wide, imputed(pv1math pv2math pv3math pv3math pv5math) no; data in memory would be lost r(4); ·mi import wide, imputed(pv1math pv2math pv3math pv3math pv5math) clear pv2math found where = expected r(198); · confirm existence found where something expected (6) ·confirm file found where filename expected; r(7);

For this reason, I tried to find an alternative way to get the plausible values by using ice command, but unfortunately I got this:

Code:

ice pv1math pv2math pv3math pv4math pv5math gender immigrant wealth home_resources cultpos occupation parents_education attitudes, saving(icedata) m(5) seed(123)

I got from Stata the following message: "All relevant cases are complete, no imputation required". and r(2000). So, I started to try to understand what was r(2000), and I found this: "In addition, r(2000) can mean that a string variable inhibits something intrinsically numerical. Or that there are no non-missing values." (Cox, 2009) from: http://www.stata.com/statalist/archi.../msg00125.html

Philip, that is exactly the problem that I am try to solve, and I have been reading about it but I did not find anything. That is why I am using Statelist in order to get a solution about it or some suggestion about how I could deal with this. I know that PISA provide the syntaxis for SAS and SPSS, but I am not familiarized with those software, If you have any idea about how could I solve this problem, please feel free, any comment or suggestion are more than welcome. I do not know about work-around, but I am going to read about it right now. Could you explain it? or at least the idea behind that. (I really would appreciate it and I think Statalist users too)

I also read that there are some softwares such as MLwiN -http://www.bristol.ac.uk/cmm/software/mlwin/- , ConQuest - used by Wu- and RealCom-impute-http://missingdata.lshtm.ac.uk/index...=50&Itemid=102- to export to them directly from Stata. But, I am afraid that we are moving to a different topic.

Thank you so much for all the comments

Last edited by Vincent Puchades; 28 Aug 2015, 12:09.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#14

28 Aug 2015, 13:53

Try the following. You don't mave the m0 variable, as Phil says, so let's create one!

1. use your data set

2. generate new variable : generate pv0math = . // missing for all cases

3. mi import wide, imputed(pv0math = pv1math pv2math pv3math pv4math pv5math) clear

As already explained, I'm no mi expert, so this proposal is simply a guess derived from re-reading help files. It assumes that no other variables contain imputations. You might have do something about 'passive' variables but, if the suggestion works, you should be able to figure that out. Also, ensure you do all the various checks that help files and Manuals recommend

[Sorry, iPad use at present and can't see how to use advanced editor in this environment]
1 like
Comment
Philip Matthews

Join Date: Apr 2014

Posts: 23
#15

28 Aug 2015, 14:56

Stephen, yes, I thought about having a variable full of missings; but doubted if that would be legal as far as what is going on 'under the hood' with mi; but perhaps it is ok. However, it is interesting/amusing to think about the (implied) rationale for imputing all values of a variable that was completely missing in the first place... In any event, PISA methodology is somewhat unusual, and in some respects of rather doubtful validity; e.g. Kreiner, Svend and Christensen, Karl Bang (2014). Analysis of Model Fit and Robustness...Psychometrika 79 (2), 210-231.
1 like
Comment

Announcement