Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multilevel Regression with plausible values as dependent variable

    Dear All,

    I'm trying to make an analysis with plausible values. The dependent variable is english skills of students. As the study design included incomplete booklets 5 plausibel vlaues were created for the test results.
    To make things more complicated, it is a multilevel data structure (Students nested in classes, nested in schools, nested in school types).
    Now for the start I wanted to calculate a simple model with only gender, age and migration status as independent variables on the individual level and no variables on the other levels.
    It looks like it worked (at least there was no error message), but the output lookes quite different from what I'm used to and I would be so thanksful to get a few hints from some of you

    This is the command I used:
    pv, pv (pv1 pv2 pv3 pv4 pv5): mixed @pv Age Sex German ||idclass: , cov(un) ||idschool: , cov(un) || schooltype: , cov(un)

    And this is the output I got:

    Estimates for pv1 complete
    Estimates for pv2 complete
    Estimates for pv3 complete
    Estimates for pv4 complete
    Estimates for pv5 complete

    Number of observations: 598
    Average R-Squared: .


    Coef Std Err t t Param P>|t|
    pv5: Age -.00253257 .0023581 -1.0739901 . .
    pv5: Sex .16583687 .06449874 2.5711647 . .
    pv5: Deutsch .05954325 .06045505 .98491764 . .
    pv5: ISEI .00362554 .00172722 2.0990664 . .

    pv5:_cons -.19917917 .11831648 -1.683444 . .
    lns1_1_1:_cons -1.8443343 55.885289 -.03300214 . .
    lns2_1_1:_cons -1.8443525 55.917522 -.03298344 . .
    lns3_1_1:_cons -1.8443384 55.922589 -.0329802 . .
    lnsig_e:_cons -.59434564 .03554486 -16.721 . .



    Is this what it is sopposed to look like? If it is, why is there no indication of significants in the last two columns?
    I'm quiet confused. I wasn't able to find a useful discripiton anywhere.
    Could someone help me out here?


    Thanks in advance
    Minka

    P.S.:If there is already a thread to this topic, please let me know


  • #2
    You may have better luck/ease using the multiple imputation commands to register the plausible values as imputed values. The other option would be to manually combine the vectors of coefficients and vce matrices, but I think that might be much more work than setting up your data as a dataset with imputed values. If you could provide a bit more context regarding the goals of your analysis you might be able to get some additional suggestions on how to handle things as well.

    Comment


    • #3
      Thank you for your answer wbuchanan.

      [QUOTE=wbuchanan;n347942]You may have better luck/ease using the multiple imputation commands to register the plausible values as imputed values.QUOTE]

      But isn't this what the pv prefix command is especially designed to do? To deal with imputed values (since plausible values are imputed values)?

      And if I wanted to register the plausible values as imputed values and use the mi commands afterwards, how would I have to do that? I know how to impute missing values with Stata, in which case Stata automatically knows, that these are imputed values. However the plausible values where delivered to me from another source, so they are already there in my data set. Is it still possible to declare them as imputed values now?

      Originally posted by wbuchanan View Post
      If you could provide a bit more context regarding the goals of your analysis you might be able to get some additional suggestions on how to handle things as well.
      Well the goal is to see which factors (individual and class/school level factors) influence the englisch skills of students. As I said, I wanted to start off with an easy model with only age gender migration status and ISEI on the individual level and just see how the output looks like etc.
      Therefore my question if this is what the output is supposed to look like, why there are no numbers in the last to columns (t Param and P>|t|) and how to interpret it.
      Last edited by Maleika Krüger; 24 Oct 2014, 02:59.

      Comment


      • #4
        Dear all,

        I am having a question very similar to what Minka had. In the end what did you do Minka?
        Did you define them as imputed values? Did that change your results? or did you keep on defining them with the pv command and did that result in correct results for your multi-level model?

        Comment


        • #5
          Hi everyone,

          I am facing the same problem like Minka. I was wondering how did you solve it? (If this is the case how did you do it?). Furthermore, Minka asked some really interesting questions and I am still waiting to get some answers that they will help me out.

          wbchanan, thanks for all your comments about plausible values (not only in this post). However, I can not understand what you meant, I am a beginner user and I would like to know how would you deal with minka's question? (step by step if that is possible, I think many people would appreciate it).

          Finally, I have to say that I have been reading a lot about it (not just in this forum). I also have tried pv and repest command. As Minka I got the results without p-value.


          "But isn't this what the pv prefix command is especially designed to do? To deal with imputed values (since plausible values are imputed values)?

          And if I wanted to register the plausible values as imputed values and use the mi commands afterwards, how would I have to do that? I know how to impute missing values with Stata, in which case Stata automatically knows, that these are imputed values. However the plausible values where delivered to me from another source, so they are already there in my data set. Is it still possible to declare them as imputed values now?" (Minka, 2014)


          Thank you very much

          Last edited by Vincent Puchades; 27 Aug 2015, 05:56.

          Comment


          • #6
            Welcome to the Forum.

            Have you read the following help-file entries?
            Code:
            help mi_import
            
            help mi_set
            And the corresponding Manual entries? (You can click through to them.)

            After doing that reading, formulate some specific questions to ask that are related to your problem, citing help-file or manual entries where relevant, and perhaps posting a relevant extract from what you've typed into Stata and what you've got back, using CODE delimiters for legibility. All this is related to posing questions in ways that maximize the chances of getting a helpful response. Please read the Forum FAQ -- it has a lot about this issue. (Please also note its request that members use their fullnames -- firstname lastname -- and the easy way to re-register to achieve this.) thank you.

            Comment


            • #7
              Hi Stephen and everyone,

              Stephen thanks for you suggestions, but you did not answer the question. Anyway, I am going to explain myself again and I hope some researcher will try to explain it. Unfortunately, there is not a clear answer about this yet.

              I read the help-file entries and manuals provided by STATA and a bunch of information about this, but I still have some questions about it.

              Extra information about the research: multilevel analysis using PISA 2012 focusing on "Top performers"

              Code:
              ·use "/Users/Vincent/Desktop/BBDDPROA.dta"
              
              *We examine data for missing values using misstable summarize:
              
              ·misstable summarize
              (variables nonmissing or string)
              
              ****Example: multiple imputation: Plausible values (P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH) (continuous measure)
              
              · mi set mlong
              · mi register imputed P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH
              
              *For simplicity, let is consider a linear regression and  I arbitrarily create 5 imputations and I set the random-number seed for reproducibility:
              
              ·mi impute regress  P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH gender  attitudes, add(5) rseed(123)
              
              (variables P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH registered as imputed and used to model variable P12_PV1MATH; this may cause some observations to be omitted from the estimation and may lead to missing imputed value note: variable P12_PV1MATH contains no soft missing (.) values (imputation variable is complete; imputing nothing)

              The question would be how can I impute something that it has not missing values?. Then, the command import is suggested, but I already have the variables in my database (importing from where??, I do not see the point).

              Taking into account that mi estimation performs analysis of multiple-imputed data but it requires at least 2 imputations. We should not forget that mi command has 3 steps for multiple imputation: 1) mi impute performs imputation, 2) mi estimate performs individual completed-data analysis and 3) uses Rubin's rule to consolidate the obtained individual estimates in a single set of mi estimates. This information is explained here: http://www.stata.com/meeting/boston1..._marchenko.pdf

              Code:
              *I already have tried these commands:
              
              repest PISA, estimate(corr pv@math pv@read pv@scie) by(cnt)
              pv, pv(pv*math) weight(w_fstuwt) brr rw(w_fstr*) fays(0.5): reg @pv stratio propqual [aw=@w]
              svyset [pweight= w_fstuwt], brrweight(w_fstr1-w_fstr80) vce(brr) fay(.5) mse
              pv, pv(pv*math): xtmixed @pv wealth schsize || schoolid:
              
              *But as Minka, Stata does not provide the p-values.
              So, how would be an alternative approach to get the p-values?


              Thank you very much,


              Comment


              • #8
                You say that I didn't answer your questions. I was trying to address this one from "Minka" that you reproduced as if you wanted an answer:
                And if I wanted to register the plausible values as imputed values and use the mi commands afterwards, how would I have to do that?
                What you show in #7 doesn't help readers much. Without seeing (an extract of) a listing of key variables in your "BBDDPROA.dta" file, we can't comment on how one should import the variables so that they can be used by the mi command suite. (dataex on SSC may help you with this.) From the look of your subsequent code, I guess that you have data in wide format: each row (i.e. each observation) contains 5 plausible values followed by the value of each covariate. If so, my reading of help files and manuals suggests that your work flow should be something like
                Code:
                use "/Users/Vincent/Desktop/BBDDPROA.dta"
                mi import wide, imputed(P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH)
                *   Use mi describe and mi varying to verify that the result is as you anticipated.
                * Optionally, use mi convert to convert the data to what you consider a more convenient style.
                * Run your mi estimate: regress commands


                Your mi impute regress command looks odd to me. You don't need to impute values of your outcome variable -- you already have the imputed values , I understand -- they are the plausible values. Hence my reference to mi import ...

                Your remarks like
                The question would be how can I impute something that it has not missing values?.
                were not at all clear to me.

                Comment


                • #9

                  Hi everyone,

                  Thank you, Stephen, for the quick help and for the time spent helping me. However, I still missing something and I am still facing the same problem. Moreover, you are absolutely right, #7 does not help readers much. I am going to try to be more specific about this.

                  I am using Stata 13.0 and the database is for PISA2012(Spain) and it has:

                  - Observation: 25,313 ; Variables: 929 and Size: 208,376,616 -

                  Variables that I am using:

                  1. DEPENDENT VARIABLE:

                  5 Plausible values (math) and they already have imputed values pv1math pv2math pv3math pv4math pv5math

                  2. INDEPENDENT VARIABLES:

                  Discrete variables:

                  - gender: Dummy variable (1 for female and 0 male)

                  - immigrant: Dummy variable ( 1 immigrant and 0 otherwise)

                  Continuous variables: (For more Information see: http://www.oecd.org/pisa/keyfindings...ol3-AnnexA.pdf and http://www.oecd.org/pisa/pisaproduct...port-final.pdf

                  - wealth: The index of family wealth (WEALTH) is based on students’ responses on whether they had the following at home: a room of their own, a link to the Internet,a dishwasher (treated as a country-specific item), a DVD player, and three other country-specific items (some items in ST26); and their responses on the number of cellular phones, televisions, computers, cars and the number of rooms with a bath or shower (ST27).

                  - home_resources: Home educational resources: The index of home educational resources (HEDRES) is based on the items measuring the existence of educational resources at home including a desk and a quiet place to study, a computer that students can use for schoolwork, educational software, books to help with students’ school work, technical reference books and a dictionary (some items in ST26).

                  - cultpos: Cultural possessions: The index of cultural possessions (CULTPOSS) is based on students’ responses to whether they had the following at home: classic literature, books of poetry and works of art (some items in ST26).

                  - parents_education: highest occupational status of parents (HISEI), highest educational level of parents in years of education according to ISCED (PARED)

                  - attitudes: Attitudes towards school (learning outcomes): The index of attitudes towards school (learning outcomes) (ATSCHL) was constructed using student responses (ST88) over the extent they strongly agreed, agreed, disagreed or strongly disagreed to the following statements when asked about what they have learned in school: School has done little to prepare me for adult life when I leave school; school has been a waste of time; school has helped give me confidence to make decisions; school has taught me things which could be useful in a job.

                  I attached photos (.png format) where you will be able to see the key variable (for the level 1), and what I got from Stata when I type the command that you suggested ( mi imputed wide), ice command, pv command and even repeat command.

                  use "/Users/Vincent/Desktop/BBDDPROA.dta" mi import wide, imputed(P12_PV1MATH P12_PV2MATH P12_PV3MATH P12_PV4MATH P12_PV5MATH) * Use mi describe and mi varying to verify that the result is as you anticipated. * Optionally, use mi convert to convert the data to what you consider a more convenient style. * Run your mi estimate: regress commands
                  However I got this from Stata:

                  Code:
                   mi import wide, imputed(pv1math pv2math pv3math pv4math pv5math) no; data in memory would be lost r(4);
                  Then, I read about r(4), r(198), r(6) and r(7) errors, but I could not fix it. In addition, I found this quite interesting: http://www.stata.com/support/faqs/st...s-ice-and-mim/ and http://www.stata.com/statalist/archi.../msg00125.html . From theses links I would like to highlight this: "Ice use the same imputation method, but their features are not the same... ice includes stepwise model selection and is compatible with all releases since Stata 9. And if you have Stata 11 or more recent, you can use mi ice"(Marchenko, 2015).

                  As I could not fix the command that Stephen provided me, I tried to find an alternative way by using ice command. Nevertheless, I did not solve my problem.

                  Code:
                   ice pv1math pv2math pv3math pv4math pv5math gender immigrant wealth home_resources cultpos occupation parents_education attitudes,              saving(icedata) m(5) seed(123)
                  I got from Stata the following message: "All relevant cases are complete, no imputation required". and r(2000). So, I started to try to understand what was r(2000), and I found this: "In addition, r(2000) can mean that a string variable inhibits something intrinsically numerical. Or that there are no non-missing values." (Cox, 2009) from: http://www.stata.com/statalist/archi.../msg00125.html

                  With respect to this:
                  Your mi impute regress command looks odd to me
                  I found that command here: http://www.stata.com/meeting/boston1..._marchenko.pdf (slide number 12).


                  Thank you soooo much



                  PD: The question is the same as before -the same problem-, and I also want to attach the commands that related to pv and repest:

                  Code:
                    pv, pv(pv*math): xtmixed @pv gender immigrant  wealth home_resources cultpos occupation parents_education attitudes || schoolid:   repest PISA, estimate(corr pv@math pv@read pv@scie) by(cnt)
                  Last edited by Vincent Puchades; 28 Aug 2015, 06:40.

                  Comment


                  • #10
                    You report a problem with:
                    Code:
                     mi import wide, imputed(pv1math pv2math pv3math pv4math pv5math)
                    no; data in memory would be lost r(4);
                    


                    Ensure you have a copy of your original data saved, and then try the following
                    Code:
                    mi import wide, imputed(pv1math pv2math pv3math pv4math pv5math) clear
                    Note the "clear". I got this from reading help import wide. You have access to the same resources ...

                    Comment


                    • #11
                      I think one source of confusion with using plausible values within Stata's mi environment stems from the following issue(s): it is indeed possible to import a PISA data set and register a set of five plausible values (e.g. pv1maths, ... pv5maths) as imputed variables. These should (in principle) be indexed as m=1, m=2,..,m=5. However, the PISA data set contains no variable that serves as an original variable (m=0) that has/had missing values. Thus it is not possible to inform Stata that, for example, mi import wide, imputed(maths = pv1maths pv2maths ... pv5maths) because there is no variable maths that can be associated with m=0. Similarly, it is not possible to ask Stata to perfom a command that would require the use of maths so that Stata could get to work with the five imputed variables. I am insufficiently familiar with Stata's mi to know if a work-around is possible; but at first glance I cannot find one.

                      Comment


                      • #12
                        That's helpful clarification -- thanks. I too don't know of a workaround. (I'm not an mi expert.) Just curious: how then does pv get around these issues? A glance at its ado code using viewsource suggests that it is applying "Rubin's Rules".

                        Comment


                        • #13
                          Hi again,

                          Stephen, I already tried it and I attached a photo in the previous comment mentioning all the problems I had. Anyway, I am going to use the code here.

                          Then, I read about r(4), r(198), r(6) and r(7) errors, but I could not fix it
                          Code:
                          ·mi import wide, imputed(pv1math pv2math pv3math pv3math pv5math)
                          no; data in memory would be lost
                          r(4);
                          
                          ·mi import wide, imputed(pv1math pv2math pv3math pv3math pv5math) clear
                          pv2math found where = expected
                          r(198);
                          
                          · confirm existence
                          found where something expected
                          (6)
                          
                          ·confirm file
                          found where filename expected;
                          r(7);
                          For this reason, I tried to find an alternative way to get the plausible values by using ice command, but unfortunately I got this:

                          Code:
                          ice pv1math pv2math pv3math pv4math pv5math gender immigrant wealth home_resources cultpos occupation parents_education attitudes, saving(icedata) m(5) seed(123)
                          I got from Stata the following message: "All relevant cases are complete, no imputation required". and r(2000). So, I started to try to understand what was r(2000), and I found this: "In addition, r(2000) can mean that a string variable inhibits something intrinsically numerical. Or that there are no non-missing values." (Cox, 2009) from: http://www.stata.com/statalist/archi.../msg00125.html


                          Philip, that is exactly the problem that I am try to solve, and I have been reading about it but I did not find anything. That is why I am using Statelist in order to get a solution about it or some suggestion about how I could deal with this. I know that PISA provide the syntaxis for SAS and SPSS, but I am not familiarized with those software, If you have any idea about how could I solve this problem, please feel free, any comment or suggestion are more than welcome. I do not know about work-around, but I am going to read about it right now. Could you explain it? or at least the idea behind that. (I really would appreciate it and I think Statalist users too)


                          I also read that there are some softwares such as MLwiN -http://www.bristol.ac.uk/cmm/software/mlwin/- , ConQuest - used by Wu- and RealCom-impute-http://missingdata.lshtm.ac.uk/index...=50&Itemid=102- to export to them directly from Stata. But, I am afraid that we are moving to a different topic.



                          Thank you so much for all the comments
                          Last edited by Vincent Puchades; 28 Aug 2015, 12:09.

                          Comment


                          • #14
                            Try the following. You don't mave the m0 variable, as Phil says, so let's create one!

                            1. use your data set

                            2. generate new variable : generate pv0math = . // missing for all cases

                            3. mi import wide, imputed(pv0math = pv1math pv2math pv3math pv4math pv5math) clear

                            As already explained, I'm no mi expert, so this proposal is simply a guess derived from re-reading help files. It assumes that no other variables contain imputations. You might have do something about 'passive' variables but, if the suggestion works, you should be able to figure that out. Also, ensure you do all the various checks that help files and Manuals recommend

                            [Sorry, iPad use at present and can't see how to use advanced editor in this environment]

                            Comment


                            • #15
                              Stephen, yes, I thought about having a variable full of missings; but doubted if that would be legal as far as what is going on 'under the hood' with mi; but perhaps it is ok. However, it is interesting/amusing to think about the (implied) rationale for imputing all values of a variable that was completely missing in the first place... In any event, PISA methodology is somewhat unusual, and in some respects of rather doubtful validity; e.g. Kreiner, Svend and Christensen, Karl Bang (2014). Analysis of Model Fit and Robustness...Psychometrika 79 (2), 210-231.

                              Comment

                              Working...
                              X