Svyset command for a two-stage sampling survey dataset - PISA 2012

Mindy Yin

Join Date: Oct 2017

Posts: 6
#1

Svyset command for a two-stage sampling survey dataset - PISA 2012

21 Oct 2017, 19:13

Hi everyone,

I'm working on PISA 2012 data and running into a very similar problem with svyset described as below, posted by Laura back in 2015.
The usual svyset command for PISA that I have found are usually performed is the following using the students' weights "w_fstuwt"

Code:
svyset [pweight=w_fstuwt], brr(w_fstr*) vce(brr) fay(.5) mse
This code is taking into accound the students' weights, but what should I do in order to add a structure including also the school level? Students are nested into schools, which in PISA have also their own weights. In order to syvset the data including the schools I have found the following code:

Code:
svyset schoolid [pw=w_fstuwt], brr(w_fstr*) vce(brr) fay(.5) mse
However, in this last code I wonder why the school weights (w_fschwt) are not integrated.

I have tried the following but it doesn't work for me with my Stata 12.1 version:

Code:
svyset schoolid, weight(w_fschwt) || _n, weight(w_fstuwt)
I get the following error message:
Code:
option weight() not allowed
Of course I would also like to svyset my data including the specificities of PISA (brr(w_fstr*) vce(brr) fay(.5) mse). Reading the svyset help for Stata 12.1 and taking also a look at the examples, I couldn't find anywhere in the syntaxis where to put the schools' weights in the svyset command.

I wonder if you could give me a hint on how to proceed with my code in order to svyset the data in order to include the PISA weighting scheme correctly, including schools and their weights.

M-
Tags: None
Philip Matthews

Join Date: Apr 2014

Posts: 23
#2

22 Oct 2017, 07:54

Not a direct answer to your question (well, in truth that would be "I don't know"); but I suspect that even if there were a way of putting in the weights as you wish, it might lead to errors in analysing PISA data. As you may know, the sampling scheme used in PISA is rather unlike that used in other major surveys — especially as it relates to schools. If you have not already done so, take a look at the PISA Data Analysis Manual: SPSS, Second Edition (available at http://www.oecd-ilibrary.org/education/pisa_19963777 ). Even though it refers to 'SPSS', the text is just as relevant to Stata users. For example, on page 145 there is the following quote:
Although the student’s (sic) samples were drawn from within a sample of schools, the school sample was
designed to optimize the resulting sample of students, rather than to give an optimal sample of schools.
For this reason, it is always preferable to analyse the school-level variables as attributes of students,
rather than as elements in their own right.
The pdf gives examples of the correct, and incorrect, use of student and school weights.
Also, in case you might not know, there is a set of user-written commands for use with PISA data that you can download. In Stata, type findit pisatools
Bets wishes, Philip
1 like
Comment
Mindy Yin

Join Date: Oct 2017

Posts: 6
#3

22 Oct 2017, 20:17

Thank you so much, Philip. I'm so glad to hear from someone who has experience with the PISA data. I actually have two followup questions.
1. What are your suggestions regarding the missing data in PISA? After a listwise deletion by default, the observation number is down from 5000 to 1500 per country. Would you recommend to do multiple imputation? How could the MI command work with PISATOOLS?
2. My model is a two-level negative binomial model and I wonder if you have any ideas about working PISATOOLS with multilevel modeling?

Your time and help are very much appreciated!
M-
Comment
Philip Matthews

Join Date: Apr 2014

Posts: 23
#4

23 Oct 2017, 14:43

As far as missing values are concerned, there is no point in trying to impute values for the missing data in the student scores for the sets of question booklets (assuming that might be your intention). That data is missing by design and the pattern is an integral part of the PISA study, and the method used to produce the sets of plausible values. Trying to impute the raw scores for individual questions would be futile. Dealing with missing data in other variables such as level of parent education, socio-economic status etc. is another matter. Approaches to imputing missing data are outlined in the Stata multiple-imputation manual, and should be relatively straightforward to implement. Also, th following document has information about imputations:
Annex A8 to PISA 2006: Science Competencies for Tomorrow’s World, Vol. 1. (available at http://www.oecd.org/pisa/39730305.pdf )

Stata can certainly be used for multilevel modelling of the PISA data: see
https://www.stata.com/features/overv...h-survey-data/
and this paper:
Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Stistical Society A, 169(4), 805–827.
(It can be obtained atwww.gllamm.org/RSSAsurvey_06.pdf )

However, using Stata to analyse all five plausible values as a set (if that is your wish) will be rather more complicated; but depending on your aims/objectives it may not be necessary — I think that somewhere in the SPSS data manual there is a comment that using single plausible values in multilevel models might be acceptable. Other software e.g. HLM7 (Scientific Software International www.ssicentral.com) has built-in support for using plausible value sets. What little multilevel modelling I have done with PISA data was done with HLM7. I doubt if the pisatools package was designed to be used in multilevel modelling — apologies for confusing the issue.

As far as I can see, some PISA data files contain no information about school-level weights. Indeed I don’t think there is firm advice to be found in the PISA documentation about using weights with school-level measures. (However, there is advice on the importance of using student weights cf. my previous post.) In most published analyses the issue seems to be ignored. Perhaps the best approach would be for you to read some of the published papers on PISA and multilevel models. (I can give you some citations; but it might be best to pursue this off-list by emailing me directly.)

There is a useful basic introduction to analysing TIMSS and PISA data (including multilevel models, weights and imputations) at https://www.scribd.com/document/9411...005-0503UNESCO

Hope the above is of some help, Philip
1 like
Comment

Announcement

Svyset command for a two-stage sampling survey dataset - PISA 2012

Comment

Comment

Comment