Structure of survey question IDs

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#1

Structure of survey question IDs

10 Aug 2018, 09:03

Hello Statalisters,

This isn't a question about the use of Stata per se, and there are many experienced people here who may be able to help. We sometimes organize multi-round surveys to explore consensus about statements about what is relevant for study for a specific disease in a process similar to a RAND, Delphi or consensus method.

I would like to use a scheme that gives me a unique identification label that I can add to the survey question/statement, for programmatic manipulation and analysis later.

The ID must be able to:
identify each unique question and sub-question (e.g., 1a, 1b, 1c)

identify the survey round iteration

if the content of a question has been split into two or more new questions, the ability to easily identify the originating question that it came from in the previous round

if the content of a sub-question has been split into two or more new sub-questions, the ability to easily identify the originating question that it came from in the previous round

My initial thought is to place the ID text in the question (to facilitate the use of our survey design program) and then later parse the ID into variables. The scheme could look like:
R<Round #>.<Item #>.<Sub-item letter> , but this fails to satisfy the tracking of question history.

I have no experience in designing hierarchical schemes like this, so I am reaching out to you all.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

15 Aug 2018, 14:39

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

Personally, I am inclined to use numerical ids. I'd separate the subgroups by multiplying the values so something like:
g id= round * 1000000 + qu *10000 + item * 100

or something. Alternatively, you can do a string identifier as you did. But, if you do this, I'd also keep separate variables for each identifiers - this is often helpful in programming. You might need a macro identifer if you want to set this up as a panel data set.

However, all of this depends on how you're going to analyze the data.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#3

15 Aug 2018, 15:35

Thanks for your idea, Phil. I like that your approach accomplishes the same task. I could see one potential drawback that a typo could be easier to introduce, though it's certainly as easy to program. I also know that the string id would need separate variables to keep track of round, question and item, much as they would be derived from your -gid-. I suppose anything that lends itself into a tree structure works.

The data analyses for my purposes amount to little more than descriptive statistics and distributions, however, the current process is entirely manually curated and it would be ideal to have better version control on how these items are generated, modified, and archived. Occasionally we have had reviewers ask us how specific questions were arrived at and that would create a lot of manual work to go back and figure that history out.
Comment

Announcement

Structure of survey question IDs

Comment

Comment