Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assess survey interview completeness

    Dear All,

    this is not a Stata question per se, but I hope the topic is not too far removed from what you are encountering and can still provide a helpful advise.

    Consider the task of assessing the completeness of interviews obtained from some survey. I would like to discuss the following scenarios (1, 2 and 3) as shown :
    1. The first scenario doesn't have any logical conditions or skips, all 4 questions are mandatory and we have answers Q1=x, Q2=y, and no answers to questions Q3 and Q4. I take it is not contested that the completeness of this interview is: AnsweredQ/TotalQ = 2/4 = 0.5 (or 50%). (if this matters, let's assume that the arrows here are not preventing from moving forward meaning that we can omit Q3 and continue to Q4, but the diamonds are, meaning we can't move to any branch before we know what are the values of everything that affects the condition in the diamond).
    2. The second scenario involves branching depending on the value of Q2, where questions Q3 and Q4 are asked only for some situations (we may or may not know how frequently they arise in the population we study, but let's assume we don't). Unfortunately Q2 was left not answered, and hence Q3 and Q4 were not even asked and hence are also not answered. How would you assess the completeness of this interview?
    3. The third scenario involves branching similarly to the second, though we know for sure that whichever direction we take, there will be 2 questions to follow (either Q3 and Q4 on the higher path, or Q5 and Q6 on the lower path). Likewise, I would like to see how this affects the estimation of completeness.
    Thank you for any guidance and recommendations.

    Best, Sergiy Radyakin

    Click image for larger version

Name:	interview_scenarios.png
Views:	1
Size:	180.4 KB
ID:	1677236


  • #2
    This is a tricky question and I'm not sure there's a right answer, per se. One thing to consider is how you're going to use your percentage complete measure and from there consider how different choices about constructing your measure would affect your end outcome.

    I've done this with a survey to get a measure of completeness that we then used to decide which data we would keep as "complete enough" and which respondents we would drop entirely.
    When I did this I used as my denominator the number of questions that I had information to confirm the respondent should have gotten that question. I had situations like scenario 2 and in those cases I did not take Q3 & Q4 into account when calculating completeness because without information from Q2 I did not know whether the respondent should have answered them. So, despite those questions I would still count the respondent as 50% complete (1 question of 2).

    I didn't deal skip patterns like scenario 3, but that strikes me as slightly different because you know that the respondent would have gotten two more questions, even if you didn't know what they were. In that case I might count the respondent in the third scenario as having answered 1 out of 4 possible questions. I think, though, that you could make an argument for still counting it as 1 of 2, depending on what your exact goals are. In that case I would think of the denominator as being the questions that the respondent actually saw (assuming this was a programmed survey and not paper with complicated skip patterns that respondents had to navigate on their own).

    I'm not sure that's much help, beyond affirming that it is indeed a difficult question to answer.

    Comment


    • #3
      Hello Sarah,

      thank you very much for sharing your advice. Indeed the situation is not trivial and the case 3 is brought in precisely because I am not satisfied with treating it the same as case 2, given that a some knowledge about the survey questionnaire's structure is available and can be used to improve the estimate.

      I agree that the completeness here should not be taken as absolute, but in a specific context. But the uses are many, and the figure will be looked at by both the respondents and the assessors. From the respondents' view, it answers the question "How close I am to finish?" from the assessor's "Is it worth rejecting this interview to acquire the missing information?". In the end the completeness of the information is used to "update the `Carma` score" associated with each respondent over multiple iterations of interviewing.

      The practicality is that the interview contains 100s of branching points, loops, and other conditions which make such assessments rather tedious, and the naïve approach of looking only at the 'immediately available' questions very attractive as a low-hanging fruit. On the other hand, it results in a "you are always 99% complete" when you have a survey with many questions which open up sequentially (require answers to all previous ones). That is incredibly irritating for the respondents and frankly speaking useless for the assessors.

      that it is indeed a difficult question to answer
      Indeed, and that's why I am really interested in getting inputs from the experts.

      Best, Sergiy

      Comment


      • #4
        Sergiy, is it critical to have just a single number? It should be relatively easy to provide a minimum and a maximum instead, and I think those numbers would be meaningful to both respondents and supervisors.

        Comment


        • #5
          Hello Hemanshu Kumar , it doesn't have to be, but before this turns into two or more separate indicators, I have to justify why I can't present a single indicator to all of the users. Thank you, Sergiy

          Comment


          • #6
            I think wanting to do this as the respondent goes along makes it even more difficult. I was thinking you were looking for an end measure of completeness based on missing data after the survey had been administered.

            I think there are a few ways to think about it. Unfortunately I can't come up with a strategy that does not result in having to map out how many questions result from various paths through the survey, which, as you note, is bound to be tedious.
            1. Determine the maximum number of questions a respondent could encounter and always use that as the denominator.
            2. Determine the minimum number of questions a respondent could encounter and always use that as the denominator.
            If there are a lot of questions and the range of the min and max possible questions based on skip patterns isn't too large then probably for most cases you could go with either figure and it would be fine (or you could determine what to use based on whether you prefer to over- or under-state completeness).

            If the skip patterns mean that there's a wide range of the number of questions a respondent would be asked it's harder to figure out how you would estimate.
            You could take a simple mathematical average of the number of questions, but depending on how common various paths through the survey are that might not be the number the average respondent is asked. If you have a history of survey responses to look at you could potentially use the mean of actual questions asked (only easily done if your survey software produces different codes for non-response and skips due to survey design, which hopefully it does).
            With enough survey history you could even get fancy and create a predictive model of how many questions a respondent might get based on their early responses, but in most cases that's probably going to be overkill.

            All that said, I imagine this is an area where there is actually some expertise in the survey administration community. This group has a lot of expertise in analyzing survey data and I'm sure there are some experts in collecting survey data, but I do wonder if there are other forums where you would find more experts in the details of programming and administering surveys.

            Comment

            Working...
            X