Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using tab with missing values

    Hi,

    I'm currently analysing experimental data that I collected on Qualtrics. I'm using Stata v18 on Mac. I want to run a chi square test to compare scores in condition 1 and condition 2, but the way the data is exported from Qualtrics means that I have missing values in both conditions (I have a between subject design: a missing value in condition 1 means that that participants was allocated to another condition). I've tried this command: tab cond1 cond2, missing chi2 so that Stata treats missing values as other values (otherwise I get an error message that there are missing values). I cannot replace the missing values with other values such as the mean.

    I'm a bit confused by the output:

    tab c1_sc3_qa c2_sc3_qa, missing chi2

    Would you | Would you be upset?
    be upset? | 0 1 . | Total
    -----------+---------------------------------+----------
    0 | 0 0 40 | 40
    1 | 0 0 7 | 7
    . | 29 21 1 | 51
    -----------+---------------------------------+----------
    Total | 29 21 48 | 98

    Pearson chi2(4) = 94.0768 Pr = 0.000


    Am I right in thinking that Stata cannot include the missing values in the chi square test formula, i.e. this is the correct result? Alternatively, are there other options to do run chi square tests, and / or to deal with the missing values?

    Thanks in advance!


  • #2
    tabulate is indeed following your instructions and including missing values as if they were a valid category, as can be seen in various ways:

    by your having 4 df, not 1 df, as would apply if you had a 2 x 2 table;

    by the fact that chi-square on 0 0 \ 0 0 would not be instructive;

    and by an independent check

    Code:
    . tabi 0 0 40 \ 0 0 7 \ 29 21 1 , chi2
    
               |               col
           row |         1          2          3 |     Total
    -----------+---------------------------------+----------
             1 |         0          0         40 |        40 
             2 |         0          0          7 |         7 
             3 |        29         21          1 |        51 
    -----------+---------------------------------+----------
         Total |        29         21         48 |        98 
    
              Pearson chi2(4) =  94.0768   Pr = 0.000
    That said, I don't know anything about Qualtrics and don't follow what makes sense for your set-up.

    Comment


    • #3
      This organization of the data is not compatible with the analysis you wish to do. It has separate outcome variables for the two conditions, and each participant has a missing value on one of the two outcome variables. You need to get the outcomes for both groups into one variable and, if it does not already exist, create a variable indicating which condition the participant was assigned to. Then you can use -tab- not to cross-tabulate the response in group 1 with the response in group 2, but the response with the group. Something like this:
      Code:
      assert missing(c1_sc3_qa, c2_sc3_qa)
      gen score = min(c1_sc3_qa, c2_sc3_qa)
      gen group = cond(missing(c1_sc3_ga), 2, 1)
      tab score group, col chi2
      Note: Because no example data was provided, this code is untested. It may contain typographical or other errors.

      In the future, when asking for help with code, please provide suitable example data, and use the -dataex- command to do so. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2. Unlike Nick, I have worked with Qualtrics data a few times in the past. I found it to be awful and now insist that my collaborators not use it for data collection.
      Last edited by Clyde Schechter; 11 Jul 2023, 11:17.

      Comment


      • #4
        Thank you, Nick for the quick reply!
        Last edited by Reka Blazsek; 11 Jul 2023, 11:21.

        Comment


        • #5
          Clyde: thank you so much! I'll try this right now. Apologies for not using dataex, I'll definitely do so next time.
          Regarding Qualtrics: unfortunately my hands are tied and for now I have to use the tools my institution is subscribed to. But for future reference, what other platform would you recommend?

          Comment


          • #6
            I like REDCAP best.

            Comment

            Working...
            X