Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pairwise Comparisons with Two Groups

    HI All,

    I want to assess if there is racial discrimination in job applicants using paired auditors identical on all relevant criteria, differing only in race, of course. However, this kind of study design is very new to me and was recently tasked with doing the analysis from colleagues. I was hoping someone could direct me to the best strategy.

    Below is the data. The first column is an anonymized job ID. Then I have a column for two pairs so there are a total of two black auditors and two white auditors. 0=no discrimination. 1 and 2 = certain types of discrimination the team coded for.

    I know that t tests are a simple way of gauging differences between groups (in my case, black and white). I wondered if there are special considerations to implement since I have pairs within the same group or other types of analysis I should be aware of with this kind of data setup. Thank you in advance.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(job black1 white1 black2 white2)
     1 0 0 1 0
     2 1 0 1 0
     3 0 0 2 0
     4 0 0 0 0
     5 0 0 1 0
     6 0 0 0 0
     7 0 0 0 0
     8 0 0 0 0
     9 0 0 0 0
    10 0 0 0 0
    11 0 0 0 0
    12 . . . .
    13 0 0 0 0
    14 0 0 0 0
    15 0 0 0 0
    16 0 0 0 0
    17 0 0 0 0
    18 0 0 0 0
    19 0 0 0 0
    20 0 0 0 0
    21 2 0 0 0
    22 . . . .
    23 0 0 0 0
    24 0 0 0 0
    25 1 0 0 0
    26 0 0 0 0
    27 0 0 0 0
    28 0 0 0 0
    29 0 0 0 0
    30 0 0 0 0
    31 0 0 0 0
    32 0 0 0 0
    end

  • #2
    This data design is too complex for a simple t-test. Indeed, I think it is more complex than your description implies. I do not see this as paired-data. It is either a set of quadruplets, or a set of pairs of pairs.

    So the first key question here is: are all four auditors identical on everything except race? If so, you have quadruplets, not pairs. Or is it just that black1 and white1 are identical to each other, and black2 and white2 are identical to each other, but the latter two may differ from black1 and white1? In this case you have pairs of pairs. Also very important to know: are black1, white1, black2 and white2 the same four people for all the jobs (id)?

    Another key question: what is this 0/1/2 discrimination variable? Are 1 and 2 representing different qualitative categories of discrimination? Or is this an intensity measure: 2 represents more discrimination than 1? Is it a numeric interval variable: is the difference in amount of discrimination between 2 and 1 the same as between 1 and 0?

    The types of analysis that could be appropriate for this data depends on the answers to these questions.

    Comment


    • #3
      Clyde, thank you so much for your response and helping me navigate this.

      All auditors are identical except for race. The team wanted to do two sets of applicants to be more confident about the potential effect of racial discrimination. So black1 and black2 are identical and white1 and white2 are identical. To your second point, yes, it is the same four people.

      Regarding the scale, the difference is qualitatively different. It wasn't what they team was interested or expected necessarily but recorded nonetheless. The 1/0 difference is the primary outcome of interest. I hope this helps.

      Comment


      • #4
        OK, so first you need to -reshape- your data into long layout, so you have four observations per job, one corresponding to each auditor, and then create a separate variable indicating the race of each auditor. Then, recode the discrimination rating to just 0 vs 1. Then you are set up to do a conditional logistic regression to estimate the association (odds ratio) between race and a finding of discrimination. The code would go like this:

        Code:
        rename (black1 white1 black2 white2) disc=
        reshape long disc, i(job) j(auditor) string
        encode auditor, gen(rater)
        gen race = substr(auditor, 1, 5)
        encode race, gen(_race)
        drop race
        rename _race race
        
        recode disc (2 = 1)
        
        xtset job
        xtlogit disc ib2.race, fe
        Now, I should warn you that this code fails in your example data. There are a couple of problems. One of them is that the white auditors never reported discrimination. And the other is that for nearly all of the jobs, nobody reported discrimination. In matched data like this (matched quadruplets in this case), when the outcome is the same for all observations in the quadruplet, that observation is uninformative about factors affecting the outcome variable, and so is dropped from the estimation sample. Similarly, because white race is always associated with a zero outcome in the example data, the race variable gets omitted from the model, and all of the observations for white auditors are omitted from the estimation sample because of this "perfect prediction." The data set you are left with is then just a handful of observations from the black auditors and the logistic regression calculation ultimately fails to converge. Hopefully, in your real data set, there is enough variation in the outcome in observations by both races that the analysis can give you a usable result.

        If the example is actually your entire data set, then you could perhaps make it work using a linear probability model--which would be the same code except substituting -xtreg- for -xtlogit-. But it's not really satisfactory because in fact most of the observed outcome values are 0, which makes the linear probability model a poor choice because most of the model's predicted probabilities will be less than zero. And I think it would be better to just report summary statistics on the number of auditors who reported discrimination for each job, or something simple like that.

        Comment


        • #5
          Thank you again, Clyde. Yes, this is the real data. Wouldn't something like a t-test be a sufficient way to analyze data like this? I'm more familiar with doing the kind of models you've described. In this case, however, since we control for everything but race and there is a lower number of cases, I thought some other strategy like t-tests would be OK. Curious your thoughts

          Comment


          • #6
            If you had true matched pairs instead of quadruplets, you could use a paired t-test. But you don't. The closest analog to a paired ttest that would be a valid analysis for these quadruplets is the analysis proposed in #4, but using -xtreg- instead of -xtlogit-. However, as I indicated, this is a bit dicey because that model mostly predicts negative probabilities.

            If you were to try to do it by breaking up the quadruplets into pairs, white1-black1 and white2-black2, and then do a paired t--test you would be neglecting the dependence between the observations on the pairs, so your standard errors,confidence intervals, t-statistics, and p-values would be wrong. That is something you might be able to partially ameliorate by using the -vce(cluster job)- option in the -xtreg- command. But again, the paired t-test approach is really equivalent (when using paired data) to a linear probability model here, and you still have the problem of so many zeroes leading to negative predicted probabilities (even though in the t-test itself you wouldn't look at those--but that's just hiding the problem, not solving it.)

            Comment

            Working...
            X