Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Proportions Multiple Groups Statistical Test

    Greetings Statalisters. I'm trying to find an appropriate method to test my hypothesis that regulations/policies aimed at professionalizing an occupation increases the proportion of job ads for that occupation that included certain key words as defined by the professional association for the occupation. My data will include ~30 indicator variables for those key words in which the variables will have values 0 = key word not included or 1 = key word included. The unit of observation is job ad and job ads are included from 4 different states. Two states began regulating the occupation more than 10 years ago and the other 2 began regulating within the last 18 months. The theory is that the states with longer term regulation will have a higher proportion of job ads with the key words.

    I'm thinking prtest may be appropriate. Looking for input on this or more appropriate methods.

    Thanks,

    Tammie

  • #2
    A reasonable place to start is to create an index for each job that is the sum of "yes, keyword is present" in each ad, and compare the means between the "old" vs. "new" regulation states, with a confidence interval or an hypothesis test. Do you have some reason not to like something like that? That method of analysis would ignore any possible differences across keyword in the effect of the old/new difference, of course. In the worst case, some of the keywords might have an effect in one direction and half in another, and analyzing them together would wash out those effects by mixing them.

    Another possibility, as you suggest, would be to do a proportion confidence interval or test separately on each key word (possibly with an adjustment for the studywise error rate if such traditions prevail in your discipline), but that would ignore that the keywords constitute repeated measures on ads. There are fancier methods to manage this complication, but they'd likely require some kind of logistic regression model.

    Perhaps someone else here can suggest some tabular analyses method to address these issues, but I'm not thinking of anything offhand. I would note that sample sizes (ads within states) and the distribution of yes/no for each keyword would affect what statistical methods might be desirable, so you could help by providing that kind of information. If, for example, "yes" or "no" occurs at low frequency for one or more keywords, that could make things hard.

    Here is some simulated data which I would guess resembles what you have, along with demonstrations of the two analyses. This might offer you useful illustrations, and also facilitate suggestions from others of fancier approaches.

    Code:
    // Simulate data like yours
    clear
    set seed 375904
    local nvar = 30
    set obs 4
    gen byte state = _n
    gen byte oldreg = mod(state,2)
    gen int nad = 5 + ceil(runiform() * 10)
    expand nad
    bysort state: gen int IDad = _n
    order state IDad
    forval i = 1/`nvar' {
      // assume expected oldreg effect same for all words
      gen byte keyword`i' = ///
         cond(oldreg ==0, runiform() > 0.8, runiform() > 0.7)
    }
    // end simulating data
    desc
    browse  // Does this conceptually resemble your data?
    //
    egen totyes = rowtotal(key*)
    label var totyes "Number of keywords present"
    ttest totyes, by(oldreg)
    foreach kw of varlist keyword* {
       tab `kw' oldreg, col chi2 exact //  I like this data display better
       prtest `kw', by(oldreg)
    }



    Comment

    Working...
    X