Hi, which test should I use in STATA for a correlation between a binary variable (0=no and 1=yes) and a nominal variable (eg. city, with 7 categories)
-
Login or Register
- Log in with
. use "C:\Program Files\Stata17\ado\base\a\auto.dta" (1978 automobile data) . ktau foreign rep78, stats(taua taub obs p) Number of obs = 69 Kendall's tau-a = 0.3095 Kendall's tau-b = 0.5589 Kendall's score = 726 SE of score = 145.056 (corrected for ties) Test of H0: foreign and rep78 are independent Prob > |z| = 0.0000 (continuity corrected) . logit foreign i.rep78 note: 1.rep78 != 0 predicts failure perfectly; 1.rep78 omitted and 2 obs not used. note: 2.rep78 != 0 predicts failure perfectly; 2.rep78 omitted and 8 obs not used. note: 5.rep78 omitted because of collinearity. Iteration 0: log likelihood = -38.411464 Iteration 1: log likelihood = -27.676628 Iteration 2: log likelihood = -27.446054 Iteration 3: log likelihood = -27.444671 Iteration 4: log likelihood = -27.444671 Logistic regression Number of obs = 59 LR chi2(2) = 21.93 Prob > chi2 = 0.0000 Log likelihood = -27.444671 Pseudo R2 = 0.2855 ------------------------------------------------------------------------------ foreign | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- rep78 | 1 | 0 (empty) 2 | 0 (empty) 3 | -3.701302 .9906975 -3.74 0.000 -5.643033 -1.759571 4 | -1.504077 .9128709 -1.65 0.099 -3.293271 .2851168 5 | 0 (omitted) | _cons | 1.504077 .781736 1.92 0.054 -.0280969 3.036252 ------------------------------------------------------------------------------ .
. sysuse auto (1978 automobile data) . tab foreign rep78, chi2 | Repair record 1978 Car origin | 1 2 3 4 5 | Total -----------+-------------------------------------------------------+---------- Domestic | 2 8 27 9 2 | 48 Foreign | 0 0 3 9 9 | 21 -----------+-------------------------------------------------------+---------- Total | 2 8 30 18 11 | 69 Pearson chi2(4) = 27.2640 Pr = 0.000
clear all program define sim drop _all set obs 4 gen row = ceil(_n/2) gen col = mod(_n,2) + 1 gen freq = rpoisson(10) tab row col [fw=freq], chi2 exact nofreq end simulate p = r(p) exact=r(p_exact), reps(40000): sim simpplot p exact, overall reps(5000)
Comment