2017-05-18 21:03:28
实际观测到的属性交叉频数
| 实际 | C1 | C2 | … | Cn | 合计 |
|---|---|---|---|---|---|
| R1 | n11 | n12 | … | n1n | \(S_{R_1} = \\ \sum_{j=1}^{n}n_{1j}\) |
| R2 | n21 | n22 | … | n2n | \(S_{R_2} = \\ \sum_{j=1}^{n}n_{2j}\) |
| … | … | … | … | … | … |
| Rm | nm1 | nm2 | … | nmn | \(S_{R_m} = \\ \sum_{j=1}^{n}n_{mj}\) |
| 合计 | \(S_{C_1} = \\ \sum_{i=1}^{m}n_{i1}\) | \(S_{C_2} = \\ \sum_{i=1}^{m}n_{i2}\) | … | \(S_{C_n} = \\ \sum_{i=1}^{m}n_{in}\) | \(S = \\ \sum_{i=1}^{m}\sum_{j=1}^{n}n_{ij}\) |
行列之和相乘,除以总数,为该交叉点的期望频数
| 理论 | C1 | C2 | … | Cn | 合计 |
|---|---|---|---|---|---|
| R1 | \(\frac{S_{R_1}S_{C_1}}{S}\) | \(\frac{S_{R_1}S_{C_2}}{S}\) | … | \(\frac{S_{R_1}S_{C_n}}{S}\) | \(\frac{S_{R_1}\sum_{j=1}^{n}S_{Cj}}{S} \\= S_{R_1}\) |
| R2 | \(\frac{S_{R_2}S_{C_1}}{S}\) | \(\frac{S_{R_2}S_{C_2}}{S}\) | … | \(\frac{S_{R_2}S_{C_n}}{S}\) | \(\frac{S_{R_2}\sum_{j=1}^{n}S_{Cj}}{S} \\= S_{R_2}\) |
| … | … | … | … | … | … |
| Rm | \(\frac{S_{R_m}S_{C_1}}{S}\) | \(\frac{S_{R_m}S_{C_2}}{S}\) | … | \(\frac{S_{R_m}S_{C_n}}{S}\) | \(\frac{S_{R_m}\sum_{j=1}^{n}S_{Cj}}{S} \\= S_{R_m}\) |
| 合计 | \(\frac{S_{C_1}\sum_{i=1}^{m}S_{Ri}}{S} \\= S_{C_1}\) | \(\frac{S_{C_2}\sum_{i=1}^{m}S_{Ri}}{S} \\= S_{C_2}\) | … | \(\frac{S_{C_n}\sum_{i=1}^{m}S_{Rn}}{S} \\= S_{C_n}\) | S |
MASS::eagles数据集,不同体型的秃鹰偷食的成功率有无不同?
| 实际频数 | yes | no | total |
|---|---|---|---|
| L | 83 | 17 | 100 |
| S | 17 | 43 | 60 |
| total | 100 | 60 | 160 |
| 理论频数 | yes | no | total |
|---|---|---|---|
| L | 62.5 | 37.5 | 100 |
| S | 37.5 | 22.5 | 60 |
| total | 100 | 60 | 160 |
chisq.test
> library(MASS); library(data.table)
> d <- dcast(data.table(eagles), P~., sum, value.var=c("y","n"))
> d$n <- d$n - d$y; d
P y n
1 L 83 17
2 S 17 43
> chisq.test(d[,.(y, n)])
Pearson's Chi-squared test with Yates' continuity correction
data: d[, .(y, n)]
X-squared = 45.511, df = 1, p-value = 1.518e-11
prop.trend.test / coin::lbl_test
> library(MASS); library(data.table) > Insurance <- dcast(Insurance, Group~Age, + sum, value.var="Holders") > Insurance Group <25 25-29 30-35 >35 1 <1l 337 509 526 3575 2 1-1.5l 517 1085 1477 8384 3 1.5-2l 241 578 765 3786 4 >2l 43 164 239 1133
> library(coin) > lbl_test(xtabs(value~Group+variable, + melt(Insurance, id="Group"))) Asymptotic Linear-by-Linear Association Test data: variable (ordered) by Group (<1l < 1-1.5l < 1.5-2l < >2l) Z = 2.0104, p-value = 0.04438 alternative hypothesis: two.sided
table、ftable、xtabs命令生成> library(vcd)
> str(with(Arthritis,
+ table(Improved, Treatment)))
'table' int [1:3, 1:2] 29 7 7 13 7 21
- attr(*, "dimnames")=List of 2
..$ Improved : chr [1:3]
"None" "Some" "Marked"
..$ Treatment: chr [1:2]
"Placebo" "Treated"
> library(reshape2)
> str(dcast(Arthritis, Improved~Treatment))
'data.frame': 3 obs. of 3 variables:
$ Improved: Ord.factor w/ 3 levels
"None"<"Some"<..: 1 2 3
$ Placebo : int 29 7 7
$ Treated : int 13 7 21> library(vcd)
> dat <- apply(PreSex, 2:1, sum)
> dat
MaritalStatus
ExtramaritalSex Divorced Married
Yes 98 (a) 23 (b)
No 396 (c) 519 (d)
离婚组和未离婚组的婚外性行为暴露率分别为:
> dat[1,]/rowSums(dat) Divorced Married 0.19838057 0.04243542
OR(Odds ratio)值= \(\frac{ad}{bc}\) = (98*519)/(396*23) = 5.584
> library(epiR)
> epi.2by2(dat, method="case.control")
Outcome+ Outcome- Total Prevalence* Odds
Exposed+ 98 23 121 81.0 4.261
Exposed- 396 519 915 43.3 0.763
Total 494 542 1036 47.7 0.911
Point estimates and 95 % CIs:
------------------------------------------------------
Odds ratio (W) 5.58 (3.48, 8.96)
Attrib prevalence * 37.71 (30.02, 45.41)
Attrib prevalence in population * 4.40 (-0.02, 8.83)
Attrib fraction (est) in exposed (%)
82.06 (70.91, 89.34)
Attrib fraction (est) in population (%)
16.29 (12.23, 20.15)
------------------------------------------------------
X2 test statistic: 60.929 p-value: < 0.001
Wald confidence limits
* Outcomes per 100 population units > dat <- as.table(matrix(c(670,202,74,640),
+ nrow = 2, byrow = TRUE))
> colnames(dat) <- c("Dis+","Dis-")
> rownames(dat) <- c("Test+","Test-")
> addmargins(dat)
Dis+ Dis- Sum
Test+ 670 202 872
Test- 74 640 714
Sum 744 842 1586
> epi.tests(dat, conf.level = 0.95)
Outcome + Outcome - Total Test + 670 202 872 Test - 74 640 714 Total 744 842 1586 Point estimates and 95 % CIs: --------------------------------------------- Apparent prevalence 0.55 (0.52, 0.57) True prevalence 0.47 (0.44, 0.49) Sensitivity 0.90 (0.88, 0.92) Specificity 0.76 (0.73, 0.79) Positive predictive value 0.77 (0.74, 0.80) Negative predictive value 0.90 (0.87, 0.92) Positive likelihood ratio 3.75 (3.32, 4.24) Negative likelihood ratio 0.13 (0.11, 0.16) ---------------------------------------------
> Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9,
+ 6,7,12,11), 4, 4, dimnames = list(
+ income = c("< 15k", "15-25k", "25-40k", "> 40k"),
+ satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS")))
> Job
satisfaction
income VeryD LittleD ModerateS VeryS
< 15k 1 3 10 6
15-25k 2 3 10 7
25-40k 1 6 14 12
> 40k 0 1 9 11
> chisq.test(Job)$expected
satisfaction
income VeryD LittleD ModerateS VeryS
< 15k 0.8333333 2.708333 8.958333 7.500
15-25k 0.9166667 2.979167 9.854167 8.250
25-40k 1.3750000 4.468750 14.781250 12.375
> 40k 0.8750000 2.843750 9.406250 7.875
> chisq.test(Job)
Pearson's Chi-squared test
data: Job
X-squared = 5.9655, df = 9, p-value = 0.7434
> fisher.test(Job)
Fisher's Exact Test for Count Data
data: Job
p-value = 0.7827
alternative hypothesis: two.sidedAlan Agresti (1990) 总统表现调查: 时隔一个月后,公民对总统的表现是否认可?
> Performance <- matrix(
+ c(794, 86, 150, 570),
+ nrow = 2, dimnames = list(
+ "1st Survey" = c("Approve", "Disapprove"),
+ "2nd Survey" = c("Approve", "Disapprove")))
> addmargins(Performance)
2nd Survey
1st Survey Approve Disapprove Sum
Approve 794 150 944
Disapprove 86 570 656
Sum 880 720 1600
同一批人两次采样,显然不是独立样本。匹配样本定性数据应用McNemar检验。
> mcnemar.test(Performance)
McNemar's Chi-squared test with continuity correction
data: Performance
McNemar's chi-squared = 16.818, df = 1, p-value = 4.115e-05
> dat0 <- apply(UCBAdmissions, 1:2, sum)
> addmargins(dat0)
Gender
Admit Male Female Sum
Admitted 1198 557 1755
Rejected 1493 1278 2771
Sum 2691 1835 4526
> UCBAdmissions , , Dept = A Gender Admit Male Female Admitted 512 89 Rejected 313 19 , , Dept = B Gender Admit Male Female Admitted 353 17 Rejected 207 8
, , Dept = C Gender Admit Male Female Admitted 120 202 Rejected 205 391 , , Dept = D Gender Admit Male Female Admitted 138 131 Rejected 279 244 , , Dept = E Gender Admit Male Female Admitted 53 94 Rejected 138 299 , , Dept = F Gender Admit Male Female Admitted 22 24 Rejected 351 317
> chisq.test(dat0)
Pearson's Chi-squared test with Yates' continuity correction
data: dat0
X-squared = 91.61, df = 1, p-value < 2.2e-16
> o <- sapply(1:6, function(i) {
+ unlist(chisq.test(UCBAdmissions[,,i])[c("statistic", "p.value")])})
> colnames(o) <- dimnames(UCBAdmissions)$Dept
> format(o, scientific = FALSE, digits=2)
A B C D E F
statistic.X-squared 16.371774 0.085098 0.633224 0.221594 0.808048 0.218243
p.value 0.000052 0.770504 0.426175 0.637828 0.368698 0.640382
stat::mantelhaen.test> mantelhaen.test(UCBAdmissions)
Mantel-Haenszel chi-squared test with continuity correction
data: UCBAdmissions
Mantel-Haenszel X-squared = 1.4269, df = 1, p-value = 0.2323
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.7719074 1.0603298
sample estimates:
common odds ratio
0.9046968
epiR::epi.mh综合计算OR值> dat <- reshape2::melt(addmargins(UCBAdmissions))
> dat <- reshape2::dcast(dat[dat$Dept != "Sum" & dat$Gender != "Sum",], Dept~Admit+Gender); dat
Dept Admitted_Male Admitted_Female Rejected_Male Rejected_Female Sum_Male Sum_Female
1 A 512 89 313 19 825 108
2 B 353 17 207 8 560 25
3 C 120 202 205 391 325 593
4 D 138 131 279 244 417 375
5 E 53 94 138 299 191 393
6 F 22 24 351 317 373 341
> with(dat, epiR::epi.mh(ev.trt=Admitted_Male, n.trt=Sum_Male, ev.ctrl=Admitted_Female,
+ n.ctrl=Sum_Female, names=Dept))
$OR
est lower upper
1 0.3492120 0.2086756 0.5843954
2 0.8025007 0.3403815 1.8920166
3 1.1330596 0.8545328 1.5023696
4 0.9212838 0.6863345 1.2366620
5 1.2216312 0.8250748 1.8087848
6 0.8278727 0.4552059 1.5056335
$OR.summary
est lower upper
1 0.9046968 0.6447247 1.086692
Thank you!