R 연속확률분포 F-분포(F-distribution) : df(), pf(), qf(), rf()

R 분석과 프로그래밍/R 통계분석 2015. 9. 25. 01:14

연속형 확률분포 (Continuous probability distribution)에는

- 정규분포 (normal distribution)

: norm()

- 균등분포 (uniform distribution)

: unif()

- 지수분포 (exponential distribution)

: exp()

- t-분포 (t-distribution)

: t()

- F-분포 (F-distribution)

: f()

- 카이제곱분포(chisq-distribution)

: chisq()

등이 있습니다.

표본분포를 나타낼 때 t-분포, F-분포, 카이제곱 분포 등을 사용하는데요, 이번 포스팅에서는 이중에서 F-분포 (F-distribution)에 대해서 소개하도록 하겠습니다.

F-분포는 F-검정 (F-test)과 두 집단 이상의 분산이 같은지 여부를 비교하는 분산분석(ANOVA, Analysis of Variance)에 사용되며, (카이제곱 분포처럼) 분산을 제곱한 값만을 사용하므로 양(+)의 값만을 가지고 되고, 왼쪽으로 치우치고 오른쪽으로 꼬리가 긴 비대칭형 형태를 띠고 있습니다. (카이제곱 분포와 모양이 유사함)

R에서의 F 분포 함수 및 parameter는 아래와 같습니다.

함수 구분		함수 / parameter
함수 구분		f()
밀도함수 (density function)	d	df(df1, df2)
누적분포함수 (cumulative distribution function)	p	pf(df1, df2, lower.tail=TRUE/FALSE)
분위수 함수 (quantile function)	q	qf(df1, df2, lower.tail=TRUE/FALSE)
난수 발생 (random number generation)	r	rf(n, df1, df2)

(1) F 분포 그래프 (F-distribution plot) : stat_function(fun=df, args=list(df1=xx, df2=xx)

> library(ggplot2)

> ggplot(data.frame(x=c(0,5)), aes(x=x)) +
+   stat_function(fun=df, args=list(df1=5, df2=10), colour="blue", size=1) +
+   stat_function(fun=df, args=list(df1=10, df2=30), colour="red", size=2) +
+   stat_function(fun=df, args=list(df1=50, df2=100), colour="yellow", size=3) +
+   annotate("segment", x=3, xend=3.5, y=1.4, yend=1.4, colour="blue", size=1) +
+   annotate("segment", x=3, xend=3.5, y=1.2, yend=1.2, colour="red", size=2) + 
+   annotate("segment", x=3, xend=3.5, y=1.0, yend=1.0, colour="yellow", size=3) + 
+   annotate("text", x=4.3, y=1.4, label="F(df1=5, df2=10)") +
+   annotate("text", x=4.3, y=1.2, label="F(df1=10, df2=30)") + 
+   annotate("text", x=4.3, y=1.0, label="F(df1=50, df2=100)") +
+   ggtitle("F Distribution")

(2) 누적 F확률분포 그래프 (Cumulative F-distribution plot)
: stat_function(fun=pf, args=list(df1=xx, df2=xx)

> # (2) 누적 F분포 그래프 (Cumulative F-distribution plot) : fun=pf
> ggplot(data.frame(x=c(0,5)), aes(x=x)) +
+   stat_function(fun=pf, args=list(df1=5, df2=10), colour="blue", size=1) +
+   ggtitle("Cumulative F-distribution : F(df1=5, df2=10)")

(3) 누적 F분포 확률 값 계산(F-distribution cumulative probability)
: pf(q, df1, df2, lower.tail=TRUE)

> # (3) 누적 F분포 확률 값 계산 : pf(q, df1, df2, lower.tail=TRUE/FALSE)
> pf(q=2, df1=5, df2=10, lower.tail = TRUE)
[1] 0.835805

(4) F분포 분위수 함수 값 계산 (F-distribution quantile value) : qf(p, df1, df2, lower.tail=TRUE)

> # (4) F분포 분위수 함수 값 계산 : qf(p, df1, df2, lower.tail=TRUE/FALSE)
> qf(p=0.835805, df1=5, df2=10, lower.tail = TRUE)
[1] 2

F분포 분위수 함수 값은 (3)번의 F분포 누적 확률분포 값과는 역의 관계에 있습니다.

(5) F분포 난수 발생 (random number generation) : rf(n, df1, df2)

> rf <- rf(100, df1=5, df2=10)

> rf
  [1] 0.62010969 3.88443539 0.49659061 0.80159363 0.17508238 0.27307005 0.26126223 0.73465551
  [9] 0.42973215 0.95448854 0.33259309 0.80825296 1.18304459 0.28760500 0.81315967 1.44000014
 [17] 0.83041456 0.76020586 2.40585264 1.66243746 0.71626772 4.18541745 1.81731056 0.32127326
 [25] 0.59385259 1.48154031 1.28662657 0.51072359 2.95967809 1.48123950 0.85420041 0.29173710
 [33] 1.55212508 0.82884045 0.38375658 1.75641755 2.39041073 0.86752014 0.87916583 0.54687123
 [41] 0.95937758 1.08620415 2.56198072 3.33297651 0.35509530 1.33342398 0.81011460 0.46871978
 [49] 0.67320976 1.82654953 1.57868396 0.51026464 1.81435701 1.59062772 1.79081953 0.26546396
 [57] 0.70741755 0.56732326 2.03751027 0.81498081 1.27629405 0.65799392 1.82927296 0.45875179
 [65] 3.62942043 0.57416234 0.21089950 1.30947117 1.08285304 0.90621979 1.14831429 0.31727995
 [73] 0.11544574 1.53316697 2.16508104 1.15088174 2.26276896 1.50965228 6.53052329 1.96294253
 [81] 0.78687566 0.41327969 0.52918777 0.96117638 0.44835433 2.34021493 3.89895973 0.84022777
 [89] 1.49158411 1.75840630 0.05668181 0.82050282 1.14190710 0.46880962 0.70338470 1.09613826
 [97] 3.95759814 2.81014843 2.08048954 0.30376581
>

> hist(rf, breaks=20)

난수 발생은 매번 할 때마다 바뀌므로 위의 결과와는 다르게 보일 것입니다만, 왼쪽으로 치우쳐있고 오른쪽으로 공룡꼬리처럼 가늘고 길게 늘어선 비대칭 모양이 나타날 것입니다.

많은 도움이 되었기를 바랍니다.

아래의 '공감' 꾸~욱 눌러주시면 글 쓰는데 큰 힘이 됩니다. ^^ (로그인 필요 없어요)

728x90

저작자표시 비영리 변경금지

'R 분석과 프로그래밍 > R 통계분석' 카테고리의 다른 글

R 신뢰구간(confidence interval), 신뢰계수(confidence coefficient), 유의수준(significance level) (0)	2015.09.30
R 카이제곱 분포 (chi-squared distribution) : chisq() (2)	2015.09.30
R t-분포 (Student’s t-distribution) : t() (0)	2015.09.24
R 지수분포 (exponential distribution) : exp() (6)	2015.09.22
R 균등분포 (uniform distribution) : unif() (5)	2015.09.19

Posted by Rfriend

R, Python 분석과 프로그래밍의 친구 (by R Friend)

R 연속확률분포 F-분포(F-distribution) : df(), pf(), qf(), rf()

'R 분석과 프로그래밍 > R 통계분석' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바