'stat_function(fun=df)' 태그의 글 목록

F-분포는 F-검정 (F-test)과 두 집단 이상의 분산이 같은지 여부를 비교하는 분산분석(ANOVA, Analysis of Variance)에 사용되며, (카이제곱 분포처럼) 분산을 제곱한 값만을 사용하므로 양(+)의 값만을 가지고 되고, 왼쪽으로 치우치고 오른쪽으로 꼬리가 긴 비대칭형 형태를 띠고 있습니다. (카이제곱 분포와 모양이 유사함)

R에서의 F 분포 함수 및 parameter는 아래와 같습니다.

함수 구분		함수 / parameter
함수 구분		f()
밀도함수 (density function)	d	df(df1, df2)
누적분포함수 (cumulative distribution function)	p	pf(df1, df2, lower.tail=TRUE/FALSE)
분위수 함수 (quantile function)	q	qf(df1, df2, lower.tail=TRUE/FALSE)
난수 발생 (random number generation)	r	rf(n, df1, df2)

(1) F 분포 그래프 (F-distribution plot) : stat_function(fun=df, args=list(df1=xx, df2=xx)

> library(ggplot2)

> ggplot(data.frame(x=c(0,5)), aes(x=x)) +
+   stat_function(fun=df, args=list(df1=5, df2=10), colour="blue", size=1) +
+   stat_function(fun=df, args=list(df1=10, df2=30), colour="red", size=2) +
+   stat_function(fun=df, args=list(df1=50, df2=100), colour="yellow", size=3) +
+   annotate("segment", x=3, xend=3.5, y=1.4, yend=1.4, colour="blue", size=1) +
+   annotate("segment", x=3, xend=3.5, y=1.2, yend=1.2, colour="red", size=2) + 
+   annotate("segment", x=3, xend=3.5, y=1.0, yend=1.0, colour="yellow", size=3) + 
+   annotate("text", x=4.3, y=1.4, label="F(df1=5, df2=10)") +
+   annotate("text", x=4.3, y=1.2, label="F(df1=10, df2=30)") + 
+   annotate("text", x=4.3, y=1.0, label="F(df1=50, df2=100)") +
+   ggtitle("F Distribution")

(2) 누적 F확률분포 그래프 (Cumulative F-distribution plot)
: stat_function(fun=pf, args=list(df1=xx, df2=xx)

> # (2) 누적 F분포 그래프 (Cumulative F-distribution plot) : fun=pf
> ggplot(data.frame(x=c(0,5)), aes(x=x)) +
+   stat_function(fun=pf, args=list(df1=5, df2=10), colour="blue", size=1) +
+   ggtitle("Cumulative F-distribution : F(df1=5, df2=10)")

(3) 누적 F분포 확률 값 계산(F-distribution cumulative probability)
: pf(q, df1, df2, lower.tail=TRUE)

> # (3) 누적 F분포 확률 값 계산 : pf(q, df1, df2, lower.tail=TRUE/FALSE)
> pf(q=2, df1=5, df2=10, lower.tail = TRUE)
[1] 0.835805

(4) F분포 분위수 함수 값 계산 (F-distribution quantile value) : qf(p, df1, df2, lower.tail=TRUE)

> # (4) F분포 분위수 함수 값 계산 : qf(p, df1, df2, lower.tail=TRUE/FALSE)
> qf(p=0.835805, df1=5, df2=10, lower.tail = TRUE)
[1] 2

F분포 분위수 함수 값은 (3)번의 F분포 누적 확률분포 값과는 역의 관계에 있습니다.

(5) F분포 난수 발생 (random number generation) : rf(n, df1, df2)

> rf <- rf(100, df1=5, df2=10)

> rf
  [1] 0.62010969 3.88443539 0.49659061 0.80159363 0.17508238 0.27307005 0.26126223 0.73465551
  [9] 0.42973215 0.95448854 0.33259309 0.80825296 1.18304459 0.28760500 0.81315967 1.44000014
 [17] 0.83041456 0.76020586 2.40585264 1.66243746 0.71626772 4.18541745 1.81731056 0.32127326
 [25] 0.59385259 1.48154031 1.28662657 0.51072359 2.95967809 1.48123950 0.85420041 0.29173710
 [33] 1.55212508 0.82884045 0.38375658 1.75641755 2.39041073 0.86752014 0.87916583 0.54687123
 [41] 0.95937758 1.08620415 2.56198072 3.33297651 0.35509530 1.33342398 0.81011460 0.46871978
 [49] 0.67320976 1.82654953 1.57868396 0.51026464 1.81435701 1.59062772 1.79081953 0.26546396
 [57] 0.70741755 0.56732326 2.03751027 0.81498081 1.27629405 0.65799392 1.82927296 0.45875179
 [65] 3.62942043 0.57416234 0.21089950 1.30947117 1.08285304 0.90621979 1.14831429 0.31727995
 [73] 0.11544574 1.53316697 2.16508104 1.15088174 2.26276896 1.50965228 6.53052329 1.96294253
 [81] 0.78687566 0.41327969 0.52918777 0.96117638 0.44835433 2.34021493 3.89895973 0.84022777
 [89] 1.49158411 1.75840630 0.05668181 0.82050282 1.14190710 0.46880962 0.70338470 1.09613826
 [97] 3.95759814 2.81014843 2.08048954 0.30376581
>

> hist(rf, breaks=20)

난수 발생은 매번 할 때마다 바뀌므로 위의 결과와는 다르게 보일 것입니다만, 왼쪽으로 치우쳐있고 오른쪽으로 공룡꼬리처럼 가늘고 길게 늘어선 비대칭 모양이 나타날 것입니다.

많은 도움이 되었기를 바랍니다.

아래의 '공감' 꾸~욱 눌러주시면 글 쓰는데 큰 힘이 됩니다. ^^ (로그인 필요 없어요)

728x90

저작자표시 비영리 변경금지

'R 분석과 프로그래밍 > R 통계분석' 카테고리의 다른 글

R 신뢰구간(confidence interval), 신뢰계수(confidence coefficient), 유의수준(significance level) (0)	2015.09.30
R 카이제곱 분포 (chi-squared distribution) : chisq() (2)	2015.09.30
R t-분포 (Student’s t-distribution) : t() (0)	2015.09.24
R 지수분포 (exponential distribution) : exp() (6)	2015.09.22
R 균등분포 (uniform distribution) : unif() (5)	2015.09.19

Posted by Rfriend

R ggplot2 연속확률분포 곡선, stat_function()

R 분석과 프로그래밍/R 그래프_시각화 2015. 9. 7. 23:58

통계에서 빼놓을 수 없는 기본 개념 중의 하나가 확률입니다. 모집단에서 표본을 추출할 때 랜덤 샘플링, 층화 랜덤 샘플링 등과 같이 확률을 사용합니다. 추정과 검정에서도 확률분포를 사용합니다. 회귀분석, 판별분석 등에서도 변수가 정규분포를 따르고 있는지 검정합니다. 시뮬레이션을 할 때 모집단의 확률분포에 따라 난수를 발생시키기도 합니다.

특히, 통계를 좀 공부했던 분이라면 정규분포는 알고 있을 듯 합니다. 하지만, 그 외에 분포들은 들어는 봤어도 모양이 어떻게 생겼는지, 어떤 때 사용하는 것인지 정확히 모르고 있는 경우가 더 많을 듯 합니다.

R ggplot2를 활용해서 연속확률분포 곡선을 그려보면 분포별로 모양을 이해하는데 도움이 되겠지요. 그리고 모수에 따라서 모양이 어떻게 바뀌는지도 확인해 볼 수 있겠구요.

이번 포스팅에서는 주로 'd'로 시작하는 밀도 함수 (Density Function) 에 대해서 정규분포(norm), t-분포(t), 카이제곱분포(chisq), 지수분포(exp), F분포(f), 감마분포(gamma), 균등분포(unif) 등의 분포에 대해서 ggplot2로 그리는 방법을 소개해보겠습니다.

[ 연속확률분포 종류별 / 함수 종류별 ggplot2 그리기 함수 종합표 ]

분포		밀도 함수 d	누적분포 함수 p	분위수 함수 q	난수 발생 r
정규분포	norm()	dnorm()	pnorm()	qnorm()	rnorm()
t-분포	t()	dt()	pt()	qt()	rt()
카이제곱분포	chisq()	dchisq()	pchisq()	qchisq()	rchisq()
지수분포	exp()	dexp()	pexp()	qexp()	rexp()
F분포	f()	df()	pf()	qf()	rf()
감마분포	gamma()	dgamma()	pgamma()	qgamma()	rgamma()
균등분포	unif()	dunif()	punif()	qunif()	runif()

ggplot2는 별도의 설치 및 호출이 필요한 패키지이므로 아래의 절차를 먼저 실행합니다.

> install.packages("ggplot2")
Installing package into ‘C:/Users/user/Documents/R/win-library/3.2’
(as ‘lib’ is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/ggplot2_1.0.1.zip'
Content type 'application/zip' length 2676292 bytes (2.6 MB)
downloaded 2.6 MB

package ‘ggplot2’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\user\AppData\Local\Temp\RtmpGAPkIo\downloaded_packages
> library(ggplot2)

(1) 정규분포 활률밀도곡선 (Normal Distribution Probability Density Curve)

: stat_function(fun = dnorm)

> # 정규분포 : fun = dnorm
> ggplot(data.frame(x=c(-3,3)), aes(x=x)) +
+   stat_function(fun=dnorm, colour="blue", size=1) +
+   ggtitle("Normal Distribution")

(2) 정규분포의 특정 구간에만 색깔 넣기 (colour at specific range of normal distribution)

> # 함수 특정 구간에 색깔 넣기
> dnorm_range <- function(x) {
+   y <- dnorm(x) 
+   y[x < -1 | x > 2] <- NA  # 이 범위에는 색깔 없음
+   return(y)
+ }
> 
> ggplot(data.frame(x=c(-3,3)), aes(x=x)) +
+   stat_function(fun=dnorm, colour="blue", size=1) +
+   stat_function(fun=dnorm_range, geom="area", fill="grey", alpha=0.5) + 
+   ggtitle("Normal Distribution of x~N(0,1) with colour from -1 to 2")

(3) 누적정규분포 (Cummulative Normal Distribution) : stat_function(fun = pnorm)

> # 누적정규분포 : fun = pnorm
> ggplot(data.frame(x=c(-3,3)), aes(x=x)) +
+   stat_function(fun=pnorm, colour="black", size=1.5) +
+   ggtitle("Cumulative Normal Distribution of x~N(0,1)")

(4) 정규분포 : 평균과 분산 지정 (Normal Distribution with specific mean and standard deviation) : stat_function(fun = dnorm, args=list(mean=2, sd=1))

> # 정규분포: 평균과 분산 지정
> ggplot(data.frame(x = c(-5, 5)), aes(x=x)) +
+   stat_function(fun=dnorm, args=list(mean=2, sd=1), colour="black", size=1.5) +
+   geom_vline(xintercept=2, colour="grey", linetype="dashed", size=1) + # 평균에 세로 직선 추가
+   geom_text(x=0, y=0.3, label="x = N(2, 1)") +
+   ggtitle("Normal Distribution of x~N(2,1)")

(5) t-분포 (t-Distribution) : stat_function(fun = dt)

> # t-분포 : fun = dt 
> ggplot(data.frame(x=c(-3,3)), aes(x=x)) +
+   stat_function(fun=dt, args=list(df=2), colour="red", size=2) +
+   ggtitle("t-Distribution of df=2")

(6) 카이제곱분포 확률밀도곡선 (Chisq Distribution Probability Density Curve)
: stat_function(fun = dchisq)

> # 카이제곱분포 : fun = dchisq
> ggplot(data.frame(x=c(0,10)), aes(x=x)) +
+   stat_function(fun=dchisq, args=list(df=1), colour="black", size=1.2) +
+   geom_text(x=0.6, y=1, label="df=1") +
+   
+   stat_function(fun=dchisq, args=list(df=2), colour="blue", size=1.2) +
+   geom_text(x=0, y=0.55, label="df=2") +
+   
+   stat_function(fun=dchisq, args=list(df=3), colour="red", size=1.2) +
+   geom_text(x=0.5, y=0.05, label="df=3") +
+   
+   ggtitle("Chisq-Distribution")

(7) 지수분포 (Exponential Distribution) : stat_function(fun = dexp)

> # 지수분포 : fun = dexp
> ggplot(data.frame(x=c(0,10)), aes(x=x)) +
+   stat_function(fun=dexp, colour="brown", size=1.5) +
+   ggtitle("Exponential Distribution")

(8) F 분포 (F Distribution) : stat_function(fun = df)

> # F분포 : fun = df
> ggplot(data.frame(x=c(0,5)), aes(x=x)) +
+   stat_function(fun=df, args=list(df1=5, df2=10), colour="purple", size=1) +
+   ggtitle("F Distribution of (df1=5, df2=10)")

(9) 감마 분포 (Gamma Distribution) : stat_function(fun = dgamma)

> # 감마 분포 : fun = dgamma
> ggplot(data.frame(x=c(0, 400)), aes(x=x)) +
+   stat_function(fun=dgamma, args=list(shape=5, rate=0.05), colour="green") +
+   ggtitle("Gamma Distribution of (shape=5, rate=0.05)")

(10) 균등분포 (Uniform Distribution) : stat_function(fun = dunif)

> # 균등분포 : fun = dunif
> ggplot(data.frame(x=c(-2,20)), aes(x=x)) +
+   stat_function(fun=dunif, args=list(min = 0, max = 10), colour="black", size=1) +
+   ggtitle("Uniform Distribution of (min=1, max=10)")

덤으로, 상용로그분포와 사인 함수, 코사인 함수 곡선도 그려보겠습니다.

(11) 상용로그분포 (Common Logarithm Distribution) : stat_function(fun = log10)

> # 상용로그분포 : fun = log10
> ggplot(data.frame(x=c(0,100)), aes(x=x)) +
+   stat_function(fun=log10, colour="black", size=1.5) +
+   geom_vline(xintercept=10, colour="grey", linetype="dashed", size=1) +
+   geom_vline(xintercept=100, colour="grey", linetype="dashed", size=1) +
+   ggtitle("Common Logarithm Distribution")

(12) 사인 함수 곡선(Sine Function Curve), 코사인 함수 곡선(Cosine Function Curve) :

stat_function(fun = sin), stat_fuction(fun = cos)

> # 사인 함수 : fun = sin, 코사인 함수 : fun = cos
> ggplot(data.frame(x=c(0,6.28)), aes(x=x)) +
+   stat_function(fun=sin, colour="blue", size=1) +
+   geom_text(x=0.2, y=0, label="sine curve") +
+   
+   stat_function(fun=cos, colour="yellow", size=1) + 
+   geom_text(x=0.2, y=1, label="cosine curve") +
+   
+   geom_vline(xintercept=3.14, colour="grey", linetype="dashed", size=1) + # pi값에 세로 직선 추가  
+   geom_vline(xintercept=6.28, colour="grey", linetype="dashed", size=1) + # 2pi값에 세로 직선 추가  
+   ggtitle("Sine(blue curve), Cosine(yellow curve) Function")

많은 도움 되었기를 바랍니다.

이번 포스팅이 도움이 되었다면 아래의 '공감 ~♡' 단추를 꾸욱 눌러주세요.^^

728x90

저작자표시 비영리 변경금지

'R 분석과 프로그래밍 > R 그래프_시각화' 카테고리의 다른 글

R 그래프 Base Graphics plotting system : hist() boxplot(), stem(), barplot(), dotchart(), pie(), plot() (4)	2015.12.24
R 동적 그래프 (Interactive Plotting in R) : with manipulate package in Rstudio (0)	2015.09.12
R ggplot2 히트맵(Heat map) 그리기 : geom_tile(), geom_raster() (12)	2015.09.06
R ggplot2 범주형 축 그룹(요인) 순서 바꾸기 : scale_x_discrete(limits=...) (3)	2015.09.06
R ggplot2 분할 면에 각각 주석 넣기 : geom_text() (0)	2015.09.06

Posted by Rfriend

이전 1 다음

R, Python 분석과 프로그래밍의 친구 (by R Friend)

'stat_function(fun=df)'에 해당되는 글 2건

R 연속확률분포 F-분포(F-distribution) : df(), pf(), qf(), rf()

'R 분석과 프로그래밍 > R 통계분석' 카테고리의 다른 글

R ggplot2 연속확률분포 곡선, stat_function()

'R 분석과 프로그래밍 > R 그래프_시각화' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바