R, Python 분석과 프로그래밍의 친구 (by R Friend)

R ggplot2 막대그림(geom_bar())

R 분석과 프로그래밍/R 그래프_시각화 2015. 8. 22. 00:58

변수의 개수 및 데이터의 형태에 따라서 그래프, 시각화 방법이 달라지는데요,

지난번 포스팅에서는 일변량 연속형 데이터의 시각화 방법으로

- 히스토그램(Histogram)
: geom_histogram()

- 커널 밀도 곡선(Kernel Density Curve)
: geom_density()

- 박스 그래프(Box Plot)
: geom_boxplot()

- 바이올린 그래프(Violin Plot)
: geom_violin()

에 대해서 알아보았습니다.

이번 포스팅에서는 일변량 범주형 데이터의 시각화 방법으로서

- 막대그림(Bar Chart): geom_bar()

- 원그림(Pie Chart): geom_bar() + coord_polar()

에 대해서 소개해드리겠습니다.

[ 변수 개수 및 데이터 형태에 따른 그래프 ]

먼저, 범주별 도수를 구하고 이를 막대 형태로 나타낸 막대 그래프 (Bar Chart)를 ggplot2의 geom_bar() 로 그려보겠습니다.

사용할 데이터는 MASS 패키지에 있는 Cars93 데이터 프레임에서 자동차 유형(Type), 제조국(Origin) 등의 범주형/요인(factor)형 변수를 사용하겠습니다.

> library(MASS)
> str(Cars93)
'data.frame':	93 obs. of  27 variables:
 $ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
 $ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
 $ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
 $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
 $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
 $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
 $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
 $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
 $ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
 $ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
 $ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
 $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
 $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
 $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
 $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
 $ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
 $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
 $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
 $ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
 $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
 $ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
 $ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
 $ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
 $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
 $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
 $ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
 $ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...

자동차 유형(Type)별 도수를 가지고 막대그림을 그려보겠습니다.

> ggplot(Cars93, aes(x=Type)) + 
+   geom_bar(fill="white", colour="black") + 
+   ggtitle("Bar Chart of Frequency by Car Type")

위와 똑같은 그래프를 그려볼건데요, 이번에는 aes(x, y)의 x변수와 도수에 해당하는 y변수로 된 데이터프레임을 만들어서 이를 직접 x, y에 입력해서 그래프를 그려보겠습니다 (간편하게는 위의 방식 사용하면 되구요, 아래 처럼 데이터가 구성이 되어있다면 이번 방식을 이용하면 되겠습니다). 아래 예제에서는 자동차 유형(Type)별로 도수를 집계(aggregation)할 때 sqldf 패키지를 사용하였습니다.

> install.packages("sqldf")

> library(sqldf)
> 
> Car_Type_cnt <- sqldf( 'select Type, count(*) as Type_cnt
+                           from Cars93
+                           group by Type
+                           order by Type
+                         ')
> 
> Car_Type_cnt
     Type Type_cnt
1 Compact       16
2   Large       11
3 Midsize       22
4   Small       21
5  Sporty       14
6     Van        9
> 
> sapply(Car_Type_cnt, class)
     Type  Type_cnt 
 "factor" "integer"

다음으로 자동차 유형(Type)별로 geom_bar()를 이용하여 막대그림을 그려보도록 하겠습니다. y에 직접 입력해주고, geom_bar()에 stat="identity"를 설정해주어야 합니다.

> # 자동차 유형별 도수 막대 그림
> library(ggplot2)
> 
> ggplot(Car_Type_cnt, aes(x=Type, y=Type_cnt)) + 
+   geom_bar(stat="identity", fill="white", colour="black") + 
+   ggtitle("Bar Chart of Frequency by Car Type")

일변량에 더해서, 이번에는 2개의 변수를 사용한 막대그림도 살펴보도록 하겠습니다. 차종(Type) 별 제조국(Origin) 별 자동차 수를 가지고 막대그림을 그려보도록 하겠습니다.

> # Origin별 구분 추가하기
> ggplot(Cars93, aes(x=Type, fill=Origin)) + 
+   geom_bar(position="dodge", colour="black") + 
+   scale_fill_brewer(palette=1) +
+   ggtitle("Bar Chart of Frequency by Car Type & Origin")

이번에는 위와 동일한 그래프를 그릴건데요, sqldf()로 차종(Type)별 & Origin 별 자동차 도수를 집계를 해서 데이터프레임을 만들어서 막대그림을 그려보겠습니다.

> # 차종(Type) 별 실린더개수(Cylinders) 별 자동차 개수 > library(sqldf)

> Car_Type_Origin_cnt <- sqldf( 'select Type, Origin, count(*) as Type_Origin_cnt + from Cars93 + group by Type, Origin + order by Type, Origin + ') > Car_Type_Origin_cnt Type Origin Type_Origin_cnt 1 Compact USA 7 2 Compact non-USA 9 3 Large USA 11 4 Midsize USA 10 5 Midsize non-USA 12 6 Small USA 7 7 Small non-USA 14 8 Sporty USA 8 9 Sporty non-USA 6 10 Van USA 5 11 Van non-USA 4 >

geom_bar()로 막대그림을 그리되, 처음의 일변량 때와는 다르게 fill=Origin 로 하여서 제조국별로 구분을 해보겠습니다. position="dodge" 를 하면 수평으로 나란히 Origin별로 그려집니다.

> ggplot(Car_Type_Origin_cnt, aes(x=Type, y=Type_Origin_cnt, fill=Origin)) + 
+      geom_bar(stat="identity", position="dodge", colour="black") + 
+      scale_fill_brewer(palette=1) +
+      ggtitle("Bar Chart of Frequency by Car Type & Origin_1")

만약 position="dodge" 옵션을 지정하지 않으면 아래와 같이 세로로 올라탄 그래프 형식으로 제시됩니다.

> # without position="dodge"
> ggplot(Car_Type_Origin_cnt, aes(x=Type, y=Type_Origin_cnt, fill=Origin)) + 
+   geom_bar(stat="identity", colour="black") +  # position="dodge" 미지정
+   scale_fill_brewer(palette=1) +
+   ggtitle("Bar Chart of Frequency by Car Type & Origin, without podge option")

* 누적 막대 그래프 (stacked bar chart)

아래와 같이 생긴 데이터프레임에서 'id' 그룹별로 'bin_val' 값을 이용해서 누적 막대그래프 (stacked bar chart)를 그려보겠습니다. 이때 막대그래프의 색깔은 'color' 칼럼의 색으로 지정해서 그려보겠습니다.

parsed.txt

df = read.table('parsed.txt', sep=',', header=T)

df <- transform(df, bin_val = bin_end - bin_start)

A data.frame: 12 × 7
id	color_cd	color	bin_start	bin_end	bin_range	bin_val
<fct>	<fct>	<fct>	<int>	<int>	<fct>	<int>
AAA	a	red	0	100	[0,100)	100
AAA	b	blue	100	200	[100,200)	100
AAA	a	red	200	300	[200,300)	100
AAA	b	blue	300	400	[300,400)	100
BBB	a	red	0	250	[0,250)	250
BBB	b	blue	250	350	[250,350)	100
BBB	a	red	350	450	[350,450)	100
BBB	b	blue	450	550	[450,550)	100
BBB	a	red	550	650	[550,650)	100
BBB	b	blue	650	750	[650,750)	100
BBB	a	red	750	800	[750,800)	50
BBB	b	blue	800	910	[800,910)	110

library(ggplot2)

ggplot(df, aes(x=id, y=bin_val, fill=color, group=id)) +

geom_bar(stat="identity") +

scale_fill_manual("legend", values = c("red" = "red", "blue" = "blue"))

많은 도움이 되었기를 바랍니다.

다음번 포스팅에서는 원그림(Pie Chart)를 알아보겠습니다.

이번 포스팅이 도움이 되었다면 아래의 '공감 ~♡' 단추를 꾸욱 눌러주세요.^^

728x90

저작자표시 비영리 변경금지

'R 분석과 프로그래밍 > R 그래프_시각화' 카테고리의 다른 글

R 모자이크 그림: vcd패키지 mosaic() 함수 (2)	2015.08.22
R ggplot2 원그림(geom_bar() + coord_polar()) (0)	2015.08.22
R ggplot2 박스 그래프 (geom_boxplot()), 바이올린 그래프(geom_violin()) (0)	2015.08.21
R ggplot2 히스토그램 (goem_histogram()), 커널 밀도 곡선 (Kernel Density Curve) (15)	2015.08.20
R 그래프 패키지 소개 (Graphics, Lattice, ggplot2) (0)	2015.08.19

Posted by Rfriend

R, Python 분석과 프로그래밍의 친구 (by R Friend)

R ggplot2 막대그림(geom_bar())

'R 분석과 프로그래밍 > R 그래프_시각화' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바