R, Python 분석과 프로그래밍의 친구 (by R Friend)

(5) 체르노프 얼굴그림 (Chernoff faces)

등이 있습니다.

이번 포스팅에서는 (5) 체르노프 얼굴그림 (Chernoff faces)에 대해서 소개하겠습니다.

체르노프 얼굴그림은 다변량 변수의 속성값들을 아래의 표에 나오는 것처럼 15가지의 얼굴의 생김새(얼굴 높이, 얼굴 넓이, 입 높이, 입 넓이...등) 특성에 매핑해서 얼굴 모양이 달라지게 하는 방식입니다.

얼굴 특성 (face characteristics)		다변량 변수 (multivariate mapping)
1. 얼굴의 높이	"height of face "	"Price"
2. 얼굴의 넓이	"width of face "	"MPG.highway"
3. 얼굴의 구조	"structure of face"	"Horsepower"
4. 입의 높이	"height of mouth "	"RPM"
5. 입의 넓이	"width of mouth "	"Length"
6. 웃음	"smiling "	"Weight"
7. 눈의 높이	"height of eyes "	"Price"
8. 눈의 넓이	"width of eyes "	"MPG.highway"
9. 머리카락 높이	"height of hair "	"Horsepower"
10. 머리카락 넓이	"width of hair "	"RPM"
11. 헤어스타일	"style of hair "	"Length"
12. 코 높이	"height of nose "	"Weight"
13. 코 넓이	"width of nose "	"Price"
14. 귀 넓이	"width of ear "	"MPG.highway"
15. 귀 높이	"height of ear "	"Horsepower"

체르노프 얼굴그림은 얼굴 모양을 가지고 데이터 관측치들의 특성을 직관적으로 파악할 수 있다는 장점이 있습니다. 다만, 각 변수가 얼굴 모양의 어느 특성에 매핑이 되었는지를 확인하고자 한다면 앞서 살펴본 레이터 차트나 별그림, 평행좌표그림 등에 비해 불편한 편이고, 왠지 official한 느낌은 덜 듭니다. 그래서 저 같은 경우는 회사에서 보고서에 체르노프 얼굴그림을 사용해본 적은 아직까지는 없습니다. ^^; 그래도 다변량 데이터를 신속하게, 직관적으로 탐색적분석 하는 용도로는 알아듬직 하므로 이번 포스팅을 이어가 보겠습니다.

예제에 사용할 데이터는 MASS Package에 내장되어있는 Cars93 dataframe을 사용하겠으며, 전체 93개의 관측치가 있는데요, 이를 모두 그리자니 너무 많아서요, 1번째 관측치부터 20번째 관측치까지만 사용하겠습니다. 체르노프 얼굴그림 그릴 때 사용할 변수로는 가격("Price"), 고속도로연비("MPG.highway"), 마력("Horsepower"), RPM("RPM"), 차길이("Length"), 차무게("Weight")의 5개만 선별해서 사용하겠습니다. 아래처럼 Cars93_1 이라는 새로운 이름의 데이터프레임을 만들었습니다.

> # dataset preparation
> library(MASS)
> str(Cars93)
'data.frame':	93 obs. of  27 variables:
 $ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
 $ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
 $ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
 $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
 $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
 $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
 $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
 $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
 $ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
 $ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
 $ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
 $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
 $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
 $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
 $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
 $ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
 $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
 $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
 $ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
 $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
 $ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
 $ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
 $ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
 $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
 $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
 $ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
 $ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...
>

> # sampling observations from 1st to 20th, selecting 5 variables

> Cars93_1 <- Cars93[c(1:20),c("Price", "MPG.highway", "Horsepower", "RPM", "Length", "Weight")]
> Cars93_1
   Price MPG.highway Horsepower  RPM Length Weight
1   15.9          31        140 6300    177   2705
2   33.9          25        200 5500    195   3560
3   29.1          26        172 5500    180   3375
4   37.7          26        172 5500    193   3405
5   30.0          30        208 5700    186   3640
6   15.7          31        110 5200    189   2880
7   20.8          28        170 4800    200   3470
8   23.7          25        180 4000    216   4105
9   26.3          27        170 4800    198   3495
10  34.7          25        200 4100    206   3620
11  40.1          25        295 6000    204   3935
12  13.4          36        110 5200    182   2490
13  11.4          34        110 5200    184   2785
14  15.1          28        160 4600    193   3240
15  15.9          29        110 5200    198   3195
16  16.3          23        170 4800    178   3715
17  16.6          20        165 4000    194   4025
18  18.8          26        170 4200    214   3910
19  38.0          25        300 5000    179   3380
20  18.4          28        153 5300    203   3515

체로노프 얼굴그림을 그리기 위해 R의 aplpack Package의 faces() 함수를 사용하겠습니다.

install.package() 함수를 써서 설치하고, library() 함수로 호출해 보겠습니다.

> install.packages("aplpack")
Installing package into ‘C:/Users/Owner/Documents/R/win-library/3.2’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/aplpack_1.3.0.zip'
Content type 'application/zip' length 3156450 bytes (3.0 MB)
downloaded 3.0 MB

package ‘aplpack’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Owner\AppData\Local\Temp\Rtmpk3jrgb\downloaded_packages
>

> library(aplpack)
필요한 패키지를 로딩중입니다: tcltk

이제 faces() 함수로 체르노프 얼굴 그림을 그려보겠습니다.

faces(dataset, face.type = 0/1/2, main = "title") 의 형식으로 사용합니다.

face.type = 0 (line drawing faces)은 색깔 없이 선으로만 얼굴을 그립니다.

face.type = 1 (the elements of the faces are painted)는 색깔도 같이 칠해서 얼굴을 그려줍니다.

face.type = 2 (Santa Claus faces are drawn)는 산타클로스 얼굴에 색을 칠해서 그려주고요.

아래에 하나씩 예를 들어보겠습니다.

face.type = 0 (line drawing faces)

> # face.type = 0 : line drawing faces
> faces(Cars93_1, face.type = 0, main = "Chernoff faces: face.type = 0")
effect of variables:
 modified item       Var          
 "height of face   " "Price"      
 "width of face    " "MPG.highway"
 "structure of face" "Horsepower" 
 "height of mouth  " "RPM"        
 "width of mouth   " "Length"     
 "smiling          " "Weight"     
 "height of eyes   " "Price"      
 "width of eyes    " "MPG.highway"
 "height of hair   " "Horsepower" 
 "width of hair   "  "RPM"        
 "style of hair   "  "Length"     
 "height of nose  "  "Weight"     
 "width of nose   "  "Price"      
 "width of ear    "  "MPG.highway"
 "height of ear   "  "Horsepower"

face.type = 1 (the elements of the faces are painted)

> # face.type = 1 : the elements of the faces are painted
> faces(Cars93_1, face.type = 1, main = "Chernoff faces: face.type = 1")
effect of variables:
 modified item       Var          
 "height of face   " "Price"      
 "width of face    " "MPG.highway"
 "structure of face" "Horsepower" 
 "height of mouth  " "RPM"        
 "width of mouth   " "Length"     
 "smiling          " "Weight"     
 "height of eyes   " "Price"      
 "width of eyes    " "MPG.highway"
 "height of hair   " "Horsepower" 
 "width of hair   "  "RPM"        
 "style of hair   "  "Length"     
 "height of nose  "  "Weight"     
 "width of nose   "  "Price"      
 "width of ear    "  "MPG.highway"
 "height of ear   "  "Horsepower"

face.type = 2 (Santa Claus faces are drawn)

> # face.type = 2 : Santa Claus faces are drawn
> faces(Cars93_1, face.type = 2, main = "Chernoff faces: face.type = 2")
effect of variables:
 modified item       Var          
 "height of face   " "Price"      
 "width of face    " "MPG.highway"
 "structure of face" "Horsepower" 
 "height of mouth  " "RPM"        
 "width of mouth   " "Length"     
 "smiling          " "Weight"     
 "height of eyes   " "Price"      
 "width of eyes    " "MPG.highway"
 "height of hair   " "Horsepower" 
 "width of hair   "  "RPM"        
 "style of hair   "  "Length"     
 "height of nose  "  "Weight"     
 "width of nose   "  "Price"      
 "width of ear    "  "MPG.highway"
 "height of ear   "  "Horsepower"

산타클로스 얼굴은 정신이 하도 산만해서 관측치들간의 유사성이나 차이가 눈에 잘 안들어오네요. @@~

체로노프 얼굴그림에 이름 추가하기 : labels =

> # putting labels as face names : labels
> faces(Cars93_1, face.type = 1, labels = Cars93[1:20,]$Model, 
+       main = "putting labels as face names : labels = ")
 effect of variables:
 modified item       Var          
 "height of face   " "Price"      
 "width of face    " "MPG.highway"
 "structure of face" "Horsepower" 
 "height of mouth  " "RPM"        
 "width of mouth   " "Length"     
 "smiling          " "Weight"     
 "height of eyes   " "Price"      
 "width of eyes    " "MPG.highway"
 "height of hair   " "Horsepower" 
 "width of hair   "  "RPM"        
 "style of hair   "  "Length"     
 "height of nose  "  "Weight"     
 "width of nose   "  "Price"      
 "width of ear    "  "MPG.highway"
 "height of ear   "  "Horsepower"

산점도에 체르노프 얼굴그림 겹쳐 그르기 (overlapping chernoff faces over scatter plot)

(1) 먼저 산점도를 plot() 함수를 사용해서 그립니다.

(2) 그 다음에 faces() 함수로 체르노프 얼굴그림을 실행시킵니다. 이때 scale = TRUE, plot = FALSE 옵션을 사용해줍니다. 그래프는 화면에 안나타나구요, (3)번 스텝에서 그래프가 그려질 수 있도록 데이터가 준비된 상태입니다.

(3) plot.faces() 함수를 사용해서 산점도 위에 (2)번에서 생성해 놓은 체르노프 얼굴그림을 겹쳐서 그려줍니다. width 와 height 는 x축과 y축의 단위를 보고서 trial & error 를 해보면서 숫자를 조금씩 바꿔가면서 그려본 후에 가장 마음에 드는 걸로 선택하면 되겠습니다.

체르노프 얼굴그림을 산점도에 겹쳐서 그리니 제법 유용한 다차원 그래프이지 않은가요? ^^

> # Overlapping Chernoff faces over scatter plot (MPG.highway*Weight)
> plot(Cars93_1[,c("MPG.highway", "Weight")], 
+      bty="n", # To make a plot with no box around the plot area
+      main = "Chernoff faces of Cars93")

> Cars93_1_faces <- faces(Cars93_1, scale = TRUE, plot=FALSE)
effect of variables:
 modified item       Var          
 "height of face   " "Price"      
 "width of face    " "MPG.highway"
 "structure of face" "Horsepower" 
 "height of mouth  " "RPM"        
 "width of mouth   " "Length"     
 "smiling          " "Weight"     
 "height of eyes   " "Price"      
 "width of eyes    " "MPG.highway"
 "height of hair   " "Horsepower" 
 "width of hair   "  "RPM"        
 "style of hair   "  "Length"     
 "height of nose  "  "Weight"     
 "width of nose   "  "Price"      
 "width of ear    "  "MPG.highway"
 "height of ear   "  "Horsepower" 
>

> plot.faces(Cars93_1_faces, 
+            Cars93_1[,c("MPG.highway")], 
+            Cars93_1[,c("Weight")], 
+            width = 2, 
+            height = 250)

체르노프 얼굴그림에 대해서 좀더 알고 싶은 분은 아래의 Reference를 참고하시기 바랍니다.

[ Reference ] http://www.inside-r.org/packages/cran/aplpack/docs/faces

많은 도움 되었기를 바랍니다.

이번 포스팅이 도움이 되었다면 아래의 '공감 ~♡' 단추를 꾸욱 눌러주세요.^^

728x90

(1) 레이더 차트 (radar chart) or 거미줄 그림(spider plot)

'R 분석과 프로그래밍 > R 그래프_시각화' 카테고리의 다른 글

[R 그래프/시각화] 방향성 있는 가중 네트워크 시각화 (directec and weighted network visualization), R igraph package (4)	2016.07.03
[R 그래프/시각화] Sankey Diagram, 두께와 색깔로 흐름, 관계 시각화 (0)	2016.07.02
R 다변량 그래프 (4) 3차원 산점도 (3 dimensional scatter plot) : scatterplot3d() (0)	2016.02.21
R 다변량 그래프 (3) 평행 좌표 그림 (parallel coordinate plot) : parcoord() (2)	2016.02.20
R 다변량 그래프 (2) 별 그래프 (star graph, segment diagrams) (0)	2016.02.12

Posted by Rfriend

R 다변량 그래프 (3) 평행 좌표 그림 (parallel coordinate plot) : parcoord()

R 분석과 프로그래밍/R 그래프_시각화 2016. 2. 20. 23:16

변수 3개 이상의 다변량 데이터(multivariate data set)를 2차원 평면에 효과적으로 시각화할 수 있는 방법으로

(2) 별 그래프 (star graph) (레이더 차트와 유사, 중심점 차이 존재)

(3) 평행 좌표 그림 (parallel coordinate plot)

(5) 체르노프 얼굴그림 (Chernoff faces)

등이 있습니다.

산점도 행렬(http://rfriend.tistory.com/83, http://rfriend.tistory.com/82)과 모자이크 그림(http://rfriend.tistory.com/71)은 이전 포스팅을 참고하시기 바랍니다.

이번 포스팅에서는 (3) 평행 좌표 그림 (parallel coordinate plot)에 대해서 소개하겠습니다.

MASS Package의 parcoord() 함수를 사용하겠으며, 예제 데이터 역시 MASS Package에 내장된 Cars93 데이터 프레임의 차종(Type)별로 선모양(line type)과 색깔(color)을 달리하여 가격(Price), 고속도로연비(MPG.highway), 마력(Horsepower), 분당회전수(RPM), 길이(Length), 무게(Weight) 변수를 가지고 그래프를 그려보겠습니다.

> ##--------------------------------------
> ## parallel coordinate plot
> ##--------------------------------------
> 
> library(MASS)
> str(Cars93)
'data.frame':	93 obs. of  27 variables:
 $ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
 $ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
 $ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
 $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
 $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
 $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
 $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
 $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
 $ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
 $ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
 $ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
 $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
 $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
 $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
 $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
 $ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
 $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
 $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
 $ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
 $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
 $ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
 $ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
 $ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
 $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
 $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
 $ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
 $ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...

차종(Type)별로 선 모양(lty)과 선 색깔(col)을 다르게 하기 위해서 데이터 전처리를 해보겠습니다. 차종이 현재는 "Compact", "Large" 등과 같이 character 로 되어있는데요, 1, 2, ..., 6 의 numeric 으로 변환하겠습니다.

> # making Type_number variable to put line type and color by Car Type
> Cars93_1 <- transform(Cars93, 
+                       Type_no = ifelse(Type == "Compact", 1, 
+                                        ifelse(Type == "Large", 2, 
+                                               ifelse(Type == "Midsize", 3, 
+                                                      ifelse(Type == "Small", 4, 
+                                                             ifelse(Type == "Sporty", 5, 6)))))
+                       )
> # checking top 10 observations
> head(Cars93_1[,c("Type", "Type_no")], n=10)
      Type Type_no
1    Small       4
2  Midsize       3
3  Compact       1
4  Midsize       3
5  Midsize       3
6  Midsize       3
7    Large       2
8    Large       2
9  Midsize       3
10   Large       2
>

이제 준비가 되었네요. MASS Package의 parcoord() 함수를 사용해서 평행 좌표 그림(parallel coordinate plot)을 그려보겠습니다.

Cars93_1[, c("MPG.highway", "RPM", "Horsepower", "Weight", "Length", "Price")] 은 평행좌표그림을 그릴 대상 변수만 선별해오는 명령문입니다.

위에서 Cars93_1 이라는 새로운 데이터 프레임에 Type_no 라는 numeric 변수를 만들었는데요, 선 유형(lty, line type)과 색깔(col, color)를 Cars93_1$Type_no 로 지정을 해줘서 차종(Type)에 따라서 선 유형과 색깔이 달라지게 했습니다.

var.label = TRUE 옵션을 설정하면 각 변수별로 minimum value, maximum value 가 하단과 상단에 표기됩니다.

main = "parallel coordinate plot of Cars93 by Type" 옵션은 그래프에 제목 넣을 때 사용합니다.

아래 그래프 상단 우측에 범례가 들어가 있는데요, legend() 함수를 사용해서 추가한 것입니다.

> # parallel coordinate plot
> library(MASS)

> parcoord(Cars93_1[, c("MPG.highway", "RPM", "Horsepower", "Weight", "Length", "Price")], 
+          lty = Cars93_1$Type_no,
+          col = Cars93_1$Type_no,

+          var.label = TRUE, 
+          main = "parallel coordinate plot of Cars93 by Type")
> 
> # putting legend
> legend("topright", 
+        legend = c("Compact", "Large", "Midsize", "Small", "Sporty", "Van"), 
+        lty = c(1:6), 
+        col = c(1:6), 
+        lwd = 2, # line width
+        cex = 0.7)  # character size

다음번 포스팅에서는 3차원 산점도 (3 dimensional scatter plot)에 대해서 소개하도록 하겠습니다.

많은 도움 되었기를 바랍니다.

이번 포스팅이 도움이 되었다면 아래의 '공감 ~♡' 단추를 꾸욱 눌러주세요.^^

728x90

(1) 레이더 차트 (radar chart) or 거미줄 그림(spider plot)

'R 분석과 프로그래밍 > R 그래프_시각화' 카테고리의 다른 글

R 다변량 그래프 (5) 체르노프 얼굴그림 (Chernoff faces) : aplpack package, faces() 함수 (0)	2016.03.05
R 다변량 그래프 (4) 3차원 산점도 (3 dimensional scatter plot) : scatterplot3d() (0)	2016.02.21
R 다변량 그래프 (2) 별 그래프 (star graph, segment diagrams) (0)	2016.02.12
R 다변량 그래프 (1) 레이더 차트(radar chart), or 거미줄 그림(spider plot) (0)	2016.02.11
R Graphics 낮은 수준의 그래프 함수 (7) 다각형 추가 : polygon(x, y, ...) (0)	2016.01.12

Posted by Rfriend

R 다변량 그래프 (2) 별 그래프 (star graph, segment diagrams)

R 분석과 프로그래밍/R 그래프_시각화 2016. 2. 12. 13:21

변수 3개 이상의 다변량 데이터(multivariate data set)를 2차원 평면에 효과적으로 시각화할 수 있는 방법으로

(2) 별 그래프 (star graph) (레이더 차트와 유사, 중심점 차이 존재)

(3) 평행 좌표 그림 (parallel coordinate plot)

(5) 체르노프 얼굴그림 (Chernoff faces)

등이 있습니다.

산점도 행렬(http://rfriend.tistory.com/83, http://rfriend.tistory.com/82)과 모자이크 그림(http://rfriend.tistory.com/71)은 이전 포스팅을 참고하시기 바랍니다.

이번 포스팅에서는 (2) 별 그래프 (star graph)에 대해서 소개하겠습니다.

graphics Package의 stars() 함수를 사용하겠습니다. graphics Package는 base Package로서 R 설치할 때 기본으로 설치되므로 stars() 함수를 사용하기 위해 추가로 별도 패키지 설치는 필요하지 않습니다.

stars() 함수는 dataframe 이나 matrix 형태의 데이터셋을 사용합니다. scale = TRUE 옵션을 사용하면 (minimum value) 0 ~ (maximum value) 1 사이의 값으로 다변량 변수들의 값을 표준화해줍니다. 별 그래프(star graph)의 기본 원리는 중심점(center point)으로 부터 각 관측치별/ 각 변수별로 거리(distance) 혹은 반지름(radius)이 얼마나 떨어져있는가를 시각화한 것입니다.

실습을 위해서 MASS Package에 내장된 Cars93 dataframe을 사용하겠습니다. 이전 포스팅 레이더 차트(radar chart) 와 비교하기 쉽도록 이번에도 차종(Type), 가격(Price), 고속도로연비(MPG.highway), 마력(Horsepower), 분당회전수(RPM), 길이(Length), 무게(Weight) 의 7개 변수를 똑같이 사용하겠습니다.

> library(MASS)
> str(Cars93)
'data.frame':	93 obs. of  27 variables:
 $ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
 $ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
 $ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
 $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
 $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
 $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
 $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
 $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
 $ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
 $ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
 $ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
 $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
 $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
 $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
 $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
 $ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
 $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
 $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
 $ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
 $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
 $ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
 $ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
 $ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
 $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
 $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
 $ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
 $ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...

93개의 차량 관측치가 있는데요, 이것을 6개의 차종(Type)을 기준으로 평균 통계량으로 요약한 후에, 차종별로 6개의 평균치 다변량 변수를 가지고 별 그래프를 그려보겠습니다.

> # cross tabulation by Car Type
> table(Cars93$Type)

Compact   Large Midsize   Small  Sporty     Van 
     16      11      22      21      14       9

> # mean of multivariates by Car Type
> install.packages("doBy")

> library(doBy)

> 
> mean_by_Type <- summaryBy(MPG.highway + RPM + Horsepower + Weight + Length + Price ~ Type, 
+                           data=Cars93, 
+                           FUN = c(mean))
> 
> mean_by_Type
     Type MPG.highway.mean RPM.mean Horsepower.mean Weight.mean Length.mean Price.mean
1 Compact         29.87500 5362.500        131.0000    2918.125    182.1250   18.21250
2   Large         26.72727 4672.727        179.4545    3695.455    204.8182   24.30000
3 Midsize         26.72727 5336.364        173.0909    3400.000    192.5455   27.21818
4   Small         35.47619 5633.333         91.0000    2312.857    167.1905   10.16667
5  Sporty         28.78571 5392.857        160.1429    2899.643    175.2143   19.39286
6     Van         21.88889 4744.444        149.4444    3830.556    185.6667   19.10000

stars() 함수에서는 rownames 를 가져다가 labeling 을 합니다. 현재 rownames 는 1, 2,..., 6 의 숫자로 되어있으므로, 이를 차종(Type) 이름으로 변경하도록 하겠습니다. (굷게 파란색으로 밑줄친 부분)

> # creating row names with Type
> rownames(mean_by_Type) <- mean_by_Type$Type
> 
> mean_by_Type
           Type MPG.highway.mean RPM.mean Horsepower.mean Weight.mean Length.mean Price.mean
Compact Compact         29.87500 5362.500        131.0000    2918.125    182.1250   18.21250
Large     Large         26.72727 4672.727        179.4545    3695.455    204.8182   24.30000
Midsize Midsize         26.72727 5336.364        173.0909    3400.000    192.5455   27.21818
Small     Small         35.47619 5633.333         91.0000    2312.857    167.1905   10.16667
Sporty   Sporty         28.78571 5392.857        160.1429    2899.643    175.2143   19.39286
Van         Van         21.88889 4744.444        149.4444    3830.556    185.6667   19.10000

위에서 rownames() 로 뭐가 바뀌었나 잘 모를수도 있는데요, 아래에 화면 캡쳐한 그래프를 참고하시기 바랍니다. 제일 왼쪽에 rowname 이 숫자에서 차종(Type)으로 바뀐게 보이지요?

doBy Package로 요약통계량을 생성하면 변수명 뒤에 자동으로 통계량 이름이 따라 붙습니다. 이번 예제의 경우에는 평균을 구했으므로 MPG.highway.mean, RPM.mean, ... 이런 식으로요. 변수명이 너무 길다보니 나중에 labeling 할 때 옆으로 삐죽 튀어나가서 보기 싫어서요, 변수명을 좀더 짧게 변경해보겠습니다. 변수명 뒤에 .mean 을 생략하고 사용하겠습니다. 위에 rownames() 함수는 stars() 함수를 사용하려면 꼭 해줘야 하는 것이구요, 아래의 renames()는 필수사항은 아닙니다.

> # renaming of variables
> library(reshape)
> mean_by_Type <- rename(mean_by_Type, 
+                        c(MPG.highway.mean = "MPG.highway",  
+                          RPM.mean = "RPM",  
+                          Horsepower.mean = "Horsepower",  
+                          Weight.mean = "Weight", 
+                          Length.mean = "Length", 
+                          Price.mean = "Price"
+                          )
+                        )
> mean_by_Type
           Type MPG.highway      RPM Horsepower   Weight   Length    Price
Compact Compact    29.87500 5362.500   131.0000 2918.125 182.1250 18.21250
Large     Large    26.72727 4672.727   179.4545 3695.455 204.8182 24.30000
Midsize Midsize    26.72727 5336.364   173.0909 3400.000 192.5455 27.21818
Small     Small    35.47619 5633.333    91.0000 2312.857 167.1905 10.16667
Sporty   Sporty    28.78571 5392.857   160.1429 2899.643 175.2143 19.39286
Van         Van    21.88889 4744.444   149.4444 3830.556 185.6667 19.10000

이제 드디어 데이터셋이 준비가 되었습니다. stars() 함수를 사용해서 별 그래프를 그려보겠습니다.

stars(x, ...) 의 x 자리에는 dataframe 이나 matrix 형태의 다변량 데이터셋 이름을 입력하면 됩니다.

locations = NULL, nrow = 2, ncol = 4 옵션은 행이 2줄, 열이 4줄인 square layout 으로 배열하라는 뜻입니다.

scale = TRUE 는 변수별로 단위(scale)가 달라서 들쭉날쭉한 값들을 변수별로 모두 최소값 0 ~ 최대값 1 사이로 변수별 값들을 표준화(standardization) 합니다.

full = TRUE 로 지정하면 360도의 전체 원으로 그래프를 그립니다. full = FALSE 로 지정하면 1180도짜리 반원(semi-circle)으로 그래프가 그려집니다.

radius = TRUE 로 지정하면 반지름 선이 그려집니다. 만약 radius = FALSE 로 하면 반지름 선이 안그려지는데요, 보기에 좀 휑~합니다. ^^'

frame.plot = TRUE 로 하면 그래프의 외곽에 네모 박스 선으로 테두리가 쳐집니다.

main 은 제목을 입력하는 옵션입니다. sub 는 부제목 입력하는 옵션이구요.

cex 는 글자 크기 지정하는 옵션인데요, default 가 1이고 숫자가 커질 수록 글자 크기가 커집니다.

lwd 는 선 두께 (line width) 지정하는 옵션입니다. default 가 1이며, 숫자가 커질 수록 선이 두꺼워집니다.

key.loc = c(7.5, 1.5) 는 x, y 좌표 위치에 각 변수들의 이름(unit key)을 범례로 집어넣습니다.

말로 설명해놓긴 했는데요, 잘 이해가 안갈수도 있겠습니다. 아래에 stars() 함수를 복사해놓고서 옵션마다 하나씩 '#'을 붙여가면서 실행을 해보시기 바랍니다. 그러면 '#'을 붙이기 전과 비교가 될테고, 옵션별 기능을 바로 확인할 수 있습니다.

> # star plot
> stars(mean_by_Type[, 2:7], # dataframe or matrix
+       locations = NULL, # locations = NULL, the segment plots will be placed in a rectangular grid
+       nrow = 2, # number of rows at a square layout (w/locations = NULL)
+       ncol = 4, # number of columns at a square layout (w/locations = NULL)
+       scale = TRUE, # the columns are scaled independently (max in each column: 1, min: 0)
+       full = TRUE, # TRUE: occupy a full circle, FALSE : semi-circle
+       radius = TRUE, # the radii corresponding to each variable in the data will be drawn
+       frame.plot = TRUE, # if TRUE, the plot region is framed
+       main = "Star plot - means of multivariate by Car Type", # a main title for the plot
+       cex = 1, # size of label character (by default, cex = 1)
+       # labels = NULL # if NULL, no attempt is made to construct labels
+       lwd = 1, # line width (by default, lwd = 1)
+       key.loc = c(7.5, 1.5) # vector with x and y coordinates of the unit key
+       )

[ stars() 함수로 radar chart 그리기 ]

이전 포스팅에서 소개했던 레이더 차트 or 거미줄 그림(radar chart, or spider plot)도 stars() 함수로 그릴 수 있습니다. 별 그래프(star plot)이 개별 관측치마다 location을 부여하여 하나씩 다변량 그래프를 그린 것이라면, 레이더 차트(radar chart) or 거미줄 그림(spider plot)은 하나의 공통된 location을 중심점으로 하여 관측치들을 중첩하여 그린 다변량 그래프입니다. locations = c(0, 0)으로 중심점을 한개로 통일하였고, key.loc = c(0, 0) 으로 똑같이 지정해주어서 이 중심점 좌표를 기준으로 변수명을 labeling 할 수 있게 하였습니다.

radius = FALSE 로 바꾸어서 반지름 선은 표시하지 않게끔 하였습니다. 6개 차종(Type)의 그래프가 중첩이 되다보니 radius = TRUE 로 했더니 선이 겹쳐서 아예 안보이는게 있어서요.

관치치들의 다변량 변수 간에 존재하는 패턴에 대해서 관심이 있는 분석가라면 아무래도 그룹별로 선 색깔을 달리하여 그린 레이더 차트 (or 거미줄 그림)가 별 그래프(star chart)보다는 좀더 유용한 편입니다. col.lines = c(1:6) 옵션으로 6개 차종(Type)별 색깔을 구분하였습니다.

범례(legend)는 legend(x= , y= , ...) 함수로 추가를 하였습니다. x, y 좌표는 몇 번 숫자를 넣어보고 시행착오를 거치면서 적당한 좌표를 찾아나가야 합니다. "topright", "topleft" 이런 식으로 범례 좌표를 지정했더니 레이더 차트랑 자꾸 겹쳐서요. ^^;

> # radar chart (or spider chart) 
> stars(mean_by_Type[, 2:7], 
+       locations = c(0, 0), 
+       key.loc = c(0, 0), 
+       scale = TRUE, 
+       radius = FALSE, 
+       cex = 1, 
+       lty = 2, 
+       col.lines = c(1:6), 
+       lwd = 2, 
+       main = "radar chart - means of multivariate by Car Type" 
+       )
> 
> legend(x=1, y=1, legend = mean_by_Type$Type, lty = 2, col = c(1:6), lwd = 2)

위에 stars() 함수로 레이더 차트를 그리기는 했는데요, 이전 포스팅에서 fmsb Package의 radarchart() 함수로 그린 radar chart 보다는 가독성이 좀 떨어져보입니다.

여러개의 그룹을 레이더 차트 (radar chart)로 보려면 fmsb Package의 radarchart() 함수를 사용하고, 개별 그룹 단위로 분리해서 보려면 graphics Package의 stars() 함수로 별 그림 (star plot)을 그려서 보면 좋을 것 같습니다.

다음 번에는 평행 좌표 그림 (parallel coordinate plot)에 대해서 알아보겠습니다.

많은 도움 되었기를 바랍니다.

이번 포스팅이 도움이 되었다면 아래의 '공감 ~♡' 단추를 꾸욱 눌러주세요.^^

728x90

(3) 평행 좌표 그림 (parallel coordinate plot)

'R 분석과 프로그래밍 > R 그래프_시각화' 카테고리의 다른 글

R 다변량 그래프 (4) 3차원 산점도 (3 dimensional scatter plot) : scatterplot3d() (0)	2016.02.21
R 다변량 그래프 (3) 평행 좌표 그림 (parallel coordinate plot) : parcoord() (2)	2016.02.20
R 다변량 그래프 (1) 레이더 차트(radar chart), or 거미줄 그림(spider plot) (0)	2016.02.11
R Graphics 낮은 수준의 그래프 함수 (7) 다각형 추가 : polygon(x, y, ...) (0)	2016.01.12
R Graphics 낮은 수준의 그래프 함수 (6) 범례 추가 : legend(x, y, legend, ...) (0)	2016.01.10

Posted by Rfriend

R 다변량 그래프 (1) 레이더 차트(radar chart), or 거미줄 그림(spider plot)

R 분석과 프로그래밍/R 그래프_시각화 2016. 2. 11. 01:55

실전 업무에서는 다변량 데이터(multivariate data set)를 사용하는 경우가 다반사입니다. 그리고 분석업무 초반에 탐색적 분석을 수행할 때 시각화를 통해 변수들 간의 관계, 패턴을 탐색하는 분석 기법이 굉장히 유용합니다.

하지만 다변량 데이터 중에서도 특히 3개 이상의 변수를 가지는 다변량 데이터의 경우 그동안 소개해드렸던 히스토그램, 막대그림, 박스 그림, 산점도, 선그림/시계열 그림 등을 활용해서 2차원 평면에 나타낼 수 없는 한계가 있습니다. (물론, 색깔이라든지 모양을 데이터 그룹 별로 달리하면 3차원, 4차원의 정보를 시각화할 수 있기는 합니다만...)

변수 3개 이상의 다변량 데이터를 2차원 평면에 효과적으로 시각화할 수 있는 방법으로

(1) 레이더 차트 (radar chart) or 거미줄 그림(spider plot)

(2) 별 그림 (star graph) (레이더 차트와 유사, 중심점 다름)

(5) 체르노프 얼굴그림 (Chernoff faces)

등이 있습니다.

산점도 행렬(http://rfriend.tistory.com/83, http://rfriend.tistory.com/82)과 모자이크 그림(http://rfriend.tistory.com/71)은 이전 포스팅을 참고하시기 바랍니다.

이번 포스팅에서는 (1) 레이더 차트 (radar chart)에 대해서 소개하겠습니다. 방사형의 레이더 차트가 마치 거미줄을 닮아서 거미줄 그림 (spider plot)이라고도 부릅니다.

별 그림 (star plot) 도 레이더 차트와 형태는 거의 유사한데요, 약간 현태가 다른 점이 있고 stars 라는 R 패키지가 별도로 있고 해서 다음번에 따로 설명을 드리겠습니다.

R 실습에 사용할 데이터는 MASS 패키지에 내장되어 있는 Cars93 데이터프레임입니다. 분석 대상 변수로는 차 유형(Type), 가격(Price), 고속도로연비(MPG.highway), 마력(Horsepower), 분당회전수RPM(RPM), 길이(Length), 무게(Weight) 등의 7개 변수입니다.

> library(MASS)
> str(Cars93)
'data.frame':	93 obs. of  27 variables:
 $ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
 $ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
 $ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
 $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
 $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
 $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
 $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
 $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
 $ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
 $ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
 $ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
 $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
 $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
 $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
 $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
 $ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
 $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
 $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
 $ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
 $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
 $ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
 $ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
 $ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
 $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
 $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
 $ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
 $ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...
>

먼저 table() 함수를 사용하여 차 유형별 분할표를 만들어보았습니다. 6종류의 차 유형별로 10~20여대씩 분포하고 있음을 알 수 있습니다.

> # cross tabulation by car type

> table(Cars93$Type)

Compact   Large Midsize   Small  Sporty     Van 
     16      11      22      21      14       9

다음으로, 차 유형(Type)별로 가격(Price), 고속도로연비(MPG.highway), 마력(Housepower), 분당회전수RPM(RPM), 길이(Length), 무게(Weight) 등 6개 변수별 평균(mean)을 구해보겠습니다.

doBy package 의 summaryBy() 함수를 사용하면 연속형변수의 다변량 데이터에 일괄적으로 요약통계량을 편리하게 계산할 수 있습니다. Base package가 아니므로 install.packages("doBy")로 설치하시고, library(doBy)로 호출한 후에 summaryBy() 함수의 FUN = c(mean, min, max, sd, ...) 처럼 원하는 통계량 함수를 입력하면 됩니다. 이번에는 평균만 사용할 것이므로 아래 예에서는 FUN = c(mean) 만 입력하였습니다.

> # mean of multivariates by Car Type
> install.packages("doBy")
Installing package into ‘C:/Users/user/Documents/R/win-library/3.2’
(as ‘lib’ is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/doBy_4.5-14.zip'
Content type 'application/zip' length 3419973 bytes (3.3 MB)
downloaded 3.3 MB

package ‘doBy’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\user\AppData\Local\Temp\RtmpGGgn4j\downloaded_packages
> library(doBy)
필요한 패키지를 로딩중입니다: survival
Warning message:
패키지 ‘doBy’는 R 버전 3.2.3에서 작성되었습니다

> mean_by_Type <- summaryBy(MPG.highway + RPM + Horsepower + Weight + Length + Price ~ Type, 
+           data=Cars93, 
+           FUN = c(mean))

> mean_by_Type


     Type MPG.highway.mean RPM.mean Horsepower.mean Weight.mean Length.mean Price.mean
1 Compact         29.87500 5362.500        131.0000    2918.125    182.1250   18.21250
2   Large         26.72727 4672.727        179.4545    3695.455    204.8182   24.30000
3 Midsize         26.72727 5336.364        173.0909    3400.000    192.5455   27.21818
4   Small         35.47619 5633.333         91.0000    2312.857    167.1905   10.16667
5  Sporty         28.78571 5392.857        160.1429    2899.643    175.2143   19.39286
6     Van         21.88889 4744.444        149.4444    3830.556    185.6667   19.10000

다음으로, 레이더 차트(radar chart)를 그리려면 fmsb Package 를 사용합니다. install.packages("fmsb")로 설치하고, library(fmsb)로 호출해보겠습니다.

> install.packages("fmsb")
Installing package into ‘C:/Users/user/Documents/R/win-library/3.2’
(as ‘lib’ is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/fmsb_0.5.2.zip'
Content type 'application/zip' length 214358 bytes (209 KB)
downloaded 209 KB

package ‘fmsb’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\user\AppData\Local\Temp\RtmpGGgn4j\downloaded_packages
> library(fmsb)
Warning message:
패키지 ‘fmsb’는 R 버전 3.2.3에서 작성되었습니다

fmsb Package의 radarchart() 함수를 사용하기 위한 데이터 형태는

(1) 데이터 구조는 Dataframe

(2) 첫번째 행(1st row)에 최대값(max value)

(3) 두번째 행(2nd row)에 최소값(min value)

(4) 세번째 행부터는 원래의 관측값

이 오도록 데이터를 전처리해주어야 합니다.

[ fmsb Package의 radrchart() 함수 사용하기 위한 데이터 준비 ]

R 사용자정의함수로 첫번째 행에 최대값, 두번째 행에 최소값이 오도록 하여 Dataframe으로 묶는 명령어는 아래와 같습니다.

사용자정의함수에 더하여 scale() 함수를 사용해서 6개의 변수를 표준화 하였습니다.

> # manipulating dataset for radar chart

> # data frame includes possible maximum values as row 1 
> # and possible minimum values as row 2> df_radarchart <- function(df) {
+   df <- data.frame(df)
+   dfmax <- apply(df, 2, max) 
+   dfmin <- apply(df, 2, min) 
+   as.data.frame(rbind(dfmax, dfmin, df))
+ }
> 
> # maximum value as row 1, minimum value as row 2 : user-defined function df_radarchart
> # standardization : scale()
> mean_by_Type_trans <- df_radarchart(scale(mean_by_Type[,c(2:7)]))

> mean_by_Type_scale MPG.highway.mean RPM.mean Horsepower.mean Weight.mean Length.mean Price.mean 1 2.6145692 2.1399576 1.98554516 2.1439959 2.53293142 2.2793069 2 -2.4199059 -2.3321490 -2.73029381 -2.5089822 -2.31904165 -2.6344948 11 0.3636458 0.4429717 -0.50216519 -0.4509575 -0.18708691 -0.2596045 21 -0.3393415 -1.3321490 0.98554516 0.9078356 1.53293142 0.7806413 3 -0.3393415 0.3757101 0.79016106 0.3913731 0.60272622 1.2793069 4 1.6145692 1.1399576 -1.73029381 -1.5089822 -1.31904165 -1.6344948 5 0.1203738 0.5210954 0.39261423 -0.4832648 -0.71088103 -0.0579024 6 -1.4199059 -1.1475858 0.06413856 1.1439959 0.08135194 -0.1079465

드디어 radarchart() 함수를 사용해서 레이더 차트를 그려보겠습니다. 각 옵션에 대한 기능은 아래 R 명령어에 부가설명을 달아놓았습니다.

범례는 legend() 함수를 사용해서 왼쪽 상단에 추가하였습니다.

> # radar chart (or spider plot)

> radarchart(df = mean_by_Type_scale, # The data frame to be used to draw radarchart
+            seg = 6, # The number of segments for each axis 
+            pty = 16, # A vector to specify point symbol: Default 16 (closed circle)
+            pcol = 1:6, # A vector of color codes for plot data
+            plty = 1:6, # A vector of line types for plot data
+            plwd = 2, # A vector of line widths for plot data
+            title = c("radar chart by Car Types") # putting title at the top-middle
+            )
>

> # adding legend

> legend("topleft", legend = mean_by_Type$Type, col = c(1:6), lty = c(1:6), lwd = 2)

선의 형태(plty)나 선의 색깔(pcol)을 프로그래밍 편하라고 1:6 이라고 했는데요, 원하는 선 모양이나 색깔을 순서대로 지정할 수 있습니다.

다음번 포스팅에서는 별 그림(star graph)에 대해서 알아보겠습니다.

많은 도움 되어기를 바랍니다.

이번 포스팅이 도움이 되었다면 아래의 '공감 ~♡' 단추를 꾸욱 눌러주세요.^^

728x90