이번 포스팅에서는 Python 을 이용해서 의사결정나무 (Decision Tree) 를 시각화하는 3가지 방법을 소개하겠습니다. 

 

(1) sklearn.tree.export_text 메소드를 이용해서 의사결정나무를 텍스트로 인쇄하기 (text representation)

(2) sklearn.tree.plot_tree 메소드와 matplotlib 을 이용해서 의사결정나무 시각화

(3) sklearn.tree.export_graphviz 메소드와 graphviz 를 이용해서 의사결정나무 시각화

 

 

먼저, 예제로 사용할 iris 데이터셋과 필요한 모듈을 불어오겠습니다. 

 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 

df = pd.DataFrame({
    'x1': [1, 2, 2, 2, 2, 3, 3, 4, 5, 5], 
    'x2': [2, 2, 3, 4, 5, 4, 6, 3, 2, 5], 
    'cat': [0, 1, 1, 0, 0, 1, 0, 1, 1, 0]
})


df

# x1	x2	cat
# 0	1	2	0
# 1	2	2	1
# 2	2	3	1
# 3	2	4	0
# 4	2	5	0
# 5	3	4	1
# 6	3	6	0
# 7	4	3	1
# 8	5	2	1
# 9	5	5	0


## scatter plot
import seaborn as sns

plt.figure(figsize=(7, 7))
sns.scatterplot(x='x1', y='x2', hue='cat', style='cat', s=100, data=df)
plt.show()

scatter plot by group

 

 

 

(0) sklearn.tree 메소드를 이용해서 의사결정나무(Decision Tree) 모델 훈련하기

 

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
from sklearn import tree

## getting X, y values
X = df[['x1', 'x2']]
y = df['cat']

## initiating DecisionTreeClassifer method
dt_clf = DecisionTreeClassifier(random_state = 1004)


## fitting a decision tree classifier
dt_clf_model = dt_clf.fit(X, y)


## feature importances
dt_clf_model.feature_importances_

#array([0.43809524, 0.56190476])

 

 

 

(1) sklearn.tree.export_text 메소드를 이용해서 의사결정나무를 텍스트로 인쇄하기 (text representation)

 

의사결정나무 모델을 적합한 후에 사용자 인터페이스가 없는 애플리케이션에서 로그(log) 텍스트로 모델 적합 결과를 확인할 때 사용할 수 있습니다. 또는 의사결정나무의 적합된 룰을 SQL 로 구현하여 운영 모드로 적용하고자 할 때 아래처럼 텍스트로 인쇄를 하면 유용하게 사용할 수 있습니다. 

 

## Text Representation
dt_clf_model_text = tree.export_text(dt_clf_model)

print(dt_clf_model_text)

|--- feature_1 <= 4.50
|   |--- feature_0 <= 1.50
|   |   |--- class: 0
|   |--- feature_0 >  1.50
|   |   |--- feature_1 <= 3.50
|   |   |   |--- class: 1
|   |   |--- feature_1 >  3.50
|   |   |   |--- feature_0 <= 2.50
|   |   |   |   |--- class: 0
|   |   |   |--- feature_0 >  2.50
|   |   |   |   |--- class: 1
|--- feature_1 >  4.50
|   |--- class: 0

 

 

 

(2) sklearn.tree.plot_tree 메소드와 matplotlib 을 이용해서 의사결정나무 시각화

 

(3)번 처럼 Graphviz 를 설치하지 않아도, matplotlib 을 사용해서 간편하게 의사결정나무를 시각화 할 때 유용합니다. 편리한 대신에 (3)번 Graphviz 대비 자유도가 떨어지는 단점이 있습니다. 

 

## Plot Tree with plot_tree
fig = plt.figure(figsize=(15, 8))
_ = tree.plot_tree(dt_clf_model, 
                  feature_names=['x1', 'x2'],
                  class_names=['0', '1'],
                  filled=True)

 

 

 

(3) sklearn.tree.export_graphviz 메소드와 graphviz 를 이용해서 의사결정나무 시각화

 

Graphviz 의 시각화 기능을 fully 사용할 수 있으므로 매우 강력한 시각화 방법이라고 할 수 있습니다. 

Graphviz 설치하고 Decision Tree 시각화하기는 https://rfriend.tistory.com/382 를 참고하세요. 

DOT language 에 대해서는 https://graphviz.org/doc/info/lang.html 를 참고하세요. 

 

## Visualizing Tree using Graphviz
from sklearn import tree
import graphviz

## exporting tree in DOT format
## refer to: https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html
tree_dot = tree.export_graphviz(
    dt_clf_model, 
    feature_names=['x1', 'x2'], 
    class_names=['0', '1'],
    filled=True
)


## draw graph using Graphviz
dt_graph = graphviz.Source(tree_dot, format='png')
dt_graph

 

decition tree visualization using graphviz

 

 

이번 포스팅이 많은 도움이 되었기를 바랍니다. 

행복한 데이터 과학자 되세요~!

 

 

728x90
반응형
Posted by Rfriend
,

Graphviz AT&T Bell Labs에서 만든 오픈소스 시각화 소프트웨어입니다. Graphviz 구조화된 정보를 추상화된 그래프나 네트워크 형태의 다이어그램으로 제시 해줍니다. 가령, 기계학습의 Decision Tree 학습 결과를 Tree 형태로 시각화 한다든지, Process Mining 통해 찾은 workflow 방향성 있는 네트워크 형태로 시각화 Graphviz 사용할 있습니다. 




PyGraphviz 는 Python으로 Graphviz 소프트웨어를 사용할 수 있게 인터페이스를 해주는 Python 패키지입니다. PyGraphviz를 사용하여 Graphviz 그래프의 데이터 구조와 배열 알고리즘에 접근하여 그래프를 생성, 편집, 읽기, 쓰기, 그리기 등을 할 수 있습니다. 



Python으로 Graphviz를 사용하려면 (a) 먼저 Graphviz S/W를 설치하고, (b) 다음으로 PyGraphviz를 설치해야 합니다.  만약 순서가 바뀌어서 Graphviz 소프트웨어를 설치하지 않은 상태에서 PyGraphviz를 설치하려고 하면 Graphviz를 먼저 설치하라는 에러 메시지가 뜰 겁니다. 순서가 중요합니다! 


  Your Graphviz installation could not be found.

  

          1) You don't have Graphviz installed:

             Install Graphviz (http://graphviz.org) 




이번 포스팅에서는 


(1) Mac OS High Sierra Graphviz 소프트웨어 설치하기

(2) Python 2.7 PyGraphviz library 설치하기

(3) Graphviz와 PyGraphviz를 사용하여 Decision Tree 시각화 해보기


에 대해서 소개하겠습니다. 



  (1) Mac OS High Sierra  Graphviz 소프트웨어 설치하기


(참고로, 저는 Mac OS High Sierra version 10.13.6을 사용하고 있습니다.)


(1-1) Homebrew 를 설치합니다. 

Homebrew는 애플 Mac OS 에서 소프트웨어 패키지를 설치를 간소화해주는 소프트웨어 패키지 관리 오픈소스 툴입니다. 터미널을 하나 열고 아래의 코드를 복사해서 실행하면 됩니다. 


$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null



ihongdon-ui-MacBook-Pro:~ ihongdon$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
==> This script will install:
/usr/local/bin/brew
/usr/local/share/doc/homebrew
/usr/local/share/man/man1/brew.1
/usr/local/share/zsh/site-functions/_brew
/usr/local/etc/bash_completion.d/brew
/usr/local/Homebrew
==> The following existing directories will be made group writable:
/usr/local/bin
/usr/local/include
/usr/local/lib
/usr/local/share
/usr/local/lib/pkgconfig
/usr/local/share/info
/usr/local/share/man
/usr/local/share/man/man1
/usr/local/share/man/man3
/usr/local/share/man/man5
/usr/local/share/man/man7
==> The following existing directories will have their owner set to ihongdon:
/usr/local/bin
/usr/local/include
/usr/local/lib
/usr/local/share
/usr/local/lib/pkgconfig
/usr/local/share/info
/usr/local/share/man
/usr/local/share/man/man1
/usr/local/share/man/man3
/usr/local/share/man/man5
/usr/local/share/man/man7
==> The following existing directories will have their group set to admin:
/usr/local/bin
/usr/local/include
/usr/local/lib
/usr/local/share
/usr/local/lib/pkgconfig
/usr/local/share/info
/usr/local/share/man
/usr/local/share/man/man1
/usr/local/share/man/man3
/usr/local/share/man/man5
/usr/local/share/man/man7
==> The following new directories will be created:
/usr/local/Cellar
/usr/local/Homebrew
/usr/local/Frameworks
/usr/local/etc
/usr/local/opt
/usr/local/sbin
/usr/local/share/zsh
/usr/local/share/zsh/site-functions
/usr/local/var
==> /usr/bin/sudo /bin/chmod u+rwx /usr/local/bin /usr/local/include /usr/local/lib /usr/local/share /usr/local/lib/pkgconfig /usr/local/share/info /usr/local/share/man /usr/local/share/man/man1 /usr/local/share/man/man3 /usr/local/share/man/man5 /usr/local/share/man/man7
Password:
==> /usr/bin/sudo /bin/chmod g+rwx /usr/local/bin /usr/local/include /usr/local/lib /usr/local/share /usr/local/lib/pkgconfig /usr/local/share/info /usr/local/share/man /usr/local/share/man/man1 /usr/local/share/man/man3 /usr/local/share/man/man5 /usr/local/share/man/man7
==> /usr/bin/sudo /usr/sbin/chown ihongdon /usr/local/bin /usr/local/include /usr/local/lib /usr/local/share /usr/local/lib/pkgconfig /usr/local/share/info /usr/local/share/man /usr/local/share/man/man1 /usr/local/share/man/man3 /usr/local/share/man/man5 /usr/local/share/man/man7
==> /usr/bin/sudo /usr/bin/chgrp admin /usr/local/bin /usr/local/include /usr/local/lib /usr/local/share /usr/local/lib/pkgconfig /usr/local/share/info /usr/local/share/man /usr/local/share/man/man1 /usr/local/share/man/man3 /usr/local/share/man/man5 /usr/local/share/man/man7
==> /usr/bin/sudo /bin/mkdir -p /usr/local/Cellar /usr/local/Homebrew /usr/local/Frameworks /usr/local/etc /usr/local/opt /usr/local/sbin /usr/local/share/zsh /usr/local/share/zsh/site-functions /usr/local/var
==> /usr/bin/sudo /bin/chmod g+rwx /usr/local/Cellar /usr/local/Homebrew /usr/local/Frameworks /usr/local/etc /usr/local/opt /usr/local/sbin /usr/local/share/zsh /usr/local/share/zsh/site-functions /usr/local/var
==> /usr/bin/sudo /bin/chmod 755 /usr/local/share/zsh /usr/local/share/zsh/site-functions
==> /usr/bin/sudo /usr/sbin/chown ihongdon /usr/local/Cellar /usr/local/Homebrew /usr/local/Frameworks /usr/local/etc /usr/local/opt /usr/local/sbin /usr/local/share/zsh /usr/local/share/zsh/site-functions /usr/local/var
==> /usr/bin/sudo /usr/bin/chgrp admin /usr/local/Cellar /usr/local/Homebrew /usr/local/Frameworks /usr/local/etc /usr/local/opt /usr/local/sbin /usr/local/share/zsh /usr/local/share/zsh/site-functions /usr/local/var
==> /usr/bin/sudo /bin/mkdir -p /Users/ihongdon/Library/Caches/Homebrew
==> /usr/bin/sudo /bin/chmod g+rwx /Users/ihongdon/Library/Caches/Homebrew
==> /usr/bin/sudo /usr/sbin/chown ihongdon /Users/ihongdon/Library/Caches/Homebrew
==> /usr/bin/sudo /bin/mkdir -p /Library/Caches/Homebrew
==> /usr/bin/sudo /bin/chmod g+rwx /Library/Caches/Homebrew
==> /usr/bin/sudo /usr/sbin/chown ihongdon /Library/Caches/Homebrew
==> Downloading and installing Homebrew...
HEAD is now at 1c7c876f3 Merge pull request #4736 from scpeters/bottle_json_local_filename
==> Homebrew is run entirely by unpaid volunteers. Please consider donating:
https://github.com/Homebrew/brew#donations
==> Tapping homebrew/core
ihongdon-ui-MacBook-Pro:~ ihongdon$
ihongdon-ui-MacBook-Pro:~ ihongdon$

 




(1-2) Graphviz를 설치합니다.


Homebrew를 설치하였으면,이제 아래의 Homebrew 코드를 터미널에서 실행하여 Graphviz를 설치해줍니다.


$ brew install graphviz




ihongdon-ui-MacBook-Pro:~ ihongdon$
ihongdon-ui-MacBook-Pro:~ ihongdon$ brew install graphviz
==> Installing dependencies for graphviz: libtool, libpng, freetype, fontconfig, jpeg, libtiff, webp, gd
==> Installing graphviz dependency: libtool
==> Downloading https://homebrew.bintray.com/bottles/libtool-2.4.6_1.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libtool--2.4.6_1.high_sierra.bottle.tar.gz
==> Caveats
In order to prevent conflicts with Apple's own libtool we have prepended a "g"
so, you have instead: glibtool and glibtoolize.
==> Summary
🍺 /usr/local/Cellar/libtool/2.4.6_1: 71 files, 3.7MB
==> Installing graphviz dependency: libpng
==> Downloading https://homebrew.bintray.com/bottles/libpng-1.6.35.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libpng--1.6.35.high_sierra.bottle.tar.gz
🍺 /usr/local/Cellar/libpng/1.6.35: 26 files, 1.2MB
==> Installing graphviz dependency: freetype
==> Downloading https://homebrew.bintray.com/bottles/freetype-2.9.1.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring freetype--2.9.1.high_sierra.bottle.tar.gz
🍺 /usr/local/Cellar/freetype/2.9.1: 60 files, 2.6MB
==> Installing graphviz dependency: fontconfig
==> Downloading https://homebrew.bintray.com/bottles/fontconfig-2.13.0.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring fontconfig--2.13.0.high_sierra.bottle.tar.gz
==> Regenerating font cache, this may take a while
==> /usr/local/Cellar/fontconfig/2.13.0/bin/fc-cache -frv
🍺 /usr/local/Cellar/fontconfig/2.13.0: 511 files, 3.2MB
==> Installing graphviz dependency: jpeg
==> Downloading https://homebrew.bintray.com/bottles/jpeg-9c.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring jpeg--9c.high_sierra.bottle.tar.gz
🍺 /usr/local/Cellar/jpeg/9c: 21 files, 724.5KB
==> Installing graphviz dependency: libtiff
==> Downloading https://homebrew.bintray.com/bottles/libtiff-4.0.9_4.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libtiff--4.0.9_4.high_sierra.bottle.tar.gz
🍺 /usr/local/Cellar/libtiff/4.0.9_4: 246 files, 3.5MB
==> Installing graphviz dependency: webp
==> Downloading https://homebrew.bintray.com/bottles/webp-1.0.0.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring webp--1.0.0.high_sierra.bottle.tar.gz
🍺 /usr/local/Cellar/webp/1.0.0: 38 files, 2MB
==> Installing graphviz dependency: gd
==> Downloading https://homebrew.bintray.com/bottles/gd-2.2.5.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring gd--2.2.5.high_sierra.bottle.tar.gz
🍺 /usr/local/Cellar/gd/2.2.5: 35 files, 1.1MB
==> Installing graphviz
==> Downloading https://homebrew.bintray.com/bottles/graphviz-2.40.1.high_sierra.bottle.1.tar.gz
######################################################################## 100.0%
==> Pouring graphviz--2.40.1.high_sierra.bottle.1.tar.gz
🍺 /usr/local/Cellar/graphviz/2.40.1: 500 files, 11.2MB
==> Caveats
==> libtool
In order to prevent conflicts with Apple's own libtool we have prepended a "g"
so, you have instead: glibtool and glibtoolize.
ihongdon-ui-MacBook-Pro:~ ihongdon$
ihongdon-ui-MacBook-Pro:~ ihongdon$

 





  (2) Python 2.7 PyGraphviz library 설치하기



PyGraphviz는 Python 3.x 버전, 그리고 Python 2.7 버전에서 사용할 수 있습니다. 이번 포스팅에서는 Python 2.7 버전에 PyGraphviz 패키지를 설치해보겠습니다. 


conda env list로 가상환경 리스트를 확인하고, source activate 로 Python2.7 버전의 가상환경을 활성화시킨 후에, pip install --upgrade pip 로 pip 버전 업그레이드 한 후에, easy_install pygraphviz 로 PyGraphviz를 설치하였습니다. 


왜 그런지 이유는 모르겠으나 pip install pygraphviz 로 설치하려고 하니 설치가 되다가 막판에 에러가 났습니다. 

pip install git://github.com/pygraphviz/pygraphviz.git 도 시도를 해봤는데 역시 에러가 났습니다. 

다행히 easy_install pygrapviz 로 설치가 되네요. 




ihongdon-ui-MacBook-Pro:~ ihongdon$ conda env list

# conda environments:

#

base                  *  /Users/ihongdon/anaconda3

py2.7_tf1.4              /Users/ihongdon/anaconda3/envs/py2.7_tf1.4

py3.5_tf1.4              /Users/ihongdon/anaconda3/envs/py3.5_tf1.4

ihongdon-ui-MacBook-Pro:~ ihongdon$ 

ihongdon-ui-MacBook-Pro:~ ihongdon$ source activate py2.7_tf1.4

(py2.7_tf1.4) ihongdon-ui-MacBook-Pro:~ ihongdon$ pip install --upgrade pip

Requirement already up-to-date: pip in ./anaconda3/envs/py2.7_tf1.4/lib/python2.7/site-packages (18.0)

(py2.7_tf1.4) ihongdon-ui-MacBook-Pro:~ ihongdon$ 

(py2.7_tf1.4) ihongdon-ui-MacBook-Pro:~ ihongdon$ 

(py2.7_tf1.4) ihongdon-ui-MacBook-Pro:~ ihongdon$ easy_install pygraphviz

Searching for pygraphviz

Reading https://pypi.python.org/simple/pygraphviz/

Downloading https://files.pythonhosted.org/packages/87/5e/40efbb2d02ee9d0282f6c8b9e477f6444a025a7ecf8cc0b15fe87a288708

/pygraphviz-1.4rc1.zip#sha256=e0b3a7f1d9203f9748b94e8365656755201966b562e53fd6424bed89e98fdc4e

Best match: pygraphviz 1.4rc1

Processing pygraphviz-1.4rc1.zip

Writing /var/folders/6q/mtq6ftrj6_z4txn_zsxcfyxc0000gn/T/easy_install-U3TQDK/pygraphviz-1.4rc1/setup.cfg

Running pygraphviz-1.4rc1/setup.py -q bdist_egg --dist-dir /var/folders/6q/mtq6ftrj6_z4txn_zsxcfyxc0000gn/T/easy_install-U3TQDK/pygraphviz-1.4rc1/egg-dist-tmp-wc0yf4

warning: no previously-included files matching '*~' found anywhere in distribution

warning: no previously-included files matching '*.pyc' found anywhere in distribution

warning: no previously-included files matching '.svn' found anywhere in distribution

no previously-included directories found matching 'doc/build'

pygraphviz/graphviz_wrap.c:3354:12: warning: incompatible pointer to integer conversion returning 'Agsym_t *' (aka 'struct Agsym_s *') from a function with result type 'int' [-Wint-conversion]

    return agattr(g, kind, name, val);

           ^~~~~~~~~~~~~~~~~~~~~~~~~~

pygraphviz/graphviz_wrap.c:3438:7: warning: unused variable 'fd1' [-Wunused-variable]

  int fd1 ;

      ^

pygraphviz/graphviz_wrap.c:3439:13: warning: unused variable 'mode_obj1' [-Wunused-variable]

  PyObject *mode_obj1 ;

            ^

pygraphviz/graphviz_wrap.c:3440:13: warning: unused variable 'mode_byte_obj1' [-Wunused-variable]

  PyObject *mode_byte_obj1 ;

            ^

pygraphviz/graphviz_wrap.c:3441:9: warning: unused variable 'mode1' [-Wunused-variable]

  char *mode1 ;

        ^

pygraphviz/graphviz_wrap.c:3509:7: warning: unused variable 'fd2' [-Wunused-variable]

  int fd2 ;

      ^

pygraphviz/graphviz_wrap.c:3510:13: warning: unused variable 'mode_obj2' [-Wunused-variable]

  PyObject *mode_obj2 ;

            ^

pygraphviz/graphviz_wrap.c:3511:13: warning: unused variable 'mode_byte_obj2' [-Wunused-variable]

  PyObject *mode_byte_obj2 ;

            ^

pygraphviz/graphviz_wrap.c:3512:9: warning: unused variable 'mode2' [-Wunused-variable]

  char *mode2 ;

        ^

9 warnings generated.

zip_safe flag not set; analyzing archive contents...

pygraphviz.graphviz: module references __file__

pygraphviz.release: module references __file__

pygraphviz.tests.test: module references __file__

creating /Users/ihongdon/anaconda3/envs/py2.7_tf1.4/lib/python2.7/site-packages/pygraphviz-1.4rc1-py2.7-macosx-10.6-x86_64.egg

Extracting pygraphviz-1.4rc1-py2.7-macosx-10.6-x86_64.egg to /Users/ihongdon/anaconda3/envs/py2.7_tf1.4/lib/python2.7/site-packages

Adding pygraphviz 1.4rc1 to easy-install.pth file


Installed /Users/ihongdon/anaconda3/envs/py2.7_tf1.4/lib/python2.7/site-packages/pygraphviz-1.4rc1-py2.7-macosx-10.6-x86_64.egg

Processing dependencies for pygraphviz

Finished processing dependencies for pygraphviz

(py2.7_tf1.4) ihongdon-ui-MacBook-Pro:~ ihongdon$ 

(py2.7_tf1.4) ihongdon-ui-MacBook-Pro:~ ihongdon$ 







  (3) Graphviz와 PyGraphviz를 사용하여 Decision Tree 시각화 해보기



이제 Graphviz와 PyGraphviz를 사용해서 Jupyter Notebook 에서 Decision Tree를 시각화해보겠습니다. Iris 데이터셋을 사용해서 Decision Tree로 Iris 종류 분류하는 예제입니다. 







# Common imports

import numpy as np

import os


# To plot pretty figures

%matplotlib inline

import matplotlib

import matplotlib.pyplot as plt


# Where to save the figures

PROJECT_ROOT_DIR = "."

SUB_DIR = "decision_trees"


def image_path(fig_id):

    return os.path.join(PROJECT_ROOT_DIR, "images", SUB_DIR, fig_id)

 


# Iris dataset import

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier


iris = load_iris()
list(iris.keys())
['target_names', 'data', 'target', 'DESCR', 'feature_names']


print(iris.DESCR)
Iris Plants Database
====================

Notes
-----
Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
    :Summary Statistics:

    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)
    ============== ==== ==== ======= ===== ====================

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML iris datasets.
http://archive.ics.uci.edu/ml/datasets/Iris

The famous Iris database, first used by Sir R.A Fisher

This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

References
----------
   - Fisher,R.A. "The use of multiple measurements in taxonomic problems"
     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
     Mathematical Statistics" (John Wiley, NY, 1950).
   - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
     Structure and Classification Rule for Recognition in Partially Exposed
     Environments".  IEEE Transactions on Pattern Analysis and Machine
     Intelligence, Vol. PAMI-2, No. 1, 67-71.
   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
     on Information Theory, May 1972, 431-433.
   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
     conceptual clustering system finds 3 classes in the data.
   - Many, many more ...

 

 


iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']


iris.data[:5,]

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2]])

 





Scikit Learn 의 DecisionTreeClassifier 클래스를 사용하여 iris 분류 모델을 적합시켜 보겠습니다. 





X = iris.data[:, 2:] # petal length and width

y = iris.target

 

# Train the model

tree_clf = DecisionTreeClassifier(max_depth=2)

tree_clf.fit(X, y)






위의 Decision Tree 모형을 export_grapviz() 함수를 사용하여 graphviz의 dot format 파일로 내보내기(export)해보겠습니다. 





# Visualization

from sklearn.tree import export_graphviz


export_graphviz(

        tree_clf,

        out_file=image_path("iris_tree.dot"),

        feature_names=iris.feature_names[2:],

        class_names=iris.target_names,

        rounded=True,

        filled=True

    )

 




위의 export_graphviz() 코드를 실행시키면 ""/Users/ihongdon/images/decision_trees/" 폴더에 iris_tree.dot 파일이 생성됩니다. 이 dot format 파일을 워드나 노트패드를 사용해서 열어보면 아래와 같이 되어있습니다. 



digraph Tree {
node [shape=box, style="filled, rounded", color="black", fontname=helvetica] ;
edge [fontname=helvetica] ;
0 [label="petal width (cm) <= 0.8\ngini = 0.667\nsamples = 150\nvalue = [50, 50, 50]\nclass = setosa", fillcolor="#e5813900"] ;
1 [label="gini = 0.0\nsamples = 50\nvalue = [50, 0, 0]\nclass = setosa", fillcolor="#e58139ff"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="petal width (cm) <= 1.75\ngini = 0.5\nsamples = 100\nvalue = [0, 50, 50]\nclass = versicolor", fillcolor="#39e58100"] ;
0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
3 [label="gini = 0.168\nsamples = 54\nvalue = [0, 49, 5]\nclass = versicolor", fillcolor="#39e581e5"] ;
2 -> 3 ;
4 [label="gini = 0.043\nsamples = 46\nvalue = [0, 1, 45]\nclass = virginica", fillcolor="#8139e5f9"] ;
2 -> 4 ;
}

 





마지막으로, 위의 iris_tree.dot 의 dot format파일을 가지고 pygraphviz 패키지를 사용하여 Decision Tree를 시각화해보겠습니다. 





import pygraphviz as pgv

from IPython.display import Image

graph = pgv.AGraph("/Users/ihongdon/images/decision_trees/iris_tree.dot")

graph.draw('iris_tree_out.png', prog='dot')

Image('iris_tree_out.png')

 




[Reference]

  • Graphviz: https://graphviz.gitlab.io/
  • PyGraphviz: https://pygraphviz.github.io/


많은 도움이 되었기를 바랍니다. 


728x90
반응형
Posted by Rfriend
,