'sklearn.preprocessing.PolynomialFeatures()' 태그의 글 목록

'sklearn.preprocessing.PolynomialFeatures()'에 해당되는 글 1건

2016.12.21 [Python] 다항차수 변환, 교호작용 변수 생성 : sklearn.preprocessing.PolynomialFeatures()

[Python] 다항차수 변환, 교호작용 변수 생성 : sklearn.preprocessing.PolynomialFeatures()

Python 분석과 프로그래밍/Python 데이터 전처리 2016. 12. 21. 23:59

지난번 포스팅에서는

- Python sklearn.preprocessing.Binarizer()를 이용한 연속형 변수의 이항변수화(binarization)

- Python sklearn.preprocessing.OneHotEncoder()를 이용한 범주형 변수의 이항변수화

- Python np.digitize(), np.where() 를 이용한 연속형 변수의 이산형화(discretization)

에 대해서 알아보겠습니다.

이번 포스팅에서는 sklearn.preprocessing.PolynomialFeatures() 를 이용한 다항차수 변환, 교호작용 변수 생성에 대해서 소개하겠습니다.

회귀분석할 때 다항 차수를 이용해서 비선형 패턴, 관계(non-linear relation)를 나타내거나, 변수 간 곱을 사용해서 교호작용 효과(interaction effects)을 나타낼 수 있는 변수를 만듭니다.

먼저 필요한 모듈을 불러오고, 예제 Arrary Dataset을 만들어보겠습니다.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: from sklearn.preprocessing import PolynomialFeatures

In [4]: X = np.arange(6).reshape(3, 2)

In [5]: X

Out[5]:

array([[0, 1],
[2, 3],
[4, 5]])

(1) sklearn.preprocessing.PolynomialFeatures()를 사용해 2차항 변수 만들기

# making 2-order polynomial features

In [6]: poly = PolynomialFeatures(degree=2)

# transform from (x1, x2) to (1, x1, x2, x1^2, x1*x2, x2^2)

In [7]: poly.fit_transform(X)

Out[7]:

array([[ 1.,   0.,   1.,   0.,   0.,   1.],
       [ 1.,   2.,   3.,   4.,   6.,   9.],
       [ 1.,   4.,   5., 16., 20., 25.]])

변수가 3개인 경우에는 다차항 변수와 교호작용 변수의 조합이 더 많아집니다(아래 예는 degree=2 로서 2차항 변수 & 교호작용 변수 생성 예시). 변수가 추가 될 때마다 조합의 경우의 수가 기하급수적으로 늘어나게 되므로 유의할 필요가 있습니다.

In [8]: X2 = np.arange(9).reshape(3, 3)

In [9]: X2

Out[9]:

array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])

# transform from (x1, x2, x3) to (1, x1, x2, x3, x1^2, x1*x2, x1*x3, x2^2, x2*x3, x3^2)

In [10]: poly.fit_transform(X2)

Out[10]:

array([[ 1.,   0.,   1.,   2.,   0.,   0.,   0.,   1.,   2.,   4.],
       [ 1.,   3.,   4.,   5.,   9., 12., 15., 16., 20., 25.],
       [ 1.,   6.,   7.,   8., 36., 42., 48., 49., 56., 64.]])

(2) 교호작용 변수만을 만들기 : interaction_only=True

다항차수는 적용하지 않고, 오직 교호작용 효과만을 분석하려면 interaction_only=True 옵션을 설정해주면 됩니다. degree를 가지고 교호작용을 몇 개 수준까지 볼 지 설정해줄 수 있습니다.

In [11]: X2

Out[11]:

array([[0, 1, 2],

[3, 4, 5],

[6, 7, 8]])

# transform from (x1, x2, x3) to (1, x1, x2, x3, x1*x2, x1*x3, x2*x3)

In [12]: poly_d2 = PolynomialFeatures(degree=2, interaction_only=True)

In [13]: poly_d2.fit_transform(X2)

Out[13]:

array([[ 1.,   0.,   1.,   2.,   0.,   0.,   2.],
       [ 1.,   3.,   4.,   5., 12., 15., 20.],
       [ 1.,   6.,   7.,   8., 42., 48., 56.]])

# transform from (x1, x2, x3) to (1, x1, x2, x3, x1*x2, x1*x3, x2*x3, x1*x2*x3)

In [14]: poly_d3 = PolynomialFeatures(degree=3, interaction_only=True)

In [15]: poly_d3.fit_transform(X2)

Out[15]:

array([[   1.,    0.,    1.,    2.,    0.,    0.,    2.,    0.],
       [   1.,    3.,    4.,    5.,   12.,   15.,   20.,   60.],
       [   1.,    6.,    7.,    8.,   42.,   48.,   56., 336.]])

이상으로 sklearn.preprocessing.PolynomialFeatures()를 이용한 다항차수 변환, 교호작용 변수 생성 방법 소개를 마치겠습니다.

다음번 포스팅에서는 데이터셋 재구조화(pivot, reshape)에 대해서 알아보겠습니다.

많은 도움 되었기를 바랍니다.

728x90

저작자표시 비영리 변경금지

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

[Python pandas] 데이터 재구조화(reshaping data) : pd.DataFrame.stack(), pd.DataFrame.unstack() (2)	2016.12.24
[Python pandas] 데이터 재구조화 (reshaping) : data.pivot(), pd.pivot_table(data) (5)	2016.12.23
[Python] 연속형 변수의 이산형화(discretization) : np.digitize(data, bins), pd.get_dummies(), np.where(condition, 'factor1', 'factor2', ...) (0)	2016.12.20
[Python] 범주형 변수의 이항변수화 : sklearn.preprocessing.OneHotEncoder() (0)	2016.12.18
[Python] 이항변수화 변환 (Binarization) : sklearn.preprocessing.Binarizer(), sklearn.preporcessing.binarize() (0)	2016.12.17

Posted by Rfriend

이전 1 다음

R, Python 분석과 프로그래밍의 친구 (by R Friend)

'sklearn.preprocessing.PolynomialFeatures()'에 해당되는 글 1건

[Python] 다항차수 변환, 교호작용 변수 생성 : sklearn.preprocessing.PolynomialFeatures()

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바