[Python NumPy] 다차원 배열 ndarray 만들기

Python 분석과 프로그래밍/Python 데이터 전처리 2017. 1. 14. 22:57

그동안 Python Pandas의 다양한 method, 함수들에 대해서 알아보았습니다.

이번 포스팅부터는 Python NumPy에 대해서 연재를 해볼까 합니다. 그동안의 Pandas 포스팅에서 항상 빠지지않고 더불어 불러왔던 모듈이 바로 NumPy 입니다. [ import numpy as np ] 라고 (거의) 항상 첫번째 줄에 쓰곤했었지요.

약방에서 약 제조 시 빠지지 않고 넣는게 '감초'라면, Python을 가지고 데이터 분석을 할 때 거의 빠지지 않고 등장하는 모듈이 NumPy라고 보면 되겠습니다. (라티댄스의 퀵~퀵~슬로우~ 스텝이 NumPy, 파트너와 추는 패턴은 Pandas라고 비유해볼 수도 있겠네요. 기본 스텝이 불안정하면 춤을 제대로 출 수가 없지요!)

NumPy (Numerical Python)로 ndarray 라는 매우 빠르고 공간 효율적며 벡터 연산(vectorized arithmetic operations)이 가능한 다차원 배열(n-dimentional array)을 만들 수 있습니다.

그리고 NumPy는 loop 프로그래밍 없이 전체 배열에 대해서 표준 수학 함수를 매우 빠른 속도로 수행할 수 있습니다. 예전에 250만개 row를 가진 DataFrame에 대해서 데이터 전처리하면서 loop 함수를 썼더니 4시간이 지나도 끝나지가 않던것을요, NumPy 함수를 썼더니 단 몇 분만에 끝나서 깜짝 놀란 적이 있습니다. (다르게 얘기하면, Python에서 Loop 돌리는게 속도, 성능에는 아주 안좋습니다. 대용량 데이터에 Loop 쓸 때는 성능 이슈 고려하시길... Loop 안쓰고 NumPy 함수로 대신할 수 있는지 꼭 확인해보세요.)

선형대수(Linear Algebra), 무작위 난수 생성(Random number generation) 등에 NumPy를 사용합니다.

이번 포스팅에서는 다양한 형태의 ndarray를 생성하는 방법에 대해서 알아보겠습니다.

ndarray 의 행과 열 내의 모든 원소는 동일한 형태의 데이터(all of the elements must be the same type)를 가져야 합니다.

[ Creating NumPy's ndarrary ]

* array image source: https://www.pinterest.com/pin/342062534176033515/

NumPy 모듈을 불어올 때 애칭(alias name)으로 'np' 를 사용합니다.

# importing numpy module

In [1]: import numpy as np

NumPy의 array() 로 ndarray 를 만들어보겠습니다.

# making ndarrays : np.array() function

In [2]: arr1 = np.array([1, 2, 3, 4, 5])

In [3]: arr1

Out[3]: array([1, 2, 3, 4, 5])

list 객체를 먼저 만들고 np.array() 를 사용해서 배열로 변환해도 됩니다.

In [4]: data_list = [6, 7, 8, 9, 10] # a list

In [5]: arr2 = np.array(data_list) # converting a list into an array

In [6]: arr2

Out[6]: array([ 6, 7, 8, 9, 10])

같은 길이의 여러개의 리스트(lists)를 가지고 있는 리스트(a list)도 np.array()를 사용해서 배열(array)로 변환할 수 있습니다.

# a list of equal-length lists

In [7]: data_lists = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]

# converting a list of equal-length lists into a multi-dimensional array

In [8]: arr12 = np.array(data_lists)

In [9]: arr12

Out[9]:

array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])

다차원 배열의 차원과 그룹 안의 원소들의 크기를 확인할 때 array.shape 함수를 사용합니다. shape에 indexing 을 적용하면 차원, 원소 크기를 각 각 선택해서 반환합니다 (종종 사용함)

array.dtype 함수를 사용해서 데이터 유형을 확인할 수 있습니다.

# a tuple indicating the size of each dimension

In [10]: arr12.shape # (2, 5)

Out[10]: (2, 5)

# indexing ndarray.shape[0], ndarray.shape[1]

In [11]: arr12.shape[0] # 2 : n-dimension

Out[11]: 2

In [12]: arr12.shape[1] # 5 : size of elements in nested sequence array

Out[12]: 5

# an object describing the data type of the array

In [13]: arr12.dtype # dtype('int32')

Out[13]: dtype('int32')

np.asarray() 를 사용해서 array 를 만들기도 합니다.

# np.asarray : Convert the input to an array

In [14]: a = [1, 2, 3, 4, 5] # a list

In [15]: type(a)

Out[15]: list

In [16]: a = np.asarray(a)

In [17]: type(a)

Out[17]: numpy.ndarray

대신 np.array()와는 달리 np.asarray()는 이미 ndarray가 있다면 복사(copy)를 하지 않습니다.

# Existing arrays are not copied

In [18]: b = np.array([1, 2])

In [19]: np.asarray(b) is b

Out[19]: True

만약 데이터 형태(data type)이 이미 설정이 되어 있다면, np.asarray() 를 사용해서 array로 변환하려고 할 경우 데이터 형태가 다를 경우에만 복사(copy)가 됩니다.

# If dtype is set, array is copied only if dtype does not match

In [20]: c = np.array([1, 2], dtype=np.float32)

In [21]: np.asarray(c, dtype=np.float32) is c # not copied

Out[21]: True

In [22]: np.asarray(c, dtype=np.float64) is c # copied

Out[22]: False

float 데이터 형태를 원소로 가지는 배열을 만들고 싶을 때는 np.asfarray() 함수를 사용합니다.

# np.asfarrary: Convert input to a floating point ndarray

In [23]: d = [6, 7, 8, 9, 10]

In [24]: np.asfarray(d)

Out[24]: array([ 6., 7., 8., 9., 10.])

np.asarray_chkfinite() 함수를 쓰면 배열로 만들려고 하는 데이터 input에 결측값(NaN)이나 무한수(infinite number)가 들어있을 경우 'ValueError'를 반환하게 할 수 있습니다. np.nan 으로 결측값을 넣어보고, np.inf 로 무한수를 추가해서 확인해보겠습니다.

# asarray_chkfinite : Convert a list into an array.
# If all elements are finite asarray_chkfinite is identical to asarray

In [25]: e = [11, 12, 13, 14, 15]

In [26]: np.asarray_chkfinite(e, dtype=float)

Out[26]: array([ 11., 12., 13., 14., 15.])

In [27]: e_2 = [11, 12, 13, np.nan, 15] # with NaN

In [28]: np.asarray_chkfinite(e_2, dtype=float)

ValueError: array must not contain infs or NaNs

In [29]: e_3 = [11, 12, 13, 14, np.inf] # with infinite

In [30]: np.asarray_chkfinite(e_3, dtype=float)

ValueError: array must not contain infs or NaNs

np.zeros(), np.ones(), np.empty() 함수는 괄호 안에 쓴 숫자 개수만큼의 '0', '1', '비어있는 배열' 공간을 만들어줍니다.

# making arrays of 0's or 1's
# np.zeros : Produce an array of all 0's with the given shape and dtype

In [31]: np.zeros(5) # an array of 5 zeros

Out[31]: array([ 0., 0., 0., 0., 0.])

# np.ones : Produce an array of all 1's with the given shape and dtype

In [32]: np.ones(10) # an array of 10 ones

Out[32]: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

# making new arrays by allocating new memory,
# but do not populate with any values like ones and zeros

In [33]: np.empty(10) # an array of 10 initialized empty-values

Out[33]: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

np.zeros((2, 5)), np.ones((5, 3)), np.empty((4, 3)) 처럼 괄호 안에 tuple을 넣어주면 다차원 배열(multi-dimensional array)을 만들 수 있습니다.

# multi-dimensional array by passing a tuple for the shape

In [34]: np.zeros((2, 5)) # 2 by 5 array with '0' elements

Out[34]:

array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])

In [35]: np.ones((5, 3)) # 5 by 3 array with '1' elements

Out[35]:

aarray([[ 1., 1., 1.],
          [ 1., 1., 1.],
          [ 1., 1., 1.],
        [ 1., 1., 1.],
        [ 1., 1., 1.]])

In [36]: np.empty((4, 3))

Out[36]:

array([[ 9.88131292e-324,   0.00000000e+000,   0.00000000e+000],
        [ 0.00000000e+000,   0.00000000e+000,   0.00000000e+000],
        [ 0.00000000e+000,   0.00000000e+000,   0.00000000e+000],
        [ 0.00000000e+000,   0.00000000e+000,   0.00000000e+000]])

np.arange() 는 Python의 range() 함수처럼 0부터 괄호안의 숫자 만큼의 정수 배열 값을 '배열'로 반환합니다. (returns not a list, but an array) 이 함수도 종종 사용하는 편이예요.

# np.arange : an array-valued version of the built-in Python range function
# Like the bulit-in range but returns an ndarray insted of a list

In [38]: f = np.arange(10)

In [39]: f

Out[39]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.zeros_like(), np.ones_like(), np.empty_like() 함수는 이미 있는 array와 동일한 모양과 데이터 형태를 유지한 상태에서 각 각 '0', '1', '빈 배열'을 반환합니다.

# np.zeros_like(), np.ones_like(), np.empty_like()
# Return an array of ones or zeros with the same shape and type as a given array

In [40]: f_2_5 = f.reshape(2, 5)

In [41]: f_2_5

Out[41]:

ararray([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])

# return an arrary of zeros with the same shape and type

In [42]: np.zeros_like(f_2_5)

Out[42]:

array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])

# return an arrary of ones with the same shape and type

In [43]: np.ones_like(f_2_5)

Out[43]:

array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])

# return an arrary of initialized values with the same shape, type

In [44]: np.empty_like(f_2_5)

Out[44]:

array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])

np.identity() 혹은 np.eye() 함수를 사용하면 대각성분은 '1'이고 나머지 성분은 '0'으로 구성된 정방행렬인 항등행렬(identity matrix) 혹은 단위행렬(unit matrix)을 만들 수 있습니다.

특수한 형태의 행렬에 대해서는 아래 포스팅을 참고하세요.

☞ http://rfriend.tistory.com/141

# np.identity, np.eye : making a square N x N identity matrix, unit matrix
# identity matrix : 1's on the diagonal and 0's elsewhere

In [45]: np.identity(5)

Out[45]:

array([[ 1., 0., 0., 0., 0.],
        [ 0., 1., 0., 0., 0.],
        [ 0., 0., 1., 0., 0.],
        [ 0., 0., 0., 1., 0.],
      [ 0., 0., 0., 0., 1.]])

In [46]: np.eye(5)

Out[46]:

array([[ 1., 0., 0., 0., 0.],
       [ 0., 1., 0., 0., 0.],
      [ 0., 0., 1., 0., 0.],
        [ 0., 0., 0., 1., 0.],
      [ 0., 0., 0., 0., 1.]])

이상 Python NumPy 모듈을 사용해서 ndarray 만드는 방법 소개를 마치겠습니다.

많은 도움 되었기를 바랍니다.

728x90

저작자표시 비영리 변경금지

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

[Python NumPy] ndarray 데이터 형태 지정 및 변경 (Data Types for ndarrays) (0)	2017.01.30
[Python NumPy] 무작위 표본 추출, 난수 만들기 (random sampling, random number generation) (4)	2017.01.21
[Python pandas] Series, DataFrame 행, 열 생성(creation), 선택(selection, slicing, indexing), 삭제(drop, delete) (0)	2017.01.03
[Python] 데이터 정렬 (sort, arrange) : DataFrame.sort_values(), sorted(), list.sort() (2)	2016.12.31
[Python] 데이터 재구조화(reshape) : pd.crosstab() 사용해 교차표(cross tabulation) (0)	2016.12.30

Posted by Rfriend

R, Python 분석과 프로그래밍의 친구 (by R Friend)

[Python NumPy] 다차원 배열 ndarray 만들기

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바