'NumPy와 Pure Python 속도 비교' 태그의 글 목록

'NumPy와 Pure Python 속도 비교'에 해당되는 글 1건

2017.02.04 [Python NumPy] 배열과 배열, 배열과 스칼라 연산 (Numerical Operations between Arrarys and Scalars) 2

[Python NumPy] 배열과 배열, 배열과 스칼라 연산 (Numerical Operations between Arrarys and Scalars)

Python 분석과 프로그래밍/Python 데이터 전처리 2017. 2. 4. 23:49

Python을 활용한 기계학습이나 딥러닝 공부하다보면 NumPy 연산이 기본으로 깔려있습니다. NumPy를 알고 있으면 공부하기가 편할 것이고, NumPy를 모르면 암호처럼 보일 거예요.

이번 포스팅에서는 Python NumPy의 배열과 배열, 배열과 스칼라 연산 (Numerical Operations between Arrarys and Scalars) 에 대해서 알아보겠습니다. 별로 어렵지는 않습니다.

크기가 같은 두 개의 숫자형 배열에 대해서 산술 연산(arithmetic operations)을 할 때 'for loops'를 사용하지 않고도 NumPy ndarray를 사용하면 Vectorization 으로 매우 빠르게 batch 연산을 수행할 수 있습니다.

만약 연산을 하는 두 배열의 차원(dimension, shape)이 다르다면 NumPy는 Broadcasting을 해서 연산을 해주는데요, 이건 다음번 포스팅에서 별도로 소개하겠습니다.

[ Operations between NumPy Arrarys and Scalars ]

먼저 NumPy 모듈을 불러오고 예제 array 를 만들어보겠습니다.

#%% Numerical Operations between Arrays and Scalars
# vectorization

## importing modules

In [1]: import numpy as np

## making an nparray

In [2]: x = np.array([1., 1., 2., 2.])

In [3]: y = np.array([1., 2., 3., 4.])

In [4]: x

Out[4]: array([ 1., 1., 2., 2.])

In [5]: y

Out[5]: array([ 1., 2., 3., 4.])

(1) 배열과 스칼라의 산술 연산 (Arithmetic operations with an array and scalar)

산술연산 종류 : 덧셈 (addition, +), 뺄셈 (subtraction, -), 곱셈 (multiplication, *), 나눗셈 (division, /), 몫 (floor division, //), 나머지 (modulus, %), 배수 (exponent, **)

연산할 때 스칼라가 배열의 각 원소별로 모두 돌아가면서 연산의 대상으로 적용됨 (한글로 풀어쓰려니 표현이 참... -_-;) (propagating the value to each element).

## (1) Arithmetic operations with scalars, propagating the value to each element
# (1-1) addition

In [6]: y + 1

Out[6]: array([ 2., 3., 4., 5.])

# (1-2) subtraction

In [7]: y - 1

Out[7]: array([ 0., 1., 2., 3.])

# (1-3) multiplication

In [8]: y*2

Out[8]: array([ 2., 4., 6., 8.])

# (1-4) division

In [9]: y/2

Out[9]: array([ 0.5, 1. , 1.5, 2. ])

# (1-5) floor division

In [10]: y//2

Out[10]: array([ 0., 1., 1., 2.])

# (1-6) modulus : returns remainder after division

In [11]: y%2

Out[11]: array([ 1., 0., 1., 0.])

# (1-7) exponent

In [12]: y**2

Out[12]: array([ 1., 4., 9., 16.])

In [13]: 2**y

Out[13]: array([ 2., 4., 8., 16.])

NumPy 와 Pure Python 속도 비교

위에서 NumPy (vectorization) 가 Pure Python (for loops) 보다 많이 빠르다고 했는데요, 한번 벤치마킹 테스트를 해보았습니다. 똑같은 연산 (배열에 스칼라 1 더하기)을 해봤는데요, NumPy가 Pure Python보다 62.4배 빠르네요. 이거 아주 중요해요. 보통 Python 공부하면서 아주 짧고 간단한 예제를 가지고 실습하다보면 NumPy와 Pure Python의 'for loops'의 속도 차이를 못느낄 수 있습니다. 그런데 만약 백만, 천만 row를 가진 big data에 for loop 잘못 썼다가는 몇 시간이 걸려도 연산이 끝나지 않던게, NumPy 잘 사용하면 수 분 안에 끝낼 수 있다는 뜻이거든요.

## comparison process time between NumPy and Pure Python
# NumPy

In [14]: a = np.arange(1000000)

In [15]: a

Out[15]: array([ 0, 1, 2, ..., 999997, 999998, 999999])

In [16]: %timeit a + 1

1000 loops, best of 3: 1.45 ms per loop

# Pure Python

In [17]: b = range(1000000)

In [18]: b

Out[18]: range(0, 1000000)

In [19]: %timeit [i+1 for i in b]

10 loops, best of 3: 90.5 ms per loop

# how fast does NumPy than Pure Python?

In [20]: 90.5/1.45

Out[20]: 62.41379310344828

(2) 같은 크기 배열 간 산술 연산

(Arithmetic elementwise operstions between equal-size arrays)

NumPy에서의 x*y 곱은 선형대수에서 쓰는 행렬곱이 아니며, 그냥 같은 위치에 있는 원소들 간의 곱(element-wise multiplication) 이라는 점 유의하세요.

## (2) Arithmetic elementwise operations between equal-size arrays

In [21]: x

Out[21]: array([ 1., 1., 2., 2.])

In [22]: y

Out[22]: array([ 1., 2., 3., 4.])

# (2-1) addition

In [23]: x + y

Out[23]: array([ 2., 3., 5., 6.])

# (2-2) substraction

In [24]: x - y

Out[24]: array([ 0., -1., -1., -2.])

# (2-3) Array multiplication is not matrix multiplication

In [25]: x*y

Out[25]: array([ 1., 2., 6., 8.])

# (2-4) division

In [26]: y/x

Out[26]: array([ 1. , 2. , 1.5, 2. ])

# (2-5) floor division : 몫

In [27]: y//x

Out[27]: array([ 1., 2., 1., 2.])

# (2-6) modulus : returns remainder after division, 나머지

In [28]: y%x

Out[28]: array([ 0., 0., 1., 0.])

# (2-7) power

In [29]: y**x

Out[29]: array([ 1., 2., 9., 16.])

(3) 배열 간 비교 연산 (Comparison operations between equal-size arrays)

원소 단위 비교 연산(element-wise comparison) : np.equal(x, y), np.not_equal(x, y), np.greater(x, y), np.greater_equal(x, y), np.less(x, y), np.less_equal(x, y)

원소 단위로 비교 연산을 만족하면 True 반환, 비교 연산을 만족하지 않으면 False 반환함
배열 단위 비교 연산(array-wise comparson) : np.array_equal(x, y)

## (3) Comparison operations : returns boolean array

In [30]: x

Out[30]: array([ 1., 1., 2., 2.])

In [31]: y

Out[31]: array([ 1., 2., 3., 4.])

# element-wise comparisons
# (3-1) np.equal(x, y) : x==y

In [32]: np.equal(x, y)

Out[32]: array([ True, False, False, False], dtype=bool)

# (3-2) np.not_equal(x, y) : x != y, x <> y

In [33]: np.not_equal(x, y)

Out[33]: array([False, True, True, True], dtype=bool)

# (3-3) np.greater(x, y) : x > y

In [34]: np.greater(x, y)

Out[34]: array([False, False, False, False], dtype=bool)

# (3-4) np.greater_equal(x, y) : x >= y

In [35]: np.greater_equal(x, y)

Out[35]: array([ True, False, False, False], dtype=bool)

# (3-5) np.less(x, y) : x < y

In [36]: np.less(x, y)

Out[36]: array([False, True, True, True], dtype=bool)

# (3-6) np.less_equal(x, y) : x <= y

In [37]: np.less_equal(x, y)

Out[37]: array([ True, True, True, True], dtype=bool)

# (3-7) array-wise comparisons

In [38]: np.array_equal(x, y)

Out[38]: False

In [39]: z = y.copy()

In [40]: z

Out[40]: array([ 1., 2., 3., 4.])

In [41]: np.array_equal(y, z)

Out[41]: True

(4) 배열 간 할당 연산 (Assignment operations between equal-size arrays)

할당 연산 : Add AND (m += n), Subtract AND (m -= n), Multiply AND (m *= n), Divide AND (m /= n), Floor Division (m //= n), Modulus AND (m %= n), Exponent AND (m **= n)

##----------------
## (4) Assignment operattions

In [42]: m = np.array([1., 1., 2., 2.])

In [43]: n = np.array([1., 2., 3., 4.])

# (4-1) Add AND : +=

In [44]: m += n # equivalent to m = m + n

In [45]: m

Out[45]: array([ 2., 3., 5., 6.])

# m += n is equivalent to m = m + n

In [46]: m = np.array([1., 1., 2., 2.])

In [47]: m = m + n

In [48]: m

Out[48]: array([ 2., 3., 5., 6.])

# (4-2) Subtract AND : -=

In [49]: m = np.array([1., 1., 2., 2.])

In [50]: m -= n # equivalent to m = m - n

In [51]: m

Out[51]: array([ 0., -1., -1., -2.])

# (4-3) Multiply AND : *=

In [52]: m = np.array([1., 1., 2., 2.])

In [53]: m *= n # equivalent to m = m*n

In [54]: m

Out[54]: array([ 1., 2., 6., 8.])

# (4-4) Divide AND : /=

In [55]: m = np.array([1., 1., 2., 2.])

In [56]: m /= n # equivalent to m = m/n

In [57]: m

Out[57]: array([ 1. , 0.5 , 0.66666667, 0.5 ])

# (4-5) Floor Division : //=

In [58]: m = np.array([1., 1., 2., 2.])

In [59]: m //= n # equivalent to m = m//n

In [60]: m

Out[60]: array([ 1., 0., 0., 0.])

# (4-6) Modulus AND : %=

In [61]: m = np.array([1., 1., 2., 2.])

In [62]: m %= n # equivalent to m = m % n

In [63]: m

Out[63]: array([ 0., 1., 2., 2.])

# (4-7) Exponent AND : **=

In [64]: m = np.array([1., 1., 2., 2.])

In [65]: m **= n # equivalent to m = m**n

In [66]: m

Out[66]: array([ 1., 1., 8., 16.])

(5) 배열 간 논리 연산 (Logical operations between equal-size arrays)

논리 연산
- np.logical_and(a, b) : 두 배열의 원소가 모두 '0'이 아니면 True 반환
- np.logical_or(a, b) : 두 배열의 원소 중 한개라도 '0'이 아니면 True 반환
- np.logical_xor(a, b) : 두 배열의 원소가 서로 같지 않으면 True를 반환

##----------------
## (5) Logical operators

In [67]: a = np.array([1, 1, 0, 0], dtype=bool)

In [68]: b = np.array([1, 0, 1, 0], dtype=bool)

# (5-1) np.logcal_and : (a and b) is true
# if all of the two operands are non-zero then condition becomes true

# equivalent to infix operators : &

In [69]: np.logical_and(a, b)

Out[69]: array([ True, False, False, False], dtype=bool)

# (5-2) np.logical_or : (a or b) is true
# if any of the operands are non-zero then contition becoems true

# equivalent to infix operators : |

In [70]: np.logical_or(a, b)

Out[70]: array([ True, True, True, False], dtype=bool)

# (5-3) np.logical_xor : Not (a and b) is false
# used to reverse the logical state of its operand

# equivalent to infix operators : ^

In [71]: np.logical_xor(a, b)

Out[71]: array([False, True, True, False], dtype=bool)

(6) 소속 여부 판단 연산 (Membership operators)

소속 여부 판단 연산 : in, not in
배열 (혹은 리스트)에 특정 객체가 들어있으면 True를 반환, 안들어 있으면 False를 반환

# (6) Membership operators

In [72]: p = "P"

In [73]: q = np.array(["P", "Q"])

In [74]: r = np.array(["R", "S"])

# (6-1) in

In [75]: p in q

Out[75]: True

In [76]: p in r

Out[76]: False

# (6-2) not in

In [77]: p not in q

Out[77]: False

In [78]: p not in r

Out[78]: True

배열의 차원, 크기가 다를 때

=> ValueError: operands could not be broadcast together with shapes (4,) (5,)

##--------------
## Shape mismatches

In [79]: x4 = np.array([1., 1., 2., 2.])

In [80]: x5 = np.arange(5)

# ValueError: operands could not be broadcast together with shapes (4,) (5,)

In [81]: x4 + x5

Traceback (most recent call last):

File "<ipython-input-82-5e88af36d6f1>", line 1, in <module>

x4 + x5

ValueError: operands could not be broadcast together with shapes (4,) (5,)

배열의 차원, 크기가 다를 때 ValueError가 났습니다. 그리고 'broadcast'를 할 수 없다(could not be broadcast together...)는 메시지가 나왔습니다.

다음번 포스팅에서는 Broadcasting 에 대해서 알아보겠습니다.

많은 도움 되었기를 바랍니다.

이번 포스팅이 도움이 되었다면 아래의 '공감~♡'를 눌러주세요. ^^

728x90

저작자표시 비영리 변경금지

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

[Python NumPy] NumPy 배열에 축 추가하기 (adding axis to NumPy Array) : np.newaxis, np.tile (0)	2017.02.19
[Python Numpy] 다른 차원의 배열 간 산술연산 시 Broadcasting (2)	2017.02.12
[Python NumPy] ndarray 데이터 형태 지정 및 변경 (Data Types for ndarrays) (0)	2017.01.30
[Python NumPy] 무작위 표본 추출, 난수 만들기 (random sampling, random number generation) (4)	2017.01.21
[Python NumPy] 다차원 배열 ndarray 만들기 (0)	2017.01.14

Posted by Rfriend

이전 1 다음

R, Python 분석과 프로그래밍의 친구 (by R Friend)

'NumPy와 Pure Python 속도 비교'에 해당되는 글 1건

[Python NumPy] 배열과 배열, 배열과 스칼라 연산 (Numerical Operations between Arrarys and Scalars)

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바