'이미지 파일을 업로드하고 전처리하여 시각화하는 방법' 태그의 글 목록

'이미지 파일을 업로드하고 전처리하여 시각화하는 방법'에 해당되는 글 1건

2019.03.05 [Keras] 이미지 파일 업로드하고 전처리하여 시각화하는 방법 (how to upload, preprocess and visualize images) 52

[Keras] 이미지 파일 업로드하고 전처리하여 시각화하는 방법 (how to upload, preprocess and visualize images)

Deep Learning (TF, Keras, PyTorch) 2019. 3. 5. 23:51

CNN(Convolutional Neural Network)으로 이미지 분류 모델링할 때 보통 tensorflow나 keras 라이브러리에 이미 포함되어 있는 MNIST, CIFAR-10 같은 이미지를 간단하게 load 하는 함수를 이용해서 toy project로 연습을 해보셨을 겁니다.

그런데, 실제 이미지, 그림 파일을 분석해야 될 경우 '어? 이미지를 어떻게 업로드 하고, 어떻게 전처리하며, 어떻게 시각화해야 하는거지?'라는 의문을 한번쯤은 가져보셨을 듯 합니다.

이번 포스팅에서는 바로 이 의문에 대한 답변 소개입니다.

필요한 Python 라이브러리를 불러오겠습니다.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import keras

1. 개와 고양이 사진 다운로드 (download dogs and cats images from Kaggle)

개와 고양이 사진을 아래의 Kaggle 사이트에서 다운로드 해주세요. Kaggle 회원가입을 먼저 해야지 다운로드 할 수 있습니다. 개는 1, 고양이는 0으로 라벨링이 되어 있는 25,000 개의 이미지를 다운받을 수 있습니다.

https://www.kaggle.com/c/dogs-vs-cats/data

2. 개와 고양이 이미지 30개만 선택해서 별도 경로(폴더)에 복사하기

downloads 폴더에 들어있는 압축된 다운로드 파일을 압축 해제(unzip)해 주세요.

윈도우 탐색기로 미리보기를 해보면 고양이 반, 개 반 입니다.

directory, path 관리하는데 필요한 os 라이브러리, 파일을 source에서 destination 경로로 복사하는데 필요한 shutil 라이브러리를 불러오겠습니다.

import os # miscellaneous operating system interfaces

import shutil # high-level file operations

이미지를 가져올 경로를 설정해보겠습니다. ('Downdoads/dogs-vs-cats/train' 경로에 train 폴더를 압축해제해 놓았습니다. 폴더 경로 확인 요함.)

# The path to the directory where the original dataset was uncompressed

base_dir = '/Users/admin/Downloads'

img_dir = '/Users/admin/Downloads/dogs-vs-cats/train'

train 폴더에 들어있는 개와 고양이 이미지가 총 25,000개 임을 확인했으며, img_dir 경로에 포함되어 있는 이미지 중에서 10개만 indexing 해서 파일 제목을 확인해보았습니다.

len(os.listdir(img_dir))

25000

os.listdir(img_dir)[:10]

['dog.8011.jpg',
 'cat.5077.jpg',
 'dog.7322.jpg',
 'cat.2718.jpg',
 'cat.10151.jpg',
 'cat.3406.jpg',
 'dog.1753.jpg',
 'cat.4369.jpg',
 'cat.7660.jpg',
 'dog.5535.jpg']

30개의 이미지만 샘플로 선별해서 다른 폴더로 복사해보겠습니다. 먼저, 30개 고양이 이미지를 담아둘 경로/ 폴더(cats30_dir) 를 만들어보겠습니다.

# Directory with 30 cat pictures

cats30_dir = os.path.join(base_dir, 'cats30')

# Make a path directory

os.mkdir(cats30_dir)

이제 source 경로에서 destination 경로로 shutil.copyfile(src, dst) 함수를 사용하여 고양이 이미지 30개만 이미지를 복사하겠습니다.

# Copy first 30 cat images to cats30_dir

fnames = ['cat.{}.jpg'.format(i) for i in range(30)]

for fname in fnames:

src = os.path.join(img_dir, fname)

dst = os.path.join(cats30_dir, fname)

shutil.copyfile(src, dst)

cats30_dir 경로로 복사한 30개의 고양이 이미지 파일 목록을 확인해 보았습니다.

# check if pictures were copied well in cats30 directory

os.listdir(cats30_dir)

['cat.6.jpg',
 'cat.24.jpg',
 'cat.18.jpg',
 'cat.19.jpg',
 'cat.25.jpg',
 'cat.7.jpg',
 'cat.5.jpg',
 'cat.27.jpg',
 'cat.26.jpg',
 'cat.4.jpg',
 'cat.0.jpg',
 'cat.22.jpg',
 'cat.23.jpg',
 'cat.1.jpg',
 'cat.3.jpg',
 'cat.21.jpg',
 'cat.20.jpg',
 'cat.2.jpg',
 'cat.11.jpg',
 'cat.10.jpg',
 'cat.12.jpg',
 'cat.13.jpg',
 'cat.9.jpg',
 'cat.17.jpg',
 'cat.16.jpg',
 'cat.8.jpg',
 'cat.28.jpg',
 'cat.14.jpg',
 'cat.15.jpg',
 'cat.29.jpg']

3. 이미지 파일을 로딩, float array 로 변환 후 전처리하기
(load image file and convert image data to float array format)

Keras preprocessing 에 있는 image 클래스를 불러온 후, load_img() 함수를 사용해서 이미지 파일을 로딩하고, img_to_array() 함수를 사용해서 array 로 변환해보겠습니다. (Python OpenCV 라이브러리로도 가능함)

# a picture of one cat as an example

img_name = 'cat.10.jpg'

img_path = os.path.join(cats30_dir, img_name)

# Preprocess the image into a 4D tensor using keras.preprocessing

from keras.preprocessing import image

img = image.load_img(img_path, target_size=(250, 250))

img_tensor = image.img_to_array(img)

3차원 array에 이미지 샘플을 구분할 수 있도록 np.expand_dims() 함수를 사용하여 1개 차원을 추가하겠습니다. 그리고 [0, 1] 값 범위 내에 값이 존재하도록 array 값을 255.로 나누어서 표준화해주었습니다.

# expand a dimension (3D -> 4D)

img_tensor = np.expand_dims(img_tensor, axis=0)

img_tensor.shape

 (1, 250, 250, 3)

# scaling into [0, 1]

img_tensor /= 255.

첫번째 고양이 이미지의 array 데이터를 출력해보면 아래처럼 생겼습니다. 꼭 영화 메트릭스의 숫자들이 주루룩 내려오는 장면 같이 생겼습니다.

img_tensor[0]

array([[[0.10196079, 0.11764706, 0.15294118],
        [0.07450981, 0.09019608, 0.1254902 ],
        [0.03137255, 0.04705882, 0.09019608],
        ...,
        [0.5058824 , 0.6313726 , 0.61960787],
        [0.49411765, 0.61960787, 0.60784316],
        [0.49019608, 0.6156863 , 0.6039216 ]],

       [[0.11764706, 0.13333334, 0.16862746],
        [0.13725491, 0.15294118, 0.1882353 ],
        [0.08627451, 0.10196079, 0.13725491],
        ...,
        [0.50980395, 0.63529414, 0.62352943],
        [0.49803922, 0.62352943, 0.6117647 ],
        [0.4862745 , 0.6117647 , 0.6       ]],

       [[0.11372549, 0.14117648, 0.16470589],
        [0.16470589, 0.19215687, 0.22352941],
        [0.15294118, 0.18039216, 0.21176471],
        ...,
        [0.50980395, 0.63529414, 0.62352943],
        [0.5019608 , 0.627451  , 0.6156863 ],
        [0.49019608, 0.6156863 , 0.6039216 ]],

       ...,

       [[0.69411767, 0.6431373 , 0.46666667],
        [0.6862745 , 0.63529414, 0.45882353],
        [0.6627451 , 0.6117647 , 0.4392157 ],
        ...,
        [0.7254902 , 0.70980394, 0.04313726],
        [0.6745098 , 0.6509804 , 0.03921569],
        [0.64705884, 0.6156863 , 0.05490196]],

       [[0.64705884, 0.5921569 , 0.45490196],
        [0.6117647 , 0.5568628 , 0.4117647 ],
        [0.5686275 , 0.5176471 , 0.3529412 ],
        ...,
        [0.7254902 , 0.7137255 , 0.01960784],
        [0.6862745 , 0.67058825, 0.00784314],
        [0.6509804 , 0.6313726 , 0.        ]],

       [[0.6039216 , 0.54901963, 0.4117647 ],
        [0.5882353 , 0.53333336, 0.3882353 ],
        [0.5803922 , 0.5294118 , 0.3647059 ],
        ...,
        [0.7254902 , 0.7137255 , 0.01960784],
        [0.6862745 , 0.67058825, 0.00784314],
        [0.6509804 , 0.6313726 , 0.        ]]], dtype=float32)

4. 한개의 이미지 파일의 array 를 시각화하기 (visualizing an image array data)

matplotlib 라이브러리를 이용하여 위의 3번에서 이미지의 array 변환/ 전처리한 데이터를 시각화해보겠습니다. 예제로서 img_tensor[0] 으로 첫번째 고양이 이미지의 데이터를 시각화했습니다.

# Image show

import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (10, 10) # set figure size

plt.imshow(img_tensor[0])

plt.show()

5. 30개의 이미지 데이터를 6*5 격자에 나누어서 시각화하기
(visualizing 30 image data at 6*5 grid layout)

위의 3번에서 했던 이미지 파일 로딩, array로 변환, 1개 차원 추가, [0, 1] 범위로 표준화하는 전처리를 preprocess_img() 라는 이름의 사용자정의함수(UDF)로 만들었습니다.

# UDF of pre-processing image into a 4D tensor

def preprocess_img(img_path, target_size=100):

from keras.preprocessing import image

img = image.load_img(img_path, target_size=(target_size, target_size))

img_tensor = image.img_to_array(img)

# expand a dimension

img_tensor = np.expand_dims(img_tensor, axis=0)

# scaling into [0, 1]

img_tensor /= 255.

return img_tensor

이제 30개의 고양이 이미지 array 데이터를 사용해서 행(row) 6개 * 열(column) 5개의 격자 배열(grid layout) 에 시각화를 해보겠습니다. 이때 가독성을 높이기 위해서 고양이 사진 간에 검정색 구분선을 넣어서 시각화를 해보겠습니다.

참고로, 아래 코드의 for loop 중간에 방금 전에 위에서 정의한 preprocess_img() 사용자정의함수 (빨간색으로 표기) 가 사용되었습니다.

# layout

n_pic = 30

n_col = 5

n_row = int(np.ceil(n_pic / n_col))

# plot & margin size

target_size = 100

margin = 3

# blank matrix to store results

total = np.zeros((n_row * target_size + (n_row - 1) * margin, n_col * target_size + (n_col - 1) * margin, 3))

# append the image tensors to the 'total matrix'

img_seq = 0

for i in range(n_row):

for j in range(n_col):

fname = 'cat.{}.jpg'.format(img_seq)

img_path = os.path.join(cats30_dir, fname)

img_tensor = preprocess_img(img_path, target_size)

horizontal_start = i * target_size + i * margin

horizontal_end = horizontal_start + target_size

vertical_start = j * target_size + j * margin

vertical_end = vertical_start + target_size

total[horizontal_start : horizontal_end, vertical_start : vertical_end, :] = img_tensor[0]

img_seq += 1

# display the pictures in grid

plt.figure(figsize=(200, 200))

plt.imshow(total)

plt.show()

많은 도움이 되었기를 바랍니다.

이번 포스팅이 도움이 되었다면 아래의 '공감~'를 꾹 눌러주세요.

728x90

저작자표시 비영리 변경금지

'Deep Learning (TF, Keras, PyTorch)' 카테고리의 다른 글

[TensorFlow] 값 변경이 가능한 변수 (tf.Variable) (0)	2021.12.20
[Tensorflow] 딥러닝을 위한 공개 데이터셋 Tensorflow Datasets (3)	2020.03.19
Tensorflow, Keras가 GPU를 사용하고 있는지 확인하는 방법 (0)	2019.02.19
[Keras] TypeError: softmax() got an unexpected keyword argument 'axis' 에러 시 tensorflow upgrade (0)	2019.02.06
집에서 딥러닝 공부하기에 적합한 PC 사양 및 가격대 (2017-09월) (9)	2017.09.17

Posted by Rfriend

이전 1 다음

R, Python 분석과 프로그래밍의 친구 (by R Friend)

'이미지 파일을 업로드하고 전처리하여 시각화하는 방법'에 해당되는 글 1건

[Keras] 이미지 파일 업로드하고 전처리하여 시각화하는 방법 (how to upload, preprocess and visualize images)

'Deep Learning (TF, Keras, PyTorch)' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바