'how to read excel sheet using python pandas' 태그의 글 목록

'how to read excel sheet using python pandas'에 해당되는 글 1건

2019.07.31 [Python pandas] Python으로 엑셀 데이터 불러와서 DataFrame으로 만들기 (How to read Excel data using Python pandas) 4

[Python pandas] Python으로 엑셀 데이터 불러와서 DataFrame으로 만들기 (How to read Excel data using Python pandas)

Python 분석과 프로그래밍/Python 데이터 전처리 2019. 7. 31. 22:44

이번 포스팅에서는 Python pandas의 read_excel() 함수를 사용하여 '엑셀 쉬트 데이터셋 (Excel sheet dataset)을 읽어와서 pandas DataFrame으로 만드는 방법'을 소개하겠습니다.

아래의 첨부 파일은 예제로 사용할 'sales_per_region.xlsx. 라는 이름의 Excel file 입니다.

sales_per_region.xlsx

위의 엑셀 자료를 읽어와서 Python pandas의 DataFrame으로 만들 때 아래의 요건을 충족시키고자 합니다.

(1) 엑셀 자료로 부터 읽어올 데이터셋은 'Sheet1' 이름의 첫번째 쉬트에 있으며, 3행 A열 부터 ~ 10행 D열까지의 Cell에 있는 데이터입니다. (sheet_name = 'Sheet1', header = 2)
(2) '3행은 칼럼 이름(header)'이며, 'A열의 'id' 칼럼은 index로 사용'하고자 합니다. (header = 2, index_col='id')
(3) 'region' 칼럼은 문자열(string), 'sales_representative' 칼럼은 정수형(integer), 'sales_amount' 칼럼은 부동소수형(float)의 데이터 형태(data type)입니다. (dtype = {'region': str, 'sales_representative': np.int64, 'sales_amount': float})
(4) pandas DataFrame으로 불러왔을 때 'sales_amount' 칼럼에 천 단위 구분 기호 콤마(',')는 없애고 싶습니다. (thousands = ',')
(5) 총 읽어올 행의 개수(number of rows)는 10개로 한정하고 싶습니다. (nrows=10)
(6) 11번째 행에 '# ignore this line' 처럼 '#' (comment character) 으로 시작하면 그 뒤의 행 전체는 무시하고자 합니다. (comment = '#')

# import libraries

import numpy as np

import pandas as pd

import os

# set directory with yours

base_dir = 'D:/admin/Documents'

excel_file = 'sales_per_region.xlsx'

excel_dir = os.path.join(base_dir, excel_file)

# read a excel file and make it as a DataFrame

df_from_excel = pd.read_excel(excel_dir, # write your directory here

sheet_name = 'Sheet1',

header = 2,

#names = ['region', 'sales_representative', 'sales_amount'],

dtype = {'region': str,

'sales_representative': np.int64,

'sales_amount': float}, # dictionary type

index_col = 'id',

na_values = 'NaN',

thousands = ',',

nrows = 10,

comment = '#')

혹시 엑셀 자료에 칼럼 이름(header)가 없다면 names = ['region', 'sales_representative', 'sales_amount'] 이런식으로 직접 칼럼 이름을 입력해주면 됩니다. 데이터가 있는 행과 열은 pandas가 알아서 찾아서 지정해주며, 데이터 형태(data type)도 일일이 지정해주지 않아도 알아서 추정을 해서 설정을 해줍니다.

'sales_per_region.xlsx' 엑셀 파일을 'df_from_excel' 이라는 이름의 DataFrame으로 잘 불러왔습니다. 확인차 index, data type, 'region' 칼럼을 조회해보겠습니다. 제대로 잘 불러온거 맞지요?!

# check index

df_from_excel.index

Int64Index([1, 2, 3, 4, 5, 6, 7], dtype='int64', name='id')

# check data type

df_from_excel.dtypes

region                   object
sales_representative      int64
sales_amount            float64
dtype: object

# check 'region' column

df_from_excel['region']

id
1      seoul
2     inchon
3      busan
4    guangju
5      ulsan
6     sejong
7     jeunju
Name: region, dtype: object

많은 도움이 되었기를 바랍니다.

Python pandas DataFrame을 Excel로 쓰기 (내보내기)를 하는 방법은 https://rfriend.tistory.com/466 를 참고하세요.

이번 포스팅이 도움이 되었다면 아래의 '공감~'를 꾹 눌러주세요. :-)

728x90

저작자표시 비영리 변경금지

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

[Python pandas] DataFrame의 칼럼 이름 바꾸기 : df.columns = [], df.rename(columns) (2)	2019.08.14
[Python pandas] DataFrame 을 Excel로 내보내기 (write DataFrame to Excel): pd.DataFrame.to_excel() (4)	2019.08.06
[Python pandas] DataFrame에서 천 단위 숫자의 자리 구분 기호 콤마(',')를 없애는 방법 (8)	2019.07.30
[Python pandas] DataFrame, Series에서 순위(rank)를 구하는 rank() 함수 (38)	2019.07.27
[Python pandas] DataFrame, Series에서 조건에 맞는 값이 들어있는 행 indexing 하기 : df.isin() (9)	2019.07.24

Posted by Rfriend

이전 1 다음

R, Python 분석과 프로그래밍의 친구 (by R Friend)

'how to read excel sheet using python pandas'에 해당되는 글 1건

[Python pandas] Python으로 엑셀 데이터 불러와서 DataFrame으로 만들기 (How to read Excel data using Python pandas)

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바