'not 'builtin_function_or_method'' 태그의 글 목록

'not 'builtin_function_or_method''에 해당되는 글 1건

2019.08.31 [Python] 웹으로 부터 JSON 포맷 데이터 읽어와서 pandas DataFrame으로 만들기

[Python] 웹으로 부터 JSON 포맷 데이터 읽어와서 pandas DataFrame으로 만들기

Python 분석과 프로그래밍/Python 데이터 전처리 2019. 8. 31. 20:22

지난번 포스팅에서는 Python의 json.dumps() 를 사용해서 JSON 포맷 데이터를 쓰거나, json.loads()를 사용해서 JSON 포맷 데이터를 python으로 읽어오는 방법(https://rfriend.tistory.com/474)을 소개하였습니다.

이번 포스팅에서는 이어서 웹에 있는 JSON 포맷 데이터를 Python으로 읽어와서 pandas DataFrame으로 만드는 방법(How to read JSON formate data from WEB API and convert it to pandas DataFrame in python)을 소개하겠습니다.

JSON 포맷 파일을 가져올 수 있는 사이트로 "Awesome JSON Datasets (https://github.com/jdorfman/awesome-json-datasets)" 를 예로 들어서 설명해보겠습니다.

여러개의 JSON Datasets 이 올라가 있는데요, 이중에서 'Novel Prize' JSON 포맷 데이터(http://api.nobelprize.org/v1/prize.json)를 읽어와서 DataFrame으로 만들어보겠습니다.

(1) API 웹 사이트에서 JSON 포맷 자료를 Python으로 읽어오기

이제 urllib 모듈의 rulopen 함수를 사용해서 JSON 데이터가 있는 URL로 요청(request)을 보내서 URL을 열고 JSON 데이터를 읽어와서, python의 json.loads() 를 사용하여 novel_prize_json 이라 이름의 Python 객체로 만들어보겠습니다.

# parse a JSON string using json.loads() method : returns a dictionary

import json

import urllib

import pandas as pd

# API request to the URL

import sys

if sys.version_info[0] == 3:

from urllib.request import urlopen # for Python 3.x

else:

from urllib import urlopen # for Python 2.x

with urlopen("http://api.nobelprize.org/v1/prize.json") as url:

novel_prize_json_file = url.read()

urllib 모듈의 (web open) request 메소드를 불러오 때 Python 2.x 버전에서는 from urllib import urlopen 을 사용하는 반면, Python 3.x 버전에서는 from urllib.request import urlopen 을 사용합니다. 따라서 만약 Python 3.x 사용자가 아래처럼 (Python 2.x 버전에서 사용하는) from urllib import urlopen 이라고 urlopen을 importing 하려고 하면 ImportError: cannot import name 'urlopen' 이라는 에러가 납니다.

# ImportError: cannot import name 'urlopen' at python 3.x

from urllib import urlopen # It's only for Python 2.x. It's not working at Python 3.x

with urlopen("http://api.nobelprize.org/v1/prize.json") as url:

novel_prize_json_file = url.read()

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-81c8fae1a1fd> in <module>()
      1 # API request to the URL
----> 2 from urllib import urlopen
      3 
      4 with urlopen("http://api.nobelprize.org/v1/prize.json") as url:
      5     novel_prize_json_file = url.read()

ImportError: cannot import name 'urlopen'

다음으로, 위에서 읽어온 JSON 포맷 데이터를 Python의 json.loads() 메소드를 이용해서 decoding 해보겠습니다. 이때 decode('utf-8') 로 설정해주었습니다.

# decoding to python object

novel_prize_json = json.loads(novel_prize_json_file.decode('utf-8'))

decoding을 할 때 'utf-8' 을 설정을 안해주니 아래처럼 TypeError 가 나네요. (TypeError: the JSON object must be str, not 'builtin_function_or_method')

# decoding TypeError. decode using decode('utf-8')

novel_prize_json = json.loads(novel_prize_json_file.decode)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-1fda68be6386> in <module>()
      1 # decoding TypeError
----> 2 novel_prize_json = json.loads(novel_prize_json_file.decode)

C:\Users\admin\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    310     if not isinstance(s, str):
    311         raise TypeError('the JSON object must be str, not {!r}'.format(
--> 312                             s.__class__.__name__))
    313     if s.startswith(u'\ufeff'):
    314         raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",

TypeError: the JSON object must be str, not 'builtin_function_or_method'

Novel Prize JSON 파일을 Python 객체로 읽어왔으니, keys() 메소드로 키를 확인해보겠습니다. 그리고 'prizes' 키의 첫번째 데이터(novel_prize_json['prizes'][0])인 물리학(Physics) 분야 노벨상 수상자 정보를 인쇄해보겠습니다.

novel_prize_json.keys()

dict_keys(['prizes'])

novel_prize_json['prizes'][0].keys()

dict_keys(['laureates', 'year', 'overallMotivation', 'category'])

novel_prize_json['prizes'][0]

{'category': 'physics',
 'laureates': [{'firstname': 'Arthur',
   'id': '960',
   'motivation': '"for the optical tweezers and their application to biological systems"',
   'share': '2',
   'surname': 'Ashkin'},
  {'firstname': 'Gérard',
   'id': '961',
   'motivation': '"for their method of generating high-intensity, ultra-short optical pulses"',
   'share': '4',
   'surname': 'Mourou'},
  {'firstname': 'Donna',
   'id': '962',
   'motivation': '"for their method of generating high-intensity, ultra-short optical pulses"',
   'share': '4',
   'surname': 'Strickland'}],
 'overallMotivation': '"for groundbreaking inventions in the field of laser physics"',

'year': '2018'}

가독성을 높이기 위해서 json.dumps(obj, indent=4) 를 사용해서 4칸 들여쓰기 (indentation)을 해보겠습니다.

print(json.dumps(novel_prize_json['prizes'][0], indent=4))

{
    "laureates": [
        {
            "share": "2",
            "id": "960",
            "surname": "Ashkin",
            "motivation": "\"for the optical tweezers and their application to biological systems\"",
            "firstname": "Arthur"
        },
        {
            "share": "4",
            "id": "961",
            "surname": "Mourou",
            "motivation": "\"for their method of generating high-intensity, ultra-short optical pulses\"",
            "firstname": "G\u00e9rard"
        },
        {
            "share": "4",
            "id": "962",
            "surname": "Strickland",
            "motivation": "\"for their method of generating high-intensity, ultra-short optical pulses\"",
            "firstname": "Donna"
        }
    ],
    "year": "2018",
    "overallMotivation": "\"for groundbreaking inventions in the field of laser physics\"",
    "category": "physics"
}

키(keys)를 기준으로 정렬하는 것까지 포함해서 다시 한번 프린트를 해보겠습니다.

print(json.dumps(novel_prize_json['prizes'][0], indent=4, sort_keys=True))

{
    "category": "physics",
    "laureates": [
        {
            "firstname": "Arthur",
            "id": "960",
            "motivation": "\"for the optical tweezers and their application to biological systems\"",
            "share": "2",
            "surname": "Ashkin"
        },
        {
            "firstname": "G\u00e9rard",
            "id": "961",
            "motivation": "\"for their method of generating high-intensity, ultra-short optical pulses\"",
            "share": "4",
            "surname": "Mourou"
        },
        {
            "firstname": "Donna",
            "id": "962",
            "motivation": "\"for their method of generating high-intensity, ultra-short optical pulses\"",
            "share": "4",
            "surname": "Strickland"
        }
    ],
    "overallMotivation": "\"for groundbreaking inventions in the field of laser physics\"",
    "year": "2018"
}

(2) JSON 포맷 데이터를 pandas DataFrame으로 만들기

다음으로 Python으로 불러온 JSON 포맷의 데이터 중의 일부분을 indexing하여 pandas DataFrame으로 만들어보겠습니다.

예로, ['prizes'][0] 은 물리학(physics) 노벨상이며, ['prizes'][0]['laureates'] 로 물리학 노벨상 수상자 정보만 선별해서 pd.DataFrame() 으로 DataFrame을 만들어보겠습니다.

novel_prize_physics = pd.DataFrame(novel_prize_json['prizes'][0]["laureates"])

novel_prize_physics

	firstname	id	motivation	share	surname
0	Arthur	960	"for the optical tweezers and their applicatio...	2	Ashkin
1	Gérard	961	"for their method of generating high-intensity...	4	Mourou
2	Donna	962	"for their method of generating high-intensity...	4	Strickland

DataFrame을 만들 때 칼럼 순서를 columns 로 지정을 해줄 수 있습니다.

novel_prize_physics = pd.DataFrame(novel_prize_json['prizes'][0]["laureates"],

columns = ['id', 'firstname', 'surname', 'share', 'motivation'])

novel_prize_physics

	id	firstname	surname	share	motivation
0	960	Arthur	Ashkin	2	"for the optical tweezers and their applicatio...
1	961	Gérard	Mourou	4	"for their method of generating high-intensity...
2	962	Donna	Strickland	4	"for their method of generating high-intensity...

많은 도움이 되었기를 바랍니다 .

다음 포스팅에서는 웹에서 XML 포맷 데이터를 Python으로 읽어와서 pandas DataFrame으로 만드는 방법을 소개하겠습니다.

이번 포스팅이 도움이 되었다면 아래의 '공감~'를 꾹 눌러주세요.

728x90

저작자표시 비영리 변경금지

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

[Python] pandas DataFrame: ValueError: If using all scalar values, you must pass an index 에러 해결 방법 (0)	2019.09.15
[Python] 웹에서 XML 포맷 데이터를 Python으로 읽어와서 DataFrame으로 만들기 (18)	2019.09.04
[Python] Python으로 JSON 데이터 읽고 쓰기 (Read and Write JSON data by Python) (4)	2019.08.31
[Python] 사전 자료형의 키, 값 기준으로 정렬하기 (sort a Dictionary by key, value) (0)	2019.08.28
[Python pandas] 한개의 문자열 칼럼을 구분자로 나누어서 여러개의 칼럼을 가진 DataFrame 만들기 (0)	2019.08.25

Posted by Rfriend

이전 1 다음

R, Python 분석과 프로그래밍의 친구 (by R Friend)

'not 'builtin_function_or_method''에 해당되는 글 1건

[Python] 웹으로 부터 JSON 포맷 데이터 읽어와서 pandas DataFrame으로 만들기

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바