Hive 데이터 조작 언어 : Data Manipulation Language

Hive 2016. 6. 25. 20:10

이번 포스팅에서는

- (1) 테이블에 데이터를 넣기 (data loading into table)

- (2) 쿼리 결과를 테이블에 넣기 (data loading from query result)

- (3) 동적으로 파티션 해서 데이터 넣기

- (4) 테이블 생성 & 데이터 로딩 한번에 하기

- (5) 데이터를 Local로 내보내기

- (6) 데이터를 동시에 다수의 디렉터리로 로딩하기

등에 사용하는 Hive 데이터 조작 언어 (HiveQL Data Manipulation Language)에 대해서 알아보겠습니다.

-- 데이터 로딩 (Data loading into managed table)
LOAD DATA LOCAL INPATH '${env:HOME}/2017-06'
OVERWRITE INTO TABLE my_table
PARTITION (year = '2016', month = '6');

-- 쿼리 결과로 부터 데이터 로딩 (inserting data from query result)

-- INTO : adding data
INSERT INTO TABLE my_table
PARTITION (year = '2016', month = '6')
SELECT * FROM another_table a
WHERE a.year = '2016' AND a.month = '6';

-- OVERWRITE : replacing data
INSERT OVERWRITE TABLE my_table
PARTITION (year = '2016', month = '6')
SELECT * FROM another_table a
WHERE a.year = '2016' AND a.month = '6';

    -- reading data once and replacing data at several tables
FROM another_table a
INSERT OVERWRITE TABLE my_table
    PARTITION (year = '2016', month = '6')
    SELECT * WHERE a.year = '2016' and a.month = '6'
INSERT OVERWRITE TABLE my_table
    PARTITION (year = '2016', month = '7')
    SELECT * WHERE a.year = '2016' and a.month = '7'
INSERT OVERWRITE TABLE my_table
    PARTITION (year = '2016', month = '8')
    SELECT * WHERE a.year = '2016' and a.month = '8';

-- dynamic partition insert
set hive.exec.dynamic.partition = true; -- to use dynamic partition
set hive.exec.dynamic.partition.mode = nonstrict; -- for all partition column
set hive.exec.max.dynamic.partition.pernode = 500; -- maximum file number

INSERT OVERWRITE TABLE my_table
PARTITION (year, month)
SELECT ..., a.year, a.month
FROM another_table a;

-- partial dynamic partition insert
INSERT OVERWRITE TABLE my_table
PARTITION (year = '2016', month) -- static partition key first, dynamic partition key second
SELECT ..., a.year, a.month
FROM another_table
WHERE a.year = '2016'

-- 테이블 생성 & 데이터 로딩 한번에 하기 (** 분석 MART 구성 시 유용 **)
CREATE TABLE my_table_201606 -- table creation
AS SELECT var_1, var_2, var_5 -- and data selection simultaneously
FROM my_table
WHERE year = '2016' and month = '6';

-- 데이터를 Local로 내보내기

-- (1) 데이터 파일 복사
hadoop fs -cp source_path target_path -- copy
hive> !ls /tmp/my_table_201606; -- file check
hive> !cat /tmp/my_table_201606/000000_0; -- data check

-- (2) INSERT OVERWRITE LOCAL DIRECTORY ...
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/my_table_201606'
SELECT var_1, var_2, var_5
FROM my_table
WHERE year = '2016' and month = '6';

-- 데이터를 동시에 다수의 디렉터리로 로딩
FROM my_table a
INSERT OVERWRITE DIRECTORY '/tmp/my_table_201606'
     SELECT * WHERE year = '2016' and month = '6'
INSERT OVERWRITE DIRECTORY '/tmp/my_table_201607'
     SELECT * WHERE year = '2016' and month ='7'
INSERT OVERWRITE DIRECTORY '/tmp/my_table_201608'
     SELECT * WHERE year = '2016' and month = '8';

[Reference]

- Programing Hive, Edward Capriolo, Dean Wampler, Jason Rutherglen, O'REILLY

이번 포스팅이 도움이 되었다면 아래의 '공감 ~♡'를 꾸욱 눌러주세요.

728x90

저작자표시 비영리 변경금지

'Hive' 카테고리의 다른 글

[Hive] 조인 문 : INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL JOIN, CARTESIAN PRODUCT JOIN, MAP-SIDE JOIN, UNION ALL (0)	2016.06.26
[Hive] HiveQL : 중첩 SELECT문, LIKE, CASE WHEN THEN ELSE END, GROUP BY, HAVING, ORDER BY, SORT BY, DISTRIBUTE BY (0)	2016.06.26
[Hive] 내장 함수(Built-in Functions), 수학 함수(Arithmetic Functions), 집계 함수(Aggregation Functions) (0)	2016.06.25
[Hive] HiveQL Operators : 프로젝션 연산자, 비교 연산자, 산술 연산자, 논리 연산자, 복합 연산자 (0)	2016.06.25
[HIVE] 데이터 정의 언어 (Data Definition Language) - 데이터베이스(database), 테이블(table), 컬럼(column) (0)	2016.06.22

Posted by Rfriend

R, Python 분석과 프로그래밍의 친구 (by R Friend)

Hive 데이터 조작 언어 : Data Manipulation Language

'Hive' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바