이번 포스팅에서는 기본 Greenplum Database (이하 GPDB) 도커 이미지를 내려받아서, 그 위에 대용량 데이터 통계분석, 기계학습에 활용하는 MADlib, PL/R, PL/Python을 설치한 Docker Image 만든 후, Docker Hub에 올리는 일련의 절차에 대해서 소개하겠습니다. 


이번 포스팅은 도커로 GPDB 기반 분석 환경을 구성하는 절차로서, 데이터 엔지니어링에 관심있으신 분에게 도움이 될 것 같습니다. 혹시 데이터 분석으로 바로 넘어가고 싶은 분은 다음번 포스팅(http://rfriend.tistory.com/379)을 참고하시면 되며, 이번 포스팅을 skip 하여도 아무런 문제 없이 도커를 사용하여 GPDB 기반 분석 환경을 바로 바로 이용하실 수 있습니다. 



[ GPDB 기본 도커 이미지에 MADlib, PL/R, PL/Python 설치 후 gpdb-analytics 도커 이미지 새로 만들어서 Docker Hub에 올리기 절차 ]






0. Dokcer Hub 회원 가입 및 Docker Desktop 다운로드/설치




Windows OS 사용자의 경우 자신의 Windows 버전에 맞게 'Docker CE for Windows' 혹은 'Docker Toolbox'를 다운로드 받아 설치하면 됩니다. 





1. Docker Hub에서 기본 GPDB 이미지를 pull 해서 사용하기

 

<터미널 사용>

1-1.docker image download



docker pull hdlee2u/gpdb-base

docker images

REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE

centos                  7                   d123f4e55e12        9 months ago        197MB

hdlee2u/gpdb-base      latest              bfe4e63b8e81        2 years ago         1.17GB

 




1-2.docker image
실행/ 5432를 기본 포트로, ,ssh2022포트를 사용하여 접근 가능하도록 docker 이미지 실행 



docker run -i -d -p 5432:5432 -p 2022:22 --name gpdb  hdlee2u/gpdb-base

docker ps -a

 




1-3.docker image
중단, 시작 후 동작 상태 확인 



docker stop gpdb

docker start gpdb

docker ps

CONTAINER ID    IMAGE                   COMMAND                   CREATED             STATUS              PORTS                   NAMES

4e6b947be0d6    hdlee2u/gpdb-base   "/bin/sh -c 'echo \"1…"   29 hours ago        Up 3 hours        0.0.0.0:5432->5432/tcp, 0.0.0.0:2022->22/tcp   gpdb

ihongdon-ui-MacBook-Pro:~ ihongdon$

 




2. Docker GPDBMADlib 설치하기

 

2-1. GPDB version 확인 (PGAdmin III 에서)



# select version()

 

PostgreSQL 8.2.15 (Greenplum Database 4.3.7.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jan 21 2016 15:51:02

 




2-2.Pivotal Network(https://network.pivotal.io)
에서 MADlib 다운로드 (회원가입 필요) 


Pivotal Greenplum Releases: 4.3.18.0 버전 카테고리에서

>> Greenplum Advanced Analytics 선택

>> MADlib 1.12 다운로드

(https://network.pivotal.io/products/pivotal-gpdb/#/releases/8538/file_groups/764)

 






2-3. MADlib downloads 한 파일을 docker로 복사



cd Downloads/

ls

madlib-1.12-gp4.3-rhel5-x86_64.tar.gz

 



docker exec -i -t gpdb /bin/bash

[root@4e6b947be0d6 ~]# mkdir /setup

 

[참고파일 옮길 때

[root@4e6b947be0d6 ~]# mv mad* /setup/

 




2-4. docker container 내부의 /bin/bash 프로세스 실행 후, MADlib 파일을 Docker GPDB에 복사하기

 

다른 터미널의 Downloads 경로에서


docker cp madlib-1.12-gp4.3-rhel5-x86_64.tar.gz gpdb:/setup/



[root@4e6b947be0d6 ~]# chown -R gpadmin:gpadmin /setup

[root@4e6b947be0d6 ~]# cd /setup

[root@4e6b947be0d6 setup]# ls -la

total 7492

drwxr-xr-x 2 gpadmin gpadmin    4096 Aug  8 07:44 .

drwxr-xr-x 1 root    root       4096 Aug  8 07:44 ..

-rw-r--r-- 1 gpadmin gpadmin 3932160 Aug  8 05:10 madlib-1.12-gp4.3-rhel5-x86_64.tar

-rw-r--r-- 1 gpadmin gpadmin 3728292 Aug 29  2017 madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

 




2-5. 권한 부여

 


[root@4e6b947be0d6 ~]# chown -R gpadmin:gpadmin /setup

[root@4e6b947be0d6 ~]# cd /setup

[root@4e6b947be0d6 setup]# ls -la

total 7492

drwxr-xr-x 2 gpadmin gpadmin    4096 Aug  8 07:44 .

drwxr-xr-x 1 root    root       4096 Aug  8 07:44 ..

-rw-r--r-- 1 gpadmin gpadmin 3932160 Aug  8 05:10 madlib-1.12-gp4.3-rhel5-x86_64.tar

-rw-r--r-- 1 gpadmin gpadmin 3728292 Aug 29  2017 madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

 




2-6. MADlib 압축 풀고 설치하기



[root@4e6b947be0d6 setup]# su - gpadmin

-bash-4.1$ cd /setup

-bash-4.1$ ls -la

total 7492

drwxr-xr-x 2 gpadmin gpadmin    4096 Aug  8 07:44 .

drwxr-xr-x 1 root    root       4096 Aug  8 07:44 ..

-rw-r--r-- 1 gpadmin gpadmin 3932160 Aug  8 05:10 madlib-1.12-gp4.3-rhel5-x86_64.tar

 

-bash-4.1$ tar xf madlib-1.12-gp4.3-rhel5-x86_64.tar

-bash-4.1$ ls -la

total 7692

drwxr-xr-x 2 gpadmin gpadmin    4096 Aug  8 07:45 .

drwxr-xr-x 1 root    root       4096 Aug  8 07:44 ..

-rw-r--r-- 1 gpadmin gpadmin 3932160 Aug  8 05:10 madlib-1.12-gp4.3-rhel5-x86_64.tar

-rw-r--r-- 1 gpadmin gpadmin 3728292 Aug 29  2017 madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin  135530 Aug 29  2017 open_source_license_MADlib_1.12_GA.txt

-rw-r--r-- 1 gpadmin gpadmin   51900 Aug 29  2017 ReleaseNotes.txt

 

-bash-4.1$ gppkg -i madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

20180808:07:47:11:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Starting gppkg with args: -i madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

20180808:07:47:11:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing package madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

20180808:07:47:11:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-4.3.7.1/.tmp/madlib-1.12-1.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix /usr/local/greenplum-db-4.3.7.1'

20180808:07:47:11:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg locally

20180808:07:47:11:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-4.3.7.1/.tmp/madlib-1.12-1.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix /usr/local/greenplum-db-4.3.7.1'

20180808:07:47:11:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing rpms cmdStr='rpm -i /usr/local/greenplum-db-4.3.7.1/.tmp/madlib-1.12-1.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix=/usr/local/greenplum-db-4.3.7.1'

20180808:07:47:12:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Completed local installation of madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg.

20180808:07:47:12:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Please run the following command to deploy MADlib

usage:  madpack install [-s schema_name] -p greenplum -c user@host:port/database

Example:

       $ $GPHOME/madlib/bin/madpack install -s madlib -p greenplum -c gpadmin@mdw:5432/testdb

       This will install MADlib objects into a Greenplum database named "testdb"

       running on server "mdw" on port 5432. Installer will try to login as "gpadmin"

       and will prompt for password. The target schema will be "madlib".

       To upgrade to a new version of MADlib from version v1.0 or later, use option "upgrade",

       instead of "install"

For additional options run:

$ madpack --help

Release notes and additional documentation can be found at http://madlib.apache.org

20180808:07:47:12:000512 gppkg:4e6b947be0d6:gpadmin-[INFO]:-madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg successfully installed.

-bash-4.1$  madpack install ^C

-bash-4.1$ hostname

4e6b947be0d6

-bash-4.1$ cat /etc/hosts

127.0.0.1       localhost

::1      localhost ip6-localhost ip6-loopback

fe00::0 ip6-localnet

ff00::0 ip6-mcastprefix

ff02::1 ip6-allnodes

ff02::2 ip6-allrouters

172.17.0.2     4e6b947be0d6

127.0.0.1 72ba20be3774

 




Database list 확인


 

-bash-4.1$ psql

psql (8.2.15)

Type "help" for help.

 

gpadmin=# \l

                  List of databases

   Name    |  Owner  | Encoding |  Access privileges 

-----------+---------+----------+---------------------

 gpadmin   | gpadmin | UTF8     |

 postgres  | gpadmin | UTF8     |

 template0 | gpadmin | UTF8     | =c/gpadmin         

                                : gpadmin=CTc/gpadmin

 template1 | gpadmin | UTF8     | =c/gpadmin         

                                : gpadmin=CTc/gpadmin

(4 rows)

 

gpadmin=# \q




-bash-4.1$ $GPHOME/madlib/bin/madpack install -s madlib -p greenplum -c gpadmin@127.0.0.1:5432/gpadmin

madpack.py : INFO : Detected Greenplum DB version 4.3.7.

madpack.py : INFO : *** Installing MADlib ***

madpack.py : INFO : MADlib tools version    = 1.12 (/usr/local/greenplum-db-4.3.7.1/madlib/Versions/1.12/bin/../madpack/madpack.py)

madpack.py : INFO : MADlib database version = None (host=127.0.0.1:5432, db=gpadmin, schema=madlib)

madpack.py : INFO : Testing PL/Python environment...

madpack.py : INFO : > Creating language PL/Python...

madpack.py : INFO : > PL/Python environment OK (version: 2.6.2)

madpack.py : INFO : Installing MADlib into MADLIB schema...

madpack.py : INFO : > Creating MADLIB schema

madpack.py : INFO : > Creating MADLIB.MigrationHistory table

madpack.py : INFO : > Writing version info in MigrationHistory table

madpack.py : INFO : > Creating objects for modules:

madpack.py : INFO : > - array_ops

madpack.py : INFO : > - bayes

madpack.py : INFO : > - crf

madpack.py : INFO : > - elastic_net

madpack.py : INFO : > - linalg

madpack.py : INFO : > - pmml

madpack.py : INFO : > - prob

madpack.py : INFO : > - sketch

madpack.py : INFO : > - svec

madpack.py : INFO : > - svm

madpack.py : INFO : > - tsa

madpack.py : INFO : > - stemmer

madpack.py : INFO : > - conjugate_gradient

madpack.py : INFO : > - knn

madpack.py : INFO : > - lda

madpack.py : INFO : > - stats

madpack.py : INFO : > - svec_util

madpack.py : INFO : > - utilities

madpack.py : INFO : > - assoc_rules

madpack.py : INFO : > - convex

madpack.py : INFO : > - glm

madpack.py : INFO : > - graph

madpack.py : INFO : > - linear_systems

madpack.py : INFO : > - recursive_partitioning

madpack.py : INFO : > - regress

madpack.py : INFO : > - sample

madpack.py : INFO : > - summary

madpack.py : INFO : > - kmeans

madpack.py : INFO : > - pca

madpack.py : INFO : > - validation

madpack.py : INFO : MADlib 1.12 installed successfully in MADLIB schema.

 



 

스키마 리스트 & MADlib 버전 확인



-bash-4.1$ psql

psql (8.2.15)

Type "help" for help.

 

gpadmin=# \dn

       List of schemas

        Name        |  Owner 

--------------------+---------

 gp_toolkit         | gpadmin

 information_schema | gpadmin

 madlib             | gpadmin

 pg_aoseg           | gpadmin

 pg_bitmapindex     | gpadmin

 pg_catalog         | gpadmin

 pg_toast           | gpadmin

 public             | gpadmin

(8 rows)

 

gpadmin=# select madlib.version();

MADlib version: 1.12, git revision: rc/1.12-rc1, cmake configuration time: Wed Aug 23 22:33:09 UTC 2017, build type: Release, build system: Linux-2.6.18-238.27.1.el5.hot

fix.bz516490, C compiler: gcc 4.4.0, C++ compiler: g++ 4.4.0

(1 row)

 

---- MADlib 설치 성공 완료! ----

 

gpadmin=# \q

 




3. Docker GPDB
PL/R 설치하기 

 

Cent OS 버전 확인하기



-bash-4.1$ cat /etc/redhat-release

CentOS release 6.7 (Final)

 




3-1. Pivotal Network에서 PL/R 다운로드 하기

Pivotal Greenplum Releases: 4.3.18.0 카테고리에서

>> Greenplum Procedural Languages

>> PL/R for RHEL 6 다운로드

(https://network.pivotal.io/products/pivotal-gpdb/#/releases/8538/file_groups/768)

 




3-2. (다른 터미널에서) Docker GPDB/setup 폴더로 PL/R 복사하기



ihongdon-ui-MacBook-Pro:~ ihongdon$ cd Downloads/

ihongdon-ui-MacBook-Pro:Downloads ihongdon$ ls -la

total 90512

-rw-r--r--@  1 ihongdon  staff   3788987  8  8 14:10 madlib-1.12-gp4.3-rhel5-x86_64.tar.gz

-rw-r--r--@  1 ihongdon  staff  39640735  8  8 16:51 plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

 

ihongdon-ui-MacBook-Pro:Downloads ihongdon$ docker cp plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg gpdb:/setup

 




3-3. (Docker
터미널에서) 권한 부여하기 및 gpadmin 들어가기 



[root@4e6b947be0d6 setup]# su - gpadmin

-bash-4.1$ ls

madlib-1.12-gp4.3-rhel5-x86_64.tar                     

open_source_license_MADlib_1.12_GA.txt  ReleaseNotes.txt

madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg 

plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

-bash-4.1$ exit

logout

 

[root@4e6b947be0d6 setup]# chown gpadmin:gpadmin plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

 

 [root@4e6b947be0d6 setup]# su - gpadmin

-bash-4.1$ cd /setup

-bash-4.1$ ls

madlib-1.12-gp4.3-rhel5-x86_64.tar                     

open_source_license_MADlib_1.12_GA.txt  ReleaseNotes.txt

madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg 

plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

 

-bash-4.1$ ls -la

total 46404

drwxr-xr-x 2 gpadmin gpadmin     4096 Aug  8 07:52 .

drwxr-xr-x 1 root    root        4096 Aug  8 07:44 ..

-rw-r--r-- 1 gpadmin gpadmin  3932160 Aug  8 05:10 madlib-1.12-gp4.3-rhel5-x86_64.tar

-rw-r--r-- 1 gpadmin gpadmin      566 Aug 29  2017 ._madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin  3728292 Aug 29  2017 madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin     1362 Aug 29  2017 ._open_source_license_MADlib_1.12_GA.txt

-rw-r--r-- 1 gpadmin gpadmin   135530 Aug 29  2017 open_source_license_MADlib_1.12_GA.txt

-rw-r--r-- 1 gpadmin gpadmin 39640735 Aug  8 07:51 plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin      594 Aug 29  2017 ._ReleaseNotes.txt

-rw-r--r-- 1 gpadmin gpadmin    51900 Aug 29  2017 ReleaseNotes.txt

 



 

3-4. PL/R 설치



-bash-4.1$ gppkg -i plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

20180808:07:53:41:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Starting gppkg with args: -i plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

20180808:07:53:42:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing package plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

20180808:07:53:42:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-4.3.7.1/.tmp/R-3.3.3-1.x86_64.rpm /usr/local/greenplum-db-4.3.7.1/.tmp/plr-2.3-1.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix /usr/local/greenplum-db-4.3.7.1'

20180808:07:53:42:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg locally

20180808:07:53:42:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-4.3.7.1/.tmp/R-3.3.3-1.x86_64.rpm /usr/local/greenplum-db-4.3.7.1/.tmp/plr-2.3-1.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix /usr/local/greenplum-db-4.3.7.1'

20180808:07:53:42:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing rpms cmdStr='rpm -i /usr/local/greenplum-db-4.3.7.1/.tmp/R-3.3.3-1.x86_64.rpm /usr/local/greenplum-db-4.3.7.1/.tmp/plr-2.3-1.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix=/usr/local/greenplum-db-4.3.7.1'

20180808:07:53:45:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Completed local installation of plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg.

20180808:07:53:45:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:--

==========================================================================

PL/R installation is complete! To proceed, follow these steps:

1. Source your new $GPHOME/greenplum_path.sh file and restart the database.

2. Create PL/R language in the target database with "createlang plr -d mydatabase"

3. Optionally you can enable PL/R helper functions in this database by running

    "psql -f $GPHOME/share/postgresql/contrib/plr.sql -d mydatabase"

==========================================================================

20180808:07:53:45:006779 gppkg:4e6b947be0d6:gpadmin-[INFO]:-plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg successfully installed.

 




3-5. GPDB stop
 



-bash-4.1$ gpstop -af

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Starting gpstop with args: -af

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Gathering information and validating the environment...

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Obtaining Segment details from master...

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.7.1 build 1'

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-There are 2 connections to the database

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='fast'

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Master host=72ba20be3774

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Detected 2 connections to database

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Switching to WAIT mode

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Will wait for shutdown to complete, this may take some time if

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-there are a large number of active complex transactions, please wait...

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=fast

20180808:07:54:30:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Master segment instance directory=/gpdata/master/gpseg-1

20180808:07:54:31:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process

20180808:07:54:31:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Terminating processes for segment /gpdata/master/gpseg-1

20180808:07:54:31:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-No standby master host configured

20180808:07:54:31:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...

20180808:07:54:31:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-0.00% of jobs completed

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-100.00% of jobs completed

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-   Segments stopped successfully      = 2

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-   Segments with errors during stop   = 0

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Successfully shutdown 2 of 2 segment instances

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Database successfully shutdown with no errors reported

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Cleaning up leftover gpmmon process

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-No leftover gpmmon process found

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Cleaning up leftover gpsmon processes

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-No leftover gpsmon processes on some hosts. not attempting forceful termination on these hosts

20180808:07:54:41:006833 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Cleaning up leftover shared memory

 




3-6.
Source your new $GPHOME/greenplum_path.sh file 



-bash-4.1$ vi ~/.bash_profile

-bash-4.1$ source ~/.bash_profile

-bash-4.1$ env | grep R

MASTER_DATA_DIRECTORY=/gpdata/master/gpseg-1

TERM=xterm

USER=gpadmin

LD_LIBRARY_PATH=/usr/local/greenplum-db/./ext/R-3.3.3/lib:/usr/local/greenplum-db/./ext/R-3.3.3/extlib:

/usr/local/greenplum-db/./lib:

/usr/local/greenplum-db/./ext/python/lib:/usr/local/greenplum-db/./lib:/usr/local/greenplum-db/./ext/python/lib:

LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:

su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:

*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:

*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:

*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:

*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:

*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:

*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:

*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:

*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:

*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:

*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:

PATH=/usr/local/greenplum-db/./ext/R-3.3.3/bin:/usr/local/greenplum-db/./bin:/usr/local/greenplum-db/./ext/python/bin:/usr/local/greenplum-db/./bin:/usr/local/greenplum-db/./ext/python/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin

HISTCONTROL=ignoredups

R_HOME=/usr/local/greenplum-db/./ext/R-3.3.3

G_BROKEN_FILENAMES=1

 




3-7.
Create PL/R language in the target database with "createlang plr -d mydatabase" and restart the database. 



-bash-4.1$ createlang plr -d gpadmin

createlang: could not connect to database gpadmin: could not connect to server: No such file or directory

         Is the server running locally and accepting

         connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

 

-bash-4.1$ gpstart -a

20180808:07:56:17:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Starting gpstart with args: -a

20180808:07:56:17:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Gathering information and validating the environment...

20180808:07:56:17:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.7.1 build 1'

20180808:07:56:17:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'

20180808:07:56:17:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Starting Master instance in admin mode

20180808:07:56:18:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information

20180808:07:56:18:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Obtaining Segment details from master...

20180808:07:56:18:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Setting new master era

20180808:07:56:18:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Master Started...

20180808:07:56:18:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Shutting down master

20180808:07:56:19:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...

..

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Process results...

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-   Successful segment starts                                            = 2

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-   Failed segment starts                                                = 0

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:07:56:21:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Starting Master instance 72ba20be3774 directory /gpdata/master/gpseg-1

20180808:07:56:22:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Command pg_ctl reports Master 72ba20be3774 instance active

20180808:07:56:23:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-No standby master configured.  skipping...

20180808:07:56:23:006995 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Database successfully started

 

-bash-4.1$ psql -f $GPHOME/share/postgresql/contrib/plr.sql -d gpadmin

SET

CREATE FUNCTION

CREATE FUNCTION

CREATE FUNCTION

REVOKE

CREATE FUNCTION

 

-bash-4.1$ exit

logout

 



 

4. DataSciencePython(PL/Python), DataScienceR 설치하기

 

4-1. Pivotal Network에서 PL/Python 다운로드 하기

Pivotal Greenplum Releases: 4.3.18.0 카테고리에서

>> Greenplum Procedural Languages

>> Python Data Science Package for RHEL 6다운로드

>> R Data Science Package for RHEL 6 다운로드

(https://network.pivotal.io/products/pivotal-gpdb/#/releases/8538/file_groups/768)

 


4-2. (다른 터미널에서) Docker GPDB로 파일 복사하기



ihongdon-ui-MacBook-Pro:Downloads ihongdon$ ls -la

total 549520

-rw-r--r--@  1 ihongdon  staff  103557483  8  8 16:58 DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg

-rw-r--r--@  1 ihongdon  staff  114736878  8  8 16:58 DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg

-rw-r--r--@  1 ihongdon  staff    3788987  8  8 14:10 madlib-1.12-gp4.3-rhel5-x86_64.tar.gz

-rw-r--r--@  1 ihongdon  staff   39640735  8  8 16:51 plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

 

ihongdon-ui-MacBook-Pro:Downloads ihongdon$ docker cp DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg gpdb:/setup

ihongdon-ui-MacBook-Pro:Downloads ihongdon$ docker cp DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg gpdb:/setup

 




4-3. (Docker
터미널에서) 권한 부여 및 gpadmin 들어가기 



[root@4e6b947be0d6 setup]# chown gpadmin:gpadmin *.gppkg

[root@4e6b947be0d6 setup]# su - gpadmin

-bash-4.1$ ls

gpAdminLogs

-bash-4.1$ cd  /setup

-bash-4.1$ ls

DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg 

madlib-1.12-gp4.3-rhel5-x86_64.tar                     

open_source_license_MADlib_1.12_GA.txt 

ReleaseNotes.txt

DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg      

madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg 

plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

 

-bash-4.1$ ls -la

total 259584

drwxr-xr-x 2 gpadmin gpadmin      4096 Aug  8 07:59 .

drwxr-xr-x 1 root    root         4096 Aug  8 07:44 ..

-rw-r--r-- 1 gpadmin gpadmin 103557483 Aug  8 07:58 DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin 114736878 Aug  8 07:58 DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin   3932160 Aug  8 05:10 madlib-1.12-gp4.3-rhel5-x86_64.tar

-rw-r--r-- 1 gpadmin gpadmin       566 Aug 29  2017 ._madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin   3728292 Aug 29  2017 madlib-ossv1.12_pv1.9.9_gpdb4.3orca-rhel5-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin      1362 Aug 29  2017 ._open_source_license_MADlib_1.12_GA.txt

-rw-r--r-- 1 gpadmin gpadmin    135530 Aug 29  2017 open_source_license_MADlib_1.12_GA.txt

-rw-r--r-- 1 gpadmin gpadmin  39640735 Aug  8 07:51 plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg

-rw-r--r-- 1 gpadmin gpadmin       594 Aug 29  2017 ._ReleaseNotes.txt

-rw-r--r-- 1 gpadmin gpadmin     51900 Aug 29  2017 ReleaseNotes.txt

 




4-4. DataSciencePython
설치하기 



-bash-4.1$ gppkg -i DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg

20180808:08:00:58:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Starting gppkg with args: -i DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg

20180808:08:00:59:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing package DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg

20180808:08:00:59:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-4.3.7.1/.tmp/DataSciencePython-1.0-0.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix /usr/local/greenplum-db-4.3.7.1'

20180808:08:00:59:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-SQLITE3 Installed

20180808:08:01:00:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-TK Not installed

20180808:08:01:00:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-TCL Not installed

20180808:08:01:00:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg locally

20180808:08:01:00:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-4.3.7.1/.tmp/DataSciencePython-1.0-0.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix /usr/local/greenplum-db-4.3.7.1'

20180808:08:01:01:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing rpms cmdStr='rpm -i /usr/local/greenplum-db-4.3.7.1/.tmp/DataSciencePython-1.0-0.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix=/usr/local/greenplum-db-4.3.7.1'

20180808:08:01:14:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Completed local installation of DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg.

20180808:08:01:14:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Please source your $GPHOME/greenplum_path.sh file and restart the database.

20180808:08:01:14:007333 gppkg:4e6b947be0d6:gpadmin-[INFO]:-DataSciencePython-1.0.0-gp4-rhel6-x86_64.gppkg successfully installed.

 




4-5. DataScienceR
설치하기 



-bash-4.1$ gppkg -i DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg

20180808:08:01:48:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Starting gppkg with args: -i DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg

20180808:08:01:48:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing package DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg

20180808:08:01:49:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-4.3.7.1/.tmp/DataScienceR-1.0-0.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix /usr/local/greenplum-db-4.3.7.1'

20180808:08:01:49:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg locally

20180808:08:01:50:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-4.3.7.1/.tmp/DataScienceR-1.0-0.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix /usr/local/greenplum-db-4.3.7.1'

20180808:08:01:50:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Installing rpms cmdStr='rpm -i /usr/local/greenplum-db-4.3.7.1/.tmp/DataScienceR-1.0-0.x86_64.rpm --dbpath /usr/local/greenplum-db-4.3.7.1/share/packages/database --prefix=/usr/local/greenplum-db-4.3.7.1'

20180808:08:02:05:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Completed local installation of DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg.

20180808:08:02:05:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-Please source your $GPHOME/greenplum_path.sh file and restart the database.

20180808:08:02:05:007393 gppkg:4e6b947be0d6:gpadmin-[INFO]:-DataScienceR-1.0.0-gp4-rhel6-x86_64.gppkg successfully installed.

 




4-6. source path
설정 


-bash-4.1$ source $GPHOME/greenplum_path.sh




4-7. GPDB stop
 



-bash-4.1$ gpstop -af

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Starting gpstop with args: -af

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Gathering information and validating the environment...

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Obtaining Segment details from master...

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.7.1 build 1'

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-There are 0 connections to the database

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='fast'

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Master host=72ba20be3774

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Detected 0 connections to database

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Using standard WAIT mode of 120 seconds

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=fast

20180808:08:02:18:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Master segment instance directory=/gpdata/master/gpseg-1

20180808:08:02:19:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process

20180808:08:02:20:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Terminating processes for segment /gpdata/master/gpseg-1

20180808:08:02:20:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-No standby master host configured

20180808:08:02:20:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...

20180808:08:02:20:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-0.00% of jobs completed

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-100.00% of jobs completed

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-   Segments stopped successfully      = 2

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-   Segments with errors during stop   = 0

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Successfully shutdown 2 of 2 segment instances

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Database successfully shutdown with no errors reported

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Cleaning up leftover gpmmon process

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-No leftover gpmmon process found

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Cleaning up leftover gpsmon processes

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-No leftover gpsmon processes on some hosts. not attempting forceful termination on these hosts

20180808:08:02:30:007430 gpstop:4e6b947be0d6:gpadmin-[INFO]:-Cleaning up leftover shared memory

 




4-8. GPDB restart
 



-bash-4.1$ gpstart -a

20180808:08:02:39:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Starting gpstart with args: -a

20180808:08:02:39:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Gathering information and validating the environment...

20180808:08:02:39:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.7.1 build 1'

20180808:08:02:39:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'

20180808:08:02:39:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Starting Master instance in admin mode

20180808:08:02:40:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information

20180808:08:02:40:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Obtaining Segment details from master...

20180808:08:02:40:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Setting new master era

20180808:08:02:40:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Master Started...

20180808:08:02:40:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Shutting down master

20180808:08:02:42:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...

..

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Process results...

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-   Successful segment starts                                            = 2

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-   Failed segment starts                                                = 0

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-----------------------------------------------------

20180808:08:02:44:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Starting Master instance 72ba20be3774 directory /gpdata/master/gpseg-1

20180808:08:02:45:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Command pg_ctl reports Master 72ba20be3774 instance active

20180808:08:02:45:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-No standby master configured.  skipping...

20180808:08:02:45:007584 gpstart:4e6b947be0d6:gpadmin-[INFO]:-Database successfully started

-bash-4.1$

-bash-4.1$

 



 

5. Docker 이미지 만들고, Docker Hub에 올리기

 

5-1. Docker 이미지 만들고, 생성 여부 확인하기



 # Docker 이미지 만들기 : $ docker commit [OPTIONS] CONTAINER [REPOSITORY[:TAG]]


ihongdon-ui-MacBook-Pro:~ ihongdon$ docker commit \

-a "pivotal-data-science-holee" \

-m "GPDB 4.37 with MADlib, PL/R, PL/Python" \

gpdb \

gpdb-analytics:first

sha256:c1450e5e5a58f0518ce747a408846968b3a8d2746e6a147c1a3d8ce2910dfff1

ihongdon-ui-MacBook-Pro:~ ihongdon$

ihongdon-ui-MacBook-Pro:~ ihongdon$ docker images

REPOSITORY                         TAG                 IMAGE ID            CREATED             SIZE

gpdb-analytics                     first               c1450e5e5a58        3 minutes ago       2.83GB

centos                             7                   d123f4e55e12        9 months ago        197MB

pivotaldata/gpdb-base              latest              bfe4e63b8e81        2 years ago         1.17GB

ihongdon-ui-MacBook-Pro:~ ihongdon$

 


 

 

5-2. Docker ImageTag 추가 후 Docker Hub에 올리기(push)



 # Tag 추가 : $ docker tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]


ihongdon-ui-MacBook-Pro:~ ihongdon$ docker tag gpdb-analytics:first hdlee2u/gpdb-analytics:latest

ihongdon-ui-MacBook-Pro:~ ihongdon$

ihongdon-ui-MacBook-Pro:~ ihongdon$ docker images

REPOSITORY                         TAG                 IMAGE ID            CREATED             SIZE

hdlee2u/gpdb-analytics             latest               c1450e5e5a58        About an hour ago   2.83GB

gpdb-analytics                     first               c1450e5e5a58        About an hour ago   2.83GB

centos                             7                   d123f4e55e12        9 months ago        197MB

pivotaldata/gpdb-base              latest              bfe4e63b8e81        2 years ago         1.17GB

 

# Docker Hub 로그인

ihongdon-ui-MacBook-Pro:~ ihongdon$ docker login

Authenticating with existing credentials...

Login Succeeded

 



 # Docker Hub에 도커 이미지 올리기 : $ docker push [OPTIONS] NAME[:TAG]


ihongdon-ui-MacBook-Pro:~ ihongdon$ docker push hdlee2u/gpdb-analytics:latest

The push refers to repository [docker.io/hdlee2u/gpdb-analytics]

e603ef714b8e: Pushing [========>                                          ]  275.1MB/1.663GB

8c65eead2327: Mounted from hdlee2u/gpdb-base

516e5ab465c7: Mounted from hdlee2u/gpdb-base

5f70bf18a086: Mounted from hdlee2u/gpdb-base

69bd93b9db4e: Mounted from hdlee2u/gpdb-base





[ https://hub.docker.com/r/hdlee2u/gpdb-analytics/ 에 gpdb-analytics 도커 이미지 올라간 모습 ]




다음번 포스팅에서는 Docker Hub에서 gpdb-ds 도커 이미지를 내려 받아서 사용할 수 있는 방법에 대해서 소개하겠습니다. 


많은 도움이 되었기를 바랍니다. 



728x90
반응형
Posted by Rfriend
,

이번 포스팅에서는 Greenplum Database (Data Platform)에 대해서 소개하겠습니다. 


세상에는 여러 개의 Database가 존재합니다.  그중에서도 Greenplum Database (줄여서 GPDB라고도 부름)은 MPP(Massively Parallel Processing) 아키텍처 기반의 Open Source RDBMS 에 해당합니다. 


Greenplum Database Community 홈페이지에 가보면 Greenplum DB를 "세계 최초의 오픈소스 & MPP 데이터 플랫폼 (The World's First Open-Source & Massively Parallel Data Platform)" 이라고 소개하고 있습니다. 



[ Greenplum Database, "The World's First Open-Source & Massively Parallel Data Platform" ]




Greenplum DB (이하 GPDB)는 2003년 미국 캘리포니아 산호세에서 처음 시작해서 2010년에 EMC에 인수되었습니다. 그리고 2012년부터는 Pivotal Greenplum Database로 이름이 바뀌었으며, 기업용의 상용SW로 Pivotal Software에서 판매, 유지보수 서비스를 하고 있습니다. 그리고 현재까지 오픈소스 커뮤니티와 Pivotal에서 Agile방법론에 입각해서 지속적으로 업그레이드를 하고 있습니다. 


왜 수십~수백 테라 바이트, 수 페타바이트 급의 빅데이터 분석에 MPP 아키텍처인지가 필요한지에 대한 이해를 돕기 위해서, SMP(Symmetric Multi-Processing) 와 MPP(Massively Prarallel Processing) 아키텍처에 대해서 간략히 설명드리겠습니다. 



[ SMP(Symmetric Multi-Processing) vs. MPP(Massively Parallel Processing) ]





SMP(Symmetric Multi-Processing) 아키텍처는 보통 "Shard Everything Architecture"라고도 하는데요, 모든 CPU가 하나의 메모리와 디스크를 사용합니다. 따라서 관리가 편리한 장점이 있는 반면에 관리/분석해야하는 데이터 용량이 커지면 Scale Up 하는데 한계를 가지게 됩니다. 


반면에 MPP(Massively Parallel Processing) 아키텍처는 "Shared Nothing Architecture"라고도 하는데요, 데이터가 여러개의 디스크에 분산되어 저장이 되며, 여러개의 CPU가 각각 메모리와 디스크를 할당받아 이용을 하게 됨에 따라 분산병렬처리가 가능하게 됩니다. 데이터 용량을 늘리려면 디스크, 메모리, CPU를 옆으로 계속 확장하여 네트워크로 연결을 해주는 Scale Out 방식을 취하면 되어 이론적으로는 계속 확장을 할 수 있습니다. 그리고 분산병렬처리를 하게 됨에 따라 노드 수에 비례하여 연산의 속도가 빨라집니다. 단점으로는 SMP 아키텍처 대비 상대적으로 복잡하고 관리가 어려우며 비용이 비싸다는 점입니다. 


Greenplum Database는 PostgreSQL 기반의 MPP 아키텍처의 장점을 가지면서, Proprietary Commercial DB들 대비 오픈소스로서 상대적으로 TCO(Total Cost of Ownership)이 낮고 혁신의 속도가 빠르다는 장점이 있습니다. 그리고 Pivotal Software에서 기업용으로 판매, 유지보수 지원을 해주므로 기업에서 도입하여 사용하기에도 문제가 없다고 하겠습니다. 


Pivotal은 Agile S/W 개발 방법론과 DevOps라는 개념을 처음으로 주창했던 기업(Pivotal Labs)이며, PCF(Pivotal Cloud Foundry)라는 Cloud Platform도 개발, 판매하는 소프트웨어 개발회사입니다. 올해 2018년 4월 달에 나스닥에 상장하기도 하였습니다.  Pivotal은 소프트웨어 회사 중에서도 가장 선도적이고 적극적으로 오픈 소스 철학을 지지하고 커미터로 참여하고 있는 기업입니다. 



[ Pivotal Greenplum and Open Source Eco. ]



Kafka, Spring Cloud Data Flow, RabbitMQ, redis, GemFire 등을 사용하여 실시간 데이터(Fast Data)를 수집/처리하는데요, Greenplum DB와 연동하여 데이터를 수집/저장할 수 있습니다. 


Greenplum DB는 또한 Cloudera, Hortonworks, MapR 등의 하둡 플랫폼과도 전용 connector로 연동하여 하둡 파일 시스템 내의 대용량 데이터를 분산 병렬로 데이터를 매우 빠르게 로딩할 수도 있습니다. 



[ MPP(Massively Parallel Processing) 기반의 Greenplum 아키텍처 ]





특히, 초대용량 데이터를 분석하는 데이터 분석가, 데이터 과학자의 입장에서는 SQL로 데이터 전처리, 집계, 분석과 함께 SQL로 빅데이터 기계학습을 할 수 있는 MADlib, R과 Python의 다양한 통계, 기계학습 라이브러리를 Greenplum DB에 설치하여 대용량 데이터를 분산병렬로 분석, 모델 학습, 스코어링 등이 가능합니다. R, Python, Java, C, Perl 등의 프로그래밍 언어를 가지고 DB 내에서 분산병렬로 사용할 수 있는 PL/x(Procedutal Language) 기능도 지원함에 따라 복잡한 사용자 정의 함수를 프로그래밍하여 사용할 수 있는 엄청난 매력이 있습니다. 


보통 R이나 Python, MATLAB, SAS 등으로 데이터를 분석할 때 DB로 부터 데이터를 내려받고, 다시 로딩하고하고, 분석 결과를 다시 DB로 옮기고 하는데 엄청나게 많은 시간이 소요되며, 시스템 운영 상에도 복잡성을 높여 관리의 어려움을 가중시키곤 합니다. 그리고 R, Python 등을 사용하려면 분석 컴퓨터의 디스크, 메모리가 감당할 수 있는 수준의 데이터 양만 가지고 분석을 할 수 있으므로 보통 샘플링한 데이터를 가지고 분석을 하게 됨에 따라 분석의 정확도(과적합 위험)를 제약하는 한계도 가집니다. 


그런데 Greenplum Database를 사용하면 In-Database 에서 데이터의 이동 없이 초대용량 데이터를 분산병렬처리로 분석, 기계학습할 수 있습니다. 



[ Greenplum : In-Database 정형 & 비정형 데이터 통합 분석 플랫폼 ]




또한 Greenplum DB는 정형데이터 뿐만 아니라, 텍스트 데이터(w/ Solr 기반 GPText), 지리데이터(w/ PostGIS), 그래프데이터(MADlib) 분석도 하나의 DB 안에서 정형 데이터와 연계하여 통합 분석, 기계학습이 모두 가능함에 따라 분석의 자유도와 그동안은 찾을 수 없었던 새로운 인사이트를 도출할 수 있는 매우 강력한 차세대 분석 플랫폼입니다. 


Greenplum DB는 On-premise 뿐만 아니라 가상화 환경 위에 Private Cloud, Public Cloud (Amazon AWS, Google Cloud, Microsoft Azure) 위에서도 구동 가능합니다. (도커 컨테이너 기반 Greenplum 은 '18년 하반기)



[ On-premise, 가상화 환경, 클라우드 환경에서 모두 구동되는 Greenplum ]





초대용량의 정형 & 비정형 데이터를 망라해서 통계분석, 기계학습을 필요로 하는 기업, 데이터 과학자, 데이터 엔지니어라면 Greenplum DB는 첫번째로 꼽을 수 있는 차세대 분석 플랫폼이라고 할 수 있습니다. 


다음번 포스팅에서는 Greenplum DB에 MADlib, PL/R, PL/Python 을 설치하여 Docker Image로 만들어서 Docker Hub에 올리는 방법을 소개하겠습니다. 


많은 도움이 되었기를 바랍니다. 


[Reference site]

(1) https://pivotal.io/

(2) https://network.pivotal.io/

(3) https://greenplum.org/

(4) https://github.com/greenplum-db

(5) http://madlib.apache.org/

(6) https://gptext.docs.pivotal.io/

(7) Greenplum Database: The First Open Source Data Warehouse (https://www.youtube.com/watch?v=E11aE38YoQ0)


마지막으로 Pivotal의 Greenplum DB product manager인 Ivan Novick의 유튜브 발표 자료 공유합니다. 



728x90
반응형
Posted by Rfriend
,

이번 포스팅에서는 리눅스 정규 표현식 (Linux Regular Expression)에 대해서 알아보겠습니다. 


정규 표현식은 데이터 검색, 복잡한 패턴 매칭을 도와주는 특별한 문자입니다. 정규표현식(regular expression)은 줄여서 'regexp' 또는 'regex' 라고도 합니다. 


정규 표현식은 리눅스 뿐만 아니라 유닉스, SQL, R, Python 등 에서도 사용할 수 있습니다. 지난번 포스팅에서 소개했던 grep 문과 함께 사용하면 정말로 강력하게 문자열 패턴 매칭, 검색을 할 수가 있습니다. 


정규 표현식(Regular expressions)에 대해서 모르는 상태에서 다른 사람이 짜놓은 정규 표현식을 처음으로 보게 되면 '에잉? 이게 뭐지? 갓난 아기가 컴퓨터 자판를 가지고 마구잡이로 장난하면서 두둘겨 놓은건가?' 이런 생각이 들겁니다.  알면 해독이 되는데요, 모르면 도통 이게 뭐에 쓰는 물건인가, 이게 프로그래밍 언어 맞나 싶게 요상하게 보이거든요. ^^;  암튼, 알아두면 문자열 패턴 매칭, 검색 시 정말 유용합니다. 



정규 표현식에는 3가지 유형이 있습니다. 


  • 기본 정규 표현식 (Basic Regular Expressions)
  • 간격 정규 표현식 (Interval Regular Expressions)
  • 확장 정규 표현식 (Extended Regular Expression)





다음은 예제로 사용할 텍스트 문서입니다. 



[MacBook-Pro:Documents rfriend$ cat mylove.txt 

drum

photography

data science

greenplum

python

R

book

movie

dancing

singing

milk

english

gangnam style

new face

soccer

pingpong

sleeping

martial art

jogging

blogging

apple

grape

banana

tomato

bibimbab

kimchi

@email

123_abc_d4e5

xyz123_abc_d4e5

123_abc_d4e5.xyz

xyz123_abc_d4e5.xyz

 



먼저 기본 정규 표현식 (Basic Regular Expressions)을 예를 들어서 살펴보겠습니다. 

정규표현식은 큰 따옴표(" ")안에 매칭할 문자와 함께 사용합니다.  



1. 문자열의 처음 시작 부분 매칭: ^


참고로 -n 은 행번호를 출력하라는 뜻입니다. 



[MacBook-Pro:Documents rfriend$ grep -n "^m" mylove.txt 

8:movie

11:milk

18:martial art

 




2. 문자열의 끝 부분 매칭$



[MacBook-Pro:Documents rfriend$ grep -n "m$" mylove.txt 

1:drum

4:greenplum

 




3. 점의 개수만큼 아무 문자나 대체: ...



[MacBook-Pro:Documents rfriend$ grep -n "m..." mylove.txt 

8:movie

11:milk

13:gangnam style

18:martial art

24:tomato

25:bibimbab

26:kimchi


[MacBook-Pro:Documents rfriend$ grep -n "m......" mylove.txt 

13:gangnam style

18:martial art


[MacBook-Pro:Documents rfriend$ grep -n "..m..." mylove.txt 

13:gangnam style

24:tomato

25:bibimbab

26:kimchi


[MacBook-Pro:Documents rfriend$ grep -n "....m" mylove.txt 

4:greenplum

13:gangnam style

25:bibimbab

 




4. * 부호 앞의 문자와 여러개 매칭 : *



[MacBook-Pro:Documents rfriend$ grep "app*" mylove.txt 

photography

apple

grape


[MacBook-Pro:Documents rfriend$ grep "^app" mylove.txt 

apple

 




5. 특수 문자와 매칭\



[MacBook-Pro:Documents rfriend$ grep "\@" mylove.txt 

@email

 




6. a나 b로 시작하는 모든 행을 찾아서 출력 : ^[ab]



[MacBook-Pro:Documents rfriend$ grep "^[ab]" mylove.txt 

book

blogging

apple

banana

bibimbab

 




7. 0~9 사이 숫자로 시작하는 단어 :  ^[0-9]



[MacBook-Pro:Documents rfriend$ grep ^[0-9] mylove.txt 

123_abc_d4e5

123_abc_d4e5.xyz

 




8. x~z 사이 알파벳으로 끝나는 단어 : [a-e]$



[MacBook-Pro:Documents rfriend$ grep [x-z]$ mylove.txt 

photography

123_abc_d4e5.xyz

xyz123_abc_d4e5.xyz

 




다음의 예는 간격 정규 표현식(Interval Regular Expressions) 입니다.  

간격 정규 표현식은 문자열 안에서 특정 문자가 몇 번 출현 했는지를 가지고 패턴 매칭할 때 사용합니다. 


9. 앞의 문자와 'n'번 정확하게 매칭: {n}


grep -E "character"\{n} 의 형식으로 사용합니다. 



[MacBook-Pro:Documents rfriend$ grep "g" mylove.txt 

photography

greenplum

dancing

singing

english

gangnam style

pingpong

sleeping

jogging

blogging

grape


[MacBook-Pro:Documents rfriend$ grep -E "g"\{2} mylove.txt 

jogging

blogging

 




마지막으로, 확장 정규 표현식(Extended Regular Expressions) 입니다.  확장 정규 표현식은 한 개 이상의 표현식(combinations of more than one expressions)을 결합하여 사용할 수 있게 해줍니다. 



10. \+ 앞의 문자가 한번 이상 출현한 문자열과 매칭 \+



[MacBook-Pro:Documents rfriend$ grep "k" mylove.txt 

book

milk

kimchi


-- 문자 'k'의 앞에 'o'가 출현한 문자열만 선별하고 싶은 경우 "o\+k" 정규표현식 사용

[MacBook-Pro:Documents rfriend$ grep "o\+k" mylove.txt 

book

 



* Reference : https://www.guru99.com/linux-regular-expressions.html


많은 도움이 되었기를 바랍니다. 

728x90
반응형
Posted by Rfriend
,

이번 포스팅에서는 리눅스의 grep 을 사용하여 하나의 파일이나 디렉토리 안의 여러개의 파일에서 특정 패턴에 매칭되는 행을 찾아서 출력하는 다양한 옵션, 방법을 소개하겠습니다.  특히 정규 표현식(regular expression) 까지 함께 사용하면 매우 강력하게 파일 내 특정 패턴을 찾을 수 있어서 굉장히 유용합니다. 



[ LINUX grep command : 문자열 패턴 검색 후 출력 ]





먼저, 예제로 사용할 수 있도록 demo_file.txt의 텍스트 파일을 준비하였습니다. 



[MacBook-Pro:~ rfriend$ ssh gpadmin@192.168.188.131

gpadmin@192.168.188.131's password: xxxxxxxx

Last login: Sat Dec  9 01:06:33 2017 from 192.168.188.1

[gpadmin@mdw ~]$ ls -al

합계 544860

drwx------  8 gpadmin gpadmin      4096 2017-12-09 01:12 .

drwxr-xr-x. 4 root    root         4096 2017-11-08 20:04 ..

-rw-------  1 gpadmin gpadmin      4792 2017-12-09 01:10 .bash_history

-rw-r--r--  1 gpadmin gpadmin        18 2014-10-16 22:56 .bash_logout

-rw-r--r--  1 gpadmin gpadmin       398 2017-12-04 13:45 .bash_profile

-rw-r--r--  1 gpadmin gpadmin       124 2014-10-16 22:56 .bashrc

-rw-rw-r--  1 gpadmin gpadmin        28 2017-11-08 20:12 .gphostcache

drwxrwxr-x  2 gpadmin gpadmin      4096 2017-12-04 13:43 .oracle_jre_usage

-rw-------  1 gpadmin gpadmin        32 2017-11-08 20:21 .pgpass

-rw-rw-r--  1 gpadmin gpadmin         0 2017-11-08 20:21 .pgpass.1510140115

-rw-------  1 gpadmin gpadmin      2066 2017-12-09 00:21 .psql_history

drwx------  2 gpadmin gpadmin      4096 2017-11-08 20:05 .ssh

drwxrwxr-x  2 gpadmin gpadmin      4096 2017-12-06 20:10 command

-rw-r--r--  1 gpadmin gpadmin       233 2017-12-09 01:12 demo_file.txt

drwxrwxr-x  2 gpadmin gpadmin      4096 2017-12-08 09:45 gpAdminLogs

drwxr-xr-x  2 gpadmin gpadmin      4096 2017-11-08 20:14 gpconfigs

drwxrwxr-x  2 gpadmin gpadmin      4096 2017-12-07 20:29 gptext

-rwxr-xr-x  1 gpadmin gpadmin      2916 2017-12-04 13:42 gptext_install_config

-rwxr-xr-x  1 gpadmin gpadmin 192264305 2017-09-19 09:20 greenplum-text-2.1.3-rhel6_x86_64.bin

-rw-r--r--  1 gpadmin gpadmin 191435368 2017-12-04 12:36 greenplum-text-2.1.3-rhel6_x86_64.tar.gz

-rw-r--r--  1 gpadmin gpadmin 174157387 2017-12-04 13:28 jdk-8u161-linux-x64.rpm


[gpadmin@mdw ~]$ cat demo_file.txt

THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

this line is the 1st lower case line in this file.

This Line Has All Its First Character Of The Word With Upper Case.


Two lines above this line is empty.

And this is the last line.


[gpadmin@mdw ~]$ 

 



grep 의 기본 syntax 는 다음과 같습니다. 


  grep [OPTIONS] PATTERN [FILE...] 



하나씩 예를 들어가면서 설명하겠습니다. 




1. 하나의 파일에서 한개의 패턴을 검색하여 해당되는 행 출력하기: grep PATTERN [FILE]



[gpadmin@mdw ~]$ grep "line" demo_file.txt

this line is the 1st lower case line in this file.

Two lines above this line is empty.

And this is the last line.

[gpadmin@mdw ~]$ 

 




2. grep 검색 패턴에 맞는 부분은 색깔을 다르게 해서 보기: --color



[gpadmin@mdw ~]$ grep  --color  "line"  demo_file.txt

this line is the 1st lower case line in this file.

Two lines above this line is empty.

And this is the last line.

[gpadmin@mdw ~]$ 

 



grep --color 앞부분에 GREP_COLOR="1;32" 로 패턴과 매칭되는 부분의 색깔을 지정할 수 있습니다. 

  • GREP_COLOR="1;32"  : 초록색 (green)
  • GREP_COLOR=:1;34"  : 보라색 (violet)
  • GREP_COLOR="1;36"  : 하늘색 (light blue)




3. 대소문자 구분없이(case in-sensitive) 패턴 검색하여 행 출력: -i


2번에서는 검색이 안되었던 대문자 "LINE"도 이번에는 검색이 되었습니다. 



[gpadmin@mdw ~]$ grep  --color  -i  "line"  demo_file.txt

THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

this line is the 1st lower case line in this file.

This Line Has All Its First Character Of The Word With Upper Case.

Two lines above this line is empty.

And this is the last line.

[gpadmin@mdw ~]$ 

 




4. 파일 이름에 와이드카드 *(Asterisk)를 사용하여 다수의 파일에서 패턴 검색하여 행 출력: filename*.txt


먼저 cp를 사용하여 demo_file.txt 파일을 demo_file2.txt 라는 파일 이름으로 복사를 해서 2개의 파일을 만들었습니다. 



[gpadmin@mdw ~]$ cp  demo_file.txt  demo_file2.txt




다음으로 파일 이름의 중간에 와이드카드 '*(Asterisk)'를 사용하여 파일 이름 demo_*.txt 에서 * 부분에 무엇이 있든지 간에 이에 해당하는 다수의 파일을 찾아서 패턴 검색 후 출력을 하게 됩니다. 3번에서와는 달리 이번에는 출력이 될 때 파일 이름(demo_file.txt, demo_file2.txt)이 행의 출력 맨 앞 부분에 나타납니다. 



[gpadmin@mdw ~]$ grep --color -i "line" demo_*.txt

demo_file.txt:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

demo_file.txt:this line is the 1st lower case line in this file.

demo_file.txt:This Line Has All Its First Character Of The Word With Upper Case.

demo_file.txt:Two lines above this line is empty.

demo_file.txt:And this is the last line.

demo_file2.txt:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

demo_file2.txt:this line is the 1st lower case line in this file.

demo_file2.txt:This Line Has All Its First Character Of The Word With Upper Case.

demo_file2.txt:Two lines above this line is empty.

demo_file2.txt:And this is the last line.





5. 서브 디렉토리까지 반복적으로 패턴 검색하여 행 출력: * (Asterisk only, without filename)



[gpadmin@mdw ~]$ grep  --color -i  "line"  *

demo_file.txt:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

demo_file.txt:this line is the 1st lower case line in this file.

demo_file.txt:This Line Has All Its First Character Of The Word With Upper Case.

demo_file.txt:Two lines above this line is empty.

demo_file.txt:And this is the last line.

demo_file2.txt:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

demo_file2.txt:this line is the 1st lower case line in this file.

demo_file2.txt:This Line Has All Its First Character Of The Word With Upper Case.

demo_file2.txt:Two lines above this line is empty.

demo_file2.txt:And this is the last line.

greenplum-text-2.1.3-rhel6_x86_64.bin:support, invoicing, and online services. Licensee is responsible for obtaining

greenplum-text-2.1.3-rhel6_x86_64.bin:    if tablog.getNumLines() > 1:

greenplum-text-2.1.3-rhel6_x86_64.bin:        for line in open(to_append_file).readlines():

greenplum-text-2.1.3-rhel6_x86_64.bin:            line = line.strip()

greenplum-text-2.1.3-rhel6_x86_64.bin:            if line.startswith('#') or line == '':

greenplum-text-2.1.3-rhel6_x86_64.bin:            if len(line.split()) != 1:

greenplum-text-2.1.3-rhel6_x86_64.bin:        for line in open(to_append_file).readlines():

greenplum-text-2.1.3-rhel6_x86_64.bin:            line = line.strip()

greenplum-text-2.1.3-rhel6_x86_64.bin:            if line.startswith('#') or line == '':

greenplum-text-2.1.3-rhel6_x86_64.bin:            if line.find('=>') == -1:

greenplum-text-2.1.3-rhel6_x86_64.bin:            parts = line.split('=>')

greenplum-text-2.1.3-rhel6_x86_64.bin:        for line in open(to_append_file).readlines():

greenplum-text-2.1.3-rhel6_x86_64.bin:            line = line.strip()

greenplum-text-2.1.3-rhel6_x86_64.bin:            if line.startswith('#') or line == '':

greenplum-text-2.1.3-rhel6_x86_64.bin:            if line.find('=>') != -1:

greenplum-text-2.1.3-rhel6_x86_64.bin:                parts = line.split('=>')

greenplum-text-2.1.3-rhel6_x86_64.bin:            sys.stdin.readline()

-- 중간 생략 --

Binary file greenplum-text-2.1.3-rhel6_x86_64.tar.gz matches

Binary file jdk-8u161-linux-x64.rpm matches





6. 검색 조건을 뒤집어서(invert), 매칭되지 않는 행을 검색하여 출력: -v



[gpadmin@mdw ~]$ cat demo_file.txt

THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

this line is the 1st lower case line in this file.

This Line Has All Its First Character Of The Word With Upper Case.


Two lines above this line is empty.

And this is the last line.

[gpadmin@mdw ~]$ 

[gpadmin@mdw ~]$ 

[gpadmin@mdw ~]$ grep  -i  -v  "case"  demo_file.txt


Two lines above this line is empty.

And this is the last line.





7. 매칭되는 행의 번호 같이 보기: -n



[gpadmin@mdw ~]$ grep  --color  --n  "case"  demo_file.txt

1:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

2:this line is the 1st lower case line in this file.

3:This Line Has All Its First Character Of The Word With Upper Case.





8. 전체 행이 정확히 일치하는 행만 검색하여 출력: -x



[gpadmin@mdw ~]$ grep  -n  -x  "this line is the 1st lower case line in this file."  demo_file.txt

2:this line is the 1st lower case line in this file.





9. 각 인풋 파일의 매칭되는 행의 개수 출력: -c, --count



[gpadmin@mdw ~]$ grep  -c  "line"  demo_file.txt

3

[gpadmin@mdw ~]$ grep  --count  "line"  demo_file.txt

3





10. 패턴 매칭된 행의 전(Before), 후(After), 중간(Center)의 전/후 행 NUM 개수 만큼 같이 출력하기  : -B NUM, -A NUM, -C NUM



--   10-1. 패턴 매칭된 행의 이전(Before) 행도 NUM 개수 만큼 출력: -B NUM



[gpadmin@mdw ~]$ cat  demo_file.txt

THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

this line is the 1st lower case line in this file.

This Line Has All Its First Character Of The Word With Upper Case.


Two lines above this line is empty.

And this is the last line.


[gpadmin@mdw ~]$ 

[gpadmin@mdw ~]$ grep  --color -n  -B 2 "First"  demo_file.txt

1-THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

2-this line is the 1st lower case line in this file.

3:This Line Has All Its First Character Of The Word With Upper Case.

[gpadmin@mdw ~]$ 

 



--   10-2. 패턴 매칭된 행의 이후(After) 행도 NUM 개수 만큼 출력: -A NUM



[gpadmin@mdw ~]$ cat demo_file.txt

THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

this line is the 1st lower case line in this file.

This Line Has All Its First Character Of The Word With Upper Case.


Two lines above this line is empty.

And this is the last line.


[gpadmin@mdw ~]$ 

[gpadmin@mdw ~]$ grep  --color  -n  -A 2  "First"  demo_file.txt

3:This Line Has All Its First Character Of The Word With Upper Case.

4-

5-Two lines above this line is empty.

[gpadmin@mdw ~]$ 

 



--   10-3. 패턴 매칭된 행을 가운데(Center)에 두고 앞 뒤 행도 NUM 개수 만큼 출력: -C NUM



[gpadmin@mdw ~]$ cat demo_file.txt

THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

this line is the 1st lower case line in this file.

This Line Has All Its First Character Of The Word With Upper Case.


Two lines above this line is empty.

And this is the last line.


[gpadmin@mdw ~]$ 

[gpadmin@mdw ~]$ grep --color -n  -C 2 "First" demo_file.txt

1-THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

2-this line is the 1st lower case line in this file.

3:This Line Has All Its First Character Of The Word With Upper Case.

4-

5-Two lines above this line is empty.

[gpadmin@mdw ~]$ 

 




11. 매칭되는 행의 개수가 NUM 을 넘으면 파일 읽기 중단: -m NUM



[gpadmin@mdw ~]$ grep  --color  -n  -i  -m 2  "line"  demo_file.txt

1:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.

2:this line is the 1st lower case line in this file.

[gpadmin@mdw ~]$ 

 




12. 매칭되는 행 중에서 오직 일치하는(Only matching) 부분만 출력: -o, --only-matching



[gpadmin@mdw ~]$ grep  -n  -o  "First"  demo_file.txt

3:First

[gpadmin@mdw ~]$ grep  -n  --only-matching  "First"  demo_file.txt

3:First

 




13. 행 안에서 패턴과 매칭되는 위치(position)를 출력: -b



[gpadmin@mdw ~]$ grep  -o  -b "First" demo_file.txt

124:First

 




14. 패턴과 매칭되는 파일 이름 출력: -l



[gpadmin@mdw ~]$ grep  -l  "First"  demo_*

demo_file.txt

demo_file2.txt

 




15. grep 옵션 요약 출력 후 exit: --help


오늘 포스팅 내용 중에서 제일 중요한(?), 다른 건 다 까먹어도 이것 하나만 기억하고 있으면 든든한 옵션이 되겠습니다. 

도와주세요 help~!  ^^



[gpadmin@mdw ~]$ grep --help

사용법: grep [옵션]... 패턴 [파일]...

Search for PATTERN in each FILE or standard input.

PATTERN is, by default, a basic regular expression (BRE).

Example: grep -i 'hello world' menu.h main.c


Regexp selection and interpretation:

  -E, --extended-regexp     PATTERN is an extended regular expression (ERE)

  -F, --fixed-strings       PATTERN is a set of newline-separated fixed strings

  -G, --basic-regexp        PATTERN is a basic regular expression (BRE)

  -P, --perl-regexp         PATTERN is a Perl regular expression

  -e, --regexp=PATTERN      use PATTERN for matching

  -f, --file=FILE           obtain PATTERN from FILE

  -i, --ignore-case         ignore case distinctions

  -w, --word-regexp         force PATTERN to match only whole words

  -x, --line-regexp         force PATTERN to match only whole lines

  -z, --null-data           a data line ends in 0 byte, not newline


Miscellaneous:

  -s, --no-messages         suppress error messages

  -v, --invert-match        select non-matching lines

  -V, --version             print version information and exit

      --help                display this help and exit

      --mmap                ignored for backwards compatibility


Output control:

  -m, --max-count=NUM       stop after NUM matches

  -b, --byte-offset         print the byte offset with output lines

  -n, --line-number         print line number with output lines

      --line-buffered       flush output on every line

  -H, --with-filename       print the filename for each match

  -h, --no-filename         suppress the prefixing filename on output

      --label=LABEL         print LABEL as filename for standard input

  -o, --only-matching       show only the part of a line matching PATTERN

  -q, --quiet, --silent     suppress all normal output

      --binary-files=TYPE   assume that binary files are TYPE;

                            TYPE is `binary', `text', or `without-match'

  -a, --text                equivalent to --binary-files=text

  -I                        equivalent to --binary-files=without-match

  -d, --directories=ACTION  how to handle directories;

                            ACTION is `read', `recurse', or `skip'

  -D, --devices=ACTION      how to handle devices, FIFOs and sockets;

                            ACTION is `read' or `skip'

  -R, -r, --recursive       equivalent to --directories=recurse

      --include=FILE_PATTERN  search only files that match FILE_PATTERN

      --exclude=FILE_PATTERN  skip files and directories matching FILE_PATTERN

      --exclude-from=FILE   skip files matching any file pattern from FILE

      --exclude-dir=PATTERN  directories that match PATTERN will be skipped.

  -L, --files-without-match  print only names of FILEs containing no match

  -l, --files-with-matches  print only names of FILEs containing matches

  -c, --count               print only a count of matching lines per FILE

  -T, --initial-tab         make tabs line up (if needed)

  -Z, --null                print 0 byte after FILE name


Context control:

  -B, --before-context=NUM  print NUM lines of leading context

  -A, --after-context=NUM   print NUM lines of trailing context

  -C, --context=NUM         print NUM lines of output context

  -NUM                      same as --context=NUM

      --color[=WHEN],

      --colour[=WHEN]       use markers to highlight the matching strings;

                            WHEN is `always', `never', or `auto'

  -U, --binary              do not strip CR characters at EOL (MSDOS)

  -u, --unix-byte-offsets   report offsets as if CRs were not there (MSDOS)


`egrep' means `grep -E'.  `fgrep' means `grep -F'.

Direct invocation as either `egrep' or `fgrep' is deprecated.

With no FILE, or when FILE is -, read standard input.  If less than two FILEs

are given, assume -h.  Exit status is 0 if any line was selected, 1 otherwise;

if any error occurs and -q was not given, the exit status is 2.


Report bugs to: bug-grep@gnu.org

GNU Grep home page: <http://www.gnu.org/software/grep/>

General help using GNU software: <http://www.gnu.org/gethelp/>

[gpadmin@mdw ~]$ 

 



다음번 포스팅에서는 정규 표현식(regular expression)에 대해서 알아보겠습니다. 


많은 도움이 되었기를 바랍니다. 


* Reference: https://www.computerhope.com/unix/ugrep.htm








728x90
반응형
Posted by Rfriend
,

이번 포스팅에서는 Linux 위에 설치된 DB에 접속해서 분석을 하려고 했을 때 자주 사용하는 가장 기본적인 Linux Shell Script 를 소개하겠습니다. 

 

참고로, 저는 MacBook Pro, MacOS sierra를 사용하고 있구요, VMware에 Linux CentOS 6.6 version을 설치한 후, CentOS에 Greenplum Database 4.3.18 버전 (Master node 1, Segment node 1, Segment node 2)를 설치하여 사용하고 있습니다. 

 

아래의 예시는 Greenplum DB 에 연결해서 사용하는 Linux script 간단 예제가 되겠습니다. 

 

1. 파일 리스트 확인하기: ls

 

script description 
  ls   list files 
  ls -l   list more info
  ls -a   list all
  ls -t   list by time
  ls -lat   combine (list more info & all & by time)
  ls *sh   파일 이름의 끝부분에 'sh'를 포함한 모든 파일 list

 

 

2. Terminal Console 창 지우기: clear

 

 

3. 파일의 텍스트 내용 확인하기: cat file_name

 



[gpadmin@mdw ~]$ ls -al
합계 544856
drwx------  8 gpadmin gpadmin      4096 2017-12-06 19:57 .
drwxr-xr-x. 4 root    root         4096 2017-11-08 20:04 ..
-rw-------  1 gpadmin gpadmin      4462 2017-12-09 00:21 .bash_history
-rw-r--r--  1 gpadmin gpadmin        18 2014-10-16 22:56 .bash_logout
-rw-r--r--  1 gpadmin gpadmin       398 2017-12-04 13:45 .bash_profile
-rw-r--r--  1 gpadmin gpadmin       124 2014-10-16 22:56 .bashrc
-rw-rw-r--  1 gpadmin gpadmin        28 2017-11-08 20:12 .gphostcache
drwxrwxr-x  2 gpadmin gpadmin      4096 2017-12-04 13:43 .oracle_jre_usage
-rw-------  1 gpadmin gpadmin        32 2017-11-08 20:21 .pgpass
-rw-rw-r--  1 gpadmin gpadmin         0 2017-11-08 20:21 .pgpass.1510140115
-rw-------  1 gpadmin gpadmin      2066 2017-12-09 00:21 .psql_history
drwx------  2 gpadmin gpadmin      4096 2017-11-08 20:05 .ssh
drwxrwxr-x  2 gpadmin gpadmin      4096 2017-12-06 20:10 command
drwxrwxr-x  2 gpadmin gpadmin      4096 2017-12-08 09:45 gpAdminLogs
drwxr-xr-x  2 gpadmin gpadmin      4096 2017-11-08 20:14 gpconfigs
drwxrwxr-x  2 gpadmin gpadmin      4096 2017-12-07 20:29 gptext
-rwxr-xr-x  1 gpadmin gpadmin      2916 2017-12-04 13:42 gptext_install_config
-rwxr-xr-x  1 gpadmin gpadmin 192264305 2017-09-19 09:20 greenplum-text-2.1.3-rhel6_x86_64.bin
-rw-r--r--  1 gpadmin gpadmin 191435368 2017-12-04 12:36 greenplum-text-2.1.3-rhel6_x86_64.tar.gz
-rw-r--r--  1 gpadmin gpadmin 174157387 2017-12-04 13:28 jdk-8u161-linux-x64.rpm
[gpadmin@mdw ~]$
[gpadmin@mdw ~]$
[gpadmin@mdw ~]$ cd command
[gpadmin@mdw command]$ ls -al
합계 32
drwxrwxr-x 2 gpadmin gpadmin 4096 2017-12-06 20:10 .
drwx------ 8 gpadmin gpadmin 4096 2017-12-06 19:57 ..
-rw-rw-r-- 1 gpadmin gpadmin  830 2017-12-06 20:04 check_tb.txt
-rw-rw-r-- 1 gpadmin gpadmin   41 2017-12-06 20:10 gp_command.txt
[gpadmin@mdw command]$
[gpadmin@mdw command]$
[gpadmin@mdw command]$ cat gp_command.txt
# starting gpdb
gpstart -a
# stopping gpdb
gpstop -af


[gpadmin@mdw command]$ 
[gpadmin@mdw command]$ 
 

 

 

>> CentOS 버전 확인, Local Host 확인 하기: cat /etc/redhat-release, cat /etc/hosts

 


[root@4e6b947be0d6 setup]# su - gpadmin
-bash-4.1$ cat /etc/redhat-release
CentOS release 6.7 (Final)


-bash-4.1$ cat /etc/hosts
127.0.0.1       localhost
::1      localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2     4e6b947be0d6
127.0.0.1 72ba20be3774


-bash-4.1$ exit
logout
 

 

>> 권한 부여: chown



[root@4e6b947be0d6 setup]# chown gpadmin:gpadmin plr-2.3.1-GPDB4.3-rhel6-x86_64.gppkg
 

 

 

 

4. 파일을 열고 편집하기: vi

 

script description 
  vi filename   편집창에 file 열기 
  esc 을 누른 후에 :q!   저장하지 않고 vi 편집창 닫기 (not save and exit)
  esc 을 누른 후에 :wq!   변경사항을 저장하고 vi 편집창 닫기 (save and exit)

 

 

 

5. Greenplum Database에 연결하기: ssh user_id@ip

 



[MacBook-Pro:~ rfriend$ ssh gpadmin@192.168.188.131
gpadmin@192.168.188.131's password: xxxxxxxx 
Last login: Fri Dec  8 22:23:24 2017 from 192.168.188.1
[gpadmin@mdw ~]$ 
 

 

 

 

6. con.sh 파일에 저장해놓고 파일 실행으로 Greenplum Database에 연결하기

 

여러개의 DB에 바꾸어 가면서 접속해야 하는 경우 IP를 모두 외우고 일일이 치기 힘들 때 유용합니다. 아래 예제는 gpdb, gpdb2, gpdb3 의 3개 DB에 접속하기 편하도록 DB별로 alias를 준 예제입니다.

 

chmod +x con.sh 는 con.sh 파일에 들어있는 script가 나중에 실행될 수 있도록 파일 형태를 바꾸어 줍니다. (chmod: change mode)

 

./con.sh gpdb 는 현재 경로의 con.sh 파일 안의 gpdb 라는 이름 아래에 있는 script (즉, ssh gpadmin@192.168.188.131)를 실행하라는 뜻입니다. 

 



[MacBook-Pro:~ rfriend$ cd


[MacBook-Pro:~ rfriend$ ls *.sh
con.sh


[MacBook-Pro:~ rfriend$ vi con.sh


#!/bin/bash
  
case $1 in
gpdb)
ssh gpadmin@192.168.188.131
;;


gpdb2)
ssh gpadmin@192.168.188.141
;;


gpdb3)
ssh gpadmin@192.168.188.151
;;


*)


echo $"Usage: $0 target"
exit
esac
~
~
:wq!




[MacBook-Pro:~ rfriend$ chmod +x con.sh


[MacBook-Pro:~ rfriend$ ./con.sh gpdb
gpadmin@192.168.188.131's password:xxxxxxxx 
Last login: Fri Dec  8 23:05:01 2017 from 192.168.188.1
[gpadmin@mdw ~]$ 

 

 

 

7. Greenplum DB 시작하기, 멈추기: gpstart, spstop

 

script descriptionn 
  gpstart   Greenplum Database 시작하기
  gpstart -a   Greenplum Database 시작할 때 자동으로 YES 입력하게 해서 시작하기
  gpstart -m   Greenplum Database 마스터 노드만 maintenance mode로 시작하기
  gpstop -af   실행 중인 모든 코드를 멈추고 Greenplum Database 내리기(중단하기)
  gpstop -r   Greenplum Database를 내렸다가 다시 시작하기
  gpstop -u   Greenplum Database 시스템을 방해하는 것 없이 
  configuration file 변경 사항만 다시 로딩하기 

 

 

 

8. Greenplum DB 상태 확인하기: gpstate

 

script description 
  gpstate   Greenplum Database 작동 상태 확인하기
  : PostgreSQL, Greenplum DB의 version 확인 가능함
  gpstate -e   Greenplum Database primary/segment mirror 세부 이슈 확인하기


Show details on primary/mirror segment pairs that have potential issues such as 1) the active segment is running in change tracking mode, meaning a segment is down 2) the active segment is in resynchronization mode, meaning it is catching up changes to the mirror 3) a segment is not in its preferred role, for example, a segment that was a primary at system initialization time is now acting as a mirror, meaning you may have one or more segment hosts with unbalanced processing load.

 



[MacBook-Pro:~ rfriend$ ssh gpadmin@192.168.188.131
gpadmin@192.168.188.131's password: xxxxxxxx 
Last login: Fri Dec  8 23:48:08 2017 from 192.168.188.1


[gpadmin@mdw ~]$ gpstate
20171208:23:50:17:109233 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: 
20171208:23:50:17:109233 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.18.0 build 1'
20171208:23:50:17:109233 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.18.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 22 2017 18:54:31'
20171208:23:50:17:109233 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20171208:23:50:17:109233 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments...
. 
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-Greenplum instance status summary
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Master instance                                = Active
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Master standby                                 = No master standby configured
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total segment instance count from metadata     = 4
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Primary Segment Status
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total primary segments                         = 4
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total primary segment valid (at master)        = 4
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total primary segment failures (at master)     = 0
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files missing   = 0
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files found     = 4
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing    = 0
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found      = 4
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total number of /tmp lock files missing        = 0
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total number of /tmp lock files found          = 4
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total number postmaster processes missing      = 0
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Total number postmaster processes found        = 4
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Mirror Segment Status
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-   Mirrors not configured on this array
20171208:23:50:18:109233 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
[gpadmin@mdw ~]$ 
[gpadmin@mdw ~]$ 
[gpadmin@mdw ~]$ gpstate -e
20171208:23:50:28:109293 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e
20171208:23:50:28:109293 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.18.0 build 1'
20171208:23:50:28:109293 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.18.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 22 2017 18:54:31'
20171208:23:50:28:109293 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20171208:23:50:28:109293 gpstate:mdw:gpadmin-[INFO]:-Physical mirroring is not configured
[gpadmin@mdw ~]$ 
[gpadmin@mdw ~]$ 
 

 

 

 

9. Greenplum DB에서 ZooKeeper, GPText 시작하기

 

script description 
  zkManager start   ZooKeeper 시작하기 
  (아래의 gptext-start 를 하기 전에 ZooKeeper를 먼저 실행시켜야 함)
  gptext-start   GPText 시작하기

 

 

 

10. Greenplum DB에서 나오기: exit

 



[gpadmin@mdw ~]$ 
[gpadmin@mdw ~]$ exit
logout
Connection to 192.168.188.131 closed.
[MacBook-Pro:~ rfriend$ 
 

 

 

 

11. 터미널에서 psql 로 Schema, Table 조회하기, SQL로 조회하기, psql 종료

 

script description 
  $ psql   터미널에서 psql 시작하기
  # \l   Database 리스트 보기
  # \dn   Schema 리스트 보기
  # \dt   Table 리스트 보기
  # \dt public.e*   public Schema안의 Table 중에서

  'e'로 시작하는 모든 Table 리스트 보기
  # select * from table_name   table에서 모든 변수의 값 조회하기
  # alter role gpadmin password 'newpassword';   Password 변경
  # \q   psql 종료

 



[MacBook-Pro:~ rfriend$ ssh gpadmin@192.168.188.131
gpadmin@192.168.188.131's password: xxxxxxxx
Last login: Fri Dec  8 23:50:12 2017 from 192.168.188.1
[gpadmin@mdw ~]$ psql
psql (8.2.15)
Type "help" for help.


testdb=# \l
                  List of databases
   Name    |  Owner  | Encoding |  Access privileges 
-----------+---------+----------+---------------------
 gpperfmon | gpadmin | UTF8     | gpadmin=CTc/gpadmin
                                : =c/gpadmin
 postgres  | gpadmin | UTF8     |
 template0 | gpadmin | UTF8     | =c/gpadmin         
                                : gpadmin=CTc/gpadmin
 template1 | gpadmin | UTF8     | =c/gpadmin         
                                : gpadmin=CTc/gpadmin
 testdb    | gpadmin | UTF8     |
(5 rows)



testdb=# \dn
       List of schemas
        Name        |  Owner  
--------------------+---------
 gp_toolkit         | gpadmin
 gpdemo             | gpadmin
 gptext             | gpadmin
 information_schema | gpadmin
 madlib             | gpadmin
 pg_aoseg           | gpadmin
 pg_bitmapindex     | gpadmin
 pg_catalog         | gpadmin
 pg_toast           | gpadmin
 plp                | gpadmin
 public             | gpadmin
 test               | gpadmin
(12 rows)


testdb=# \dt public.e*
                 List of relations
 Schema |     Name     | Type  |  Owner  | Storage 
--------+--------------+-------+---------+---------
 public | edge         | table | gpadmin | heap
 public | employeelist | table | gpadmin | heap
 public | enron        | table | gpadmin | heap
(3 rows)




testdb=# select eid, firstname, lastname, email_id from employeelist limit 5;
 eid | firstname | lastname |          email_id          
-----+-----------+----------+----------------------------
  19 | Lindy     | Donoho   | lindy.donoho@enron.com
 115 | Lisa      | Gang     | lisa.gang@enron.com
 127 | Kenneth   | Lay      | kenneth.lay@enron.com
 107 | Louise    | Kitchen  | louise.kitchen@enron.com
  43 | Larry     | Campbell | larry.f.campbell@enron.com
(5 rows)
testdb=# 
testdb=#


testdb=# alter role gpadmin password 'newpassword';



testdb=# \q
[gpadmin@mdw ~]$ 

 

 

 

12. IP, 네트워크 인터페이스 확인하기: ifconfig

 



[gpadmin@mdw ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:4E:9B:86  
          inet addr:192.168.188.131  Bcast:192.168.188.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe4e:9b86/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1069849 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1586899 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:973723679 (928.6 MiB)  TX bytes:1690761826 (1.5 GiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:3683 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3683 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:519553 (507.3 KiB)  TX bytes:519553 (507.3 KiB)

 

 

 

 

13. Downloads 폴더에 있는 파일을 GPDB 폴더 안으로 원격 복사하기

      : scp file_name gpadmin@ip:directory

 

pwd 로 현재 작업 경로 (current working directory)를 확인하고, 

cd Downloads 로 파일을 다운로드 받아놓은 Downloads 폴더로 작업 디렉토리를 변경 후에, 

scp file_name directory 로 파일을 복사하는 예제입니다. 

 



[MacBook-Pro:~ rfriend$ pwd


/Users/rfriend
[MacBook-Pro:~ rfriend$ cd Downloads/
[MacBook-Pro:Downloads rfriend$ scp file_name gpadmin@192.168.188.131:/home/gpadmin/gpdb
gpadmin@192.168.188.131's password: xxxxxxxx 
file_name                                         100%   72MB  79.0MB/s   00:00    


[MacBook-Pro:Downloads rfriend$ 
 

 

 

14. 문자열 패턴 검색: grep

은 다음번 포스팅에서 예를 들어서 설명하겠습니다. 

 

 

15. GPDB 버전, master, segment node configuration, MADlib 버전, PL/Language 확인하기

 



-- GPDB 버전 확인


# select version()


PostgreSQL 8.2.15 (Greenplum Database 4.3.18.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 22 2017 18:54:31




-- GPDB master, segment node configuration 확인
# select * from pg_catalog.gp_segment_configuration
 





-- MADlib 버전 확인


# select madlib.version()
MADlib version: 1.12, git revision: rc/1.12-rc1, cmake configuration time: Wed Aug 23 22:33:09 UTC 2017, build type: Release, build system: Linux-2.6.18-238.27.1.el5.hotfix.bz516490, C compiler: gcc 4.4.0, C++ compiler: g++ 4.4.0


select * from madlib.migrationhistory








-- PL/Language 확인


# select * from pg_catalog.pg_language
 




 

 

많은 도움이 되었기를 바랍니다. 

 

728x90
반응형
Posted by Rfriend
,