이번 포스팅에서는 리눅스의 grep 을 사용하여 하나의 파일이나 디렉토리 안의 여러개의 파일에서 특정 패턴에 매칭되는 행을 찾아서 출력하는 다양한 옵션, 방법을 소개하겠습니다. 특히 정규 표현식(regular expression) 까지 함께 사용하면 매우 강력하게 파일 내 특정 패턴을 찾을 수 있어서 굉장히 유용합니다.
[ LINUX grep command : 문자열 패턴 검색 후 출력 ]
먼저, 예제로 사용할 수 있도록 demo_file.txt의 텍스트 파일을 준비하였습니다.
[MacBook-Pro:~ rfriend$ ssh gpadmin@192.168.188.131 gpadmin@192.168.188.131's password: xxxxxxxx Last login: Sat Dec 9 01:06:33 2017 from 192.168.188.1 [gpadmin@mdw ~]$ ls -al 합계 544860 drwx------ 8 gpadmin gpadmin 4096 2017-12-09 01:12 . drwxr-xr-x. 4 root root 4096 2017-11-08 20:04 .. -rw------- 1 gpadmin gpadmin 4792 2017-12-09 01:10 .bash_history -rw-r--r-- 1 gpadmin gpadmin 18 2014-10-16 22:56 .bash_logout -rw-r--r-- 1 gpadmin gpadmin 398 2017-12-04 13:45 .bash_profile -rw-r--r-- 1 gpadmin gpadmin 124 2014-10-16 22:56 .bashrc -rw-rw-r-- 1 gpadmin gpadmin 28 2017-11-08 20:12 .gphostcache drwxrwxr-x 2 gpadmin gpadmin 4096 2017-12-04 13:43 .oracle_jre_usage -rw------- 1 gpadmin gpadmin 32 2017-11-08 20:21 .pgpass -rw-rw-r-- 1 gpadmin gpadmin 0 2017-11-08 20:21 .pgpass.1510140115 -rw------- 1 gpadmin gpadmin 2066 2017-12-09 00:21 .psql_history drwx------ 2 gpadmin gpadmin 4096 2017-11-08 20:05 .ssh drwxrwxr-x 2 gpadmin gpadmin 4096 2017-12-06 20:10 command -rw-r--r-- 1 gpadmin gpadmin 233 2017-12-09 01:12 demo_file.txt drwxrwxr-x 2 gpadmin gpadmin 4096 2017-12-08 09:45 gpAdminLogs drwxr-xr-x 2 gpadmin gpadmin 4096 2017-11-08 20:14 gpconfigs drwxrwxr-x 2 gpadmin gpadmin 4096 2017-12-07 20:29 gptext -rwxr-xr-x 1 gpadmin gpadmin 2916 2017-12-04 13:42 gptext_install_config -rwxr-xr-x 1 gpadmin gpadmin 192264305 2017-09-19 09:20 greenplum-text-2.1.3-rhel6_x86_64.bin -rw-r--r-- 1 gpadmin gpadmin 191435368 2017-12-04 12:36 greenplum-text-2.1.3-rhel6_x86_64.tar.gz -rw-r--r-- 1 gpadmin gpadmin 174157387 2017-12-04 13:28 jdk-8u161-linux-x64.rpm [gpadmin@mdw ~]$ cat demo_file.txt THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. this line is the 1st lower case line in this file. This Line Has All Its First Character Of The Word With Upper Case. Two lines above this line is empty. And this is the last line.
[gpadmin@mdw ~]$
|
grep 의 기본 syntax 는 다음과 같습니다.
grep [OPTIONS] PATTERN [FILE...] |
하나씩 예를 들어가면서 설명하겠습니다.
1. 하나의 파일에서 한개의 패턴을 검색하여 해당되는 행 출력하기: grep PATTERN [FILE]
[gpadmin@mdw ~]$ grep "line" demo_file.txt this line is the 1st lower case line in this file. Two lines above this line is empty. And this is the last line. [gpadmin@mdw ~]$
|
2. grep 검색 패턴에 맞는 부분은 색깔을 다르게 해서 보기: --color
[gpadmin@mdw ~]$ grep --color "line" demo_file.txt this line is the 1st lower case line in this file. Two lines above this line is empty. And this is the last line.
[gpadmin@mdw ~]$
|
grep --color 앞부분에 GREP_COLOR="1;32" 로 패턴과 매칭되는 부분의 색깔을 지정할 수 있습니다.
- GREP_COLOR="1;32" : 초록색 (green)
- GREP_COLOR=:1;34" : 보라색 (violet)
- GREP_COLOR="1;36" : 하늘색 (light blue)
3. 대소문자 구분없이(case in-sensitive) 패턴 검색하여 행 출력: -i
2번에서는 검색이 안되었던 대문자 "LINE"도 이번에는 검색이 되었습니다.
[gpadmin@mdw ~]$ grep --color -i "line" demo_file.txt THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. this line is the 1st lower case line in this file. This Line Has All Its First Character Of The Word With Upper Case. Two lines above this line is empty. And this is the last line.
[gpadmin@mdw ~]$
|
4. 파일 이름에 와이드카드 *(Asterisk)를 사용하여 다수의 파일에서 패턴 검색하여 행 출력: filename*.txt
먼저 cp를 사용하여 demo_file.txt 파일을 demo_file2.txt 라는 파일 이름으로 복사를 해서 2개의 파일을 만들었습니다.
[gpadmin@mdw ~]$ cp demo_file.txt demo_file2.txt
|
다음으로 파일 이름의 중간에 와이드카드 '*(Asterisk)'를 사용하여 파일 이름 demo_*.txt 에서 * 부분에 무엇이 있든지 간에 이에 해당하는 다수의 파일을 찾아서 패턴 검색 후 출력을 하게 됩니다. 3번에서와는 달리 이번에는 출력이 될 때 파일 이름(demo_file.txt, demo_file2.txt)이 행의 출력 맨 앞 부분에 나타납니다.
[gpadmin@mdw ~]$ grep --color -i "line" demo_*.txt demo_file.txt:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. demo_file.txt:this line is the 1st lower case line in this file. demo_file.txt:This Line Has All Its First Character Of The Word With Upper Case. demo_file.txt:Two lines above this line is empty. demo_file.txt:And this is the last line. demo_file2.txt:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. demo_file2.txt:this line is the 1st lower case line in this file. demo_file2.txt:This Line Has All Its First Character Of The Word With Upper Case. demo_file2.txt:Two lines above this line is empty. demo_file2.txt:And this is the last line. |
5. 서브 디렉토리까지 반복적으로 패턴 검색하여 행 출력: * (Asterisk only, without filename)
[gpadmin@mdw ~]$ grep --color -i "line" * demo_file.txt:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. demo_file.txt:this line is the 1st lower case line in this file. demo_file.txt:This Line Has All Its First Character Of The Word With Upper Case. demo_file.txt:Two lines above this line is empty. demo_file.txt:And this is the last line. demo_file2.txt:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. demo_file2.txt:this line is the 1st lower case line in this file. demo_file2.txt:This Line Has All Its First Character Of The Word With Upper Case. demo_file2.txt:Two lines above this line is empty. demo_file2.txt:And this is the last line. greenplum-text-2.1.3-rhel6_x86_64.bin:support, invoicing, and online services. Licensee is responsible for obtaining greenplum-text-2.1.3-rhel6_x86_64.bin: if tablog.getNumLines() > 1: greenplum-text-2.1.3-rhel6_x86_64.bin: for line in open(to_append_file).readlines(): greenplum-text-2.1.3-rhel6_x86_64.bin: line = line.strip() greenplum-text-2.1.3-rhel6_x86_64.bin: if line.startswith('#') or line == '': greenplum-text-2.1.3-rhel6_x86_64.bin: if len(line.split()) != 1: greenplum-text-2.1.3-rhel6_x86_64.bin: for line in open(to_append_file).readlines(): greenplum-text-2.1.3-rhel6_x86_64.bin: line = line.strip() greenplum-text-2.1.3-rhel6_x86_64.bin: if line.startswith('#') or line == '': greenplum-text-2.1.3-rhel6_x86_64.bin: if line.find('=>') == -1: greenplum-text-2.1.3-rhel6_x86_64.bin: parts = line.split('=>') greenplum-text-2.1.3-rhel6_x86_64.bin: for line in open(to_append_file).readlines(): greenplum-text-2.1.3-rhel6_x86_64.bin: line = line.strip() greenplum-text-2.1.3-rhel6_x86_64.bin: if line.startswith('#') or line == '': greenplum-text-2.1.3-rhel6_x86_64.bin: if line.find('=>') != -1: greenplum-text-2.1.3-rhel6_x86_64.bin: parts = line.split('=>')
greenplum-text-2.1.3-rhel6_x86_64.bin: sys.stdin.readline() -- 중간 생략 -- Binary file greenplum-text-2.1.3-rhel6_x86_64.tar.gz matches Binary file jdk-8u161-linux-x64.rpm matches
|
6. 검색 조건을 뒤집어서(invert), 매칭되지 않는 행을 검색하여 출력: -v
[gpadmin@mdw ~]$ cat demo_file.txt THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. this line is the 1st lower case line in this file. This Line Has All Its First Character Of The Word With Upper Case. Two lines above this line is empty. And this is the last line. [gpadmin@mdw ~]$ [gpadmin@mdw ~]$ [gpadmin@mdw ~]$ grep -i -v "case" demo_file.txt Two lines above this line is empty. And this is the last line. |
7. 매칭되는 행의 번호 같이 보기: -n
[gpadmin@mdw ~]$ grep --color -i -n "case" demo_file.txt 1:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. 2:this line is the 1st lower case line in this file. 3:This Line Has All Its First Character Of The Word With Upper Case.
|
8. 전체 행이 정확히 일치하는 행만 검색하여 출력: -x
[gpadmin@mdw ~]$ grep -n -x "this line is the 1st lower case line in this file." demo_file.txt 2:this line is the 1st lower case line in this file.
|
9. 각 인풋 파일의 매칭되는 행의 개수 출력: -c, --count
[gpadmin@mdw ~]$ grep -c "line" demo_file.txt 3 [gpadmin@mdw ~]$ grep --count "line" demo_file.txt 3
|
10. 패턴 매칭된 행의 전(Before), 후(After), 중간(Center)의 전/후 행 NUM 개수 만큼 같이 출력하기 : -B NUM, -A NUM, -C NUM
-- 10-1. 패턴 매칭된 행의 이전(Before) 행도 NUM 개수 만큼 출력: -B NUM
[gpadmin@mdw ~]$ cat demo_file.txt THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. this line is the 1st lower case line in this file. This Line Has All Its First Character Of The Word With Upper Case. Two lines above this line is empty. And this is the last line. [gpadmin@mdw ~]$ [gpadmin@mdw ~]$ grep --color -n -B 2 "First" demo_file.txt 1-THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. 2-this line is the 1st lower case line in this file. 3:This Line Has All Its First Character Of The Word With Upper Case.
[gpadmin@mdw ~]$
|
-- 10-2. 패턴 매칭된 행의 이후(After) 행도 NUM 개수 만큼 출력: -A NUM
[gpadmin@mdw ~]$ cat demo_file.txt THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. this line is the 1st lower case line in this file. This Line Has All Its First Character Of The Word With Upper Case. Two lines above this line is empty. And this is the last line. [gpadmin@mdw ~]$ [gpadmin@mdw ~]$ grep --color -n -A 2 "First" demo_file.txt 3:This Line Has All Its First Character Of The Word With Upper Case. 4- 5-Two lines above this line is empty.
[gpadmin@mdw ~]$
|
-- 10-3. 패턴 매칭된 행을 가운데(Center)에 두고 앞 뒤 행도 NUM 개수 만큼 출력: -C NUM
[gpadmin@mdw ~]$ cat demo_file.txt THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. this line is the 1st lower case line in this file. This Line Has All Its First Character Of The Word With Upper Case. Two lines above this line is empty. And this is the last line. [gpadmin@mdw ~]$ [gpadmin@mdw ~]$ grep --color -n -C 2 "First" demo_file.txt 1-THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. 2-this line is the 1st lower case line in this file. 3:This Line Has All Its First Character Of The Word With Upper Case. 4- 5-Two lines above this line is empty.
[gpadmin@mdw ~]$
|
11. 매칭되는 행의 개수가 NUM 을 넘으면 파일 읽기 중단: -m NUM
[gpadmin@mdw ~]$ grep --color -n -i -m 2 "line" demo_file.txt 1:THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE. 2:this line is the 1st lower case line in this file.
[gpadmin@mdw ~]$
|
12. 매칭되는 행 중에서 오직 일치하는(Only matching) 부분만 출력: -o, --only-matching
[gpadmin@mdw ~]$ grep -n -o "First" demo_file.txt 3:First [gpadmin@mdw ~]$ grep -n --only-matching "First" demo_file.txt
3:First
|
13. 행 안에서 패턴과 매칭되는 위치(position)를 출력: -b
[gpadmin@mdw ~]$ grep -o -b "First" demo_file.txt
124:First
|
14. 패턴과 매칭되는 파일 이름 출력: -l
[gpadmin@mdw ~]$ grep -l "First" demo_* demo_file.txt demo_file2.txt
|
15. grep 옵션 요약 출력 후 exit: --help
오늘 포스팅 내용 중에서 제일 중요한(?), 다른 건 다 까먹어도 이것 하나만 기억하고 있으면 든든한 옵션이 되겠습니다.
도와주세요 help~! ^^
[gpadmin@mdw ~]$ grep --help 사용법: grep [옵션]... 패턴 [파일]... Search for PATTERN in each FILE or standard input. PATTERN is, by default, a basic regular expression (BRE). Example: grep -i 'hello world' menu.h main.c Regexp selection and interpretation: -E, --extended-regexp PATTERN is an extended regular expression (ERE) -F, --fixed-strings PATTERN is a set of newline-separated fixed strings -G, --basic-regexp PATTERN is a basic regular expression (BRE) -P, --perl-regexp PATTERN is a Perl regular expression -e, --regexp=PATTERN use PATTERN for matching -f, --file=FILE obtain PATTERN from FILE -i, --ignore-case ignore case distinctions -w, --word-regexp force PATTERN to match only whole words -x, --line-regexp force PATTERN to match only whole lines -z, --null-data a data line ends in 0 byte, not newline Miscellaneous: -s, --no-messages suppress error messages -v, --invert-match select non-matching lines -V, --version print version information and exit --help display this help and exit --mmap ignored for backwards compatibility Output control: -m, --max-count=NUM stop after NUM matches -b, --byte-offset print the byte offset with output lines -n, --line-number print line number with output lines --line-buffered flush output on every line -H, --with-filename print the filename for each match -h, --no-filename suppress the prefixing filename on output --label=LABEL print LABEL as filename for standard input -o, --only-matching show only the part of a line matching PATTERN -q, --quiet, --silent suppress all normal output --binary-files=TYPE assume that binary files are TYPE; TYPE is `binary', `text', or `without-match' -a, --text equivalent to --binary-files=text -I equivalent to --binary-files=without-match -d, --directories=ACTION how to handle directories; ACTION is `read', `recurse', or `skip' -D, --devices=ACTION how to handle devices, FIFOs and sockets; ACTION is `read' or `skip' -R, -r, --recursive equivalent to --directories=recurse --include=FILE_PATTERN search only files that match FILE_PATTERN --exclude=FILE_PATTERN skip files and directories matching FILE_PATTERN --exclude-from=FILE skip files matching any file pattern from FILE --exclude-dir=PATTERN directories that match PATTERN will be skipped. -L, --files-without-match print only names of FILEs containing no match -l, --files-with-matches print only names of FILEs containing matches -c, --count print only a count of matching lines per FILE -T, --initial-tab make tabs line up (if needed) -Z, --null print 0 byte after FILE name Context control: -B, --before-context=NUM print NUM lines of leading context -A, --after-context=NUM print NUM lines of trailing context -C, --context=NUM print NUM lines of output context -NUM same as --context=NUM --color[=WHEN], --colour[=WHEN] use markers to highlight the matching strings; WHEN is `always', `never', or `auto' -U, --binary do not strip CR characters at EOL (MSDOS) -u, --unix-byte-offsets report offsets as if CRs were not there (MSDOS) `egrep' means `grep -E'. `fgrep' means `grep -F'. Direct invocation as either `egrep' or `fgrep' is deprecated. With no FILE, or when FILE is -, read standard input. If less than two FILEs are given, assume -h. Exit status is 0 if any line was selected, 1 otherwise; if any error occurs and -q was not given, the exit status is 2. Report bugs to: bug-grep@gnu.org GNU Grep home page: <http://www.gnu.org/software/grep/> General help using GNU software: <http://www.gnu.org/gethelp/>
[gpadmin@mdw ~]$
|
다음번 포스팅에서는 정규 표현식(regular expression)에 대해서 알아보겠습니다.
많은 도움이 되었기를 바랍니다.
* Reference: https://www.computerhope.com/unix/ugrep.htm
'Greenplum and PostgreSQL Database' 카테고리의 다른 글
[Greenplum DB] Greenplum DB, MADlib, PL/R, PL/Python을 Docker Image를 이용하여 환경구성 하기 (0) | 2018.08.13 |
---|---|
[Greenplum DB] Greenplum DB에 MADlib, PL/R, PL/Python 설치한 Docker Image 만들어서 Docker Hub에 올리기 (0) | 2018.08.13 |
[Greenplum DB] 세계 최초 오픈소스 대용량 데이터 병렬 처리 데이터 분석 플랫폼, Greenplum Database (GPDB) (3) | 2018.08.08 |
리눅스 정규 표현식 (Linux Regular Expression) (8) | 2018.08.03 |
Linux Shell Script 기본 사용법 (0) | 2018.07.29 |