2018年10月26日 星期五

regex(正規表達式)

正規表達式(regex或簡稱regexp)
是用於描述搜索模式的特殊文本字符串。  您可能熟悉萬用字元,例如* .txt,以查找文件管理器中的所有文本文件。 正規表達式等於 ^.*\.txt$

常用字元
 (前面的字元可有可無)
 (跳脫)
.   Match any single character(符合任何單個字元,例如a.b可符合acb,a1b)
\.  (cancel regex dot)
*  (the previous character can be null,repeated 1 time,2 time,any time)
+  (the previous character can be null,repeated 1 time,2 time,any time)
Match the empty string that occurs at the beginning of a line or string(字串開始)
$   Match the empty string that occurs at the end of line(字串結束)
\b  (邊界)
\B
\d or [0-9] Match any single digit.
\D Match any single non-digit character.
\s or \t\n  (任何一個空白字元)
[xyz]
[^xyz]
[a-z]
[^a-z]
[A-Z]


e.g.
grep -i word regex.txt
(find the words in regex.txt contain 'word' string)

grep -E '^word' regex.txt
(The option -E specifies a regex. )
(find the words begin with word.

e.g.
grep ^root /etc/passwd
(find each line start with 'root')

grep :$ /etc/group
(find each line end with)

grep '\<c...h\>'
(find start with c end with h and 5 letters)

e.g.
find 'color' or 'colour' in a text file
use colou?r

e.g.
find 'ports' or 'port' in export,portable,important
use \bports?\b

e.g.
find TPE,KHH,LAX in a text file
use \b[A-Z][A-Z][A-Z]\b

e.g. 
find phone numbers
use 09\d\d-?\d\d\d-?\d\d\d

e.g.
search all ip address
use \d+\.\d+\.\d+\.\d+
(\. means escape dot)

e.g.
delete  all space in a text file
delete ^\s*$

e.g.
find email address
\ b [A-Z0-9 ._%+ - ] + @ [A-Z0-9 .-] + \. [AZ] {2,} \ b