r - How do I deal with special characters like \^$.?*|+()[{ in my regex? -


i want match regular expression special character, \^$.?*|+()[{. tried:

x <- "a[b" grepl("[", x) ## error: invalid regular expression '[', reason 'missing ']'' 

(equivalently stringr::str_detect(x, "[") or stringi::stri_detect_regex(x, "[").)

doubling value escape doesn't work:

grepl("[[", x) ## error: invalid regular expression '[[', reason 'missing ']'' 

neither using backslash:

grepl("\[", x) ## error: '\[' unrecognized escape in character string starting ""\[" 

how match special characters?


some special cases of in questions old , written enough cheeky close duplicates of this:
escaped periods in r regular expressions
how escape question mark in r?
escaping pipe ("|") in regex

escape double backslash

r treats backslashes escape values character constants. (... , regular expressions. hence need 2 backslashes when supplying character argument pattern. first 1 isn't character, rather makes second 1 character.) can see how processed using cat.

y <- "double quote: \", tab: \t, newline: \n, unicode point: \u20ac" print(y) ## [1] "double quote: \", tab: \t, newline: \n, unicode point: €" cat(y) ## double quote: ", tab:    , newline:  ## , unicode point: € 

further reading: escaping backslash backslash in r produces 2 backslashes in string, not 1

to use special characters in regular expression simplest method escape them backslash, noted above, backslash needs escaped.

grepl("\\[", "a[b") ## [1] true 

to match backslashes, need double escape, resulting in 4 backslashes.

grepl("\\\\", c("a\\b", "a\nb")) ## [1]  true false 

the rebus package contains constants each of special characters save mistyping slashes.

library(rebus) open_bracket ## [1] "\\[" backslash ## [1] "\\\\" 

form character class

you can wrap special characters in square brackets form character class.

grepl("[?]", "a?b") ## [1] true 

two of special characters have special meaning inside character classes: \ , ^.

backslash still needs escaped if inside character class.

grepl("[\\\\]", c("a\\b", "a\nb")) ## [1]  true false 

caret needs escaped if directly after opening square bracket.

grepl("[ ^]", "a^b")  # matches spaces well. ## [1] true grepl("[\\^]", "a^b")  ## [1] true 

rebus lets form character class.

char_class("?") ## <regex> [?] 

use pre-existing character class

if want match punctuation, can use [:punct:] character class.

grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$")) ## [1] true true true true true true true 

stringi maps unicode general category punctuation, behaviour different.

stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]") ## [1]  true  true  true  true  true false false 

you can use cross-platform syntax accessing ugc.

stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\\p{p}") ## [1]  true  true  true  true  true false false 

use \q \e escapes

placing characters between \\q , \\e makes regular expression engine treat them literally rather regular expressions.

grepl("\\q.\\e", "a.b") ## [1] true 

rebus lets write literal blocks of regular expressions.

literal(".") ## <regex> \q.\e 

don't use regular expressions

regular expressions not answer. if want match fixed string can do, example:

grepl("[", "a[b", fixed = true) stringr::str_detect("a[b", fixed("[")) stringi::stri_detect_fixed("a[b", "[") 

Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -