r - How do I deal with special characters like \^$.?*|+()[{ in my regex? -
i want match regular expression special character, \^$.?*|+()[{
. tried:
x <- "a[b" grepl("[", x) ## error: invalid regular expression '[', reason 'missing ']''
(equivalently stringr::str_detect(x, "[")
or stringi::stri_detect_regex(x, "[")
.)
doubling value escape doesn't work:
grepl("[[", x) ## error: invalid regular expression '[[', reason 'missing ']''
neither using backslash:
grepl("\[", x) ## error: '\[' unrecognized escape in character string starting ""\["
how match special characters?
some special cases of in questions old , written enough cheeky close duplicates of this:
escaped periods in r regular expressions
how escape question mark in r?
escaping pipe ("|") in regex
escape double backslash
r treats backslashes escape values character constants. (... , regular expressions. hence need 2 backslashes when supplying character argument pattern. first 1 isn't character, rather makes second 1 character.) can see how processed using cat
.
y <- "double quote: \", tab: \t, newline: \n, unicode point: \u20ac" print(y) ## [1] "double quote: \", tab: \t, newline: \n, unicode point: €" cat(y) ## double quote: ", tab: , newline: ## , unicode point: €
further reading: escaping backslash backslash in r produces 2 backslashes in string, not 1
to use special characters in regular expression simplest method escape them backslash, noted above, backslash needs escaped.
grepl("\\[", "a[b") ## [1] true
to match backslashes, need double escape, resulting in 4 backslashes.
grepl("\\\\", c("a\\b", "a\nb")) ## [1] true false
the rebus
package contains constants each of special characters save mistyping slashes.
library(rebus) open_bracket ## [1] "\\[" backslash ## [1] "\\\\"
form character class
you can wrap special characters in square brackets form character class.
grepl("[?]", "a?b") ## [1] true
two of special characters have special meaning inside character classes: \
, ^
.
backslash still needs escaped if inside character class.
grepl("[\\\\]", c("a\\b", "a\nb")) ## [1] true false
caret needs escaped if directly after opening square bracket.
grepl("[ ^]", "a^b") # matches spaces well. ## [1] true grepl("[\\^]", "a^b") ## [1] true
rebus
lets form character class.
char_class("?") ## <regex> [?]
use pre-existing character class
if want match punctuation, can use [:punct:]
character class.
grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$")) ## [1] true true true true true true true
stringi
maps unicode general category punctuation, behaviour different.
stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]") ## [1] true true true true true false false
you can use cross-platform syntax accessing ugc.
stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\\p{p}") ## [1] true true true true true false false
use \q \e escapes
placing characters between \\q
, \\e
makes regular expression engine treat them literally rather regular expressions.
grepl("\\q.\\e", "a.b") ## [1] true
rebus
lets write literal blocks of regular expressions.
literal(".") ## <regex> \q.\e
don't use regular expressions
regular expressions not answer. if want match fixed string can do, example:
grepl("[", "a[b", fixed = true) stringr::str_detect("a[b", fixed("[")) stringi::stri_detect_fixed("a[b", "[")
Comments
Post a Comment