6.12 Regexp Character Classes

20180608 A character class is a collection of characters that are in some way grouped together. We enclose the characters to be grouped within square backets []. The pattern then matches any one of the characters in the set. For example, the character class [0-9] matches any of the digits from 0 to 9.

% latex table generated in R 4.3.0 by xtable 1.8-4 package % Sun Aug 13 06:26:19 2023
s <- c("abc12", "@#$", "345", "ABcd")
grep(pattern="[0-9]+", s, value=TRUE)
## [1] "abc12" "345"
grep(pattern="[A-Z]+", s, value=TRUE)
## [1] "ABcd"
grep(pattern="[^@#$]+", s, value=TRUE)
## [1] "abc12" "345"   "ABcd"

R also supports the use of POSIX character classes which are represented within [[]] (double braces).

grep(pattern="[[:alpha:]]", s, value=TRUE)
## [1] "abc12" "ABcd"
grep(pattern="[[:upper:]]", s, value=TRUE)
## [1] "ABcd"


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0