Struct regex::bytes::Regex [−] [src]

pub struct Regex(_);

A compiled regular expression for matching arbitrary bytes.

It can be used to search, split or replace text. All searching is done with an implicit .*? at the beginning and end of an expression. To force an expression to match the whole string (or a prefix or a suffix), you must use an anchor like ^ or $ (or \A and \z).

Like the Regex type in the parent module, matches with this regex return byte offsets into the search text. Unlike the parent Regex type, these byte offsets may not correspond to UTF-8 sequence boundaries since the regexes in this module can match arbitrary bytes.

Methods

`impl Regex`
[src]

`fn new(re: &str) -> Result<Regex, Error>`

Compiles a regular expression. Once compiled, it can be used repeatedly to search, split or replace text in a string.

If an invalid expression is given, then an error is returned.

`fn with_size_limit(size: usize, re: &str) -> Result<Regex, Error>`

Compiles a regular expression with the given size limit.

The size limit is applied to the size of the compiled data structure. If the data structure exceeds the size given, then an error is returned.

`fn is_match(&self, text: &[u8]) -> bool`

Returns true if and only if the regex matches the string given.

It is recommended to use this method if all you need to do is test a match, since the underlying matching engine may be able to do less work.

Example

Test if some text contains at least one word with exactly 13 ASCII word bytes:

let text = b"I categorically deny having triskaidekaphobia.";
assert!(Regex::new(r"\b\w{13}\b").unwrap().is_match(text));

`fn find(&self, text: &[u8]) -> Option<(usize, usize)>`

Returns the start and end byte range of the leftmost-first match in text. If no match exists, then None is returned.

Note that this should only be used if you want to discover the position of the match. Testing the existence of a match is faster if you use is_match.

Example

Find the start and end location of the first word with exactly 13 ASCII word bytes:

let text = b"I categorically deny having triskaidekaphobia.";
let pos = Regex::new(r"\b\w{13}\b").unwrap().find(text);
assert_eq!(pos, Some((2, 15)));

`fn find_iter<'r, 't>(&'r self, text: &'t [u8]) -> FindMatches<'r, 't>`

Returns an iterator for each successive non-overlapping match in text, returning the start and end byte indices with respect to text.

Example

Find the start and end location of every word with exactly 13 ASCII word bytes:

let text = b"Retroactively relinquishing remunerations is reprehensible.";
for pos in Regex::new(r"\b\w{13}\b").unwrap().find_iter(text) {
    println!("{:?}", pos);
}
// Output:
// (0, 13)
// (14, 27)
// (28, 41)
// (45, 58)

`fn captures<'t>(&self, text: &'t [u8]) -> Option<Captures<'t>>`

Returns the capture groups corresponding to the leftmost-first match in text. Capture group 0 always corresponds to the entire match. If no match is found, then None is returned.

You should only use captures if you need access to submatches. Otherwise, find is faster for discovering the location of the overall match.

Examples

Say you have some text with movie names and their release years, like "'Citizen Kane' (1941)". It'd be nice if we could search for text looking like that, while also extracting the movie name and its release year separately.

let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let text = b"Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(text).unwrap();
assert_eq!(caps.at(1), Some(&b"Citizen Kane"[..]));
assert_eq!(caps.at(2), Some(&b"1941"[..]));
assert_eq!(caps.at(0), Some(&b"'Citizen Kane' (1941)"[..]));
// You can also access the groups by index using the Index notation.
// Note that this will panic on an invalid index.
assert_eq!(&caps[1], b"Citizen Kane");
assert_eq!(&caps[2], b"1941");
assert_eq!(&caps[0], b"'Citizen Kane' (1941)");

Note that the full match is at capture group 0. Each subsequent capture group is indexed by the order of its opening (.

We can make this example a bit clearer by using named capture groups:

let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)")
               .unwrap();
let text = b"Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(text).unwrap();
assert_eq!(caps.name("title"), Some(&b"Citizen Kane"[..]));
assert_eq!(caps.name("year"), Some(&b"1941"[..]));
assert_eq!(caps.at(0), Some(&b"'Citizen Kane' (1941)"[..]));
// You can also access the groups by name using the Index notation.
// Note that this will panic on an invalid group name.
assert_eq!(&caps["title"], b"Citizen Kane");
assert_eq!(&caps["year"], b"1941");
assert_eq!(&caps[0], b"'Citizen Kane' (1941)");

Here we name the capture groups, which we can access with the name method or the Index notation with a &str. Note that the named capture groups are still accessible with at or the Index notation with a usize.

The 0th capture group is always unnamed, so it must always be accessed with at(0) or [0].

`fn captures_iter<'r, 't>(&'r self, text: &'t [u8]) -> FindCaptures<'r, 't>`

Returns an iterator over all the non-overlapping capture groups matched in text. This is operationally the same as find_iter, except it yields information about submatches.

Example

We can use this to find all movie titles and their release years in some text, where the movie is formatted like "'Title' (xxxx)":

let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)")
               .unwrap();
let text = b"'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
for caps in re.captures_iter(text) {
    let title = str::from_utf8(&caps["title"]).unwrap();
    let year = str::from_utf8(&caps["year"]).unwrap();
    println!("Movie: {:?}, Released: {:?}", title, year);
}
// Output:
// Movie: Citizen Kane, Released: 1941
// Movie: The Wizard of Oz, Released: 1939
// Movie: M, Released: 1931

`fn split<'r, 't>(&'r self, text: &'t [u8]) -> Splits<'r, 't>`

Returns an iterator of substrings of text delimited by a match of the regular expression. Namely, each element of the iterator corresponds to text that isn't matched by the regular expression.

This method will not copy the text given.

Example

To split a string delimited by arbitrary amounts of spaces or tabs:

let re = Regex::new(r"[ \t]+").unwrap();
let fields: Vec<&[u8]> = re.split(b"a b \t  c\td    e").collect();
assert_eq!(fields, vec![
    &b"a"[..], &b"b"[..], &b"c"[..], &b"d"[..], &b"e"[..],
]);

`fn splitn<'r, 't>(&'r self, text: &'t [u8], limit: usize) -> SplitsN<'r, 't>`

Returns an iterator of at most limit substrings of text delimited by a match of the regular expression. (A limit of 0 will return no substrings.) Namely, each element of the iterator corresponds to text that isn't matched by the regular expression. The remainder of the string that is not split will be the last element in the iterator.

This method will not copy the text given.

Example

Get the first two words in some text:

let re = Regex::new(r"\W+").unwrap();
let fields: Vec<&[u8]> = re.splitn(b"Hey! How are you?", 3).collect();
assert_eq!(fields, vec![&b"Hey"[..], &b"How"[..], &b"are you?"[..]]);

`fn replace<R: Replacer>(&self, text: &[u8], rep: R) -> Vec<u8>`

Replaces the leftmost-first match with the replacement provided. The replacement can be a regular byte string (where $N and $name are expanded to match capture groups) or a function that takes the matches' Captures and returns the replaced byte string.

If no match is found, then a copy of the byte string is returned unchanged.

Examples

Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal byte string:

let re = Regex::new("[^01]+").unwrap();
assert_eq!(re.replace(b"1078910", &b""[..]), b"1010");

But anything satisfying the Replacer trait will work. For example, a closure of type |&Captures| -> Vec<u8> provides direct access to the captures corresponding to a match. This allows one to access submatches easily:

let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap();
let result = re.replace(b"Springsteen, Bruce", |caps: &Captures| {
    let mut replacement = caps[2].to_owned();
    replacement.push(b' ');
    replacement.extend(&caps[1]);
    replacement
});
assert_eq!(result, b"Bruce Springsteen");

But this is a bit cumbersome to use all the time. Instead, a simple syntax is supported that expands $name into the corresponding capture group. Here's the last example, but using this expansion technique with named capture groups:

let re = Regex::new(r"(?P<last>[^,\s]+),\s+(?P<first>\S+)").unwrap();
let result = re.replace(b"Springsteen, Bruce", &b"$first $last"[..]);
assert_eq!(result, b"Bruce Springsteen");

Note that using $2 instead of $first or $1 instead of $last would produce the same result. To write a literal $ use $$.

If $name isn't a valid capture group (whether the name doesn't exist or isn't a valid index), then it is replaced with the empty string.

The longest possible name is used. e.g., $1a looks up the capture group named 1a and not the capture group at index 1. To exert more precise control over the name, use braces, e.g., ${1}a.

Finally, sometimes you just want to replace a literal string with no submatch expansion. This can be done by wrapping a byte string with NoExpand:

use regex::bytes::NoExpand;

let re = Regex::new(r"(?P<last>[^,\s]+),\s+(\S+)").unwrap();
let result = re.replace(b"Springsteen, Bruce", NoExpand(b"$2 $last"));
assert_eq!(result, b"$2 $last");

`fn replace_all<R: Replacer>(&self, text: &[u8], rep: R) -> Vec<u8>`

Replaces all non-overlapping matches in text with the replacement provided. This is the same as calling replacen with limit set to 0.

See the documentation for replace for details on how to access submatches in the replacement text.

`fn replacen<R: Replacer>(&self, text: &[u8], limit: usize, rep: R) -> Vec<u8>`

Replaces at most limit non-overlapping matches in text with the replacement provided. If limit is 0, then all non-overlapping matches are replaced.

See the documentation for replace for details on how to access submatches in the replacement text.

`fn shortest_match(&self, text: &[u8]) -> Option<usize>`

Returns the end location of a match in the text given.

This method may have the same performance characteristics as is_match, except it provides an end location for a match. In particular, the location returned may be shorter than the proper end of the leftmost-first match.

Example

Typically, a+ would match the entire first sequence of a in some text, but shortest_match can give up as soon as it sees the first a.

let text = b"aaaaa";
let pos = Regex::new(r"a+").unwrap().shortest_match(text);
assert_eq!(pos, Some(1));

`fn as_str(&self) -> &str`

Returns the original string of this regex.

`fn capture_names(&self) -> CaptureNames`

Returns an iterator over the capture names.

`fn captures_len(&self) -> usize`

Returns the number of captures.

Trait Implementations

`impl Clone for Regex`
[src]

`fn clone(&self) -> Regex`

Returns a copy of the value. Read more

`fn clone_from(&mut self, source: &Self)`
1.0.0

Performs copy-assignment from source. Read more

`impl Display for Regex`
[src]

`fn fmt(&self, f: &mut Formatter) -> Result`

Shows the original regular expression.

`impl Debug for Regex`
[src]

`fn fmt(&self, f: &mut Formatter) -> Result`

Shows the original regular expression.

`impl FromStr for Regex`
[src]

`type Err = Error`

The associated error which can be returned from parsing.

`fn from_str(s: &str) -> Result<Regex, Error>`

Attempts to parse a string into a regular expression

Struct regex::bytes::Regex [−] [src]

Methods

impl Regex[src]

fn new(re: &str) -> Result<Regex, Error>

fn with_size_limit(size: usize, re: &str) -> Result<Regex, Error>

fn is_match(&self, text: &[u8]) -> bool

fn find(&self, text: &[u8]) -> Option<(usize, usize)>

fn find_iter<'r, 't>(&'r self, text: &'t [u8]) -> FindMatches<'r, 't>

fn captures<'t>(&self, text: &'t [u8]) -> Option<Captures<'t>>

fn captures_iter<'r, 't>(&'r self, text: &'t [u8]) -> FindCaptures<'r, 't>

fn split<'r, 't>(&'r self, text: &'t [u8]) -> Splits<'r, 't>

fn splitn<'r, 't>(&'r self, text: &'t [u8], limit: usize) -> SplitsN<'r, 't>

fn replace<R: Replacer>(&self, text: &[u8], rep: R) -> Vec<u8>

fn replace_all<R: Replacer>(&self, text: &[u8], rep: R) -> Vec<u8>

fn replacen<R: Replacer>(&self, text: &[u8], limit: usize, rep: R) -> Vec<u8>

fn shortest_match(&self, text: &[u8]) -> Option<usize>

fn as_str(&self) -> &str

fn capture_names(&self) -> CaptureNames

fn captures_len(&self) -> usize

Trait Implementations

impl Clone for Regex[src]

fn clone(&self) -> Regex

fn clone_from(&mut self, source: &Self)1.0.0

impl Display for Regex[src]

fn fmt(&self, f: &mut Formatter) -> Result

impl Debug for Regex[src]

fn fmt(&self, f: &mut Formatter) -> Result

impl FromStr for Regex[src]

type Err = Error

fn from_str(s: &str) -> Result<Regex, Error>

`impl Regex`
[src]

`fn new(re: &str) -> Result<Regex, Error>`

`fn with_size_limit(size: usize, re: &str) -> Result<Regex, Error>`

`fn is_match(&self, text: &[u8]) -> bool`

`fn find(&self, text: &[u8]) -> Option<(usize, usize)>`

`fn find_iter<'r, 't>(&'r self, text: &'t [u8]) -> FindMatches<'r, 't>`

`fn captures<'t>(&self, text: &'t [u8]) -> Option<Captures<'t>>`

`fn captures_iter<'r, 't>(&'r self, text: &'t [u8]) -> FindCaptures<'r, 't>`

`fn split<'r, 't>(&'r self, text: &'t [u8]) -> Splits<'r, 't>`

`fn splitn<'r, 't>(&'r self, text: &'t [u8], limit: usize) -> SplitsN<'r, 't>`

`fn replace<R: Replacer>(&self, text: &[u8], rep: R) -> Vec<u8>`

`fn replace_all<R: Replacer>(&self, text: &[u8], rep: R) -> Vec<u8>`

`fn replacen<R: Replacer>(&self, text: &[u8], limit: usize, rep: R) -> Vec<u8>`

`fn shortest_match(&self, text: &[u8]) -> Option<usize>`

`fn as_str(&self) -> &str`

`fn capture_names(&self) -> CaptureNames`

`fn captures_len(&self) -> usize`

`impl Clone for Regex`
[src]

`fn clone(&self) -> Regex`

`fn clone_from(&mut self, source: &Self)`
1.0.0

`impl Display for Regex`
[src]

`fn fmt(&self, f: &mut Formatter) -> Result`

`impl Debug for Regex`
[src]

`fn fmt(&self, f: &mut Formatter) -> Result`

`impl FromStr for Regex`
[src]

`type Err = Error`

`fn from_str(s: &str) -> Result<Regex, Error>`