11| ⚠️ This package is under active development which will include breaking changes. ⚠️ |
2- | --------------------------------------------------------------------- |
2+ | : --------------------------------------------------------------------------------: |
33# Regex for Humans
44The goal of this crate is simple: give everybody the power of regular expressions without having
55to learn the complicated syntax. It is inspired by [ ReadableRegex.jl] ( https://github.com/jkrumbiegel/ReadableRegex.jl ) .
@@ -24,100 +24,100 @@ fn main() {
2424```
2525
2626# Roadmap
27- The eventual goal of this crate is to support all of the syntax in the [ core Rust regex library] ( https://crates.io/crates/regex ) through a human-readable API. Here is where we currently stand:
27+ The eventual goal of this crate is to support all the syntax in the [ core Rust regex library] ( https://crates.io/crates/regex ) through a human-readable API. Here is where we currently stand:
2828
2929## Character Classes
3030### Single Character
3131
32- | Implemented? | Expression | Description |
33- | :----------: | :--------: | :---------- |
32+ | Implemented? | Expression | Description |
33+ | :----------: | :--------: | :------------------------------------------------------------ |
3434| ` any() ` | ` . ` | any character except new line (includes new line with s flag) |
35- | ` digit() ` | ` \d ` | digit (\p{Nd}) |
36- | ` non_digit() ` | ` \D ` | not digit |
37- | | ` \pN ` | One-letter name Unicode character class |
38- | | ` \p{Greek} ` | Unicode character class (general category or script) |
39- | | ` \PN ` | Negated one-letter name Unicode character class |
40- | | ` \P{Greek} ` | negated Unicode character class (general category or script) |
35+ | ` digit() ` | ` \d ` | digit (\p{Nd}) |
36+ | ` non_digit() ` | ` \D ` | not digit |
37+ | | ` \pN ` | One-letter name Unicode character class |
38+ | | ` \p{Greek} ` | Unicode character class (general category or script) |
39+ | | ` \PN ` | Negated one-letter name Unicode character class |
40+ | | ` \P{Greek} ` | negated Unicode character class (general category or script) |
4141
4242### Perl Character Classes
4343
44- | Implemented? | Expression | Description |
45- | :---------------: | :--------: | :---------- |
46- | ` digit() ` | ` \d ` | digit (\p{Nd}) |
47- | ` non_digit() ` | ` \D ` | not digit |
48- | ` whitespace() ` | ` \s ` | whitespace (\p{White_Space}) |
49- | ` non_whitespace() ` | ` \S ` | not whitespace |
44+ | Implemented? | Expression | Description |
45+ | :---------------: | :--------: | :----------------------------------------------------------------------- |
46+ | ` digit() ` | ` \d ` | digit (\p{Nd}) |
47+ | ` non_digit() ` | ` \D ` | not digit |
48+ | ` whitespace() ` | ` \s ` | whitespace (\p{White_Space}) |
49+ | ` non_whitespace() ` | ` \S ` | not whitespace |
5050| ` word() ` | ` \w ` | word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}) |
51- | ` non_word() ` | ` \W ` | not word character |
51+ | ` non_word() ` | ` \W ` | not word character |
5252
5353### ASCII Character Classes
5454
55- | Implemented? | Expression | Description |
56- | :---------------: | :------------: | :---------- |
57- | | ` [[:alnum:]] ` | alphanumeric ([ 0-9A-Za-z] ) |
58- | | ` [[:alpha:]] ` | alphabetic ([ A-Za-z] ) |
59- | | ` [[:ascii:]] ` | ASCII ([ \x00-\x7F] ) |
60- | | ` [[:blank:]] ` | blank ([ \t ] ) |
61- | | ` [[:cntrl:]] ` | control ([ \x00-\x1F\x7F] ) |
62- | ` digit() ` | ` [[:digit:]] ` | digits ([ 0-9] ) |
63- | | ` [[:graph:]] ` | graphical ([ !-~ ] ) |
64- | | ` [[:lower:]] ` | lower case ([ a-z] ) |
65- | | ` [[:print:]] ` | printable ([ -~ ] ) |
66- | | ` [[:punct:]] ` | punctuation ([ !-/:-@\[ -`{-~ ] ) |
67- | | ` [[:space:]] ` | whitespace ([ \t\n\v\f\r ] ) |
68- | | ` [[:upper:]] ` | upper case ([ A-Z] ) |
55+ | Implemented? | Expression | Description |
56+ | :---------------: | :------------: | :----------------------------- |
57+ | | ` [[:alnum:]] ` | alphanumeric ([ 0-9A-Za-z] ) |
58+ | | ` [[:alpha:]] ` | alphabetic ([ A-Za-z] ) |
59+ | | ` [[:ascii:]] ` | ASCII ([ \x00-\x7F] ) |
60+ | | ` [[:blank:]] ` | blank ([ \t ] ) |
61+ | | ` [[:cntrl:]] ` | control ([ \x00-\x1F\x7F] ) |
62+ | ` digit() ` | ` [[:digit:]] ` | digits ([ 0-9] ) |
63+ | | ` [[:graph:]] ` | graphical ([ !-~ ] ) |
64+ | | ` [[:lower:]] ` | lower case ([ a-z] ) |
65+ | | ` [[:print:]] ` | printable ([ -~ ] ) |
66+ | | ` [[:punct:]] ` | punctuation ([ !-/:-@\[ -`{-~ ] ) |
67+ | | ` [[:space:]] ` | whitespace ([ \t\n\v\f\r ] ) |
68+ | | ` [[:upper:]] ` | upper case ([ A-Z] ) |
6969| ` word() ` | ` [[:word:]] ` | word characters ([ 0-9A-Za-z_ ] ) |
70- | | ` [[:xdigit:]] ` | hex digit ([ 0-9A-Fa-f] ) |
70+ | | ` [[:xdigit:]] ` | hex digit ([ 0-9A-Fa-f] ) |
7171
7272## Repetitions
7373
74- | Implemented? | Expression | Description |
75- | :----------------------: | :------------: | :---------- |
76- | ` zero_or_more(x) ` | ` x* ` | zero or more of x (greedy) |
77- | ` one_or_more(x) ` | ` x+ ` | one or more of x (greedy) |
78- | ` zero_or_one(x) ` | ` x? ` | zero or one of x (greedy) |
79- | ` zero_or_more(x) ` | ` x*? ` | zero or more of x (ungreedy/lazy) |
80- | ` one_or_more(x).lazy() ` | ` x+? ` | one or more of x (ungreedy/lazy) |
81- | ` zero_or_more(x).lazy() ` | ` x?? ` | zero or one of x (ungreedy/lazy) |
82- | ` at_least_at_most (n, m, x)` | ` x{n,m} ` | at least n x and at most m x (greedy) |
83- | ` at_least(n, x) ` | ` x{n,} ` | at least n x (greedy) |
84- | ` exactly(n, x) ` | ` x{n} ` | exactly n x |
85- | ` at_least_at_most (n, m, x).lazy()` | ` x{n,m}? ` | at least n x and at most m x (ungreedy/lazy) |
86- | ` at_least(n, x).lazy() ` | ` x{n,}? ` | at least n x (ungreedy/lazy) |
74+ | Implemented? | Expression | Description |
75+ | :----------------------- : | :------------: | :--------------------------------- ---------- |
76+ | ` zero_or_more(x) ` | ` x* ` | zero or more of x (greedy) |
77+ | ` one_or_more(x) ` | ` x+ ` | one or more of x (greedy) |
78+ | ` zero_or_one(x) ` | ` x? ` | zero or one of x (greedy) |
79+ | ` zero_or_more(x) ` | ` x*? ` | zero or more of x (ungreedy/lazy) |
80+ | ` one_or_more(x).lazy() ` | ` x+? ` | one or more of x (ungreedy/lazy) |
81+ | ` zero_or_more(x).lazy() ` | ` x?? ` | zero or one of x (ungreedy/lazy) |
82+ | ` between (n, m, x)` | ` x{n,m} ` | at least n x and at most m x (greedy) |
83+ | ` at_least(n, x) ` | ` x{n,} ` | at least n x (greedy) |
84+ | ` exactly(n, x) ` | ` x{n} ` | exactly n x |
85+ | ` between (n, m, x).lazy()` | ` x{n,m}? ` | at least n x and at most m x (ungreedy/lazy) |
86+ | ` at_least(n, x).lazy() ` | ` x{n,}? ` | at least n x (ungreedy/lazy) |
8787
8888## Composites
8989
9090| Implemented? | Expression | Description |
9191| :---------------: | :------------: | :------------------------------ |
9292| ` + ` | ` xy ` | concatenation (x followed by y) |
93- | ` or() ` | ` x\|y ` | alternation (x or y, prefer x) |
93+ | ` or() ` | ` x\|y ` | alternation (x or y, prefer x) |
9494
9595## Empty matches
9696
97- | Implemented? | Expression | Description |
98- | :---------------: | :------------: | :------------------------------ |
99- | ` begin() ` | ` ^ ` | the beginning of text (or start-of-line with multi-line mode) |
100- | ` end() ` | ` $ ` | the end of text (or end-of-line with multi-line mode) |
101- | | ` \A ` | only the beginning of text (even with multi-line mode enabled) |
102- | | ` \z ` | only the end of text (even with multi-line mode enabled) |
103- | | ` \b ` | a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
104- | | ` \B ` | not a Unicode word boundary |
97+ | Implemented? | Expression | Description |
98+ | :------------------ : | :------------: | :------------------------------------ ------------------------------ |
99+ | ` begin() ` | ` ^ ` | the beginning of text (or start-of-line with multi-line mode) |
100+ | ` end() ` | ` $ ` | the end of text (or end-of-line with multi-line mode) |
101+ | | ` \A ` | only the beginning of text (even with multi-line mode enabled) |
102+ | | ` \z ` | only the end of text (even with multi-line mode enabled) |
103+ | ` word_boundary() ` | ` \b ` | a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
104+ | ` non_word_boundary() ` | ` \B ` | not a Unicode word boundary |
105105
106106## Groupings and Flags
107107
108- | Implemented? | Expression | Description |
109- | :---------------: | :------------: | :------------------------------ |
110- | | ` (exp) ` | numbered capture group (indexed by opening parenthesis) |
111- | | ` (?P<name>exp) ` | named (also numbered) capture group |
112- | | ` (?:exp) ` | non-capturing group |
113- | | ` (?flags) ` | set flags within current group |
114- | | ` (?flags:exp) ` | set flags for exp (non-capturing) |
115-
116- | Implemented? | Expression | Description |
117- | :---------------: | :------------: | :------------------------------ |
118- | | ` i ` | case-insensitive: letters match both upper and lower case |
119- | | ` m ` | multi-line mode: ` ^ ` and ` $ ` match begin/end of line |
120- | | ` s ` | allow ` . ` to match ` \n ` |
121- | | ` U ` | swap the meaning of ` x* ` and ` x* ` ? |
122- | | ` u ` | Unicode support (enabled by default) |
123- | | ` x ` | ignore whitespace and allow line comments (starting with ` # ` ) |
108+ | Implemented? | Expression | Description |
109+ | :---------------: | :------------- : | :------------------------ ------------------------------ |
110+ | | ` (exp) ` | numbered capture group (indexed by opening parenthesis) |
111+ | | ` (?P<name>exp) ` | named (also numbered) capture group |
112+ | Handled implicitly through functional composition | ` (?:exp) ` | non-capturing group |
113+ | | ` (?flags) ` | set flags within current group |
114+ | | ` (?flags:exp) ` | set flags for exp (non-capturing) |
115+
116+ | Implemented? | Expression | Description |
117+ | :---------------: | :------------: | :------------------------------------------------------------ |
118+ | | ` i ` | case-insensitive: letters match both upper and lower case |
119+ | | ` m ` | multi-line mode: ` ^ ` and ` $ ` match begin/end of line |
120+ | | ` s ` | allow ` . ` to match ` \n ` |
121+ | | ` U ` | swap the meaning of ` x* ` and ` x* ` ? |
122+ | | ` u ` | Unicode support (enabled by default) |
123+ | | ` x ` | ignore whitespace and allow line comments (starting with ` # ` ) |
0 commit comments