Skip to content

Commit 89d619b

Browse files
committed
Cleaned up the roadmap, added examples, and a few features too.
1 parent 496e51a commit 89d619b

File tree

6 files changed

+100
-84
lines changed

6 files changed

+100
-84
lines changed

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "human_regex"
3-
version = "0.1.2"
3+
version = "0.1.3"
44
authors = ["Chris McComb <ccmcc2012@gmail.com>"]
55
description = "A regex library for humans"
66
edition = "2021"

README.md

Lines changed: 70 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
| ⚠️ This package is under active development which will include breaking changes. ⚠️ |
2-
| --------------------------------------------------------------------- |
2+
| :--------------------------------------------------------------------------------: |
33
# Regex for Humans
44
The goal of this crate is simple: give everybody the power of regular expressions without having
55
to learn the complicated syntax. It is inspired by [ReadableRegex.jl](https://github.com/jkrumbiegel/ReadableRegex.jl).
@@ -24,100 +24,100 @@ fn main() {
2424
```
2525

2626
# Roadmap
27-
The eventual goal of this crate is to support all of the syntax in the [core Rust regex library](https://crates.io/crates/regex) through a human-readable API. Here is where we currently stand:
27+
The eventual goal of this crate is to support all the syntax in the [core Rust regex library](https://crates.io/crates/regex) through a human-readable API. Here is where we currently stand:
2828

2929
## Character Classes
3030
### Single Character
3131

32-
| Implemented? | Expression | Description |
33-
| :----------: | :--------: | :---------- |
32+
| Implemented? | Expression | Description |
33+
| :----------: | :--------: | :------------------------------------------------------------ |
3434
| `any()` | `.` | any character except new line (includes new line with s flag) |
35-
| `digit()` | `\d` | digit (\p{Nd}) |
36-
| `non_digit()` | `\D` | not digit |
37-
| |`\pN` | One-letter name Unicode character class |
38-
| |`\p{Greek}` | Unicode character class (general category or script) |
39-
| |`\PN` | Negated one-letter name Unicode character class |
40-
| |`\P{Greek}` | negated Unicode character class (general category or script) |
35+
| `digit()` | `\d` | digit (\p{Nd}) |
36+
| `non_digit()` | `\D` | not digit |
37+
| |`\pN` | One-letter name Unicode character class |
38+
| |`\p{Greek}` | Unicode character class (general category or script) |
39+
| |`\PN` | Negated one-letter name Unicode character class |
40+
| |`\P{Greek}` | negated Unicode character class (general category or script) |
4141

4242
### Perl Character Classes
4343

44-
| Implemented? | Expression | Description |
45-
| :---------------: | :--------: | :---------- |
46-
| `digit()` | `\d` | digit (\p{Nd}) |
47-
| `non_digit()` | `\D` | not digit |
48-
| `whitespace()` | `\s` | whitespace (\p{White_Space}) |
49-
| `non_whitespace()` | `\S` | not whitespace |
44+
| Implemented? | Expression | Description |
45+
| :---------------: | :--------: | :----------------------------------------------------------------------- |
46+
| `digit()` | `\d` | digit (\p{Nd}) |
47+
| `non_digit()` | `\D` | not digit |
48+
| `whitespace()` | `\s` | whitespace (\p{White_Space}) |
49+
| `non_whitespace()` | `\S` | not whitespace |
5050
| `word()` | `\w` | word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}) |
51-
| `non_word()` | `\W` | not word character |
51+
| `non_word()` | `\W` | not word character |
5252

5353
### ASCII Character Classes
5454

55-
| Implemented? | Expression | Description |
56-
| :---------------: | :------------: | :---------- |
57-
| | `[[:alnum:]]` | alphanumeric ([0-9A-Za-z]) |
58-
| | `[[:alpha:]]` | alphabetic ([A-Za-z]) |
59-
| | `[[:ascii:]]` | ASCII ([\x00-\x7F]) |
60-
| | `[[:blank:]]` | blank ([\t ]) |
61-
| | `[[:cntrl:]]` | control ([\x00-\x1F\x7F]) |
62-
| `digit()` | `[[:digit:]]` | digits ([0-9]) |
63-
| | `[[:graph:]]` | graphical ([!-~]) |
64-
| | `[[:lower:]]` | lower case ([a-z]) |
65-
| | `[[:print:]]` | printable ([ -~]) |
66-
| | `[[:punct:]]` | punctuation ([!-/:-@\[-`{-~]) |
67-
| | `[[:space:]]` | whitespace ([\t\n\v\f\r ]) |
68-
| | `[[:upper:]]` | upper case ([A-Z]) |
55+
| Implemented? | Expression | Description |
56+
| :---------------: | :------------: | :----------------------------- |
57+
| | `[[:alnum:]]` | alphanumeric ([0-9A-Za-z]) |
58+
| | `[[:alpha:]]` | alphabetic ([A-Za-z]) |
59+
| | `[[:ascii:]]` | ASCII ([\x00-\x7F]) |
60+
| | `[[:blank:]]` | blank ([\t ]) |
61+
| | `[[:cntrl:]]` | control ([\x00-\x1F\x7F]) |
62+
| `digit()` | `[[:digit:]]` | digits ([0-9]) |
63+
| | `[[:graph:]]` | graphical ([!-~]) |
64+
| | `[[:lower:]]` | lower case ([a-z]) |
65+
| | `[[:print:]]` | printable ([ -~]) |
66+
| | `[[:punct:]]` | punctuation ([!-/:-@\[-`{-~]) |
67+
| | `[[:space:]]` | whitespace ([\t\n\v\f\r ]) |
68+
| | `[[:upper:]]` | upper case ([A-Z]) |
6969
| `word()` | `[[:word:]]` | word characters ([0-9A-Za-z_]) |
70-
| | `[[:xdigit:]]` | hex digit ([0-9A-Fa-f]) |
70+
| | `[[:xdigit:]]` | hex digit ([0-9A-Fa-f]) |
7171

7272
## Repetitions
7373

74-
| Implemented? | Expression | Description |
75-
| :----------------------: | :------------: | :---------- |
76-
| `zero_or_more(x)` | `x*` | zero or more of x (greedy) |
77-
| `one_or_more(x)` | `x+` | one or more of x (greedy) |
78-
| `zero_or_one(x)` | `x?` | zero or one of x (greedy) |
79-
| `zero_or_more(x)` | `x*?` | zero or more of x (ungreedy/lazy) |
80-
| `one_or_more(x).lazy()` | `x+?` | one or more of x (ungreedy/lazy) |
81-
| `zero_or_more(x).lazy()` | `x??` | zero or one of x (ungreedy/lazy) |
82-
| `at_least_at_most(n, m, x)` | `x{n,m}` | at least n x and at most m x (greedy) |
83-
| `at_least(n, x)` | `x{n,}` | at least n x (greedy) |
84-
| `exactly(n, x)` | `x{n}` | exactly n x |
85-
| `at_least_at_most(n, m, x).lazy()`| `x{n,m}?` | at least n x and at most m x (ungreedy/lazy) |
86-
| `at_least(n, x).lazy()` | `x{n,}?` | at least n x (ungreedy/lazy) |
74+
| Implemented? | Expression | Description |
75+
| :-----------------------: | :------------: | :------------------------------------------- |
76+
| `zero_or_more(x)` | `x*` | zero or more of x (greedy) |
77+
| `one_or_more(x)` | `x+` | one or more of x (greedy) |
78+
| `zero_or_one(x)` | `x?` | zero or one of x (greedy) |
79+
| `zero_or_more(x)` | `x*?` | zero or more of x (ungreedy/lazy) |
80+
| `one_or_more(x).lazy()` | `x+?` | one or more of x (ungreedy/lazy) |
81+
| `zero_or_more(x).lazy()` | `x??` | zero or one of x (ungreedy/lazy) |
82+
| `between(n, m, x)` | `x{n,m}` | at least n x and at most m x (greedy) |
83+
| `at_least(n, x)` | `x{n,}` | at least n x (greedy) |
84+
| `exactly(n, x)` | `x{n}` | exactly n x |
85+
| `between(n, m, x).lazy()` | `x{n,m}?` | at least n x and at most m x (ungreedy/lazy) |
86+
| `at_least(n, x).lazy()` | `x{n,}?` | at least n x (ungreedy/lazy) |
8787

8888
## Composites
8989

9090
| Implemented? | Expression | Description |
9191
| :---------------: | :------------: | :------------------------------ |
9292
| `+` | `xy` | concatenation (x followed by y) |
93-
| `or()` | `x\|y` | alternation (x or y, prefer x) |
93+
| `or()` | `x\|y` | alternation (x or y, prefer x) |
9494

9595
## Empty matches
9696

97-
| Implemented? | Expression | Description |
98-
| :---------------: | :------------: | :------------------------------ |
99-
| `begin()` | `^` | the beginning of text (or start-of-line with multi-line mode) |
100-
| `end()` | `$` | the end of text (or end-of-line with multi-line mode) |
101-
| |`\A` | only the beginning of text (even with multi-line mode enabled) |
102-
| | `\z` | only the end of text (even with multi-line mode enabled) |
103-
| |`\b` | a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
104-
| | `\B` | not a Unicode word boundary |
97+
| Implemented? | Expression | Description |
98+
| :------------------: | :------------: | :------------------------------------------------------------------ |
99+
| `begin()` | `^` | the beginning of text (or start-of-line with multi-line mode) |
100+
| `end()` | `$` | the end of text (or end-of-line with multi-line mode) |
101+
| | `\A` | only the beginning of text (even with multi-line mode enabled) |
102+
| | `\z` | only the end of text (even with multi-line mode enabled) |
103+
| `word_boundary()` | `\b` | a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
104+
| `non_word_boundary()` | `\B` | not a Unicode word boundary |
105105

106106
## Groupings and Flags
107107

108-
| Implemented? | Expression | Description |
109-
| :---------------: | :------------: | :------------------------------ |
110-
| | `(exp)` | numbered capture group (indexed by opening parenthesis) |
111-
| | `(?P<name>exp)` | named (also numbered) capture group |
112-
| | `(?:exp)` | non-capturing group |
113-
| | `(?flags)` | set flags within current group |
114-
| | `(?flags:exp)` | set flags for exp (non-capturing) |
115-
116-
| Implemented? | Expression | Description |
117-
| :---------------: | :------------: | :------------------------------ |
118-
| | `i` | case-insensitive: letters match both upper and lower case |
119-
| | `m` | multi-line mode: `^` and `$` match begin/end of line |
120-
| | `s` | allow `.` to match `\n` |
121-
| | `U` | swap the meaning of `x*` and `x*`? |
122-
| | `u` | Unicode support (enabled by default) |
123-
| | `x` | ignore whitespace and allow line comments (starting with `#`) |
108+
| Implemented? | Expression | Description |
109+
| :---------------: | :-------------: | :------------------------------------------------------ |
110+
| | `(exp)` | numbered capture group (indexed by opening parenthesis) |
111+
| | `(?P<name>exp)` | named (also numbered) capture group |
112+
| Handled implicitly through functional composition | `(?:exp)` | non-capturing group |
113+
| | `(?flags)` | set flags within current group |
114+
| | `(?flags:exp)` | set flags for exp (non-capturing) |
115+
116+
| Implemented? | Expression | Description |
117+
| :---------------: | :------------: | :------------------------------------------------------------ |
118+
| | `i` | case-insensitive: letters match both upper and lower case |
119+
| | `m` | multi-line mode: `^` and `$` match begin/end of line |
120+
| | `s` | allow `.` to match `\n` |
121+
| | `U` | swap the meaning of `x*` and `x*`? |
122+
| | `u` | Unicode support (enabled by default) |
123+
| | `x` | ignore whitespace and allow line comments (starting with `#`) |

src/lib.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,16 @@
3535
3636
mod shorthand;
3737
pub use shorthand::{
38-
any, begin, digit, direct_regex, end, non_digit, non_whitespace, non_word, text, whitespace,
39-
word,
38+
any, begin, digit, direct_regex, end, non_digit, non_whitespace, non_word, non_word_boundary,
39+
text, whitespace, word, word_boundary,
4040
};
4141

4242
mod humanregex;
4343
pub use humanregex::{fmt, HumanRegex};
4444

4545
mod repetitions;
4646
pub use repetitions::{
47-
at_least, at_least_at_most, exactly, one_or_more, optional, zero_or_more, zero_or_one,
47+
at_least, between, exactly, one_or_more, optional, zero_or_more, zero_or_one,
4848
};
4949

5050
mod logical;

src/logical.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ where
1414
{
1515
let mut regex_string = format!("({})", options[0].to_string());
1616
for idx in 1..options.len() {
17-
regex_string = format!("{}|({})", regex_string, options[idx].to_string())
17+
regex_string = format!("{}|(:?{})", regex_string, options[idx].to_string())
1818
}
1919
HumanRegex(regex_string)
2020
}

src/repetitions.rs

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,20 @@ pub fn at_least<T>(n: u8, target: T) -> HumanRegex
1010
where
1111
T: Into<String> + fmt::Display,
1212
{
13-
HumanRegex(format!("({}){{{},}}", target, n))
13+
HumanRegex(format!("(:?{}){{{},}}", target, n))
1414
}
1515

1616
/// Match at least _n_ and at most _m_ of a certain target
1717
/// ```
18-
/// let regex_string = human_regex::at_least_at_most(3, 5, "a");
18+
/// let regex_string = human_regex::between(3, 5, "a");
1919
/// assert!(regex_string.to_regex().is_match("aaaa"));
2020
/// assert!(!regex_string.to_regex().is_match("aa"));
2121
/// ```
22-
pub fn at_least_at_most<T>(n: u8, m: u8, target: T) -> HumanRegex
22+
pub fn between<T>(n: u8, m: u8, target: T) -> HumanRegex
2323
where
2424
T: Into<String> + fmt::Display,
2525
{
26-
HumanRegex(format!("({}){{{},{}}}", target, n, m))
26+
HumanRegex(format!("(:?{}){{{},{}}}", target, n, m))
2727
}
2828

2929
/// Match one or more of a certain target
@@ -36,7 +36,7 @@ pub fn one_or_more<T>(target: T) -> HumanRegex
3636
where
3737
T: Into<String> + fmt::Display,
3838
{
39-
HumanRegex(format!("({})+", target))
39+
HumanRegex(format!("(:?{})+", target))
4040
}
4141

4242
/// Match zero or more of a certain target
@@ -49,7 +49,7 @@ pub fn zero_or_more<T>(target: T) -> HumanRegex
4949
where
5050
T: Into<String> + fmt::Display,
5151
{
52-
HumanRegex(format!("({})*", target))
52+
HumanRegex(format!("(:?{})*", target))
5353
}
5454

5555
/// Match zero or one of a certain target
@@ -62,7 +62,7 @@ pub fn zero_or_one<T>(target: T) -> HumanRegex
6262
where
6363
T: Into<String> + fmt::Display,
6464
{
65-
HumanRegex(format!("({})?", target))
65+
HumanRegex(format!("(:?{})?", target))
6666
}
6767

6868
/// Match zero or one of a certain target
@@ -75,7 +75,7 @@ pub fn optional<T>(target: T) -> HumanRegex
7575
where
7676
T: Into<String> + fmt::Display,
7777
{
78-
HumanRegex(format!("({})?", target))
78+
HumanRegex(format!("(:?{})?", target))
7979
}
8080

8181
/// Match exactly _n_ of a certain target
@@ -88,5 +88,5 @@ pub fn exactly<T>(n: u8, target: T) -> HumanRegex
8888
where
8989
T: Into<String> + fmt::Display,
9090
{
91-
HumanRegex(format!("({}){{{}}}", target, n))
91+
HumanRegex(format!("(:?{}){{{}}}", target, n))
9292
}

src/shorthand.rs

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,12 @@ pub fn any() -> HumanRegex {
1313
}
1414

1515
/// A function for the digit character class (i.e., the digits 0 through 9)
16+
/// ```
17+
/// use human_regex::{begin, end, one_or_more, digit};
18+
/// let regex_string = begin() + one_or_more(digit()) + end();
19+
/// assert!(regex_string.to_regex().is_match("010101010100100100100101"));
20+
/// assert!(!regex_string.to_regex().is_match("a string that is not composed of digits will fail"));
21+
/// ```
1622
pub fn digit() -> HumanRegex {
1723
HumanRegex(r"\d".to_string())
1824
}
@@ -107,3 +113,13 @@ where
107113
pub fn direct_regex(text: &str) -> HumanRegex {
108114
HumanRegex(text.to_string())
109115
}
116+
117+
/// A function to match a word boundary
118+
pub fn word_boundary() -> HumanRegex {
119+
HumanRegex(r"\b".to_string())
120+
}
121+
122+
/// A function to match anything BUT a word boundary
123+
pub fn non_word_boundary() -> HumanRegex {
124+
HumanRegex(r"\B".to_string())
125+
}

0 commit comments

Comments
 (0)