This is the fourth part of the series on everyday programming tasks in the average CRUD application.
So far we covered:
- initial setup with axum and postgres, logging and dates
- input validation on incoming Json objects
- working with environment variables
This time we’re going to look at another humble task: regular expressions and constants (often seen together).
As it turns out this post dives deeper into some rust intricacies than the previous. So be cautious! You might learn something…
The beginning is not going to be complicated. We need two new crates:
regex = "1.5" lazy_static = "1.4"
First you will see how regular expressions work and next, how to make sure they are only compiled once (using constants), which is what you want, for performance.
Imagine you need to remove punctuation from a sentence. For this you could use this regular expression:
The regex crate uses perl style expressions which is also what java does.
To make this work in rust:
use regex::Regex; let punct_re = Regex::new(r#"[\d\.:,"'\(\)\[\]|/?!;]+"#).unwrap();
I did not use highlighting in this snippet, because the highlighter on this page is actually incorrect in that it doesn’t ‘escape’ the double-quote character in the middle, thinking it’s the end of the string. The code uses a raw string:
If I hadn’t included a double-quote as part of the expression, this would have been valid as well:
And if I wanted to include a pound-sign (#) in the expression, I would need to write this:
This syntax avoids having to use escaping with backslash and makes the expression more readable.
The newly created
punct_re expression can simply be used like this:
let it_contains_punctuation = punct_re.is_match("!");
Check the docs for more information on all available methods.
In our case we need to use
replace_all and pass an empty string to effectively remove all unwanted characters:
let result = punct_re.replace_all("hello world!", "");
Now it will get a little bit tricky, because
replace_all does not return a String or string slice, but instead a Cow…
No not you!
A COW as in Clone On Write:
A clone-on-write smart pointer. The type Cow is a smart pointer providing clone-on-write functionality: it can enclose and provide immutable access to borrowed data, and clone the data lazily when mutation or ownership is required. The type is designed to work with general borrowed data via the Borrow trait. Cow implements Deref, which means that you can call non-mutating methods directly on the data it encloses. If mutation is desired, to_mut will obtain a mutable reference to an owned value, cloning if necessary.
What is this and why is it used in
To start with the latter, it was put in for efficiency, returning a reference to the original string in case nothing needed replacing. And a
Cow allows mutation, as opposed to other smart pointers (like
Rc), which is useful when you do need to replace.
If you want you can read more here
As the docs state:
Which means that something like the C-language
* operation for pointers is automatically applied by the compiler to turn the smart pointer to a value, into the value itself.
I have included the types to show what goes on and because line 4 wouldn’t compile without it.
- you get the result as
- you say you want a string slice, so the compiler deref’s the
- and you get a new reference
Without dereferencing you would get a
&Cow<str> instead, which isn’t helpful at all.
One last thing:
let result twice? Yes, that’s rust’s shadowing. Really handy to avoid (quasi) hungarian notation.
Rust has a
const A: usize = 1;
const punct_re: Regex = Regex::new(r#"[\d\.:,'\(\)\[\]|/?!;]+"#).unwrap();
To work around this we need lazy_static.
This is a
macro and the code that you put in it is guaranteed to only run once.
We could simply put it in a function, right where we need it:
"out, and used highlighting again)
!important! I cannot use
&str here, because returning a reference from a function is in fact a dangling pointer. That is a pointer to memory that is owned by the function and reclaimed when it finishes. That’s why we have to copy the value to an owned
String and return that. This has a performance impact. Try to avoid copying as much as possible!
Working with regular expressions and constants isn’t really difficult, but it opens the door to some more advanced concepts in the rust type system.
I highly recommend https://rust-unofficial.github.io/too-many-lists/index.html. Don’t just read it. Don’t copy-paste the code. Don’t even copy it manually.
Read it, hide the browser tab, and try to create the code of a variation on the linkedlist by heart. Reopen the tab whenever you are stuck. And don’t despair!