Creating a Rust function that returns a &str or String
We learned how to create a function that accepts String or &str as an argument. Now I want to show you how to create a function that returns either String
or &str
. I also want to discuss why we would want to do this. To start, let us write a function to remove all the spaces from a given string. Our function might look something like this:
fn remove_spaces(input: &str) -> String {
let mut buf = String::with_capacity(input.len());
for c in input.chars() {
if c != ' ' {
buf.push(c);
}
}
buf
}
This function allocates memory for a string buffer, loops through each character of input
and appends all non-space characters to the string buffer. Now I ask: what if my input did not contain spaces at all? The value input
would be the same as buf
. In that case, it would be more efficient to not create buf
in the first place. Instead, we would like to just return the given input
back to the caller. The type of input
is a &str
but our function returns a String though. We could change the type of input
to a String
:
fn remove_spaces(input: String) -> String { ... }
but this causes two problems. First, by making input
of type String
we are forcing the caller to move the ownership of input
into our function. This prevents the caller from using that value in the future. We should only take ownership of input
if we actually need it. Second, the input might already be of type &str
and we are now forcing the caller to convert it into a String
which defeats our attempts to not allocate new memory when creating buf
.
Clone-on-write
What we really want is the ability to return our input string (&str
) if there are no spaces and to return a new string (String
) if there are spaces we need to remove. This is where the clone-on-write or Cow type can be used. The Cow
type allows us to abstract away whether something is Owned
or Borrowed
. In our example, the &str
is a reference to an existing string so that would be borrowed data. If there are spaces, then we need to allocate memory for a new String
. That new String
is owned by the buf
variable. Normally, we would move the ownership of buf
by returning it to the caller. When using Cow
, we want to move the ownership of buf
into the Cow
type and return that.
use std::borrow::Cow;
fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> {
if input.contains(' ') {
let mut buf = String::with_capacity(input.len());
for c in input.chars() {
if c != ' ' {
buf.push(c);
}
}
return Cow::Owned(buf);
}
return Cow::Borrowed(input);
}
Our function now checks to see if the given input
contains a space and only then allocates memory for a new buffer. If the input
does not contain a space, the input
is simply returned. We are adding a bit of runtime complexity to optimize how we allocate memory. Notice that our Cow
type has the same lifetime of the &str
type. As we discussed previously, the compiler needs to track the &str
reference to know when it can safely free (or Drop
) the memory.
The beauty of Cow
is that it implements the Deref
trait so you can call immutable functions without knowing whether or not the result is a new string buffer or not. Example:
let s = remove_spaces("Herman Radtke");
println!("Length of string is {}", s.len());
If I do need to mutate s
, then I can convert it into an owned variable using the into_owned()
function. If the variant of Cow
was already Owned
then we are simply moving ownership. If the variant of Cow
is Borrowed
, then we are allocating memory. This allows us to lazily clone (allocate memory) only when we want to write (or mutate) the variable.
Example where a Cow::Borrowed
is mutated:
let s = remove_spaces("Herman"); // s is a Cow::Borrowed variant
let len = s.len(); // immutable function call using Deref
let owned: String = s.into_owned(); // memory is allocated for a new string
Example where a Cow::Owned
is mutated:
let s = remove_spaces("Herman Radtke"); // s is a Cow::Owned variant
let len = s.len(); // immutable function call using Deref
let owned: String = s.into_owned(); // no new memory allocated as we already had a String
The idea behind Cow
is two-fold:
- Delay the allocation of memory for as long as possible. In the best case, we never have to allocate any new memory.
- Allow the caller of our
remove_spaces
function to not care if memory was allocated or not. The usage of theCow
type is the same in either case.
Leveraging the Into
Trait
We previously discussed using the Into
trait to convert a &str
into a String
. We can also use the Into
trait to convert the &str
or String
into the proper Cow
variant. By calling .into()
the compiler will perform the conversion automatically. Using .into()
will not speed up or slow down the code. It is simply an option to avoid having to specify Cow::Owned
or Cow::Borrowed
explicitly.
fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> {
if input.contains(' ') {
let mut buf = String::with_capacity(input.len());
let v: Vec<char> = input.chars().collect();
for c in v {
if c != ' ' {
buf.push(c);
}
}
return buf.into();
}
return input.into();
}
We can also clean this up a bit using just iterators:
fn remove_spaces<'a>(input: &'a str) -> Cow<'a, str> {
if input.contains(' ') {
input
.chars()
.filter(|&x| x != ' ')
.collect::<std::string::String>()
.into()
} else {
input.into()
}
}
Real World Uses of Cow
My example of removing spaces may seem a bit contrived, but there are some great real-world applications of this strategy. Inside of Rust core there is a function that converts bytes to UTF-8 in a lossy manner and a function that will translate CRLF to LF. Both of these functions have a case where a &str
can be returned in the optimal case and another case where a String
has to be allocated. Other examples I can think of are properly encoding an xml/html string or properly escaping a SQL query. In many cases, the input is already properly encoded or escaped. In those cases, it is better to just return the input string back to the caller. When the input does need to be modified we are forced to allocate new memory, in the form of a String buffer, and return that to the caller.
Why use String::with_capacity()
?
While we are on the topic of efficient memory management, notice that I used String::with_capacity()
instead of String::new()
when creating the string buffer. You can use String::new()
instead of String::with_capacity()
, but it is more efficient to allocate memory for the buffer all at once instead of re-allocating memory as we push more char
s onto the buffer. Let us walk through what Rust does when we use String::new()
and then push characters onto the string.
A String
is really a Vec
of UTF-8 code points. When String::new()
is called, Rust creates a vector with zero bytes of capacity. If we then push the character a
onto the string buffer, like input.push('a')
, Rust has to increase the capacity of the vector. In this case, it will allocate 2 bytes of memory. As we push more characters and exceed the capacity, Rust will double the size of the string by re-allocating memory. It will continue to double the size each time the capacity is exceeded. The sequence of memory allocation is 0, 2, 4, 8, 16, 32 ... 2^n
where n is the number of times Rust detected that capacity was exceeded. Re-allocating memory is really slow (edit: kmc_v3 explained that it might not be as slow as I thought). Not only does Rust have to ask the kernel for new memory, it must also copy the contents of the vector from the old memory space to the new memory space. Check out the source code for Vec::push to see the resizing logic first-hand.
In general, we want to allocate new memory only when we need it and only allocate as much as we need. For small strings, like remove_spaces("Herman Radtke")
, the overheard of re-allocating memory is not a big deal. What if I wanted to remove all of the spaces in each JavaScript file for my website? The overhead of re-allocating memory for a buffer is much higher. When pushing data onto a vector (String or otherwise) it can be a good idea to specify a capacity to start with. The best situation is when you already know the length and the capacity can be exactly set. The code comments for Vec
give a similar warning.