23 May 2021

Vectors

This post presents one of Rusts most useful standard types, vectors.

A vector is NOT an Iterator

Like a string, a Vec or std::vec is a sequence of values. In a string, the values are characters; in a vector, they can be any type, as long as they are all the same type. We call the values in a vector elements or sometimes items . There are several ways to create a new vector. The simplest is to enclose the elements in square brackets ( [ and ]):

let v1 = vec![1,2,3,4];
let v2 = vec!["Moonbase Alpha", "Star Trek", Babylon 5"];

The first example is a vec of four integers. The second is a vec of three strings. The elements in a vec all have to be the same type. We can cheat, by creating our own enum type, that in turn can contain different types. You’ll learn more about that in a later post.

A vec within another vec is nested. Although it is possible to nest instances of vec, it’s not advisable. If you really need a nested vec there is a special crate just for you:

Crate nested

This is a two dimensional collection you can use instead of nesting one vec inside another.

We call a vec that contains no elements an empty vec; you can create one with empty brackets,vec![] or Vec::new().

Vectors can mutate

To access the elements of a vec you can use the bracket operator. The expression inside the brackets specifies the index. Remember that the indices start at 0:

fn main() {
    let my_vec= vec!["H","e","l","l","o"];
    for i in 0..my_vec.len() {
        print!("{}",my_vec[i]);
    }
    println!("");
}

Unlike a string, we can make a vec mutable. We use the bracket operator on the left side of an assignment. That way we can identify the element of the vec that we want to bind a value to.

fn main() {
    let mut v = vec![20, 30, 40, 50];
    print!("Elements of vector are :");
    v[1] = 60;
    for mut item in v {
        item += 20;
        print!("{} ", item);
    }
}

The one-eth element of v, which first was 30, is then changed to 60. Then we update all the items in v with the help of the loop, wherein we add 20 to each of them and print them out:

Elements of vector are :40 80 60 70 %

Looks ok… hang on, what’s that percent sign at the end? Where did that come from? It’s not in the vec, is it some “null terminator” thing like in some other languages that comes out in the printing?

Nope. It’s Rusts way of showing that this is the end of the printout when there is no line ending in the print statement. A single println!(""); after the loop fixes that:

fn main() {
    let mut v = vec![20, 30, 40, 50];
    print!("Elements of vector are :");
    v[1] = 60;
    for mut item in v {
        item += 20;
        print!("{} ", item);
    }
    println!("");
}

And now the printout looks like it should:

Elements of vector are :40 80 60 70

Sweet!

A warning about looping over vectors: if you try to read or write an element that does not exist, Rust will panic. So don’t do that.

Traversing a vector

The most common way to traverse the elements of a vector is with a for loop, using this syntax:

fn main() {
    let mut v = vec![20, 30, 40, 50];
    for mut item in v {
        item += 20;
        print!("{} ", item);
    }
    println!("");
}

You don’t need to use the brackets like you do in many other languages. Updating the items inside like this works perfectly fine.

Vector operations

There are a lot of methods defined for the Vec type, but not all work all the time. Some of them only work with certain types of content inside the vector. For instance the method dedup removes duplicate items. So it needs to compare items to see if they are equal. Sometimes the compiler complains when you create your own types, or try to use dedup. That is because the content in the vector needs to have a special trait, PartialEq.

The dedup method also needs a sorted vector. So let’s start with sorting. That requires the content type to have the trait Ord. That in turn requires three other traits: PartialOrd, Eq and PartialEq. Once we meet the requirements, sorting isn’t complicated:

fn main() {
    let mut v = vec![20, 50, 40, 30];
    print!("Elements of vector are: ");
    v[1] = 60;
    v.sort(); // can't be much easier than that
    for mut item in v {
        item += 20;
        print!("{} ", item);
    }
    println!("");
}

Note that the vector that you sort must be mutable, since sort will mutate the vector.

Now we can test dedup:

fn main() {
    let mut v = vec![20, 50, 40, 30, 50, 20, 50, 20, 40];
    print!("Elements of vector are: ");
    v[1] = 60;
    v.sort();
    v.dedup();
    for mut item in v {
        item += 20;
        print!("{} ", item);
    }
    println!("");
}

And this prints out:

Elements of vector are: 40 50 60 70 80

Now, this way of sorting works well unless you sort words. Some words may start with a lowercase letter, and some with an uppercase initial. In UTF-8, as well as in ASCII, all the uppercase letters come before the lowercase letters. Let’s say we have a vector like this: ["richard", "Peter", "charles"]. If we sort it in the same way we did in the previous example, it’ll come out like this: ["Peter", "charles", "richard"]. That’s probably not what you want.

If you want to sort without regard to capitalisation, you’re better of using sort_by().

It takes a function some_sorting_fuction() that describes the sorting criterion. Then it gives that function two items from the vector at a time to compare.

fn main() {
    let mut my_vector = vec!["richard", "Peter", "charles"];
    my_vector.sort_by(some_sorting_function(a,b));
    println!("{:?}", my_vector);
}

It wants the comparison function to deliver a result type of Ordering, so let’s take a look at what that is first.

An Ordering is a kind of enum. That’s short for enumeration, and we’ll be creating our own version af an enum later. This particular enum can hold one of three different values: Less, Equal, and Greater.

We get two items from the sort_by() and we call the first a and the second b. Our function needs to return an Ordering::Less, if a should come before b. It should return Ordering::Greater if b should come before a. If we return Ordering::Equal nothing changes, a and b keep their relative placing.

As luck would have it, there already exists such a comparison function that we can use. So we don’t need to create one ourselves. We can use cmp() on an item to compare it to another item, and it does have the return type Ordering.

So if we use it on a to compare it to b, we’ll get exactly what we want: a.cmp(&b). We only need to make sure that a and b both are either entirely lowercase or uppercase first. This is to make sure that case doesn’t matter.

You can use the method to_lowercase() to turn a borrowed string into a lowercase version for comparison. Then you use cmp() to actually do the comparison, like so:

fn main() {
    let mut v = vec!["richard", "Peter", "charles"];
    v.sort_by(|a, b|
    a.to_lowercase().cmp(&b.to_lowercase()));
    println!("{:?}", my_vector);
}

This prints out ["charles", "Peter", "richard"], just like we wanted.

How about combining vectors, is that possible?

Of course, and there are a couple of ways to do it. If you want to create a new vector, that has all the elements of two other vectors, you can use concat(), like this:

fn main() {
    let mut old1 = vec!["richard", "Peter", "charles"];
    let old2 = vec!["Maria", "Helena", "Rachel"];
    let new_one = [&old1[..], &old2[..]].concat();
    println!("{:?}", old1);
    println!("{:?}", new_one);
}

So what is happening on that third line? What’s up with that [..] after the vector names? It looks like indexing but without numbers, kinda strange, don’t you think? Well, it is indexing, and since start and end index are missing, it means all of the items in the vector.

Why the &-signs then? What are they doing there?

The method concat() isn’t defined for vectors, but it is for slices, even when the slices are slices of vectors instead of strings. To create a slice from a vector we put a & in front and a [] behind to show what part of the vector we want.

If we don’t want to create a new vector, but take an old vector and add a second to it, we can use extend() instead, like so:

fn main() {
    let mut my_vector = vec!["richard", "Peter", "charles"];
    let my_other_vector = vec!["Maria", "Helena", "Rachel"];
    my_vector.extend(&my_other_vector);
    println!("{:?}", my_vector);
    println!("{:?}", my_other_vector);
}

This prints out:

["richard", "Peter", "charles", "Maria", "Helena", "Rachel"]
["Maria", "Helena", "Rachel"]

But, if we remove the & inside the call to extend, we get something altogether different:

fn main() {
    let mut my_vector = vec!["richard", "Peter", "charles"];
    let my_other_vector = vec!["Maria", "Helena", "Rachel"];
    my_vector.extend(my_other_vector);
    println!("{:?}", my_vector);
    println!("{:?}", my_other_vector);
}

Now the printout looks like this:

error[E0382]: borrow of moved value: `my_other_vector`
  --> src/main.rs:19:22
   |
16 |     let my_other_vector = vec!["Maria", "Helena", "Rachel"];
   |         --------------- move occurs because `my_other_vector` has type `std::vec::Vec<&str>`, which does not implement the `Copy` trait
17 |     my_vector.extend(my_other_vector);
   |                      --------------- value moved here
18 |     println!("{:?}", my_vector);
19 |     println!("{:?}", my_other_vector);
   |                      ^^^^^^^^^^^^^^^ value borrowed here after move

error: aborting due to previous error

For more information about this error, try `rustc --explain E0382`.

Say what? What is this Copy trait and why does it matter that Vec doesn’t have it? And what does “move” mean in this context?

Let’s start with that last question, what does “move” mean? It means that the value that was bound to the variable my_other_vector gets moved into the value that is bound to my_vector. Then we reallocate a new version of my_vector to a new contiguous area on the heap. The old areas are marked as available for reuse. The old my_other_vector is dropped.

So, what about the Copy trait then? Well, it is a trait that will create a copy of a value, instead of moving it when you use an assignment. An example:

let mut x = 5;
let y = x;
x = 20;

In this example, y is still 5, as x was before the change. That is because the value that was bound to x is copied, not moved, because all the integer types have the trait Copy.

All the primitive types have the Copy trait. And so do a lot of other types. You can see this if you scroll down to the headline “Implementors” on this page in the documentation:

Documentation of the Copy Trait

There is a way to copy manually too, that we can use on types that don’t have Copy. We use the method clone() instead, if our type has that. Luckily it does, so we can use it:

fn main() {
    let mut my_vector = vec!["richard", "Peter", "charles"];
    let my_other_vector = vec!["Maria", "Helena", "Rachel"];
    my_vector.extend(my_other_vector.clone());
    println!("{:?}", my_vector);
    println!("{:?}", my_other_vector);
}

Once again, the printout is what we want:

["richard", "Peter", "charles", "Maria", "Helena", "Rachel"]
["Maria", "Helena", "Rachel"]

The repeat method repeats a vector a given number of times:

fn main() {
    let zeroes = [0].repeat(4);
    let numbers = [1,2,3].repeat(3);
    println!("{:?}", zeroes);
    println!("{:?}", numbers);
}

The first example repeats 0 four times. The second example repeats 1,2,3 three times, and we get this printout:

[0, 0, 0, 0]
[1, 2, 3, 1, 2, 3, 1, 2, 3]

Vector slices

The slice operator also works on vectors, as we’ve already tried in the example with concat() earlier.

If you omit the first index, the slice starts at the beginning. If you omit the second, the slice goes to the end. So if you omit both, the slice is a copy of the whole vector. That’s what we used in the concat() example.

If you have a mutable vector, it is often useful to make a copy before performing operations that change it.

You would think that a slice operator on the left side of an assignment could update many elements:

t[1..3] = ["x", "y"];

But, this doesn’t make sense even to me, and it certainly doesn’t to the compiler:

--> src/main.rs:26:15
   |
26 |     t[1..3] = ["x", "y"];
   |               ^^^^^^^^^^ expected slice `[&str]`, found array `[&str; 2]`

error[E0277]: the size for values of type `[&str]` cannot be known at compilation time
  --> src/main.rs:26:5
   |
26 |     t[1..3] = ["x", "y"];
   |     ^^^^^^^ doesn't have a size known at compile-time
   |
   = help: the trait `std::marker::Sized` is not implemented for `[&str]`
   = note: to learn more, visit <https://doc.rust-lang.org/book/ch19-04-advanced-types.html##dynamically-sized-types-and-the-sized-trait>
   = note: the left-hand-side of an assignment must have a statically known size

I mean, what are the types? On the left we have a slice of the vector, on the other we have… what? An array?

No, if we want to replace many elements of the vector, and we often do, we need to use another method, called splice(). We invoke the method on the vector we want to change. It takes two arguments: a range to specify what part of the vector that we want to replace. And an Iterator that we want to replace it with, and since Vec is an iterator, why not use a vector!

We don’t even need an assignment, a line like this will do:

    let mut t = vec!["a", "b", "c", "d"];
    t.splice(1..3, vec!["x", "y"]);

After the splice, t will be ["a", "x", "y", "d"] as expected.

Map, filter and fold

fold

To add up all the numbers in a list, you can use a loop like this:

fn main() {
    let mut total = 0;
    let numbers = vec![0,1,2,3];
    for number in numbers {
        total = total + number;
    }
    println!("The sum is: {}", total);
}

Here, we initialize total to 0. Each time through the loop, total gets one element from the list.

As the loop runs, total accumulates the sum of the elements. A variable used this way is sometimes called an accumulator.

Adding up the elements of an Iterator is such a common operation that Rust provides it as a built-in function, sum:

fn main() {
    let numbers = vec![0,1,2,3];
    let total: i32 = numbers.iter().sum();
    println!("The sum is: {}", total);
}

An operation like this that combines a sequence of elements into a single value is sometimes called reduce in other languages. In functional programming, and in Rust, we call it fold.

map

Sometimes you want to traverse one vector while building another. For example, the following function takes a vector of strings and returns a new list that contains capitalized strings:

fn capitalize_all(t: Vec<String>) -> Vec<String> {
    let mut res = vec![];
    let mut capitalized = vec![];
    for item in t {
        capitalized = vec![capitalize(item)];
        res.append(&mut capitalized);
    }
    return res;
}

fn capitalize(s: String) -> String {
    let mut c = s.chars();
    match c.next() {
        None => String::new(),
        Some(f) => f.to_uppercase().collect::<String>() + c.as_str(),
    }
}

Let’s first concentrate on the top function, capitalize_all and we’ll examine the capitalize later.

First, we initialize res with an empty vector; each time through the loop, we append the next element. So res is a kind of accumulator.

An operation like capitalize_all is sometimes called a map. It “maps” a function (in this case the method captalize) onto each of the elements in a sequence. There is in fact a method called map() on the trait Iterator that we could use instead:

fn capitalize_all(t: Vec<String>) -> Vec<String> {
    let mut res = vec![];
    let mut capitalized = vec![];
    t.iter().map(|item| {
        capitalized = vec![capitalize(item.to_string())];
        res.append(&mut capitalized);
    });
    return res;
}

This is easier to read, and that’s always important. You want to be able to understand your code later, when you come back to refactor.

The second function, capitalize uses a little trick: let mut c = s.chars(); creates an iterator called c.

fn capitalize(s: String) -> String {
    let mut c = s.chars();
    match c.next() {
        None => String::new(),
        Some(f) => f.to_uppercase().collect::<String>() + c.as_str(),
    }
}

Iterators have a method called next() that gets the next item in the object, in this case the next character. When we call next() on c one of two things will be true; there either is a letter or there isn’t.

The method next() returns an Option, which means we either get a None if there isn’t a letter, or a Some(<char>) if there is a letter.

To handle the Option we create a match clause, where we decide what to do with either case.

If there isn’t a letter, we get a None. That means we called the function with an empty string as argument. To return something of the correct return type, we create an empty String. Easy.

If there is a letter, we pick it up, and turn it into an uppercase letter. Then we stick it in a collection so we can use the + to concatenate it with the rest of the string, and then we return that. Not that difficult either.

filter

Another common operation is to select some of the elements from a vec and return a smaller vec. For example, the following function takes a vec of numbers, and puts all the positive in a new vec:

fn main() {
    let new_numbers = vec![-3, 0, 3, 2, -3];
    let res: Vec<i32> = new_numbers.iter()
        .filter(|x| x >= &&0)
        .cloned()
        .collect();
    println!("The positive: {:?}", res);
}

The Iterator type that iter() returns has the method filter() and it takes a function as an argument. The function should take an item from the iterator and return a bool.

All items in the iterator are sent, one by one to that function. If the function returns false, we discard the item. If the function returns true we include the item in the resulting vec.

The odd &&0 in the closure in the call to filter comes from the fact that the parameter x has the type &&i32. It is a borrow from a borrow. A bit strange, but that’s what happens when you combine iter() and filter like that.

We can express almost all common vector operations as a combination of map, filter and fold.

Deleting elements

There are several ways to delete elements from a vec. If you know the index of the element you want, you can use remove():

fn main() {
    let mut numbers = vec![10, 11, 12, 13, 14];
    let one_of_them = numbers.remove(3);
    println!(
        "The removed item was: {}, 'numbers' is now {:?}",
        one_of_them, numbers);
}

This prints out:

The removed item was: 13, 'numbers' is now [10, 11, 12, 14]

remove() modifies the vector and returns the removed element. If you don’t provide an index, Rust panics. If you provide an index that is too high, Rust panics. Making Rust panic is generally considered to be bad.

If you know the element you want to remove (but not the index), you can use retain(). The retain() is similar to filter() with the exception that filter() requires an Iterator object to operate on. The method retain() is implemented directly on vec itself. To keep the even values in the vec below we use retain with a function that checks to see which items are divisible by 2:

let mut vec = vec![1, 2, 3, 4];
vec.retain(|&x| x % 2 == 0);
println!("{:?}", vec);

To remove several elements in sequence, you can use drain() with a slice index:

let mut v = vec![1, 2, 3];
v.drain(1..);

This removes the last two items and v is now [1].

As usual, the slice selects all the elements up to but not including the second index. If you want to move the drained items into another vector, you can assign them to it directly:

let mut v = vec![1, 2, 3];
let u: Vec<_> = v.drain(1..).collect();

Now v is [1], and u is [2,3].

Vectors and strings

A string is a sequence of characters and a vector is a sequence of values. But a vector of characters is not the same as a string. To convert from a string to a vector of characters, you can use chars(). This turns it into a Chars which is an iterator over char, that you can collect into a vector of char:

fn main() {
    let my_string = String::from("spam");
    let my_vector: Vec<char> = my_string.chars().collect();
    println!("{:?}", my_vector);
}

This prints out:

['s', 'p', 'a', 'm']

The chars() function breaks a string into a Chars iterator with individual letters. If you want to break a string into words, you can use the split() method:

let v: Vec<&str> = "The quick brown fox jumped over the lazy dog".split(' ').collect();

An argument called a delimiter specifies which characters to use as word boundaries. You can use anyone that works for the string you are splitting. But if you want to split on spaces, tabs or any other whitespace, you’re better off using split_whitespace. Or its sibling split_ascii_whitespace:

let type_test = String::from("The quick brown fox jumped over the lazy dog");
let type_test_vector: Vec<&str> = type_test
    .split_whitespace()
    .collect();
println!("{:?}", type_test_vector);

This will print out ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"].

The method split() has a return type of Split, which is an iterator over the split items. Using the collect() method on it reassembles the iterator into the type we specify on the left side of the assignment, Vec<&str>.

join() is the inverse of split(). It takes a vector of strings and concatenates the elements. join is a vector method, so you have to invoke it on the vector and pass the delimiter as a parameter:

let new_type_test = type_test_vector.join(" ");
println!("{}", new_type_test);

And this prints out the original sentence: The quick brown fox jumped over the lazy dog.

In this case the delimiter is a space character, so join puts a space between words. To concatenate into a string without spaces, you can use the empty string, "" as a delimiter.

Vectors as arguments

When you pass a vector to a function, the function gets a reference to the vector. If the function modifies the vector, the caller sees the change.

Don’t do that.

If you want a function to help you change the contents of a vector, write a function that creates and returns a new vector with the desired content.

Always leave the original vector unmodified. Always. The same goes for all other types you use as parameters: always create a copy in the function, modify the copy and return the modified copy. Leave originals alone!

Modifying things that exist outside a function is called a side effect and it is a bad thing, except when it is absolutely necessary, as in outputs and UI. Side effects often lead to nasty bugs that can be difficult to discover and stamp out.

Debugging

Careless use of vectors (and other mutable objects) can lead to long hours of debugging. Here are some common pitfalls and ways to avoid them:

Some vec methods change the argument and return a single item.

This is the opposite of the string methods, which return a new string and leave the original alone. Before using methods on vec, you should read the documentation carefully and then test them in interactive mode.

Pick an idiom and stick with it.

Part of the problem with vectors is that there are many ways to do the same thing. For example, to remove elements from a list, you can use retain(), remove(), drain() or even iter().filter().

To add elements, you can use the extend() method or concat().

Try to stick to one method of adding and one for removing in each program or at least in each function if you can. The code gets easier to read that way.

Make copies to avoid aliasing.

If you want to use a method like sort() that modifies the argument, but you need to keep the original as well, make a copy. The method clone() will serve you well for that purpose.

Learning Rust Series, part 10: Vectors

Vectors

A vector is NOT an Iterator

Vectors can mutate

Traversing a vector

Vector operations

Vector slices

Map, filter and fold

fold

map

filter

Deleting elements

Vectors and strings

Vectors as arguments

Debugging