Course 2: Rust Language

Ownership, borrowing, mutability, heap and stack in Rust (2h on computers)

Pierre Cochard, Tanguy Risset

Move semantics and Copy semantics
References and borrowing
- course: Reference in Rust
Mutable Reference
- course: Mutable reference
Heap and Stack: the String example
Smart Pointers
Recalls on Heap and Stack {#appStack}

course: Ownership (from https://doc.rust-lang.org/book/)

Ownership is a set of rules that govern how a Rust program manages memory. All programs have to manage the way they use a computer's memory while running. Some languages have garbage collection that regularly looks for no-longer-used memory as the program runs; in other languages, the programmer must explicitly allocate and free the memory.

Rust uses a third approach: memory is managed through a system of ownership with a set of rules that the compiler checks.

If any of the rules are violated, the program won't compile. None of the features of ownership will slow down your program while it's running.

Because ownership is a new concept for many programmers, it does take some time to get used to. When you understand ownership, you'll have a solid foundation for understanding the features that make Rust unique.

Here are the Ownership Rules:

Each value in Rust has an owner.
There can only be one owner at a time.
When the owner goes out of scope, the value will be dropped.

The scope notion is (for the moment) the same as in traditional languages such as C.

Move semantics and Copy semantics

As we have seen in previous course, the following program will compile because:

default semantics of assignement is for type Cube is move.
But the derivation of the Copy trait turns it into a copy semantics, hence x and y represent two different values.

    #[derive(Debug, Clone, Copy)]
    struct Cube {
        c: f32,
    }

    fn main() {
        let x = [Cube{c:0.5},Cube{c:0.75},Cube{c:1.0}];
        let y = x;
        println!("x is: {:?}", x);
        println!("y is: {:?}", y);
    }

Try this program:

    fn main() {
        let x = [(10,20),(30,40),(50,60)];
        let y = x;
        println!("x is: {:?}", x);
        println!("y is: {:?}", y);
    }

Why does it work?

References and borrowing

There is an alternative to moving or dupplicating (i.e. cloning) a value: you can borrow it. Borowing in Rust is done with the reference operator: ’&’.

in the original Cube program which has not derive the Copy trait, create a reference to x by using let y = & x. Can you print x after that?

course: Reference in Rust

References in Rust are equivalent to references in any language: a pointer to the same content, except that, because of the strong static verifications performed by the compiler, a reference is always guaranteed to point to a valid value of a particular type for the life of that reference¹.

References are indicated by the ’&’ operator. As in C, the opposite of referencing is dereferencing, which is accomplished with the dereference operator: ’*’. However, in practice, the ’&’ operator can be omitted; this is called deref coercion or autoderef (it is implemented in a trait Deref that is implemented for all references).

This autoderef is implemented in almost all cases, except when you assign a value to a dereferenced mutable reference:

    let mut x = 10;
    let y = &mut x;

    *y = 20; //explicit dereferencing is required here

Borrowing is extremely useful in function calls. Each time you call a function with a parameter, the ownership of the object passed as a parameter is transferred to the function (actually, it is transferred to the formal parameter of the function). If, instead, you pass a reference to the object, the ownership does not change, so you can call many functions that only use an object without modifying it by using references.

Mutable Reference

Sometimes, you wish to have a function call that modifies an object. For that, you can use a mutable reference with the syntax: let y = &mut x. Mutable references in Rust do not change ownership. They only provide exclusive access to a value for mutation while ensuring that the ownership of the value remains unchanged.

By using a mutable reference to x (let y = &mut x), write a function called double that double the size of your cube x.

course: Mutable reference

Ownership in Rust means having full control over a value (here a value is to be understand as L-value, i.e. a value which is stored in a memory box). The owner is responsible for managing the value's lifetime (we will talk later about lifetimes) and cleaning up its resources when it goes out of scope. Ownership can be transferred (moved) but is unique at any given time (except in very special cases that we will see).

Borrowing (via references, either &T or &mut T) allows you to access a value without transferring ownership. Immutable borrow (&T): Grants read-only access to a value. Mutable borrow (& mut T): Grants exclusive, write-access to a value.

Rules of Mutable References:

You can only have one mutable reference to a value at a time.
While a mutable reference exists, no other references (mutable or immutable) to the same value are allowed.

It is important to understand that the Rust compiler evaluate very precisely the scope of variable.

In the two codes below, only one of them is correct, which one and why?

    #[derive(Debug)]
    struct Cube {
        c: f32,
    }

    fn double(y : &mut Cube) {
        y.c = 2.*y.c;
     }
     

    fn main() {
        let mut x = Cube { c: 0.75 };
        let y = &mut x;
        double(y);
        println!("My cube is: {:?}", x);
        println!("My cube is: {:?}", y);    
    }


    #[derive(Debug)]
    struct Cube {
        c: f32,
    }

    fn double(y : &mut Cube) {
        y.c = 2.*y.c;
     }
     

    fn main() {
        let mut x = Cube { c: 0.75 };
        let y = &mut x;
        double(y);
        println!("My cube is: {:?}", y);
        println!("My cube is: {:?}", x);    
    }

Heap and Stack: the String example

Many programming languages don't require you to think about the stack and the heap very often. But in a systems programming language like Rust, whether a value is on the stack or the heap affects how the language behaves and why you have to make certain decisions.

Section 6 recalls the basics that everyone should know about the heap and the stack; please read it if you are not very familiar with these concepts.

The following code manipulates a string that contains hello:

       let s1 = String::from("hello"); 
       let s2 = s1;

As you know, if s1 were set to an integer (say 5), then s2 would have been set to a copy of 5, because int32 has copy semantics by default. But here, s1 is assigned to a String. We will study strings in more detail later, but this is a good example to understand the difference between the heap and the stack.

A String is made up of three parts, shown in the left figure 2{reference-type="ref" reference="trpl04-01"} (taken from the Rust book): a pointer to the memory that holds the contents of the string, a length, and a capacity. This group of data is stored on the stack. On the right is the memory on the heap that holds the contents. The reason for this is that a string might contain an arbitrarily long character string, but the size used to store the structural information (i.e., pointer, length, and capacity) does not change from one string to another; it is known statically.

(a) Representation in memory of a String holding the value "hello" bound to `s1`. (b) Representation in memory of the variable `s2` that has a copy of the pointer, length, and capacity of `s1`

When we assign s1 to s2, the String data is copied, meaning we copy the pointer, the length, and the capacity that are on the stack. We do not copy the data on the heap that the pointer refers to. In other words, the data representation in memory looks the right of like Figure above.

Note that the effective content of the string (i.e. the 'hello' characters) is not duplicated, moreover it cannot be reached anymore with s1 string have move semantics so s1 is moved to s2 (data is now owned by s2)².

write a function fn append_word(s: & mut String), call it giving a mutable reference to s2. you can use the function pub fn push_str(&mut self, string: &str)

Smart Pointers

Smart pointers are inherited from other language such as C++. Smart pointers are data structures that act like a pointer but also have additional metadata and capabilities. Rust has a variety of smart pointers defined in the standard library that provide functionality beyond that provided by references. To explore the general concept, we'll look at a couple of different examples of smart pointers, including a reference counting smart pointer type (Rc) and a unique pointer on the heap (Box).

The most straightforward smart pointer is a Box, whose type is written Box<T>. Boxes allow you to store data on the heap rather than the stack with a Unique pointer. What remains on the stack is the pointer to the heap data. This is usefull for instance to create recursive type.

Create a type List based on the following structure: a list is either (the "either" correspond to an enum) the constant Nil or the concatenation of an integer and a List: Cons(i32, List). Try without using Box then using Box.

You will have to declare the use of the created symbols after the definition of List by writing: use crate::List::{Cons,Nil};

In the majority of cases, ownership is clear: you know exactly which variable owns a given value. However, there are application when a single value might have multiple "owners". For example, in graph data structures, multiple edges might point to the same node, and that node is conceptually owned by all of the edges that point to it. A node shouldn't be cleaned up unless it doesn't have any edges pointing to it and so has no owners.

You have to enable multiple ownership explicitly by using the Rust type Rc<T>, which is an abbreviation for reference counting. We use the Rc<T> type when we want to allocate some data on the heap for multiple parts of our program to read and we can't determine at compile time which part will finish using the data last. Note that Rc<T> is only for use in single-threaded scenario, other constructs are used in multithreaded programs.

consider the scheme below where a list (a) is shared by two other lists (b and c). Write a program that creates this object by using Rc<T> instead of Box<T> in the List definition. (As Rc is not in the prelude, you have to use use std::rc::Rc;)

Recalls on Heap and Stack

Although knowing the exact memory management is generaly not necessary to a programmer, in many case (system programming or embedded programming for instance, often done in Rust), it is crucial to understand how memory is handle by the compiler/OS. From the programmer point of view, and thanks to virtual memory system, everything happens as if we had all the memory available.

The memory management is more or less the same for every language and system, what differ is what is visible for the programmer: explicit memory management (malloc/free) or garbage collecting etc. This memory is organized in different section, almost allways in the following way:

(a) Representation of the memory as seen by the programmer

The "code" section contains the assemble code of the program. The "static" section contains all the "static" variables (i.e. variables that are available during the whole execution of the program). The two other section are managed dynamically during execution:

The heap is used for dynamic memory allocation: malloc (in C) or new (in object languages). The object stored in the heap have a lifetime that is independent of function execution, they can survive after the function that created them has finished. The heap can be managed explicitely (as is C with malloc and free) or implicitely (using a garbage collector as in Python for instance).
The stack is used to manage the execution of functions (or procedures in general) which includes in particuly the allocation and management of functions local variables.

The stack start from big adresses and grows downward, although it is often represented upside-down as below: small adresses up, big addresses down. The heap grows upward, when the two bounds meet, the system is out of memory.

The stack execution principle is important to know. when a function is called, a space is allocated on the stack to store its local variables: this space is called the function frame. When the function ends, its frame is freed and the stack goes back to the frame of the calling function.

Below is an illustration of the evolution of the stack during a function call, two registers of the processor are indicated: the stack pointer (SP) that indicate the top of the stack and the frame pointer that indicate the beginning of the frame of the current fonction. The frame contains all the information needed to the execution of the function, including room for local variables.

before the call, the frame pointer FP points to the frame of the calling function
during the call, the stack is increase (i.e. SP is decreased as the stack is upside-down) to have room for the frame of the called function. This includes room for local variable of the function, parameter given to the function and information for returning from the function (return address in the code because a given function can be called from many places in the code), room for the function result as well as some bookeeping information such as saved values of the processor registers.
after the call, the called fonction frame has disappeared. Actually its content is still there but cannot be accessed anymore because the stack pointer SP has been put back to its location before the call

Important to remember: The function variables whose size are known at compile time are usually stored in the stack. The variable whose size are know during execution, such as String or object created by new are usually stored on the heap.

It is a major difference between Rust and other languages: there are no "null pointers", interestingly enough, the decision of authorizing Null pointer was taken by Tony Hoare place during the 60's, it is known as his "billion dollar mistake": https://news.ycombinator.com/item?id=12427069

It is important to know that the pointer used in a String has the "Unique<T>" type, which forbid the object pointed by this pointer to have two Owner at the same time. Hence the String type cannot have a copy semantics

5TC-Rust

course: Ownership (from https://doc.rust-lang.org/book/)