Ownership, borrowing, mutability, heap and stack in Rust (2h on computers)
Pierre Cochard, Tanguy Risset
- Move semantics and Copy semantics
- References and borrowing
- Mutable Reference
- Heap and Stack: the String example
- Smart Pointers
- Rust Lifetimes
- Recalls on Heap and Stack {#appStack}
Course: Ownership (from https://doc.rust-lang.org/book/)
Ownership is a set of rules that govern how a Rust program manages memory. All programs have to manage the way they use a computer's memory while running. Some languages have garbage collection that regularly looks for no-longer-used memory as the program runs; in other languages, the programmer must explicitly allocate and free the memory.
Rust uses a third approach: memory is managed through a system of ownership with a set of rules that the compiler checks.
If any of the rules are violated, the program won't compile. None of the features of ownership will slow down your program while it's running.
Because ownership is a new concept for many programmers, it does take some time to get used to. When you understand ownership, you'll have a solid foundation for understanding the features that make Rust unique.
Here are the Ownership Rules:
- Each value in Rust has an owner.
- There can only be one owner at a time.
- When the owner goes out of scope, the value will be dropped.
The scope notion is (for the moment) the same as in traditional languages such as C.
Move semantics and Copy semantics
As we have seen in previous course, the following program will compile because:
-
default semantics of assignement is for type
Cubeis move. -
But the derivation of the
Copytrait turns it into a copy semantics, hencexandyrepresent two different values.
#[derive(Debug, Clone, Copy)] struct Cube { c: f32, } fn main() { let x = [Cube{c:0.5},Cube{c:0.75},Cube{c:1.0}]; let y = x; println!("x is: {:?}", x); println!("y is: {:?}", y); }
Try this program: fn main() { let x = [(10,20),(30,40),(50,60)]; let y = x; println!("x is: {:?}", x); println!("y is: {:?}", y); }Why does it work?
Correction
In Rust, we have Copy Semantics for primitive types like numbers, Boolean, tuples and arrays. Here x is an array of pairs of numbers hence it has value semantics
References and borrowing
There is an alternative to moving or duplicating (i.e. cloning) a
value: you can borrow it. Borrowing in Rust is done with the
reference operator: ’&’.
in the original Cube program which has not derived the Copytrait, create a reference toxby usinglet y = &x. Can you printxafter that?
Correction
#[derive(Debug)] struct Cube { c: f32, } fn main() { let mut x = Cube { c: 0.75 }; let y = &x; println!("My cube size is: {}", x.c); println!("My cube is: {:?}", y); }
Yes we can print x and y because both are referencing the same
object that is x. The cube is still owned by x
Course: Reference in Rust
References in Rust are equivalent to references in any language: a pointer to the same content, except that, because of the strong static verifications performed by the compiler, a reference is always guaranteed to point to a valid value of a particular type for the life of that reference1.
References are indicated by the ’&’ operator. As in C, the opposite of
referencing is dereferencing, which is accomplished with the dereference
operator: ’*’. However, in practice, the ’*’ operator can be
omitted; this is called deref coercion or autoderef (it is
implemented in a trait Deref that is implemented for all references).
This autoderef is implemented in almost all cases, except when you assign a value to a dereferenced mutable reference:
#![allow(unused)] fn main() { let mut x = 10; let y = &mut x; *y = 20; // explicit dereferencing is required here }
Borrowing is extremely useful in function calls. Each time you call a function with a parameter, the ownership of the object passed as a parameter is transferred to the function (actually, it is transferred to the formal parameter of the function). If, instead, you pass a reference to the object, the ownership does not change, so you can call many functions that only use an object without modifying it by using references.
Mutable Reference
Sometimes, you wish to have a function call that modifies an object.
For that, you can use a mutable reference with the syntax:
let y = &mut x. Mutable references in Rust do not change ownership.
They only provide exclusive access to a value for mutation while
ensuring that the ownership of the value remains unchanged.
By using a mutable reference to x ( let y = &mut x), write a function calleddoublethat double the size of your cubex.
Correction
#[derive(Debug)] struct Cube { c: f32, } fn double(y : &mut Cube) { y.c = 2.*y.c; } fn main() { let mut x = Cube { c: 0.75 }; let y = &mut x; double(y); println!("My cube is: {:?}", x); }
Course: Mutable reference
Ownership in Rust means having full control over a value (here a value is to be understand as L-value, i.e. a value which is stored in a memory box). The owner is responsible for managing the value's lifetime (we will talk later about lifetimes) and cleaning up its resources when it goes out of scope. Ownership can be transferred (moved) but is unique at any given time (except in very special cases that we will see).
Borrowing (via references, either &T or &mut T) allows you to
access a value without transferring ownership. Immutable borrow (&T):
Grants read-only access to a value. Mutable borrow (& mut T): Grants
exclusive, write-access to a value.
Rules of Mutable References:
-
You can only have one mutable reference to a value at a time.
-
While a mutable reference exists, no other references (mutable or immutable) to the same value are allowed.
It is important to understand that the Rust compiler evaluate very precisely the scope of variable.
In the two codes below, only one of them is correct, which one and why?
#[derive(Debug)]
struct Cube {
c: f32,
}
fn double(y : &mut Cube) {
y.c = 2.*y.c;
}
fn main() {
let mut x = Cube { c: 0.75 };
let y = &mut x;
double(y);
println!("My cube is: {:?}", x);
println!("My cube is: {:?}", y);
}
#[derive(Debug)]
struct Cube {
c: f32,
}
fn double(y : &mut Cube) {
y.c = 2.*y.c;
}
fn main() {
let mut x = Cube { c: 0.75 };
let y = &mut x;
double(y);
println!("My cube is: {:?}", y);
println!("My cube is: {:?}", x);
}
Correction
Only the code of the right side is correct: on the left side, the scope
of y include the use of x in the println, which means access to
the value of x while it is still borrowed by y. On the right side,
the scope of y does not extend after it is printed (it is not used
anymore), hence x can be used again.
Heap and Stack: the String example
Many programming languages don't require you to think about the stack and the heap very often. But in a systems programming language like Rust, whether a value is on the stack or the heap affects how the language behaves and why you have to make certain decisions.
Section 6 recalls the basics that everyone should know about the heap and the stack; please read it if you are not very familiar with these concepts.
The following code manipulates a string that contains hello:
#![allow(unused)] fn main() { let s1 = String::from("hello"); let s2 = s1; }
As you know, if s1 were set to an integer (say 5), then s2 would
have been set to a copy of 5, because int32 has copy semantics by
default. But here, s1 is assigned to a String. We will study strings
in more detail later, but this is a good example to understand the
difference between the heap and the stack.
A String is made up of three parts, shown in the left figure 2 (taken from the Rust book): a pointer to the memory that holds the contents of the string, a length, and a capacity. This group of data is stored on the stack. On the right is the memory on the heap that holds the contents. The reason for this is that a string might contain an arbitrarily long character string, but the size used to store the structural information (i.e., pointer, length, and capacity) does not change from one string to another; it is known statically.
(a) (b)
(a) Representation in memory of a String holding the value "hello" bound to s1.
(b) Representation in memory of the variable s2 that has a copy of the pointer, length, and capacity of s1
When we assign s1 to s2, the String data is copied, meaning we copy
the pointer, the length, and the capacity that are on the stack. We do
not copy the data on the heap that the pointer refers to. In other
words, the data representation in memory looks the right of like
Figure above.
Note that the effective content of the string (i.e. the hello
characters) is not duplicated, moreover it cannot be reached anymore
with s1 string have move semantics so s1 is moved to s2 (data is
now owned by s2)2.
Write a function fn append_world(s: & mut String)which appends " world" to string s. Call it giving a mutable reference tos2. you can use the functionpub fn push_str(&mut self, string: &str)
Correction
fn append_world(s:& mut String){ s.push_str(" world!") } fn main() { let s1 = String::from("hello"); let mut s2 = s1; append_world(&mut s2); println!("my string: {:?}", s2); }
Smart Pointers
Smart pointers are inherited from other language such as C++. Smart
pointers are data structures that act like a pointer but also have
additional metadata and capabilities. Rust has a variety of smart
pointers defined in the standard library that provide functionality
beyond that provided by references. To explore the general concept,
we'll look at a couple of different examples of smart pointers,
including a reference counting smart pointer type (Rc) and a unique
pointer on the heap (Box).
The most straightforward smart pointer is a Box, whose type is
written Box<T>. Boxes allow you to store data on the heap rather than
the stack with a Unique pointer. What remains on the stack is the
pointer to the heap data. This is usefull for instance to create
recursive type.
Create a type Listbased on the following structure: a list is either (the "either" correspond to anenum) the constantNilor the concatenation of an integer and aList:Cons(i32, List). Try without usingBoxthen usingBox.
You will have to declare the use of the created symbols after the definition of List by writing: use crate::List::{Cons,Nil};
Correction
Si on écrit ça:
enum List{ Cons(i32,List), Nil, } use crate::List::{Cons,Nil}; fn main() { let l1 = Cons(4,Cons(3,Nil)); }
Compilation error is:
error[E0072]: recursive type `List` has infinite size
--> src/main.rs:1:1
|
1 | enum List{
| ^^^^^^^^^
2 | Cons(i32,List),
| ---- recursive without indirection
But this works:
#[derive(Debug)] enum List{ Cons(i32,Box<List>), Nil, } use crate::List::{Cons,Nil}; fn main() { let l1 = Cons(4,Box::new(Cons(3,Box::new(Nil)))); println!("L1 = {:?}",l1); }
In the majority of cases, ownership is clear: you know exactly which variable owns a given value. However, there are applications where a single value might have multiple "owners". For example, in graph data structures, multiple edges might point to the same node, and that node is conceptually owned by all of the edges that point to it. A node shouldn't be cleaned up unless it doesn't have any edge pointing to it and so has no owners.
You have to enable multiple ownership explicitly by using the Rust type
Rc<T>, which is an abbreviation for reference counting. We use
the Rc<T> type when we want to allocate some data on the heap for
multiple parts of our program to read and we can't determine at compile
time which part will finish using the data last. Note that Rc<T> is
only for use in single-threaded scenario, other constructs are used in
multithreaded programs.
Consider the scheme below where a list ( a) is shared by two other lists (bandc).Write a program that creates this object by using
Rc<T>instead ofBox<T>. For that you will have to:
use std::rc::Rc;- Provide 2 clones (for
bandc) of references toa:Rc::clone(&a)
Correction
#[derive (Debug)] enum List { Cons(i32,Rc<List>), Nil, } use crate::List::{Cons,Nil}; use std::rc::Rc; fn main() { let a=Rc::new(Cons(5,Rc::new(Cons(10,Rc::new(Nil))))); let b = Cons(3,Rc::clone(&a)); let c = Cons(3,Rc::clone(&a)); println!("a={:?}",a); println!("b={:?}",b); println!("c={:?}",c); drop(a); //b and c still exist println!("b={:?}",b); println!("c={:?}",c); drop(b); drop(c); }
Rust Lifetimes
Rust’s memory model aims to guarantee safety (no dangling pointers, no data races) without a garbage collector. To achieve this, Rust enforces a system of ownership, borrowing, and finally, the essential but sometimes confusing concept of lifetimes.
Lifetimes don’t control allocation or deallocation. Instead, they allow the compiler to reason about the validity of references. Their purpose is to ensure that no reference outlives the data it points to.
Why Rust Needs Lifetimes
Consider a function that returns a reference:
#![allow(unused)] fn main() { fn get_ref<'a>(s: &'a String) -> &'a str { &s[..] } }
Rust must ensure that:
- the returned reference is valid, and
- it never points to data that might be freed or moved before it is used.
Rust cannot always infer the relationships between the lifetimes of multiple references. When inference is too ambiguous, it requires explicit lifetime annotations.
A lifetime is therefore a static constraint, used at compile time, to avoid errors like dangling references.
Course: Lifetimes
A lifetime indicates how long a reference must remain valid. It is typically written as 'a, 'b, etc.
Example:
#![allow(unused)] fn main() { fn demo<'a>(x: &'a i32) { /* ... */ } }
This means:
- The reference
xmust stay valid for the entire duration of lifetime'a.
Lifetimes correspond to regions of code. Importantly, they are not stored at runtime. They exist only for the compiler’s static analysis.
Rust can often infer lifetimes automatically. Three lifetime elision rules apply to function signatures, which cover most cases. For instance:
#![allow(unused)] fn main() { fn len(s: &str) -> usize { s.len() } }
We do not annotate anything, but Rust implicitly understands:
- the input reference has a lifetime
'a, and - the return value does not depend on the input’s lifetime.
Rust only requires explicit lifetimes when there are multiple input references and the function returns one of them.
Why is the following Rust code invalid? Try to correct it.
#![allow(unused)] fn main() { fn longest(x: &str, y: &str) -> &str { if x.len() > y.len() { x } else { y } } }
Correction
Rust rejects this function. It cannot determine whether `x` or `y` lives longer, and returning one of them is ambiguous.The correct version is:
#![allow(unused)] fn main() { fn longest<'a>(x: &'a str, y: &'a str) -> &'a str { if x.len() > y.len() { x } else { y } } }
- The returned reference cannot outlive either
xory. - Its lifetime is the minimum of the two input lifetimes.
With this annotation, the compiler can verify the function’s safety.
Lifetimes in Structs
A struct containing references must almost always define lifetimes:
#![allow(unused)] fn main() { struct Holder<'a> { value: &'a str, } }
This means:
- A
Holderinstance cannot outlive the value it references.
This is essential for safety: if a struct contains a reference, Rust ensures that the struct is destroyed before the data it refers to.
Lifetimes in Methods: self and Lifetime Propagation
In method implementations, lifetimes apply to self:
#![allow(unused)] fn main() { impl<'a> Holder<'a> { fn get(&self) -> &str { self.value } } }
Rust infers that the returned reference has the same lifetime as &self, which is 'a.
No explicit annotation is needed because of elision rules.
The most famous lifetime is 'static. It indicates that the data:
- is available for the entire program (e.g., string literals), or
- is stored in a location that will never be freed prematurely.
Recalls on Heap and Stack
Although knowing the exact memory management is generaly not necessary to a programmer, in many case (system programming or embedded programming for instance, often done in Rust), it is crucial to understand how memory is handle by the compiler/OS. From the programmer point of view, and thanks to virtual memory system, everything happens as if we had all the memory available.
The memory management is more or less the same for every language and
system, what differ is what is visible for the programmer: explicit
memory management (malloc/free) or garbage collecting etc. This memory
is organized in different section, almost allways in the following way:
The "code" section contains the assemble code of the program. The "static" section contains all the "static" variables (i.e. variables that are available during the whole execution of the program). The two other section are managed dynamically during execution:
-
The heap is used for dynamic memory allocation:
malloc(in C) ornew(in object languages). The object stored in the heap have a lifetime that is independent of function execution, they can survive after the function that created them has finished. The heap can be managed explicitely (as is C withmallocandfree) or implicitely (using a garbage collector as in Python for instance). -
The stack is used to manage the execution of functions (or procedures in general) which includes in particuly the allocation and management of functions local variables.
The stack start from big adresses and grows downward, although it is often represented upside-down as below: small adresses up, big addresses down. The heap grows upward, when the two bounds meet, the system is out of memory.
The stack execution principle is important to know. when a function is called, a space is allocated on the stack to store its local variables: this space is called the function frame. When the function ends, its frame is freed and the stack goes back to the frame of the calling function.
Below is an illustration of the evolution of the stack during a function call, two registers of the processor are indicated: the stack pointer (SP) that indicate the top of the stack and the frame pointer that indicate the beginning of the frame of the current fonction. The frame contains all the information needed to the execution of the function, including room for local variables.
    
    
(a) before call (b) during call (c) after call
-
before the call, the frame pointer FP points to the frame of the calling function
-
during the call, the stack is increase (i.e. SP is decreased as the stack is upside-down) to have room for the frame of the called function. This includes room for local variable of the function, parameter given to the function and information for returning from the function (return address in the code because a given function can be called from many places in the code), room for the function result as well as some bookeeping information such as saved values of the processor registers.
-
after the call, the called fonction frame has disappeared. Actually its content is still there but cannot be accessed anymore because the stack pointer SP has been put back to its location before the call
Important to remember: The function variables whose size are known
at compile time are usually stored in the stack. The variable whose
size are know during execution, such as String or object created by
new are usually stored on the heap.
It is a major difference between Rust and other languages: there are no "null pointers", interestingly enough, the decision of authorizing Null pointer was taken by Tony Hoare place during the 60's, it is known as his "billion dollar mistake": https://news.ycombinator.com/item?id=12427069
It is important to know that the pointer used in a String has the
"Unique<T>" type, which forbid the object pointed by this pointer
to have two Owner at the same time. Hence the String type cannot
have a copy semantics