References and Pointers

Often when working with large data types, passing them by value can be time-consuming due to the need to copy a large amount of data. In such cases, instead of copying the data, you can pass a link to it. This approach is called passing by reference.

Passing by mutable reference also allows you to change the data referenced by the reference.

There are two ways to pass by reference in Spawn: references and pointers. Regular code almost always uses references because they are safe and easy to use. Pointers are used primarily for working with external code (for example, with C libraries), as well as for writing high-performance low-level, but not always safe, code.

References

References are a special data type that stores an address in memory where a value is stored, rather than the value itself.

Create a references.sp file for the code in this article.

Consider the following code:

a := 42

Here 42 is the data that is stored in the variable a. Using the & operator we can get the address in memory where the variable a is stored:

a := 42 b := &a

Now b is a reference to a. In fact, b now stores the address in memory where a is stored. Knowing the address, we can get the value located at this address using the * dereference operator:

a := 42 b := &a //code::start c := *b println(c) // 42 //code::end

Mutable references

n the example above, b is a reference to a, and by default all references are immutable. This means that we cannot change the value referenced by b. To make reference mutable, use the &mut operator:

a := 42 b := &mut a

If you try to compile this code, the compiler will throw an error:

error(E0144): cannot take mutable reference to immutable variable `a`
 --> references.sp:133:10:14
     |
 133 |     b := &mut a
     |          ^^^^ - this variable is immutable

help: consider changing variable `a` to be mutable
 --> references.sp:132:5:5
     |
 132 |     mut a := 42
     |     +++

In the code above, we tried to take a mutable reference to the immutable variable a, which is not allowed. To fix the error, you need to make the a variable mutable:

<highlight>mut</highlight> a := 42 b := &mut a

Now, using the * operator we already know, we can change the value of the a variable:

mut a := 42 b := &mut a //code::start *b = 43 println(a) // 43 //code::end

To write the value referenced by b into memory, we use the * operator, which in this case is to the left of the = assignment operator. Let's take this line of code piece by piece:

* b = 43 // <- the value we write // ^ ^ ^ since dereference is to the left of the assignment operator, // | | | this means that we write the value 43 to the memory referenced by `b` // | | // | dereference the address stored in variable `b` // | // dereference operator

If you are familiar with pointers from languages like C or C++, the code shown above should be familiar to you. However, references in Spawn, due to their safety and the absence of pointer arithmetic (which we will talk about a little later), allow you to write this line more simply:

b = 43

Here we are not manually dereferencing the reference b, the compiler knows that 43 or some other number is not a valid memory address, and since the value type referenced by b is also i32, like 43, the compiler automatically dereferences the reference b and writes value 43 to memory,

When assigning one reference to another, the link on the left is not dereferenced, since we are directly assigning the address from one reference to another:

mut a := 42 mut b := &mut a c := &mut a b = c // now b and c point to the same memory address

Let's practice! Create a function inc, which will increase the value of a variable by 1:

fn inc(a &mut i32) { a++ } fn main() { mut a := 42 inc(&mut a) println(a) // 43 }

Since we want to change the value, we must pass a mutable reference to the function. To do this, we use the &mut operator when calling the inc function. Now, inside the inc function, we can change the value of the variable a as if it were a variable that simply stores a number.

As a result, after calling the inc function, the value of the a variable will increase by 1 and the number 43 will be output.

Reference safety

If you come from languages like C or C++, then you know that working with pointers can be dangerous, from null or invalid pointer dereferences after arithmetic operations to dangling pointers and memory leaks.

In Spawn, references are safe by default. A reference cannot store a null address, pointer arithmetic is prohibited, and you cannot get a reference except through the & or &mut operators. Thanks to escape analysis, a technique that analyzes whether a reference will live longer than the object it refers to, the compiler can ensure that the reference is not dangling.

Garbage Collector will ensure that there are no memory leaks.

Let's look at an example with a dangling pointer, which is so easy to get in C:

int *get_int() { int a = 42; return &a; } int main() { int *b = get_int(); printf("%d\n", *b); return 0; }

In this small example, the get_int function returns a pointer to the local variable a. After the function completes, the variable a will be located in the memory area allocated on the stack of the function get_int. We cannot guarantee that there will be no other data in this memory location at some point in the program, so dereferencing such a pointer may produce unexpected results. Such pointers are called dangling.

In Spawn this situation is impossible, let's look at the same example, but in Spawn:

fn get_int() -> &i32 { a := 42 return &a } fn main() { b := get_int() println(b) }

Here, as in the C example, we return a reference to the local variable a. However, the compiler may calculate that the variable a will live longer than the function get_int, and therefore the variable a will be allocated on the heap rather than on the stack. Thanks to this, after the get_int function completes, the a variable will be in a valid memory area that cannot be accidentally overwritten by other data.

Take reference

As mentioned earlier, to get a reference to a variable, the & and &mut operators are used. However, we cannot take a reference to any expression, only to those that have an address in memory.

For example, you cannot take a reference to a literal:

a := &42 // error

This code will not compile because it is impossible to take a reference to a literal, it does not have an address in memory. To understand whether an expression can be referenced, consider whether the expression has a specific address in memory.

For example, variables definitely have an address in memory, but the expression a + 2 is not, since the result of that expression is not stored anywhere. The fields of structures also have an address in memory, since the structure itself has an address in memory, and the field is an offset relative to this address.

&a // ok &person.name // ok &arr[0] // ok &(a + 2) // error &42 // error &foo() // error

In case you want to take the address of an expression that does not have an address, you can assign the expression to a variable and take the address from it:

fn main() { a := 42 b := &a // ok println(b) // 42 }

Auto-dereference

As shown earlier, when assigning a value to referenced memory, the * operator can be omitted. In the absence of an explicit dereference operator, the compiler automatically inserts one into the code when necessary. This is called auto-dereference.

The main use of auto-dereference is to access struct fields/methods via reference:

struct Person { name string } fn main() { p := Person{ name: 'Bob' } b := &p println(b.name) // Bob }

Here we are not dereferencing the b reference to get the value of the structure to then get the name field. The compiler automatically dereferences the b reference and obtains the name field from the structure. Due to reference safety described above, this auto dereference is safe and cannot lead to invalid memory access.

Auto-dereferencing is there to make working with references as easy as possible; using references, you almost never have to think about how they work under the hood, since things like assigning or retrieving a value happen without the need for explicit dereferencing. In this case, you get all the advantages of C pointers, but without their disadvantages.

Nullable references

Null pointers are often used as a sign of missing data, but in Spawn references cannot be null. To indicate the absence of data, a special type Option is used, which we will look at in more detail in Chapter X.

For now, remember that this type is written as ?T, where T is the data that is stored inside Option. Thus, ?&i32 specifies a reference to i32 which can be none, a special value of type Option that denotes no data.

Let's create a simple singly linked list that will store integers:

struct Node { value i32 next ?&Node } fn main() { // head -> tail -> {none} mut head := Node{ value: 1 } mut tail := Node{ value: 2 } head.next = &tail tail.next = none }

In this example, the next field stores a reference to the next element in the list. Since the last element of the list does not have a next element, we use the special value none to indicate the absence of data.

Now we can iterate over the list and output all its elements:

struct Node { value i32 next ?&Node } fn main() { // head -> tail -> {none} mut head := Node{ value: 1 } mut tail := Node{ value: 2 } head.next = &tail tail.next = none //code::start mut node := &head as ?&Node for node != none { println(node.value) node = node.next } //code::end }

Here we use a for loop to traverse the list. In the first line, we create a temporary variable node that will store a reference to the current list element. Since there is no next value at the end of the list, node must be able to store none, so we explicitly cast a reference to Node to type ?&Node using the as operator.

Now in the loop, we simply output the current node and move on to the next one. When we reach the end of the list, node will store none and the loop will end.

Despite the seeming overhead of the Option type, ?&Node is represented in memory as a normal pointer (not yet implemented) and none is represented as a null pointer. Thus, on the Spawn side, this code is safe, and when compiled into machine code, it is as efficient as C code.

?&i32 in Spawn === i32* in C none in Spawn === NULL in C

Unlike pointers in C, ?&i32 cannot be dereferenced if it stores none, since before any reference can be dereferenced, it must be obtained from Option. If there is none there when accessing the data, the program will panic and exit with an error.

opt_ref := none as ?&i32 println(opt_ref.unwrap()) // panic: unwrap on a none value

If you read the example code and the paragraph above carefully, you might be wondering, if node is ?&i32, then how can we access the value field if we didn't explicitly get the value from Option? The point is that the compiler can figure out when the value of Option is definitely not none, which means you can implicitly get the value without the need to explicitly call the unwrap method. In the example above, in the loop condition we have already checked that node is not equal to none, and inside the body we can be sure that node is not equal to none, since otherwise the loop would end.

In the next article, we'll look at pointers, a more powerful but much less secure way of working with data.

On this page