References and Pointers
Often when working with large data types, passing them by value can be time-consuming due to the need to copy a large amount of data. In such cases, instead of copying the data, you can pass a link to it. This approach is called passing by reference.
Passing by mutable reference also allows you to change the data referenced by the reference.
There are two ways to pass by reference in Spawn: references and pointers. Regular code almost always uses references because they are safe and easy to use. Pointers are used primarily for working with external code (for example, with C libraries), as well as for writing high-performance low-level, but not always safe, code.
References
References are a special data type that stores an address in memory where a value is stored, rather than the value itself.
Create a references.sp file for the code in this article.
Consider the following code:
Here 42 is the data that is stored in the variable a. Using the & operator
we can get the address in memory where the variable a is stored:
Now b is a reference to a. In fact, b now stores the address in memory
where a is stored. Knowing the address, we can get the value located at this
address using the * dereference operator:
Mutable references
n the example above, b is a reference to a, and by default all references
are immutable.
This means that we cannot change the value referenced by b. To make
reference mutable, use the &mut operator:
If you try to compile this code, the compiler will throw an error:
error(E0144): cannot take mutable reference to immutable variable `a`
--> references.sp:133:10:14
|
133 | b := &mut a
| ^^^^ - this variable is immutable
help: consider changing variable `a` to be mutable
--> references.sp:132:5:5
|
132 | mut a := 42
| +++
In the code above, we tried to take a mutable reference to the immutable
variable a, which is not allowed.
To fix the error, you need to make the a variable mutable:
Now, using the * operator we already know, we can change the value of the a
variable:
To write the value referenced by b into memory, we use the * operator, which
in this case is to the left of the = assignment operator. Let's take this line
of code piece by piece:
If you are familiar with pointers from languages like C or C++, the code shown above should be familiar to you. However, references in Spawn, due to their safety and the absence of pointer arithmetic (which we will talk about a little later), allow you to write this line more simply:
Here we are not manually dereferencing the reference b, the compiler knows
that 43 or some other number is not a valid memory address, and since the value
type referenced by b is also i32, like 43, the compiler automatically
dereferences the reference b and writes value 43 to memory,
When assigning one reference to another, the link on the left is not dereferenced, since we are directly assigning the address from one reference to another:
mut a := 42 mut b := &mut a c := &mut a b = c // now b and c point to the same memory address
Let's practice! Create a function inc, which will increase the value of a
variable by 1:
Since we want to change the value, we must pass a mutable reference to the
function. To do this, we use the &mut operator when calling the inc
function. Now, inside the inc function, we can change the value of the
variable a as if it were a variable that simply stores a number.
As a result, after calling the inc function, the value of the a variable
will increase by 1 and the number 43 will be output.
Reference safety
If you come from languages like C or C++, then you know that working with pointers can be dangerous, from null or invalid pointer dereferences after arithmetic operations to dangling pointers and memory leaks.
In Spawn, references are safe by default. A reference cannot store a null
address, pointer arithmetic is prohibited, and you cannot get a reference
except through the & or &mut operators. Thanks to escape analysis, a
technique that analyzes whether a reference will live longer than the object it
refers to, the compiler can ensure that the reference is not dangling.
Garbage Collector will ensure that there are no memory leaks.
Let's look at an example with a dangling pointer, which is so easy to get in C:
In this small example, the get_int function returns a pointer to the local
variable a.
After the function completes, the variable a will be located in the memory
area allocated on the stack of the function get_int. We cannot guarantee that
there will be no other data in this memory location at some point
in the program, so dereferencing such a pointer may produce unexpected results.
Such pointers are called dangling.
In Spawn this situation is impossible, let's look at the same example, but in Spawn:
Here, as in the C example, we return a reference to the local variable a.
However, the compiler may calculate that the variable a will live longer than
the function get_int, and therefore the variable a will be allocated on the
heap rather than on the stack. Thanks to this, after the get_int function
completes, the a variable will be in a valid memory area that cannot be
accidentally overwritten by other data.
Take reference
As mentioned earlier, to get a reference to a variable, the & and &mut
operators are used. However, we cannot take a reference to any expression, only
to those that have an address in memory.
For example, you cannot take a reference to a literal:
This code will not compile because it is impossible to take a reference to a literal, it does not have an address in memory. To understand whether an expression can be referenced, consider whether the expression has a specific address in memory.
For example, variables definitely have an address in memory, but the
expression a + 2 is not, since the result of that expression is not stored
anywhere. The fields of structures also have an address in memory,
since the structure itself has an address in memory, and the field is an offset
relative to this address.
In case you want to take the address of an expression that does not have an address, you can assign the expression to a variable and take the address from it:
Auto-dereference
As shown earlier, when assigning a value to referenced memory, the * operator
can be omitted. In the absence of an explicit dereference operator, the compiler
automatically inserts one into the code when necessary. This is called
auto-dereference.
The main use of auto-dereference is to access struct fields/methods via reference:
Here we are not dereferencing the b reference to get the value of the
structure to then get the name field. The compiler automatically dereferences
the b reference and obtains the name field from the structure.
Due to reference safety described above, this auto dereference is safe and
cannot lead to invalid memory access.
Auto-dereferencing is there to make working with references as easy as possible; using references, you almost never have to think about how they work under the hood, since things like assigning or retrieving a value happen without the need for explicit dereferencing. In this case, you get all the advantages of C pointers, but without their disadvantages.
Nullable references
Null pointers are often used as a sign of missing data, but in Spawn references
cannot be null. To indicate the absence of data, a special type Option is
used, which we will look at in more detail in Chapter X.
For now, remember that this type is written as ?T, where T is the data that
is stored inside Option. Thus, ?&i32 specifies a reference to i32 which
can be none, a special value of type Option that denotes no data.
Let's create a simple singly linked list that will store integers:
In this example, the next field stores a reference to the next element in the
list. Since the last element of the list does not have a next element, we use
the special value none to indicate the absence of data.
Now we can iterate over the list and output all its elements:
Here we use a for loop to traverse the list. In the first line, we create a
temporary variable node that will store a reference to the current list
element. Since there is no next value at the end of the list, node must be
able to store none, so we explicitly cast a reference to Node to
type ?&Node using the as operator.
Now in the loop, we simply output the current node and move on to the next one.
When we reach the end of the list, node will store none and the loop will
end.
Despite the seeming overhead of the Option type, ?&Node is represented in
memory as a normal pointer (not yet implemented) and none is represented as a
null pointer. Thus, on the Spawn side, this code is safe, and when compiled into
machine code, it is as efficient as C code.
Unlike pointers in C, ?&i32 cannot be dereferenced if it stores none, since
before any reference can be dereferenced, it must be obtained from Option. If
there is none there when accessing the data, the program will panic and exit
with an error.
If you read the example code and the paragraph above carefully, you might be
wondering, if node is ?&i32, then how can we access the value field if we
didn't explicitly get the value from Option? The point is that the compiler
can figure out when the value of Option is definitely not none, which means
you can implicitly get the value without the need to explicitly call
the unwrap method. In the example above, in the loop condition we have already
checked that node is not equal to none, and inside the body we can be sure
that node is not equal to none, since otherwise the loop would end.
In the next article, we'll look at pointers, a more powerful but much less secure way of working with data.