Data structures in Go: Linked lists
Table of Contents
Data structures and algorithms are the bread and butter of computer science. Although sometimes they appear scary to people, most of them have a simple explanation. Also, when explained well with a problem algorithms can be interesting and fun to learn and apply.
This post is aimed at people that are not comfortable with linked lists, or folks that want to see and learn how to build one with Golang. We will see how we can implement them with Go via a (somewhat) practical example, instead of the plain theory and code examples.
But first, let’s touch on a bit of theory.
Linked lists #
Linked lists are one of the simpler data structures out there. Wikipedia’s article on linked lists states that:
In computer science, a linked list is a linear collection of data elements, in which linear order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of data and a reference (in other words, a link) to the next node in the sequence.
While all this might look like too much or confusing, let’s break it down. A linear data structure is the one where it’s elements form a sequence of some sort. Simple as that. Now, why the physical placement in memory doesn’t matter? Well, when you have arrays the amount of memory the array takes is fixed, in the sense that if you have an array of 5 items, the language will grab only 5 memory addresses in the memory, one after another. Because these addresses create a sequence, the array knows in what memory range its values will be stored and thus the physical placement in-memory of these values create a sequence.
With linked lists, it’s a bit different. In the definition you will notice that “each element points to the next”, using “data and a reference (in other words, a link) to the next node”. This means that each node of the linked list stores two things: a value and a reference to the next node in the list. Simple as that.
Streams of data #
Everything that humans perceive around them is some sort of information or rather data that our senses and our minds know how to process and convert it into a useful piece of information. Whether we look, or smell, or touch we process data and find meaning from that data. Or, when we browse our social media networks we always resort to a feed of data, ordered chronologically and with no end in sight.
So, how can we use linked lists to model such a news feed? Let’s first quickly take a glance at a simple Tweet for example:
just setting up my twttr
— jack (@jack) March 21, 2006
For the purposes of our example social network, let’s get inspired by Twitter
and create a Post type, that has a body, a publishDate and a link to the
next post:
type Post struct {
body string
publishDate int64 // Unix timestamp
next *Post // link to the next Post
}Next, how can we model a feed of posts? Well, if we know that feeds consist of posts that are linked to another post, then we can try to create a type like that:
type Feed struct {
length int // we'll use it later
start *Post
}The Feed struct will have a beginning (or start) which will point to the
first Post in the feed and a length property which will store the size of
the Feed at any given moment.
So, let’s say we want to create a Feed with two posts, the first step is to
create an Append function on the Feed type:
func (f *Feed) Append(newPost *Post) {
if f.length == 0 {
f.start = newPost
} else {
currentPost := f.start
for currentPost.next != nil {
currentPost = currentPost.next
}
currentPost.next = newPost
}
f.length++
}Then we can invoke it, twice:
func main() {
f := &Feed{}
p1 := Post{
body: "Lorem ipsum",
}
f.Append(&p1)
fmt.Printf("Length: %v\n", f.length)
fmt.Printf("First: %v\n", f.start)
p2 := Post{
body: "Dolor sit amet",
}
f.Append(&p2)
fmt.Printf("Length: %v\n", f.length)
fmt.Printf("First: %v\n", f.start)
fmt.Printf("Second: %v\n", f.start.next)
}So, what does this code do? First, the main function - it creates a pointer
to a Feed struct, two Post structs with some dummy content and it invokes
twice the Append function on the Feed, which results in it having a length
of 2. We inspect the two values that the Feed has by accessing the start
of the Feed (which is, in fact, a Post) and the next item after the
start, which is the second Post.
When we run the program, the output will look something like:
Length: 1
First: &{Lorem ipsum 1257894000 <nil>}
Length: 2
First: &{Lorem ipsum 1257894000 0x10444280}
Second: &{Dolor sit amet 1257894000 <nil>}You can see, after we add the first Post to the Feed, the length is 1 and
the first Post has a body and a publishte (as a Unix timestamp), while
it’s next value is nil (because there’s no more Posts in the Feed).
Then, we add the second Post to the Feed and when we inspect the two
Posts we see that the first one has the same content as before, but with a
pointer to the next Post in the list. The second also has a body and a
publishDate but with no pointer to the next Post in the list. Also, the
length of the Feed increases as we add more Posts to it.
Let’s go back to the Append function now and deconstruct it so we understand
better how to work with linked lists. First, the function creates a pointer to
a Post value, with the body argument as body of the Post and the
publishDate is set to the Unix timestamp representation of the current time.
Then, we check if the length of the Feed is 0 - this means that it has no
Posts and the first one that is added should be set as the starting Post,
conveniently named start.
But, if the length of the Feed is more than 0, then our algorithm takes a
different turn. It will start with the start of the Feed and it will walk
through all the Posts until it finds one that doesn’t have a pointer to a
next one. Then, it will attach the new Post as the next value on the last
Post of the list.
Optimising Append #
Imagine we have a user that scrolls through their Feed, like on any other
social network. Since posts are chronologically ordered, based on the
publishDate, the Feed will grow more and more as the user scrolls through
it and more Posts are attached to the Feed. Given the approach, we took in
the Append function, as the Feed gets longer and longer the Append
function will become more and more expensive. Why? Because we have to traverse
the whole Feed to add a Post at the end of it. If you have heard about
Big-O notation, this algorithm has an O(n) time complexity, which means
that it will always traverse the whole Feed before it adds a Post to it.
As you can imagine, this can be quite inefficient, especially if the Feed
grows to be quite long. How can we improve the Append function and decrease
the asymptotic
complexity
of it?
See, since our Feed data structure is just a list of Posts, to traverse it
we have to know the beginning of the list (called start) that’s a pointer of
type Post. Because in our example Append always adds a Post to the end of
the Feed, we could drastically improve the performance of the algorithm if
Feed knows not just of its starting element, but also about it’s ending
element. Of course, there’s always a tradeoff with optimisations, and the
tradeoff here is that the data structure will consume a bit more memory (for
the new attribute on the Feed structure).
Extending our Feed data structure is quite easy:
type Feed struct {
length int
start *Post
end *Post
}But our Append algorithm will have to be tweaked to work with the new
structure of a Feed. This is the version of Append using the end
attribute of Post:
func (f *Feed) Append(newPost *Post) {
if f.length == 0 {
f.start = newPost
f.end = newPost
} else {
lastPost := f.end
lastPost.next = newPost
f.end = newPost
}
f.length++
}That looks a bit simpler, doesn’t it? Let me give you some good news:
- The code is simpler and shorter now, and
- We drastically improved the time complexity of the function
If you look now at the algorithm, it does two things: if the Feed is empty it
will set the new Post as the beginning and the end of the Feed, otherwise
it will set the new Post as the end item and it will attach it to the
previous Post in the list. On top of how simple it is, this will algorithm
now has a big-O complexity of O(1), also known as “constant time”. That means
that Append will perform the same regardless of the length of the Feed
structure.
Pretty simple, right? But let’s imagine that the Feed is actually a list of
Posts on our profile. Hence they are ours, we should be able to delete them.
I mean, what kind of a social network doesn’t allow it’s users to (at least)
delete their posts?
Removing a Post #
As we established in the previous section, we want the users of our Feeds to
be able to delete their posts. So, how can we model that? If our Feeds were
an array, we would just remove the item and be done with it, right?
Well, this is actually where linked lists shine. When arrays sizes change, the runtime has to capture a new memory block to store the items of the array in. Linked lists due to their design, each item having a pointer to the next node in the list, can be scattered throughout the memory space meaning adding/removing nodes from the list is cheap from a space perspective. When one wants to remove a node from a linked list only the neighbours of the removed node need to be linked, and that is. Garbage collected languages (like Go) make this even easier since we don’t have to worry about releasing the allocated memory - the GC will kick in and remove all unused objects.
To make our lives a bit easier for this example, let’s put a constraint that
each of the Posts on a Feedwill have a unique publishDate.This means a
publisher can create one Post per second on their Feed. Taking that into
effect, this is how we can easily remove a Post from a Feed:
func (f *Feed) Remove(publishDate int64) {
if f.length == 0 {
panic(errors.New("Feed is empty"))
}
var previousPost *Post
currentPost := f.start
for currentPost.publishDate != publishDate {
if currentPost.next == nil {
panic(errors.New("No such Post found."))
}
previousPost = currentPost
currentPost = currentPost.next
}
previousPost.next = currentPost.next
f.length--
}The Remove function will take a publishDate of a Post as an argument by
which it will detect what Post needs to be deleted (or unlinked). The
function is rather small. If it detects that the start item of the Feed is
to be removed it will just reassign the start of the Feed with the second
Post in the Feed. Otherwise, it jumps through each of the Posts in the
Feed until it runs into a Post that has a matching publishDate to the one
passed as the function argument. When it finds one, it will just connect the
previous and the next Post in the Feed with each other, effectively
dropping the middle (matching) one from the Feed.
There’s one edge case that we need to make sure that we cover in our Remove
function - what if the Feed doesn’t have a Post with the specified
publishDate? To keep it simple, the function checks for the absence of the
next Post in the Feed before it jumps to it. If the next is nil the
function panics, telling us that it couldn’t find a Post with such
publishDate.
Inserting a Post #
Now that we got appending and removing out of the way, let’s take a look at (a
bit of) a hypothetical case. Imagine that the source that produces the Posts
sends them to our application in a non-chronological order. This means that the
Posts need to be put in the correct order in the Feed, based on the
publishDate. This is how that would look like:
func (f *Feed) Insert(newPost *Post) {
if f.length == 0 {
f.start = newPost
} else {
var previousPost *Post
currentPost := f.start
for currentPost.publishDate < newPost.publishDate {
previousPost = currentPost
currentPost = previousPost.next
}
previousPost.next = newPost
newPost.next = currentPost
}
f.length++
}In essence, this is a very similar algorithm to the one in the Remove
function, because although both of them do a very different thing (adding v.s.
removing a Post in the Feed), they are both based on a search algorithm.
That means that both of the functions actually traverse the whole Feed,
searching for a Post that matches the publishDate with the one received
in the argument of the function. The only difference is that Insert will
actually put the new Post in the place where the dates match, while Remove
will remove the Post from the Feed.
Additionally, this means that both of these functions carry the same time
complexity, which is O(n). This means that in a worst-case scenario, the
functions will have to traverse the whole Feed to get to the item where the
new post needs to be inserted (or the removed).
What if we used arrays? #
If you are asking that yourself, let me say right up front - you have a point.
True, we could store all of the Posts in an array (or a slice in Go), easily
push items onto it and also even have random access with an O(1) complexity.
Due to the nature of arrays, whose values have to be stored in memory one right after another, reading is really fast and cheap. Once you have something stored in an array, retrieving it as easy as accessing it by its 0-based index. When it comes to inserting an item, whether in the middle or at the end, then arrays become less efficient compared to lists. That is because if the array doesn’t have more memory reserved for the new item(s), it will have to reserve it and use it. But, if the next memory addresses are not free, it will have to “move” to a new memory address where there would be space for all of the items in it (new and old).
Looking at all of the examples and discussion we had so far, we can create a table with time complexity for each of the algorithms we created, and compare them with the same algorithms for arrays:
╔═══════════╦═════════╦═════════════╗
║ Action ║ Array ║ Linked list ║
╠═══════════╬═════════╬═════════════╣
║ Access ║ O(1) ║ O(n) ║
║ Search ║ O(n) ║ O(n) ║
║ Prepend ║ O(1) ║ O(1) ║
║ Append ║ O(n) ║ O(1) ║
║ Delete ║ O(n) ║ O(n) ║
╚═══════════╩═════════╩═════════════╝As you can see, when you are faced with a certain problem, picking the correct
data structure can really make or break the products you create. For
ever-growing Feeds, where insertion of Posts is paramount, linked lists do
a much better job because insertions are really cheap. But, if we had a
different problem on our hands that requires frequent deletions or lots of
retrieval/searching, then we would have to pick the correct data structure for
the problem we are dealing with.
You can see the whole implementation of the Feed and play with it
here. Also, Go has it’s own linked
list implementation, with some nice functions already built in. You can see
it’s documentation here.
EDIT 20.02.2018 10:00 CET: Previous verion of the article wrongly stated that
the big-O complexity of the delete function is O(1). Changed to O(n) after
pointed out by Bart de Goede.
EDIT 22.02.2018 23:00 CET: Previous verion of the article wrongly stated that
the big-O complexity of searching in an array is O(1) instead of O(N). This
was a mixup of Access and Search on my side. This is fixed by adding
separate rows in the table for Search and Access and their respective time
complexities.
EDIT 25.02.2018 17:00 CET: Previous version of the article had a buggy
implementation of the Insert function, which was pointed out by Albert
Shirima in the comments.