Barebones model of Spotify's 'Recently Played' screen using a Least Recently Used (LRU) cache in Golang
Table of Contents
One of the most used caching schemes is Least Recently Used (LRU). Caches that use this eviction strategy will remove the least recently used item when their size reaches their capacity. That means that such caches have to keep a track of the order of access of the cached items, so it would evict the correct items.
Before we go too much into details about the implementation of an LRU cache, let’s first see what are the possible applications of this caching scheme.
Applications #
LRU cache applications are diverse. The LRU cache eviction scheme is one of the most widespread and used scheme. Due to its nature, it often does the trick for conventional problems in commercial programming. For example, if one has a fixed amount of memory, the most natural way to cache is to keep recently accessed items.
So, what would be one potential application of a LRU cache? Let’s see an example.
Open your Spotify application (if you have one). Don’t have one? Here’s a screenshot of mine:
You can see in the screenshot above that I have chosen the “Recently played” view. To no one’s surprise, it shows all my recently played media. If youβve never thought about the implementation of such a screen, let me break the magic down for you. This view is a GUI that lives on top of some sort of an LRU cache.
Curious how we would model such a “Recently played” view? Let’s take a deeper look.
Establishing basic types #
Let’s define couple of types that would represent the Playlist
s and Song
s
found on Spotify:
type Song struct {
duration int64 // seconds
title string
artist *Artist
}
No surprises here - a Song
has a duration
, a title
and an artist
. Let’s
look at a Playlist
:
type Playlist struct {
title string
descrption string
duration int64 // seconds
publishedAt int32 // Unix timestamp
songs []*Song
}
A Playlist
has a title
, description
, duration
(which is the sum of the
durations of all songs in it). Also, a publishedAt
which is the date of the
publishing of the playlist. Finally it has songs
- a slice of Song
pointers
representing the songs in the playlist.
Such structure of the Playlist
struct
will allow users to find a playlist
and “just play it”.
Recently played #
The “Recently played” screen in Spotify has 30 items only, where each of the items can be a playlist, an artist or an album. For simplicity of our example here, we will agree that we will only keep track of the recently played playlists. This means that we’ll ignore the artists or albums in our version of the “Recently Played” screen.
A typical implementation of a LRU cache uses a combination of two simple data structures: a hash table and a linked list. Why? Well, as I mentioned in another article on caching, caches have to be fast along two dimensions:
- Ensuring that as many of the requests for files go to it (cache hit), not over the network or to main memory (cache miss);
- The overhead of using it should be small: testing membership and deciding
when to replace a
Playlist
should be as fast as possible.
On the first point, we have agreed that LRU will do the job because of the
nature of the “Recently played” screen. This means that we have to keep track
of what Playlist
s are recently played. A data structure that allows us to do
this is a linked list. Since Go already provides us a doubly-linked list
implementation, there is no need to reinvent the wheel. We will use the
container/list
package 1.
On the second point, the data structure that comes to mind for a scalable
membership tests is a hash table. 2 As you might know, hash tables have
unique keys for every value. If our program can create a unique key for a
Playlist
, it will be easy for us to check if the Playlist
is in the cache
or not.
Knowing this, let’s define the RecentlyPlayed
struct and its constructor
function:
type RecentlyPlayed struct {
capacity int
size int
cache map[int]*list.Element
lru *list.List
}
Things are pretty self explanatory here. capacity
will hold the maximum
capacity of the cache and size
will be the current size of the recently
played screen. We will use the cache
hash table to retrieve the elements from
cache. The lru
is a linked list (from the container/list
package) that we
will use to keep track of the recently played playlists.
Let’s first create a small function that will spawn up a new RecentlyPlayed
struct with some sane defaults:
func NewRecentlyPlayed(capacity int) *RecentlyPlayed {
rp := RecentlyPlayed{
capacity: capacity,
size: 0,
lru: list.New(),
cache: make(map[string]*list.Element),
}
return &rp
}
This function sets the defaults to the struct and returns a pointer to it. Before we continue, let’s make a quick stop. Let’s take a moment to understand how each of these two data structures will work together.
First, cache
is going to be a map
that will have a key of type string
and
a value of type *list.Element
. The string
will be a hash of the Playlist
that will be stored there, while the *list.Element
will be a pointer to the
element in the doubly-linked list. By having this pointer, we can get the
Playlist
stored inside it (it’s Value
). Also, it will be easy to reposition
it in the lru
list when we play the Playlist.
Introducing the player #
To have a “Recently played” list of playlists, we obviously need to put it in
some sort of a player, that will have a Play
function.
Here’s a quick sketch of it, so we can move to the meat of our implementation:
type Player struct {
playProgress int
RecentlyPlayedList *RecentlyPlayed
}
For the purpose of this example, we can keep the Player
quite simple - only
the progress of the song currently playing (playProgress
) and the recently
played list of playlists (RecentlyPlayedList
).
We will also add a small constructor function for the Player
:
func NewPlayer() *Player {
p := Player{
RecentlyPlayedList: NewRecentlyPlayed(30),
}
return &p
}
Now we need to add the Play
function and understand what are the functions
related to RecentlyPlayed
that Play
will have to invoke.
Playing a Playlist
#
Obviously, there’a already an entity that will have to play the Playlist
- the
Player
. Let’s add a function Play
to our Player
:
func (player *Player) Play(playlist *Playlist) {
if cached, playlist := player.RecentlyPlayedList.Get(playlist.hash()); cached {
// Play from cache...
} else {
// Fetch over the network and start playing...
player.RecentlyPlayedList.Set(playlist)
}
}
Let’s dissect the Play
function:
-
The
player.RecentlyPlayedList
struct is of type*RecentlyPlayed
. It has a functionGet
which will return abool
and a playlist which our player can then play. -
If it doesnβt find a cached playlist it will fetch/start streaming the playlist over the network. Then, it will cache it using the
Set
function of theRecentlyPlayedLists
struct.
So, what is the behaviour and the internals of these two new methods, Get
and
Set
?
Get
ting a Playlist
from cache #
To retrieve a Playlist
from cache is actually a cheap task, from time
perspective. Our cache is hash table-backed which has a O(1)
time complexity
for the access operation. The trick is once we access an item in cache
we
also have to move it to the beginning of the lru
list.
In other words, we need to promote it as the most recently used item in the
RecentlyPlayedList
. Given that we use a linked list-backed cache, we have
to take our item from the list and move it to the front.
Let’s see how that would work in our context:
func (rp *RecentlyPlayed) Get(key string) (*Playlist, bool) {
if elem, present := rp.cache[key]; present {
rp.lru.MoveToFront(elem)
return elem.Value.(*Playlist), true
} else {
return nil, false
}
}
In case you were expecting some magic here β sorry to disappoint, but this is pretty simple. Its a three step function:
- Checks if it finds the item in the cache. If not it returns a
nil
. - Otherwise, it moves the element to the beginning of the LRU list, and
- It returns the value of the
*list.Element
, which is the pointer to thePlaylist
that our player will play
That’s it, pretty simple.
Hash table key for a Playlist
#
Before we move on to Set
, let’s quickly discuss the implementation of the
hash
function.
func (playlist *Playlist) hash() string {
hash := sha1.New()
s := fmt.Sprintf(
"%d-%s-%s-%d",
playlist.duration,
playlist.title,
playlist.description,
playlist.publishedAt,
)
hash.Write([]byte(s))
sum := hash.Sum(nil)
return fmt.Sprintf("%x", sum)
}
As you can see, it’s a simple one - it’s goal is just to generate a
reproducible hash for a Playlist
. It basically concatenates the attributes of
the Playlist
struct and then applies a SHA1
hashing sum on it. As the last
step, it returns the hash in a hexidecimal format.
Set
ting a Playlist
to cache #
Now that we have the hash
function out of the way, let’s look at the code of
the Set
function. After, we can discuss the steps this function takes to add
a Playlist
to the cache:
func (rp *RecentlyPlayed) Set(playlist *Playlist) {
key := playlist.hash()
if elem, present := rp.cache[key]; present {
rp.lru.MoveToFront(elem)
} else {
elem := rp.lru.PushFront(playlist)
rp.size++
}
rp.cache[key] = elem
}
The Set function takes a *Playlist
as an argument. It is then hashed using
the hash
method that implemented by RecentlyPlayed
. It returns a unique key
based on some attributes of the Playlist
. The key is then used when the
Playlist
is added in the hash table caches
.
But before we add it to the hash table cache
, we will check if there’s
already a value in the cache
hash table with the same key
. If so, we will
only move the *Playlist
to the front of the lru
linked list.
If not, we push the *Playlist
to the front of the lru
linked list, as the
most recently used item. This returns a *list.Element
, which for its Value
expects an interface{}
(which is the Go way to say “any type”). The
*list.Element
will wrap the *Playlist
as its Value
.
This means that any time we access the elem.Value
we will have to cast it to
its proper type, since *list.Element
does not know the type of its Value
(remember, it accepts any type).
After the playlist is added to the front of the lru
list, the Set
function
will increment the RecentlyPlayed
’s size
due to the new item added to the
list.
Finally, the function will cache the *list.Element
in the cache
hash table
- which we will use to retrieve the playlist with a
O(1)
time complexity.
Eviction #
Now that we know how the Get
and Set
functions work, we need to take one
more thing into consideration. That is Spotifyβs limitation of the size of the
recently played screen. It means that once the number of playlists reaches a
threshold, it removes the least recently played playlist. This will make the
room to add a new one to the list.
This is the eviction algorithm that we have to write for our LRU cache, which
powers the recently played screen. In our implementation we’ll call the
function increment
:
func (rp *RecentlyPlayed) increment(element *list.Element) {
rp.lru.MoveToFront(element)
if rp.size == rp.capacity {
lruItem := rp.lru.Back()
rp.lru.Remove(lruItem)
rp.size--
}
}
Every time we want to increment the usage of a certain *list.Element
, the
function will take these two steps:
- Move the accessed element to be beginning of the
lru
linked list, and - When the
size
of theRecentlyPlayed
struct has reached itscapacity
, remove the last item in thelru
linked list
This function will allow us to handle the eviction of the least recently used
item in the list of playlists. Now, we can revisit our Set
and Get
functions and drop this function in:
func (rp *RecentlyPlayed) Set(playlist *Playlist) {
key := playlist.hash()
if elem, present := rp.cache[key]; present {
rp.increment(elem) // <- the change
} else {
elem := rp.lru.PushFront(playlist)
rp.size++
}
rp.cache[key] = elem
}
func (rp *RecentlyPlayed) Get(key string) (*Playlist, bool) {
if elem, present := rp.cache[key]; present {
rp.increment(elem) // <- the change
return elem.Value.(*Playlist), true
} else {
return nil, false
}
}
Now that we have the increment
function, we will use it in the Set
and
Get
functions. By doing this, we will update the list of recently used items
every time we play a Playlist
.
This change will unlock two things for our recently played screen:
- It will change the order of the cached playlists, based on the how recently the playlists are played
- It will remove any playlists once
size
exceedscapacity
In closing #
Now that we have a Go-powered sketch of Spotify’s recently played screen, let’s do a quick recap.
To shine some light on the shortcomings:
- This is a barebones model - it lacks any mechanisms to stream/download the playlists via the internet. We agreed that although interesting, this would be hard to cover in this article.
- The model does not take into account any restarting of the application. This means if we stop and start the program the cached playlists will be gone. This is because our implementation does not store the data on disk (only in memory).
- We do not have any graphical user interface to interact with the application - only a couple of functions that we can invoke.
While all the above is a shortcoming of our implementation, it still paints the picture of how we could write such a program using Go. The combination of a linked list & a hash table works nice for solving the problem at hand. From time complexity perspective, it scales well. And, for our tiny scenario, the hit-to-miss ratio should is optimal.
If you would like to read more of my rambling about caching algorithms, you can also read “When and Why to use a Least Frequently Used (LFU) cache with an implementation in Golang” 3.
-
Package that implements a doubly linked list: https://golang.org/pkg/container/list/ ↩︎
-
Or a bloom filter - good article on the topic: https://bart.degoe.de/bloom-filters-bit-arrays-recommendations-caches-bitcoin/ ↩︎
-
Article I published on this very blog: https://ieftimov.com/when-why-least-frequently-used-cache-implementation-golang ↩︎