One of the most used caching schemes is Least Recently Used (LRU). Caches that use this eviction strategy will remove the least recently used item when their size reaches their capacity. That means that such caches have to keep a track of the order of access of the cached items, so it would evict the correct items.

Before we go too much into details about the implementation of an LRU cache, let’s first see what are the possible applications of this caching scheme.

Applications

LRU cache applications are diverse. The LRU cache eviction scheme is one of the most widespread and used scheme. Due to its nature, it often does the trick for conventional problems in commercial programming. For example, if one has a fixed amount of memory, the most natural way to cache is to keep recently accessed items.

So, what would be one potential application of a LRU cache? Let’s see an example.

Open your Spotify application (if you have one). Don’t have one? Here’s a screenshot of mine:

Let's keep my music taste out of the discussion, please.

You can see in the screenshot above that I have chosen the “Recently played” view. To no one’s surprise, it shows all my recently played media. If you’ve never thought about the implementation of such a screen, let me break the magic down for you. This view is a GUI that lives on top of some sort of an LRU cache.

Curious how we would model such a “Recently played” view? Let’s take a deeper look.

Establishing basic types

Let’s define couple of types that would represent the Playlists and Songs found on Spotify:

1
2
3
4
5
type Song struct {
	duration  int64 // seconds
	title     string
	artist    *Artist
}

No surprises here - a Song has a duration, a title and an artist. Let’s look at a Playlist:

1
2
3
4
5
6
7
type Playlist struct {
	title       string
	descrption  string
	duration    int64 // seconds
	publishedAt int32 // Unix timestamp
	songs       []*Song
}

A Playlist has a title, description, duration (which is the sum of the durations of all songs in it). Also, a publishedAt which is the date of the publishing of the playlist. Finally it has songs - a slice of Song pointers representing the songs in the playlist.

Such structure of the Playlist struct will allow users to find a playlist and “just play it”.

Recently played

The “Recently played” screen in Spotify has 30 items only, where each of the items can be a playlist, an artist or an album. For simplicity of our example here, we will agree that we will only keep track of the recently played playlists. This means that we’ll ignore the artists or albums in our version of the “Recently Played” screen.

A typical implementation of a LRU cache uses a combination of two simple data structures: a hash table and a linked list. Why? Well, as I mentioned in another article on caching, caches have to be fast along two dimensions:

  1. Ensuring that as many of the requests for files go to it (cache hit), not over the network or to main memory (cache miss);
  2. The overhead of using it should be small: testing membership and deciding when to replace a Playlist should be as fast as possible.

On the first point, we have agreed that LRU will do the job because of the nature of the “Recently played” screen. This means that we have to keep track of what Playlists are recently played. A data structure that allows us to do this is a linked list. Since Go already provides us a doubly-linked list implementation, there is no need to reinvent the wheel. We will use the container/list package 1.

On the second point, the data structure that comes to mind for a scalable membership tests is a hash table. 2 As you might know, hash tables have unique keys for every value. If our program can create a unique key for a Playlist, it will be easy for us to check if the Playlist is in the cache or not.

Knowing this, let’s define the RecentlyPlayed struct and its constructor function:

1
2
3
4
5
6
type RecentlyPlayed struct {
	capacity int
	size     int
	cache    map[int]*list.Element
	lru      *list.List
}

Things are pretty self explanatory here. capacity will hold the maximum capacity of the cache and size will be the current size of the recently played screen. We will use the cache hash table to retrieve the elements from cache. The lru is a linked list (from the container/list package) that we will use to keep track of the recently played playlists.

Let’s first create a small function that will spawn up a new RecentlyPlayed struct with some sane defaults:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func NewRecentlyPlayed(capacity int) *RecentlyPlayed {
	rp := RecentlyPlayed{
		capacity: capacity,
                size:     0,
		lru:      list.New(),
		cache:    make(map[string]*list.Element),
	}

	return &rp
}

This function sets the defaults to the struct and returns a pointer to it. Before we continue, let’s make a quick stop. Let’s take a moment to understand how each of these two data structures will work together.

First, cache is going to be a map that will have a key of type string and a value of type *list.Element. The string will be a hash of the Playlist that will be stored there, while the *list.Element will be a pointer to the element in the doubly-linked list. By having this pointer, we can get the Playlist stored inside it (it’s Value). Also, it will be easy to reposition it in the lru list when we play the Playlist.

Introducing the player

To have a “Recently played” list of playlists, we obviously need to put it in some sort of a player, that will have a Play function.

Here’s a quick sketch of it, so we can move to the meat of our implementation:

1
2
3
4
type Player struct {
	playProgress       int
	RecentlyPlayedList *RecentlyPlayed
}

For the purpose of this example, we can keep the Player quite simple - only the progress of the song currently playing (playProgress) and the recently played list of playlists (RecentlyPlayedList).

We will also add a small constructor function for the Player:

1
2
3
4
5
6
func NewPlayer() *Player {
	p := Player{
		RecentlyPlayedList: NewRecentlyPlayed(30),
	}
	return &p
}

Now we need to add the Play function and understand what are the functions related to RecentlyPlayed that Play will have to invoke.

Playing a Playlist

Obviously, there’a already an entity that will have to play the Playlist - the Player. Let’s add a function Play to our Player:

1
2
3
4
5
6
7
8
func (player *Player) Play(playlist *Playlist) {
	if cached, playlist := player.RecentlyPlayedList.Get(playlist.hash()); cached {
		// Play from cache...
	} else {
		// Fetch over the network and start playing...
		player.RecentlyPlayedList.Set(playlist)
	}
}
Before we continue...

If you thought that I’d show you how we’ll actually play a media file or stream it over the network I am sorry to disappoint. That’s a tad out of the scope of this article.

That being said, I am always looking for topic ideas to write on, although I have a huge list already, so if that’s something you’d like to read on drop a comment below.

Let’s dissect the Play function:

  1. The player.RecentlyPlayedList struct is of type *RecentlyPlayed. It has a function Get which will return a bool and a playlist which our player can then play.

  2. If it doesn’t find a cached playlist it will fetch/start streaming the playlist over the network. Then, it will cache it using the Set function of the RecentlyPlayedLists struct.

So, what is the behaviour and the internals of these two new methods, Get and Set?

Getting a Playlist from cache

To retrieve a Playlist from cache is actually a cheap task, from time perspective. Our cache is hash table-backed which has a O(1) time complexity for the access operation. The trick is once we access an item in cache we also have to move it to the beginning of the lru list.

In other words, we need to promote it as the most recently used item in the RecentlyPlayedList. Given that we use a linked list-backed cache, we have to take our item from the list and move it to the front.

Let’s see how that would work in our context:

1
2
3
4
5
6
7
8
func (rp *RecentlyPlayed) Get(key string) (*Playlist, bool) {
	if elem, present := rp.cache[key]; present {
                rp.lru.MoveToFront(elem)
		return elem.Value.(*Playlist), true
	} else {
		return nil, false
	}
}

In case you were expecting some magic here – sorry to disappoint, but this is pretty simple. Its a three step function:

  1. Checks if it finds the item in the cache. If not it returns a nil.
  2. Otherwise, it moves the element to the beginning of the LRU list, and
  3. It returns the value of the *list.Element, which is the pointer to the Playlist that our player will play

That’s it, pretty simple.

Hash table key for a Playlist

Before we move on to Set, let’s quickly discuss the implementation of the hash function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
func (playlist *Playlist) hash() string {
	hash := sha1.New()
	s := fmt.Sprintf(
		"%d-%s-%s-%d",
		playlist.duration,
		playlist.title,
		playlist.description,
		playlist.publishedAt,
	)
	hash.Write([]byte(s))
	sum := hash.Sum(nil)
	return fmt.Sprintf("%x", sum)
}

As you can see, it’s a simple one - it’s goal is just to generate a reproducible hash for a Playlist. It basically concatenates the attributes of the Playlist struct and then applies a SHA1 hashing sum on it. As the last step, it returns the hash in a hexidecimal format.

Setting a Playlist to cache

Now that we have the hash function out of the way, let’s look at the code of the Set function. After, we can discuss the steps this function takes to add a Playlist to the cache:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func (rp *RecentlyPlayed) Set(playlist *Playlist) {
	key := playlist.hash()
	if elem, present := rp.cache[key]; present {
		rp.lru.MoveToFront(elem)
	} else {
		elem := rp.lru.PushFront(playlist)
		rp.size++
	}
	rp.cache[key] = elem
}

The Set function takes a *Playlist as an argument. It is then hashed using the hash method that implemented by RecentlyPlayed. It returns a unique key based on some attributes of the Playlist. The key is then used when the Playlist is added in the hash table caches.

But before we add it to the hash table cache, we will check if there’s already a value in the cache hash table with the same key. If so, we will only move the *Playlist to the front of the lru linked list.

If not, we push the *Playlist to the front of the lru linked list, as the most recently used item. This returns a *list.Element, which for its Value expects an interface{} (which is the Go way to say “any type”). The *list.Element will wrap the *Playlist as its Value.

This means that any time we access the elem.Value we will have to cast it to its proper type, since *list.Element does not know the type of its Value (remember, it accepts any type).

After the playlist is added to the front of the lru list, the Set function will increment the RecentlyPlayed’s size due to the new item added to the list.

Finally, the function will cache the *list.Element in the cache hash table - which we will use to retrieve the playlist with a O(1) time complexity.

Eviction

Now that we know how the Get and Set functions work, we need to take one more thing into consideration. That is Spotify’s limitation of the size of the recently played screen. It means that once the number of playlists reaches a threshold, it removes the least recently played playlist. This will make the room to add a new one to the list.

This is the eviction algorithm that we have to write for our LRU cache, which powers the recently played screen. In our implementation we’ll call the function increment:

1
2
3
4
5
6
7
8
func (rp *RecentlyPlayed) increment(element *list.Element) {
	rp.lru.MoveToFront(element)
	if rp.size == rp.capacity {
		lruItem := rp.lru.Back()
		rp.lru.Remove(lruItem)
		rp.size--
	}
}

Every time we want to increment the usage of a certain *list.Element, the function will take these two steps:

  1. Move the accessed element to be beginning of the lru linked list, and
  2. When the size of the RecentlyPlayed struct has reached its capacity, remove the last item in the lru linked list

This function will allow us to handle the eviction of the least recently used item in the list of playlists. Now, we can revisit our Set and Get functions and drop this function in:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
func (rp *RecentlyPlayed) Set(playlist *Playlist) {
	key := playlist.hash()
	if elem, present := rp.cache[key]; present {
		rp.increment(elem) // <- the change
	} else {
		elem := rp.lru.PushFront(playlist)
		rp.size++
	}
	rp.cache[key] = elem
}

func (rp *RecentlyPlayed) Get(key string) (*Playlist, bool) {
	if elem, present := rp.cache[key]; present {
		rp.increment(elem) // <- the change
		return elem.Value.(*Playlist), true
	} else {
		return nil, false
	}
}

Now that we have the increment function, we will use it in the Set and Get functions. By doing this, we will update the list of recently used items every time we play a Playlist.

This change will unlock two things for our recently played screen: 1. It will change the order of the cached playlists, based on the how recently the playlists are played 2. It will remove any playlists once size exceeds capacity

In closing

Now that we have a Go-powered sketch of Spotify’s recently played screen, let’s do a quick recap.

To shine some light on the shortcomings:

  1. This is a barebones model - it lacks any mechanisms to stream/download the playlists via the internet. We agreed that although interesting, this would be hard to cover in this article.
  2. The model does not take into account any restarting of the application. This means if we stop and start the program the cached playlists will be gone. This is because our implementation does not store the data on disk (only in memory).
  3. We do not have any graphical user interface to interact with the application - only a couple of functions that we can invoke.

While all the above is a shortcoming of our implementation, it still paints the picture of how we could write such a program using Go. The combination of a linked list & a hash table works nice for solving the problem at hand. From time complexity perspective, it scales well. And, for our tiny scenario, the hit-to-miss ratio should is optimal.

If you would like to read more of my rambling about caching algorithms, you can also read “When and Why to use a Least Frequently Used (LFU) cache with an implementation in Golang” 3.