wolfgang ziegler


„make stuff and blog about it“

BucketList<T>

March 21, 2021

TL;DR: BucketList<T> is a data structure that performs exceptionally well for a certain usage scenario:

  • n (list size) is large.
  • Fast index-based access: list.GetItemAt(i).
  • Fast index-based removal: list.RemoveAt(i).

In my use case, this reduced the runtime from 7 hours to 17 seconds!

An Interesting Problem

In December of last year I took part in solving the Advent of Code programming puzzles for the first time. Since then I became a bit obsessed with that and started solving also the older events going back until 2015.

Day 19 of 2016 presented an interesting challenge that took me a while to solve (with a reasonable runtime) and led to the solution based on this BucketList<T> data structure.

The basic idea behind this problem is that we have a:

  • Circular data structure.
  • Maintain a current element.
  • Find the element opposite of the current element.
  • Advance the current element by one.
  • Repeat until the list has only one element left.

A poor solution

My first approach was using a LinkedList which performed OK for small list sizes. But for large lists (e.g. 3.001.330 was my puzzle input) the performance was dismal.

List size: 3001330.
END after 26696.1142087 seconds.

That's a runtime of more than 7 hours šŸ˜¬.

The problem was the linear search for locating the opposite element in the list that caused an O(nĀ²) complexity.

Using a simple List<T> (instead of LinkedList<T>) performed equally bad because in that case the remove operation causes a runtime complexity of O(nĀ²).

To sum it up:

Data Structure Lookup Removal
List<T> O(1) O(NĀ²)
LinkedList<T> O(NĀ²) O(1)

A better (by a lot) solution

So I needed a data structure with linear or better performance for:

  • Indexing
  • Removal

My solution was a custom data structure I called BucketList (ha ha) which:

  • maintains the elements in a list of lists
  • that have the same maximum size (hence, buckets).

Hand-drawn illustration of the BucketList datastructure

This way, when performing an index-based lookup I have nearly constantĀ¹ O(1) access to elements and removal is only limited by the List<T> performance of a single bucket.

Ā¹ Actually O(k) where k amounts to the number of buckets which should be negligibly low compared to n.

To sum it up:

Data Structure Lookup Remove
BucketList<T> O(k) O(n)

k: number of buckets
n: bucket size (list size N / k)

This brought the runtime down dramatically!

List size: 3001330.
END after 17.0722168 seconds.

17 seconds ... way better šŸ¤©.

Here's the full source code for BucketList.

Note: A bucket size of 25.000 worked best in that specific scenario but your mileage may vary, of course, and depends on the problem input size.

Caveats

  • I only used BucketList<T> for this specific problem so far.
  • The runtime performance depends on the selection of a bucket size for a given list size.
  • The InsertAt case is not yet considered.
  • I'm sure that this data structure exists already with a different name. However, I came up with this on my own when solving my problem, so give me that ;-).