Announcements

Office Hours Schedule
April 6, 2023

We will be holding our office hours starting next week:

  • Keith: Wednesdays, 3PM - 5PM in Durand 317.
  • Kevin: Fridays, 2PM - 4PM in Huang Baseement.
Welcome to CS166!
April 3, 2023

Welcome to CS166, a course in the design, analysis, and implementation of data structures. We've got an exciting quarter ahead of us - the data structures we'll investigate are some of the most beautiful constructs I've ever come across - and I hope you're able to join us.

CS166 has two prerequisites - CS107 and CS161. From CS107, we'll assume that you're comfortable working from the command-line; designing, testing, and debugging nontrivial programs; manipulating pointers and arrays; using bitwise operators; and reasoning about the memory hierarchy. From CS161, we'll assume you're comfortable designing and analyzing nontrivial algorithms; using O, o, Θ, Ω, and ω notation; solving recurrences; working through standard graph and sequence algorithms; and structuring proofs of correctness.

We'll update this site with more information as we get closer to the start of the quarter. In the meantime, feel free to email me at htiek@cs.stanford.edu if you have any questions about the class!

 

Schedule and Readings

This syllabus is still under construction and is subject to change as we fine-tune the course. Stay tuned for more information and updates!

Tuesday Thursday
Fusion Trees, Part II
June 6

The sardine tree we developed in our last lecture gives a fast ordered dictionary data structure for small keys. By harnessing its key insight - B-tree lookups can be sped up by improving rank calculations at each node - and combining it with some insights about integers and Patricia tries, we can build the fusion tree, which works for any integers that fit into a machine word.

Slides:

Holm Forests
May 30

Maintaining connectivity information in a fully dynamic graph efficiently is a challenge. Today we see one data structure that elegantly solves the problem, giving excellent amortized time bounds.

Slides:

Readings:

Fusion Trees, Part I
June 1

Memory inside a computer is split apart into individual machine words, with primitive operations like addition, subtraction, and multiplication taking constant time. This means that, if we can pack multiple pieces of data into a single machine word, we can essentially get a limited form of parallel computation. Through a variety of very clever tricks, we can use this to speed up searching and sorting.

This first lecture explores word-level parallelism through a data structure for very small integers and a technique for computing most-significant bits in time O(1). It will lay the groundwork we'll use next time when exploring fusion trees.

Slides:

Readings:

Disjoint Set Forests
May 23

Checking if two nodes are connected in a graph (whether there's a path between them) is a very easy problem to solve if the graph is given to you in advance. But what if the graph can change by having edges added in dynamically? This problem requires a more clever solution, one that's trivial to code up but whose analysis is one of the most surprising of the quarter.

Slides:

Readings:

Euler Tour Trees
May 25

The dynamic connectivity problem is the following: maintain connectivity information about a graph as edges are added and deleted. (Contrast this with what disjoint-set forests do, where edges can only be added.) In the restricted case of maintaining a forest, there's a surprisingly elegant and versatile data structure for this problem: the Euler tour tree, based on a creative perspective for representing a tree as a sequence.

Slides:

Readings:

Better-than-Balanced BSTs
May 16

We've been operating under the assumption that a balanced BST that has worst-case O(log n) lookups is, in some sense, an "optimal" binary search tree. In one sense (worst-case efficiency) these trees are optimal. However, there are other perspectives we can take on what "optimal" means, and they counsel toward other choices of tree structures - weight-balanced trees, finger search trees, and Iacono's working set structure.

Slides:

Readings:

Assignments:

Splay Trees
May 18

Balanced binary search trees give worst-case O(log n) times on each tree operation. If we're trying to guarantee worst-case efficiency, this is as good as it gets. But what if we're trying to guarantee average-case efficiency? In that case, it's possible to improve upon the bounds given by balanced BSTs. In this lecture, we'll explore statically optimal BSTs and an amazing data structure called the splay tree that very well may be the best possible binary search tree.

Slides:

Readings:

Hashing and Sketching, Part II
May 9

We've now seen how to build an estimator: make a simple data structure that gives a good chance of success, then run it in parallel. This idea can be extended to build frequency estimators with other properties, as well as to build estimators for how many distinct items we've seen.

Slides:

Readings:

Cuckoo Hashing
May 11

Most hash tables give expected O(1) lookups. Can we make hash tables with no collisions at all, and if so, can we do it efficiently? Amazingly, the answer is yes. There are many schemes for achieving this, one of which, cuckoo hashing, is surprisingly simple to implement. The analysis, on the other hand, goes deep into properties of random graph theory.

Slides:

Readings:

Fibonacci Heaps
May 2

Fibonacci heaps are a type of priority queue designed to make a particular operation called decrease-key run quickly, at least in an amortized sense. Today we'll see why this operation is so important, as well as how to make the operation run quickly by pulling together many ideas we've seen in the course of our exploration of amortization.

Slides:

Readings:

Hashing and Sketching, Part I
May 4

How can Google keep track of frequent search queries without storing all the queries it gets in memory? How can you estimate frequently- occurring tweets without storing every tweet in RAM? As long as you're willing to trade off accuracy for space, you get get excellent approximations.

Slides:

Readings:

Assignments:

Amortized Analysis
April 25

In many cases we only care about the total time required to process a set of data. In those cases, we can design data structures that make some operations more expensive in order to lower the total cost of all aggregate operations. How do you analyze these structures?

Slides:

Readings:

  • CLRS: Chapter 17

Assignments:

Binomial Heaps
April 27 (Prerecorded)

Binomial heaps are a simple and flexible priority queue structure that supports efficient melding of priority queues. The intuition behind binomial heaps is particularly elegant, and they'll serve as a building block toward the more complex abdication heap data structure that we'll talk about on Tuesday.

Slides:

Readings:

Tries and Suffix Trees
April 18

To kick off our discussion of string data structures, we'll be exploring tries, Patricia tries, and, most importantly, suffix trees. These data structures provide fast solutions to a number of algorithmic problems and are much more versatile than they might initially seem. What makes them so useful? What properties of strings do they capture? And what intuitions can we build from them?

Slides:

Assignments:

Suffix and LCP Arrays
April 20

What makes suffix trees so useful as a data structure? Surprisingly, much of their utility and flexibility can be attributed purely to two facts: they keep the suffixes sorted, and they expose the branching words in the string. By representing this information in a different way, we can get much of the benefit of suffix trees without the huge space cost.

Slides:

Readings:

Balanced Trees, Part I
April 11

Balanced search trees are among the most versatile and flexible data structures. They're used extensively in theory and in practice. What sorts of balanced trees exist? How would you design them? And what can you do with them?

Slides:

Assignments:

Readings:

Balanced Trees, Part II
April 13 (Prerecorded)

Our last lecture concluded with a view of red/black trees as isometries of 2-3-4 trees. How far does this connection go? How can we use it to derive the rules for red/black trees? And now that we've got red/black trees, what else can we do with them?

Slides:

Readings:

  • CLRS, Chapter 14.
Range Minimum Queries, Part One
April 4

The range minimum query problem is the following: given an array, preprocess it so that you can efficiently determine the smallest value in a variety of subranges. RMQ has tons of applications throughout computer science and is an excellent proving ground for a number of advanced algorithmic techniques.

Slides:

Readings:

Assignments:

Range Minimum Queries, Part Two
April 6

Our last lecture took us very, very close to a ⟨O(n), O(1)⟩-time solution to RMQ. Using a new data structure called a Cartesian tree in conjunction with a technique called the Method of Four Russians, we can adapt our approach to end up with a linear-preprocessing-time, constant-query-time solution to RMQ. In doing so, we'll see a number of clever techniques that will appear time and time again in data structure design.

Slides:

Readings: