Data Structures¶

Author: Jacob Park

Arrays¶

Use¶

Constant Random Access.
Fixed Size.

Advice¶

Array problems often have simple brute-force solutions that have $O(n)$ space complexity, but subtler solutions use the array itself to reduce space complexity to $O(1)$.
Filling an array from the front is slow, so see if it's possible to write values from the back.
Instead of deleting an entry (which requires moving all entries to its right), consider overwriting it with a tombstone.
Be comfortable with writing code that operates on subarrays.
Don't worry about preserving the integrity of the array (sortedness, etc...) until it's time to return.
A one-pass, range query algorithm can be generalized with start, current, and end indices of which each index is updated per cases qualifying the range.
A several-pass algorithm although slower than a one-pass algorithm still has $O(n)$ time complexity.

Square Root Optimization - "Poor Man's Logarithm"¶

Background¶

In the study of time complexity, $O(\log{n}) \ll O(\sqrt{n}) \ll O(n)$.
Although for a sufficiently large $n$, $O(\log{n}) \ll O(\sqrt{n})$, for $n \le 10^6$, in practice, $O(\log{n}) \approx O(\sqrt{n})$ because of the physical overhead of conditionally executing the recurrence-tree, lower bound of an $O(\log{n})$ process.

Optimization¶

Parition an array of $n$ elements into blocks of size $\sqrt{n}$ and maintain an auxiliary data structure that summarizes the elements inside each block.
Execute an $O(\sqrt{n})$ process by sequentially traversing the auxiliary data structure.

Doubly Linked Lists¶

Constant Splicing.
Dynamic Size.

Operations¶

Insert(x,l) ($O(1)$): Insert item x into doubly linked list l.
Delete(x,l) ($O(1)$): Remove item x from doubly linked list l.
Search(x,l) ($O(n)$): Return a node with item x if one exists in doubly linked list l.

Advice¶

List problems often have simple brute-force solutions that have $O(n)$ space complexity, but subtler solutions use the array itself to reduce space complexity to $O(1)$.
Consider using a sentinel to reduce edge cases such as checking empty lists.
Algorithms operating on lists often benefit from using two iterators, one ahead of the other, or one advancing quicker than the other.
A $k$-ary tree can be represented as a left-child right-sibling binary tree, a doubly chained tree.

Stacks (LIFO)¶

Use¶

Position-Based Retrieval.
Order Does Not Matter.
Nested (Recursive) Structure.

Operations¶

Push(x,s) ($O(1)$): Insert item x at the top of stack s.
Pop(s) ($O(1)$): Return and remove the top item of stack s.

Advice¶

Stacks are ideal for implementing non-trivial parsing.
A Push is ideal for transitioning from a parent context to a child context.
A Pop is ideal for transitioning from a child context to a parent context.
When performing a depth-first search using a stack, it never halts for infinite-depth. It finds any path to the goal, but it has polynomial space complexity.
To prevent infinite recursion using a stack, consider depth-limiting or iterative deepening the recursion.

Queues (FIFO)¶

Use¶

Position-Based Retrieval.
Order Does Matter.
Linear Structure.

Operations¶

Enqueue(x,q) ($O(1)$): Insert item x at the back of queue q.
Dequeue(x,q) ($O(1)$): Return and remove the front item from queue q.

Advice¶

Queues are ideal for implementing non-trivial traversals.
Enqueue and Dequeue are ideal for circularly transitioning through sibling contexts.
When performing a breadth-first search using a queue, it always halts for infinite-depth. It finds the shortest path to the goal, but it has exponential space complexity.

Dictionaries¶

Use¶

Content-Based Retrieval.
Hash-Based: Unordered with $O(1)$.
- Caching.
- Lookup/Symbol Table.
Tree-Based: Ordered with $O(\log(n))$.
- Sorted Traversal.
- Ceiling Query: Find an item with the least key greater than or equal to the given key.
- Flooring Query: Find an item with the greatest key lesser than or equal to the given key.
- Minimum Query: Find an item with the smallest key.
- Maximum Query: Find an item with the largest key.
- Higher Neighbour Query: Find an item with the least key strictly greater than the given key.
- Lower Neighbour Query: Find an item with the greatest key strictly lower than the given key.

Operations¶

Insert(x,d) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Insert item x into dictionary d.
Delete(x,d) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Remove item x (or the item pointed to by x) from dictionary d.
Search(x,d) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Return an item with key k if one exists in dictionary d.

Advice¶

A hash-based dictionary is ideal for partitioning using prefixes, suffixes, fingerprints, and signatures.
A tree-based dictionary is ideal for classifying using comparators as decision processes.
If you need a multi-map, consider your access patterns before having the values of your dictionary be contained in a list, a set, or a tree.
Some problems need a combination of a hash-based dictionary and a tree-based dictionary. Specifically, a hash-based dictionary can index into the entries of a tree-based dictionary.
Consider using a sentinel to reduce edge cases.
When representing ranges in a tree-based dictionary, consider having each entry be a partition, and have negative and positive infinities as sentinels.

Priority Queues¶

Use¶

Constant Minimum/Maximum Query.
$O(n \log(n))$ In-Place Heap Sort.

Operations¶

Insert(x,p) ($O(\log(n))$): Insert item x into priority queue p.
Maximum(p) ($O(1)$): Return the item with the largest key in priority queue p.
ExtractMax(p) ($O(\log(n))$): Return and remove the item with the largest key in p.

Advice¶

N/A.

Sets¶

Use¶

Filtering Duplicates.
Testing Membership.
Hash-Based: Unordered with $O(1)$.
Tree-Based: Ordered with $O(\log(n))$.

Operations¶

Insert(x,S) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Insert element x into set S.
Delete(x,S) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Delete element x from set S.
Member(x,S) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Is an item x an element of set S?
Union(A,B): Construct a subset of A and B of all elements in set A or in set B.
Intersection(A,B): Construct a subset of A and B of all elements in set A and in set B.

Advice¶

N/A.

Hash Codes¶

Use¶

Filtering Duplicates.
Filtering Similarities (e.g. MinHash).
Filtering Nearest Neighbor Search (e.g. Hamming Distance, Cosine Distance, ...)
Canonicalization/Normalization/Standardization (e.g., Trimming, Stripping, Lowercase, Uppercase, ...).
Compaction/Fingerprinting.
Consistent Hashing.
Dimensionality Reduction (e.g. Hilbert Space-Filling Curve).
Partitioning.
Uniform Distribution Shuffling.

Operations¶

Hash(x) $O(1)$: Map arbitrary element x to a fixed-size value.

Advice¶

N/A.

Data Structures¶

Arrays¶

Use¶

See Also¶

Advice¶

Square Root Optimization - "Poor Man's Logarithm"¶

Background¶

Optimization¶

Doubly Linked Lists¶

Operations¶

See Also¶

Advice¶

Stacks (LIFO)¶

Use¶

Operations¶

See Also¶

Advice¶

Queues (FIFO)¶

Use¶

Operations¶

See Also¶

Advice¶

Dictionaries¶

Use¶

Operations¶

See Also¶

Advice¶

Priority Queues¶

Use¶

Operations¶

See Also¶

Advice¶

Sets¶

Use¶

Operations¶

See Also¶

Advice¶

Hash Codes¶

Use¶

Operations¶

See Also¶

Advice¶