Data Structures¶
Author: Jacob Park
Arrays¶
Use¶
- Constant Random Access.
- Fixed Size.
See Also¶
Advice¶
- Array problems often have simple brute-force solutions that have $O(n)$ space complexity, but subtler solutions use the array itself to reduce space complexity to $O(1)$.
- Filling an array from the front is slow, so see if it's possible to write values from the back.
- Instead of deleting an entry (which requires moving all entries to its right), consider overwriting it with a tombstone.
- Be comfortable with writing code that operates on subarrays.
- Don't worry about preserving the integrity of the array (sortedness, etc...) until it's time to return.
- A one-pass, range query algorithm can be generalized with start, current, and end indices of which each index is updated per cases qualifying the range.
- A several-pass algorithm although slower than a one-pass algorithm still has $O(n)$ time complexity.
Square Root Optimization - "Poor Man's Logarithm"¶
Background¶
- In the study of time complexity, $O(\log{n}) \ll O(\sqrt{n}) \ll O(n)$.
- Although for a sufficiently large $n$, $O(\log{n}) \ll O(\sqrt{n})$, for $n \le 10^6$, in practice, $O(\log{n}) \approx O(\sqrt{n})$ because of the physical overhead of conditionally executing the recurrence-tree, lower bound of an $O(\log{n})$ process.
Optimization¶
- Parition an array of $n$ elements into blocks of size $\sqrt{n}$ and maintain an auxiliary data structure that summarizes the elements inside each block.
- Execute an $O(\sqrt{n})$ process by sequentially traversing the auxiliary data structure.
Doubly Linked Lists¶
- Constant Splicing.
- Dynamic Size.
Operations¶
Insert(x,l)
($O(1)$): Insert item x
into doubly linked list l
.
Delete(x,l)
($O(1)$): Remove item x
from doubly linked list l
.
Search(x,l)
($O(n)$): Return a node with item x
if one exists in doubly linked list l
.
See Also¶
Advice¶
- List problems often have simple brute-force solutions that have $O(n)$ space complexity, but subtler solutions use the array itself to reduce space complexity to $O(1)$.
- Consider using a sentinel to reduce edge cases such as checking empty lists.
- Algorithms operating on lists often benefit from using two iterators, one ahead of the other, or one advancing quicker than the other.
- A $k$-ary tree can be represented as a left-child right-sibling binary tree, a doubly chained tree.
Stacks (LIFO)¶
Use¶
- Position-Based Retrieval.
- Order Does Not Matter.
- Nested (Recursive) Structure.
Operations¶
Push(x,s)
($O(1)$): Insert item x
at the top of stack s
.
Pop(s)
($O(1)$): Return and remove the top item of stack s
.
See Also¶
Advice¶
- Stacks are ideal for implementing non-trivial parsing.
- A
Push
is ideal for transitioning from a parent context to a child context.
- A
Pop
is ideal for transitioning from a child context to a parent context.
- When performing a depth-first search using a stack, it never halts for infinite-depth. It finds any path to the goal, but it has polynomial space complexity.
- To prevent infinite recursion using a stack, consider depth-limiting or iterative deepening the recursion.
Queues (FIFO)¶
Use¶
- Position-Based Retrieval.
- Order Does Matter.
- Linear Structure.
Operations¶
Enqueue(x,q)
($O(1)$): Insert item x
at the back of queue q
.
Dequeue(x,q)
($O(1)$): Return and remove the front item from queue q
.
See Also¶
Advice¶
- Queues are ideal for implementing non-trivial traversals.
Enqueue
and Dequeue
are ideal for circularly transitioning through sibling contexts.
- When performing a breadth-first search using a queue, it always halts for infinite-depth. It finds the shortest path to the goal, but it has exponential space complexity.
Dictionaries¶
Use¶
- Content-Based Retrieval.
- Hash-Based: Unordered with $O(1)$.
- Caching.
- Lookup/Symbol Table.
- Tree-Based: Ordered with $O(\log(n))$.
- Sorted Traversal.
- Ceiling Query: Find an item with the least key greater than or equal to the given key.
- Flooring Query: Find an item with the greatest key lesser than or equal to the given key.
- Minimum Query: Find an item with the smallest key.
- Maximum Query: Find an item with the largest key.
- Higher Neighbour Query: Find an item with the least key strictly greater than the given key.
- Lower Neighbour Query: Find an item with the greatest key strictly lower than the given key.
Operations¶
Insert(x,d)
(Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Insert item x
into dictionary d
.
Delete(x,d)
(Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Remove item x
(or the item pointed to by x
) from dictionary d
.
Search(x,d)
(Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Return an item with key k
if one exists in dictionary d
.
See Also¶
Advice¶
- A hash-based dictionary is ideal for partitioning using prefixes, suffixes, fingerprints, and signatures.
- A tree-based dictionary is ideal for classifying using comparators as decision processes.
- If you need a multi-map, consider your access patterns before having the values of your dictionary be contained in a list, a set, or a tree.
- Some problems need a combination of a hash-based dictionary and a tree-based dictionary. Specifically, a hash-based dictionary can index into the entries of a tree-based dictionary.
- Consider using a sentinel to reduce edge cases.
- When representing ranges in a tree-based dictionary, consider having each entry be a partition, and have negative and positive infinities as sentinels.
Priority Queues¶
Use¶
- Constant Minimum/Maximum Query.
- $O(n \log(n))$ In-Place Heap Sort.
Operations¶
Insert(x,p)
($O(\log(n))$): Insert item x
into priority queue p
.
Maximum(p)
($O(1)$): Return the item with the largest key in priority queue p
.
ExtractMax(p)
($O(\log(n))$): Return and remove the item with the largest key in p
.
See Also¶
Advice¶
Sets¶
Use¶
- Filtering Duplicates.
- Testing Membership.
- Hash-Based: Unordered with $O(1)$.
- Tree-Based: Ordered with $O(\log(n))$.
Operations¶
Insert(x,S)
(Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Insert element x
into set S
.
Delete(x,S)
(Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Delete element x
from set S
.
Member(x,S)
(Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Is an item x
an element of set S
?
Union(A,B)
: Construct a subset of A
and B
of all elements in set A
or in set B
.
Intersection(A,B)
: Construct a subset of A
and B
of all elements in set A
and in set B
.
See Also¶
Advice¶
Hash Codes¶
Use¶
Operations¶
Hash(x)
$O(1)$: Map arbitrary element x
to a fixed-size value.
See Also¶
Advice¶