Data Structures

Author: Jacob Park

Arrays

Use

  • Constant Random Access.
  • Fixed Size.

See Also

Advice

  • Array problems often have simple brute-force solutions that have $O(n)$ space complexity, but subtler solutions use the array itself to reduce space complexity to $O(1)$.
  • Filling an array from the front is slow, so see if it's possible to write values from the back.
  • Instead of deleting an entry (which requires moving all entries to its right), consider overwriting it with a tombstone.
  • Be comfortable with writing code that operates on subarrays.
  • Don't worry about preserving the integrity of the array (sortedness, etc...) until it's time to return.
  • A one-pass, range query algorithm can be generalized with start, current, and end indices of which each index is updated per cases qualifying the range.
  • A several-pass algorithm although slower than a one-pass algorithm still has $O(n)$ time complexity.

Square Root Optimization - "Poor Man's Logarithm"

Background

  • In the study of time complexity, $O(\log{n}) \ll O(\sqrt{n}) \ll O(n)$.
  • Although for a sufficiently large $n$, $O(\log{n}) \ll O(\sqrt{n})$, for $n \le 10^6$, in practice, $O(\log{n}) \approx O(\sqrt{n})$ because of the physical overhead of conditionally executing the recurrence-tree, lower bound of an $O(\log{n})$ process.

Optimization

  1. Parition an array of $n$ elements into blocks of size $\sqrt{n}$ and maintain an auxiliary data structure that summarizes the elements inside each block.
  2. Execute an $O(\sqrt{n})$ process by sequentially traversing the auxiliary data structure.

Doubly Linked Lists

  • Constant Splicing.
  • Dynamic Size.

Operations

  • Insert(x,l) ($O(1)$): Insert item x into doubly linked list l.
  • Delete(x,l) ($O(1)$): Remove item x from doubly linked list l.
  • Search(x,l) ($O(n)$): Return a node with item x if one exists in doubly linked list l.

See Also

Advice

  • List problems often have simple brute-force solutions that have $O(n)$ space complexity, but subtler solutions use the array itself to reduce space complexity to $O(1)$.
  • Consider using a sentinel to reduce edge cases such as checking empty lists.
  • Algorithms operating on lists often benefit from using two iterators, one ahead of the other, or one advancing quicker than the other.
  • A $k$-ary tree can be represented as a left-child right-sibling binary tree, a doubly chained tree.

Stacks (LIFO)

Use

  • Position-Based Retrieval.
  • Order Does Not Matter.
  • Nested (Recursive) Structure.

Operations

  • Push(x,s) ($O(1)$): Insert item x at the top of stack s.
  • Pop(s) ($O(1)$): Return and remove the top item of stack s.

See Also

Advice

  • Stacks are ideal for implementing non-trivial parsing.
  • A Push is ideal for transitioning from a parent context to a child context.
  • A Pop is ideal for transitioning from a child context to a parent context.
  • When performing a depth-first search using a stack, it never halts for infinite-depth. It finds any path to the goal, but it has polynomial space complexity.
  • To prevent infinite recursion using a stack, consider depth-limiting or iterative deepening the recursion.

Queues (FIFO)

Use

  • Position-Based Retrieval.
  • Order Does Matter.
  • Linear Structure.

Operations

  • Enqueue(x,q) ($O(1)$): Insert item x at the back of queue q.
  • Dequeue(x,q) ($O(1)$): Return and remove the front item from queue q.

See Also

Advice

  • Queues are ideal for implementing non-trivial traversals.
  • Enqueue and Dequeue are ideal for circularly transitioning through sibling contexts.
  • When performing a breadth-first search using a queue, it always halts for infinite-depth. It finds the shortest path to the goal, but it has exponential space complexity.

Dictionaries

Use

  • Content-Based Retrieval.
  • Hash-Based: Unordered with $O(1)$.
    • Caching.
    • Lookup/Symbol Table.
  • Tree-Based: Ordered with $O(\log(n))$.
    • Sorted Traversal.
    • Ceiling Query: Find an item with the least key greater than or equal to the given key.
    • Flooring Query: Find an item with the greatest key lesser than or equal to the given key.
    • Minimum Query: Find an item with the smallest key.
    • Maximum Query: Find an item with the largest key.
    • Higher Neighbour Query: Find an item with the least key strictly greater than the given key.
    • Lower Neighbour Query: Find an item with the greatest key strictly lower than the given key.

Operations

  • Insert(x,d) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Insert item x into dictionary d.
  • Delete(x,d) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Remove item x (or the item pointed to by x) from dictionary d.
  • Search(x,d) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Return an item with key k if one exists in dictionary d.

See Also

Advice

  • A hash-based dictionary is ideal for partitioning using prefixes, suffixes, fingerprints, and signatures.
  • A tree-based dictionary is ideal for classifying using comparators as decision processes.
  • If you need a multi-map, consider your access patterns before having the values of your dictionary be contained in a list, a set, or a tree.
  • Some problems need a combination of a hash-based dictionary and a tree-based dictionary. Specifically, a hash-based dictionary can index into the entries of a tree-based dictionary.
  • Consider using a sentinel to reduce edge cases.
  • When representing ranges in a tree-based dictionary, consider having each entry be a partition, and have negative and positive infinities as sentinels.

Priority Queues

Use

  • Constant Minimum/Maximum Query.
  • $O(n \log(n))$ In-Place Heap Sort.

Operations

  • Insert(x,p) ($O(\log(n))$): Insert item x into priority queue p.
  • Maximum(p) ($O(1)$): Return the item with the largest key in priority queue p.
  • ExtractMax(p) ($O(\log(n))$): Return and remove the item with the largest key in p.

See Also

Advice

  • N/A.

Sets

Use

  • Filtering Duplicates.
  • Testing Membership.
  • Hash-Based: Unordered with $O(1)$.
  • Tree-Based: Ordered with $O(\log(n))$.

Operations

  • Insert(x,S) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Insert element x into set S.
  • Delete(x,S) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Delete element x from set S.
  • Member(x,S) (Hash-Based: $O(1)$; Tree-Based: $O(\log(n))$): Is an item x an element of set S?
  • Union(A,B): Construct a subset of A and B of all elements in set A or in set B.
  • Intersection(A,B): Construct a subset of A and B of all elements in set A and in set B.

See Also

Advice

  • N/A.

Hash Codes

Use

Operations

  • Hash(x) $O(1)$: Map arbitrary element x to a fixed-size value.

See Also

Advice

  • N/A.