ECE 457A - Cooperative and Adaptive Algorithms¶

Introduction¶

Intelligence¶

Intelligence: The ability to acquire and apply knowledge and skills.
Artificial Intelligence: The science of creating intelligent machines, including intelligent computer programs.

Rational Thinking & Rational Behavior¶

Rational System: A logical system that optimizes a given set of criteria.
Rational Thinking: A logical system that achieves goals via logical inferencing.
Rational Behavior: A logical system that perceives its environment and acts to achieve goals according to some set of beliefs.

Agents¶

Agent: Senses its environment and acts on collected information.
Rational Agent: An agent that acts in a way that is expected to maximize performance on the basis of perceptual history and built-in knowledge.

Types of Agents¶

Simple Reflex Agents: Follow a lookup-table approach; needs fully observable environment.
Model-Based Reflex Agents: Add state information to handle partially observable environments.
Goal-Based Agents: Add concept of goals to help choose actions.
Utility-Based Agents: Add utility to decide "good" or "bad" when faced with conflicting goals.
Learning Agents: Add ability to learn from experience to improve performance.

Environments¶

Fully vs. Partially Observable:
- Fully Observable: Sensors can detect all aspects relevant to the choice of an action.
- Partially Observable: Missing Information or Inaccurate Sensors.
Deterministic vs. Stochastic:
- Deterministic: Environments that are only influenced by their current state and the next action executed by the agent.
- Stochastic: Randomness/Noise.
Episodic vs. Sequential:
- Episodic: The choice of an action in each episode does not depend on previous episodes.
- Sequential: An agent is required to "think ahead".
Static vs. Dynamic: N/A.
Discrete vs. Continuous: N/A.
Single Agent vs. Multi Agent: N/A.

Cooperative and Adaptive Algorithms¶

Cooperative: Solve Joint Problems.
Adaptive: Change Behavior While Running.

Search Problem Formulation¶

Characteristics of Search Problems¶

Large, Non-Polynomial Search Space Size
Large, Non-Polynomial Constraints Size

Well-Structured vs. Ill-Structured Problems¶

Well-Structured Problems: Problems for which the existing state and desired state are clearly identified, and the methods to reach the desired state are fairly obvious.
Ill-Structured Problems: Situation in which its existing state and the desired state are unclear and, hence, methods of reaching the desired state cannot be found.
1. Start & Improve Guess
2. Search Alternatives
3. Forward Search from Problem to Answer
4. Backward Search from Goal to Problem Situation

Optimization Problems¶

Optimization Problem: Finding the best solutions from a set of solutions subject to a set of constraints.

Types of Optimization Algorithms¶

Exact Algorithms:
- Find Optimal Solution
- High Computational Cost
Approximate Algorithms:
- Find Near-Optimal Solution
- Low Computational Cost

Approximate Algorithms¶

Heuristics: A solution strategy or rules by trial and error to produce acceptable (optimal or sub-optimal) solutions to complex problems in a reasonably practical time.
Constructive Methods: A solution is constructed by iteratively introducing a new component.
Local Search Methods: An initial solution is improved by iteratively applying actions.

Goal and Problem Formulation¶

Requirements for Search: Goal Formulation + Problem Formulation
Closed World Assumption: All necessary information about a problem domain is available in each percept so that each state is a complete description of the world.

Problem Formulation Template¶

State Space: Complete/Partial Configuration of Problem
- Required: Each State = UNIQUE
Initial State: Beginning Search State
Goal State: Ending Search State
Action Set: Set of Possible State Transitions
Cost: Comparison Function between Solutions

Extra Terminologies¶

State: Any Possible Agent/Problem Configuration
Transition Model: Action Description

Graph Search Algorithms¶

Search Tree Terminology¶

Node: Search Problem State
Edge: Search Problem Action
Fringe: Frontier/Leaves of Search Tree
Branching Factor ($b$): The maximum number of child nodes extending from a parent node.
Maximum Depth ($m$): The number of edges in the shortest path from the root node to the furthest leaf node.
Optimal Goal Depth ($d$): the number of edges in the shortest path from the root node to an optimal goal node.

Properties of Search Algorithms¶

Completeness: Guarantee Find A Goal Node
Optimality: Guarantee Find Best Goal Node
Time Complexity: # of Nodes Generated
Space Complexity: # of Nodes Stored

Generic Search¶

Fringe = Queue-Like Data Structure

Choose Node
Test Node
Expand Node

Local Search Strategies¶

Uninformed Strategies: No knowledge of the direction of goal nodes.
- Breadth-First
- Depth-First
- Depth-Limited
- Uniform-Cost
- Depth-First Iterative Deepening
- Bidirectional
Informed Strategies: Domain knowledge of the direction of goal nodes.
- Hill Climbing
- Best-First
- Greedy Search
- Beam Search
- A
- A*

Uninformed Search Strategies¶

Breadth-First Search¶

Expand the shallowest unexpanded nodes, storing the fringe to be expanded in a FIFO queue.

Properties of Breadth-First Search¶

Completeness:
- Yes - If $b$ is Finite
Optimality:
- Yes - If Cost = Depth
Time Complexity: $O(b^{d + 1}) \approx O(b^d)$
Space Complexity: $O(b^{d + 1}) \approx O(b^d)$

Uniform Cost Search¶

Expand the lowest cost unexpanded node, storing the fringe to be expanded in a minimum priority queue.
Required: No Zero/Negative-Cost Edges

Properties of Uniform Cost Search¶

Let $C^*$ be the path cost to the goal.
Let $\epsilon$ be the minimum cost of all other actions.
Completeness:
- Yes - If $b$ is Finite
Optimality: Yes
Time Complexity: $O(b^{\frac{C^*}{\epsilon} + 1}) \approx O(b^d)$
Space Complexity: $O(b^{\frac{C^*}{\epsilon} + 1}) \approx O(b^d)$

Depth-First Search¶

Expand the deepest unexpanded nodes, storing the fringe to be expanded in a LIFO stack.

Properties of Depth-First Search¶

Completeness:
- No - If Search Space with Infinite-Depth/Loops
Optimality: No
Time Complexity: $O(b^m)$
Space Complexity: $O(bm)$

Depth-Limited Search¶

Execute DFS with a maximum search depth as a restriction.
- Prevents Infinite-Depth Problem
- Prevents Loops Problem

Properties of Depth-Limited Search¶

Let $l$ be the maximum search depth.
Completeness:
- Yes - If Solution's Depth $d \le l$
Optimality: No
Time Complexity: $O(b^l)$
Space Complexity: $O(bl)$

Iterative Deepening Search¶

Iteratively, execute DLS with an increasing maximum search depth $l$ until a solution is found.

Properties of Iterative Deepening Search¶

Completeness:
- Yes - If $b$ is Finite
Optimality:
- Yes - If Cost = Depth
Time Complexity: $O(b^d)$
Space Complexity: $O(bd)$

Breadth-First vs. Depth-First Strategies¶

Breadth-First Strategies¶

High Memory Requirement
Never Stuck on Infinite Depths
Find Shortest Path to Goal

Depth-First Strategies¶

Low Memory Requirement
Stuck on Infinite Depths
Find Any Path to Goal

Avoiding Repeated States¶

Increasing Computational Costs:

Do not return to the state your just came from.
Do not create paths with cycles in them.
Do not generate any state that was ever created before.

Informed Search Strategies¶

Overview¶

Apply domain knowledge in a problem to search the "most promising" branches first.
Potentially, find solutions faster or cheaper than uninformed search algorithms.

Heuristics¶

A heuristic function $h(n)$ can be used to estimate the "goodness" of node $n$.
- $\forall n, h(n) \ge 0$
- $h(n) = 0$ $\implies$ $n$ is a goal node.
- $h(n) = \infty$ $\implies$ $n$ is a dead end that does not lead to a goal.
Admissible/Optimistic: If a heuristic function never overestimates the cost of reaching the goal.

Strong vs. Weak Methods¶

Strong Methods: Specific Approach to Some Problems
Weak Methods: General Approach to Many Problems

Examples of Weak Methods¶

Mean-End Analysis: A strategy where a representation is formed for the current and goal state, and actions are analyzed that shrink the difference between the two.
Space Splitting: A strategy where possible solutions to a problem are listed, and then classes of these solutions are ruled out to shrink the search space.
Subgoaling: A strategy where a large problem is split into independent smaller ones.

Best-First Search/Greedy Search¶

~Uniform Cost Search with Priority Queue
- Minimize $f(n) \mapsto h(n)$
- Greedy If $f(n) = h(n)$

Properties of Best-First Search/Greedy Search¶

Completeness:
- No - Stuck in Loops
Optimality:
- No
Time Complexity: $O(b^m)$
Space Complexity: $O(b^m)$

Beam Search¶

~Breadth-First Search + Reduced Memory Requirements
- Expands Best $\beta$ (Beam Width) Nodes Per Level

Properties of Beam Search¶

Admissible:
- No
Completeness:
- No
Optimality:
- No
Time Complexity: $O(\beta b)$
Space Complexity: $O(\beta b)$

A Search/A* Search

A Search: Best-First Search with $f(n) = g(n) + h(n)$
- $g(n)$ is the cost from the start to $n$.
- $h(n)$ is the cost from $n$ to the goal.
A* Search: Constraint $h(n) \le h^{\ast}(n)$
- $h^{\ast}(n)$ is the actual minimal path cost from $n$ to the goal.

Properties of A Search¶

Admissible:
- No
Completeness:
- No - If $h(n) \to \infty$
Optimality:
- No

Properties of A* Search

Admissible:
- Yes - If $h(n) \le h^{\ast}(n)$
Completeness:
- Yes - If $b$ is finite and only fixed positive costs.
Optimality:
- Yes

Hill Climbing Search¶

Improvement of Depth-First Search
~Beam Search with $\beta = 1$
~Greedy Search with No Backtracking
Not Complete at Local Minima, Plateaus, Ridges

Start with an arbitrary solution.
Attempt to improve the solution by changing a single element at a time.
Sort the successors of a node according to their heuristic values, and then adding them to the list to be expanded.
Make changes until no further improvements can be found.

Rule of Hill Climbing Search¶

If there is a successor $s$ for node $n$ such that:
- $h(s) < h(n)$ and
- $h(s) \le h(t)$ for all successors $t$ of $n$.
True $\implies$ Advance from $n$ to $s$.
False $\implies$ Halt at $n$.

Heuristics¶

Perfect Heuristic: If $h(n) = h^{\ast}(n)$, then only nodes on the optimal solution are expanded.
Null Heuristic: If $h(n) = 0$, then A* behaves like uniform cost search.
Better Heuristic: If $h_{1}(n) < h_{2}(n) < h^{\ast}(n)$, then $h_{2}$ is a better heuristic than $h_{1}$.

Game Playing as Search¶

Overview¶

Games involve playing against an opponent, where search problems involve finding a good move, waiting for an opponent's response, and then repeating.
Time is typically limited in each search.

Problem Formulation of Games¶

Initial State: Initial Position + Whose Move It Is
Operators: Legal Player Moves
Goal (Terminal Test): Is Game Over?
Utility (Payoff): Measures Outcome/Desirability

Types of Games¶

Perfect Information: Each player has complete information on the opponent's state and available choices.
Imperfect Information: Each player does not have complete information on the opponent's state and available choices.

Max Min Strategy¶

With perfect information and two players, a game tree can be expanded to describe all possible moves of the player and the opponent in the game.
Zero Sum Games: Player Win $\implies$ Opponent Loss
Minimax Principle: Minimize the maximum losses that occur.

Minimax Algorithm¶

Example of Minimax Algorithm

Important Note: Bottom-Up

Generate the game tree labeling each level with alternating $\text{MAX}(player)$ and $\text{MIN}(opponent)$ labels.
Apply the utility function to each terminal state (leaf) to get its minimax value.
Extrapolate these minimax values to determine the utility of the nodes on level higher in the search tree.
- For a $\text{MAX}(player)$ level, select the maximum minimax value of its successors.
- For a $\text{MIN}(opponent)$ level, select the minimum minimax value of its successors.
From the root node, select the move which leads to the highest minimax value.

Limited Depth¶

For complicated games, a limited depth of the game tree should be explored.
An evaluation function $f(n)$ is used to measure the "goodness" of a game state.

Properties of Minimax Algorithm¶

Completeness:
- Yes - If game tree is finite
Optimality:
- Yes - If opponent is optimal
Time Complexity: $O(b^{d})$
Space Complexity: $O(bd)$

$\alpha$-$\beta$ Pruning¶

Branch and Bound: Reduce # of Generated/Evaluated Nodes
- Avoid Processing Subtrees $\ne$ Affecting Result
Alpha ($\alpha$): The best value for $\text{MAX}$ seen so far.
- Used in $\text{MIN}$ nodes
- Assigned in $\text{MAX}$ nodes
- Never Decreases
Beta ($\beta$): The best value for $\text{MIN}$ seen so far.
- Used in $\text{MAX}$ nodes
- Assigned in $\text{MIN}$ nodes
- Never Increases
Alpha Cutoff (Lower Bound): When the value of a minimum position is less than or equal to the alpha-value of its parent, stop generating further successors.
Beta Cutoff (Upper Bound): When the value of a maximum position is greater than the beta-value of its parent, stop generating further successors.

$\alpha$-$\beta$ Minimax Algorithm Revisions¶

Search discontinued below any $\text{MIN}$ with $\beta \le \alpha$ of one of its ancestors.
- Set final value of the node to be this $\beta$ value.
Search discontinued below any $\text{MAX}$ with $\alpha \ge \beta$ of one of its ancestors.
- Set final value of the node to be this $\alpha$ value.

Metaheuristics¶

Overview¶

Overview of Metaheuristic Methods

Metaheuristics: High-level heuristics designed to select other heuristics to solve a problem by exploring and exploiting its search space.
- Approximate Solutions
- Nondeterministic Solutions

Properties¶

Mechanisms to avoid getting trapped in confined areas of the search space.
Not problem-specific; may use domain-specific knowledge from heuristics controlled by upper-level strategy.
Search history to guide the search.
Hybrid search models where the search identifies neighborhoods where a goal may lie, and then the search is intensified in that area.

Population-Based Methods¶

Population-based methods are metaheuristic approaches that apply multiple agents to a search space and can handle multiple simultaneous solutions.

Trajectory Methods¶

Trajectory methods are metaheuristic variants of local search that apply memory structure to avoid getting stuck at local minima, and implement an explorative strategy that tries to avoid revisiting nodes.

Simulated Annealing¶

Physical Annealing Analogy¶

Physical annealing involves heating a substance (e.g. a metal) and then letting it cool to increase its ductility and reduce hardness.
The goal is to make the molecules in a cooled substances arrange themselves in a low-energy structure, and the properties of this structure are influenced by the temperatures reached and the rate of cooling.
A sequence of cooling times and temperatures is referred to as an annealing or cooling schedule.

Simulated Annealing Algorithm¶

Let $s = s_{0}$ be a current solution initialized to $s_{0}$.
Let $t = t_{0}$ be a current temperature initialized to $t_{0}$.
Let $\alpha$ be a temperature reduction function.

Repeat,
1. Repeat,
  1. Select a solution $s_{i}$ from the neighborhood $N(s)$.
  2. Calculate the change in cost $\Delta C$.
  3. If $\Delta C < 0$, then accept the new solution: $s = s_{i}$.
  4. Else, generate a random number $x \in (0, 1)$.
  5. If $x < \exp(\frac{-\Delta C}{t})$, then accept the new solution: $s = s_{i}$.
2. Until maximum number of iteration for $t$.
3. Decrease $t$ using $\alpha$.

Strategy¶

Simulated Annealing Strategy

Simulated annealing always accepts better solutions.
Simulated annealing randomly accepts worse solutions.
At higher temperatures, explore parameter space.
At lower temperatures, restrict exploration.

$$\text{ Low Temperature } \wedge \text{ High Change in Cost } \implies \text{ Low Acceptance Probability }$$

Annealing Schedule¶

Annealing Schedule: Adjusts Temperature
- Initial Temperature
- Final Temperature
- Temperature Decrement Rule
- Temperature Iterations

Initial Temperature¶

The initial temperature should be high enough to allow exploration to any part of the search space.
If the initial temperature is too hot, simulated annealing would behave too randomly.
The maximum change of a cost function should be considered when setting the initial temperature.
General Rule: Set the initial temperature to accept around $60\%$ of worse solutions.

Final Temperature¶

The final temperature should be quite low but not neccessarily have to reach zero.
A search using simulated annealing can be stopped once no better moves are being found and no worse moves are being accepted.

Temperature Decrement Rule¶

Linear: $t = t - \alpha$
Geometric: $t = t \times alpha$
Slow Decrease: $t = \frac{t}{1 + \beta t}$

Temperature Iterations¶

Enough iterations should be allowed at every temperature for the system to be stable at that temperature.
If the search space is very large, a large number of iterations may be required.
If the slow decrease rule is used, one iteration per temperature should be used.

Convergence¶

Simulated annealing is guaranteed to eventually converge to a solution at a constant temperature, assuming some sequence of moves leads to the goal state.
When temperature is not constant, convergence can still be guaranteed but only under conditions that result in very slow temperature reduction and an exponential increase in the number of iterations at each temperature.

Advantages and Disadvantages¶

Advantages¶

Easy
Widely Applicable

Disadvantages¶

Time Complexity
Many Tunable Parameters

Adaptation¶

Adaptation refers to adapting the critical parameters of the simulated annealing algorithm.

Initial Temperature¶

Finding the right temperature is very problem-specific, and different search algorithms can be applied in finding this temperature.

Cooling Schedule¶

Some cooling schedules require that only the cooling rate $\alpha$ is specified, and the remainder of parameters are automatically determined using a linear random combination of previously accepted states and parameters to estimate new steps and parameters.

Probability of Acceptance¶

Some attempted adaptations concerning the probability of acceptance include using a lookup table for the relevant calculations (to decrease computation time) or using a different, non-exponential probability formula.

Cost Function¶

Cost functions that return similar values for many different states tend to not lead to effective search.
As an alternative, a cost function can have a penalty term associated with certain types of states, and weighting of these penalty terms can vary dynamically.

Cooperation¶

Cooperative simulated annealing involves multiple concurrent runs of a simulated annealing search algorithm on a search space.
Potential solutions are produced by somehow combining the value of a run with the value of a random previous run of the algorithm.
This exchange of information from other solutions is known as cooperative transition, and is a concept borrowed from genetic algorithms.

Tabu Search¶

Overview¶

Tabu Search: A general trajectory-based metaheuristic strategy for controlling inner heuristics.
- Combination of Local Search Strategy + Search Experience Model $\implies$ Escape Local Minima + Explorative Strategy

Local Search Strategy¶

Start with an initial feasible solution.
Repeat,
1. Generate a neighboring solution by applying a series of local modifications.
2. If the new solution is better, then replace the current solution.

Local Search Strategy Challenges¶

It can be costly to consider all possible local modifications.
It can get stuck in local optimum.

Basic Ideas¶

Penalize moves that take a solution back to a previously visited (tabu) state.
Accepts non-improving solutions in order to escape from local optima and eventually find a better solution.

Use of Memory¶

Short-Term Memory (Recency of Occurrence): Prevent the search from revisiting recently visited solutions.
- Tabu List: A short-term memory structure that stores recent moves applied to the current solutions or their attributes.
- Tabu Tenure: The number of iterations $T$ for which a certain move or its attributes are kept in the list.
- Complete solutions are rarely used because of the space requirement.
Long-Term Memory (Frequency of Occurrence): Prevent the search from revisiting frequently visited solutions.

Neighborhood¶

When selecting a new state, consider neighbors that are not on the tabu list. $$N(s) - T(s)$$
The neighborhood structure $N(s)$ can be reduced and modified based on history and knowledge.

Termination Conditions¶

Sample Conditions:
- No feasible solution in the neighborhood of current solution.
- Reached the maximum number of iterations allowed.
- The number of iterations since the last improvement is larger than a specified number.
- Evidence shows that an optimum solution has been obtained.

Candidates and Aspiration¶

A candidate list stores the potential solutions in a neighborhood to be examined.
- Isolate regions of a neighborhood with desirable features, aiming to find a solution more efficiently.
- At times, it can be desirable to include a move in a candidate list even if it is tabu in order to prevent stagnation.
- Aspiration Criteria: Approaches for Canceling Tabus

Tabu Search Algorithm¶

Let $s = s_{0}$ be a current solution initialized to $s_{0}$.
Let $N(s)$ be the neighborhood of the current solution.
Let $T(s)$ be the tabu list of the current solution.
Let $A(s)$ be the aspiration list of the current solution.

Repeat,
1. Select the best solution $s'$ from $N^{\ast}(s) = N(s) - T(s) + A(s)$.
2. Memorize $s'$ if it is improves the best known solution.
3. Let $s = s'$.
4. Update $T(s)$ and $A(s)$
Until termination criteria are met.

Selecting Tabu Restrictions¶

Sample Restrictions:
- Not picking a move that involves the same exchange of positions of a tabu move.
- Not picking a move that results in positions that previously appeared in a tabu move.

Selecting Tabu Tenure¶

Sample Strategies:
- Statically assigning $T$ to be a constant: $\sqrt{n}$ where $n$ is the problem size.
- Dynamically letting $T$ vary between a $T_{min}$ and a $T_{max}$.
  - Advantages: Better Limits Cycles
  - Disadvantages: May Contain Cycles Longer Than Tabu Tenure

Selecting Aspiration Criteria¶

Sample Strategies:
- By default, where a tabu move becomes admissible if it yields a better solution than any found so far.
- By objective, where a tabu move becomes admissible if it yield a solution better than an aspiration value.
- By search direction, where a tabu move becomes admissible if the direction of the search remains constant.

Intensification¶

Intensification: The process of exploiting a small portion of the search space (e.g. penalizing solutions far from the current solution).
General Idea: Locally optimize a best known solution while trying to preserve the general components of that solution (based on short-term memory).

Diversification¶

Diversification: The process of forcing the search into unexplored areas (e.g. penalizing solutions close to the current solution).
General Problem: Tabu search can miss some good solutions in unexplored search space areas.

Diversification Strategies¶

Restart Diversification: Where components rarely appearing in solutions are forced into new solutions.
Continuous Diversification: Where the evaluation of possible moves is biased by a term related to component frequency.

Adaptation¶

Adaptation refers to a series of techniques for varying the tabu tenure.
- If the tabu tenure is too small, then cycles in the search are likely.
- If the tabu tenure is too big, then many moves could be prevented at each iteration.

Adaptation Techniques¶

Randomly select a new tenure from a pre-computed range every predetermined number of iterations.
Set the tenure to one if a best-so-far solution is found, decreasing it in an improving phase, and increasing it in a worsening phase.

Cooperation¶

Cooperative tabu search involves multiple concurrent runs of a tabu search algorithm on a search space.
Synchronous Communication: Search agents exchange information every fixed number of iterations.
Asynchronous Communication: Search agents relay their best-so-far results to a central memory.
Forced Diversification: A search agent can replace its own best-so-far solution with an incoming solution.
Conditional Import: A seach agent can replace its own best-so-far solution with an incoming solution only if the incoming solution is better.

Observations on Tabu Search Cooperation¶

Increasing the number of search agents improves the solution up to a certain point.
Increasing the number of synchronization messages increases the computation time due to message passing overhead.
Conditional imports are almost always preferable to forced diversification.

Swarm Intelligence¶

Overview¶

Swarm Intelligence: The collective behavior of decentralized, self-organized systems.

Interactions¶

Swarm: A group of agents that communicate with each other by acting on their local environment.
Complex problem-solving behavior may emerge not as a result of any particular individual, but rather as a result of their interactions.
Interactions between individuals may be direct (physical contact) or indirect (via local change to the environment, stigmergy).

Properties¶

Flexibility: System performance is adaptive to internal or external changes.
Robustness: System can perform even if some individuals fail.
Decentralization: Control is distributed among individuals rather than allocated to some master.
Self-Organization: Global behaviors emerge as a result of local interactions.

Models of Behavior¶

Swarm: Group with little parallel alignment.
Torus: Group in which individuals rotate around an empty core in one direction.
Dynamic Parallel Group: Group where individuals are polarized and move as a coherent group, but can still move throughout the group such that the group density and form fluctuates.
Highly Parallel Group: Group similar to dynamical parallel group but with minimal fluctuations.

Swarm Intelligence Problem Solving¶

Proximity: The swarm should be able to carry out simple space and time computations.
Quality: The swarm should be able to respond to quality factors in its environment.
Diverse Response: The swarm should not commit to excessively narrow channels of exploration.
Stability: The swarm should not rapidly alter its behavior in response to all environmental changes.
Adaptability: The swarm should be willing to change its behavior when it is worth the computational price.

Ant Colony Optimization¶

Background¶

Ant Colony Optimization: A search technique based on the swarm intelligence of ants.
Ants interact with one another through stigmergy, meaning that individual ants can make changes to their environment to be picked up by other ants.
Ants do this by forming trails with substances known as pheromone. Ants tend to follow paths with higher pheromone concentrations. Pheromone also evaporates over time, so more recent pheromone deposits have a greater influence than older ones.
Ants can use this mechanism to find the shortest path to a destination. If multiple paths are available, ants will pick their initial paths randomly. Ants that take the shorter path will return faster, so their trail will have more unevaporated pheromone. This makes other ants more likely to pick this trail, and eventually, they converge on this shortest path.

ACO Algorithm¶

Let $G = (N, E)$ be a graph where $N$ is a set of nodes and $E$ is a set of edges.
Let $d_{i, j}$ be the length of each edge $(i, j)$.
Let $\tau_{i, j}$ be the amount of pheromones of each edge $(i, j)$.

Initialization¶

Initialize all edges with a small amount of pheromones.
Initialize the source node with a group of $m$ ants.

Transition Rule¶

At each node $i$, with adjacent nodes $N_{i}$, an ant can move to an adjacent node $j$ with the following probability. $$\frac{\frac{\tau_{i, j}^{\alpha}}{d_{i, j}^{\beta}}}{\sum_{n \in N_{i}} \frac{\tau_{i, n}^{\alpha}}{d_{i, n}^{\beta}}}$$
$\alpha$ and $\beta$ balance the local and the global search abilities respectively.

Pheromone Evaporation and Update¶

In each step, the amount of pheromone on the trail is evaporated (i.e. $\tau = \tau \cdot (1 - p)$).
In each step, the amount of pheromone on the trail is increased if ants choose the trail (i.e. $\tau = \tau + \Delta\tau$).
- Ant Density Model: Where a constant value is added.
- Ant Quantity Model: Where a constant is divided by the edge length.
- Online Delayed Model: Where an ant first builds a solution, then traces its path backwards and adds pheromone based on the solution quality.

Termination Criteria¶

Sample Criteria:
- Maximum Number of Iterations Reached
- Good Enough Solution Reached
- Stagnation Occurs

Tunable Parameters¶

Number of Ants
Maximum Number of Iterations
Initial Pheromone
Pheromone Delay Parameters (i.e. $p$)

Ant Colony System Algorithm¶

The ant colony system (ACS) algorithm features the following extensions on ACO:
- The transition rules sometimes just choose the best path (i.e. acts greedily) instead of applying probabilistic selection.
- Pheromone update is only based on the best solution (i.e. highest increase in $\tau$ either globally or in iteration).

Max-Min Ant System Algorithm¶

The max-min ant system algorithm is an extension that restricts pheromone values within a range.
- The max and min pheromone values can be adjusted to favor exploration over exploitation during early phases of the search.

Adaptation¶

ACSGA-TSP¶

Have a genetic algorithm running on top of ACS to attempt to optimize its parameter values.
Each ant has certain parameters (e.g. $p$, $\beta$) encoded into a chromosome.
New generations are formed by crossing over the best-performing ants.

Near Parameter Free ACS¶

Apply an ant approach to optimize ant parameters.

Cooperation¶

Heterogeneous cooperation approaches involve ants in different colonies having different behavior.
- This can be used, for instance, to optimize for different criteria of a solution.
Homoegeneous cooperation approaches involve ants in different colonies having similar behavior.
- Exchange of information can take place in a similar way to that of genetic algorithms.
- Homogeneous cooperation implementations can be course-grained, where each process holds a single ant, or fine-grained, where each process holds a colony.
- An effective version of homogeneous cooperation is a circular exchange of locally best solutions.

Advantages and Disadvantages¶

Advantages¶

Memory of entire colony retained.
Poor solutions rarely converged on due to many combinations of path selection.
Effective handling of dynamic environments.

Disadvantages¶

Theoretical analysis is limited.
Many parameters to tune.
Convergence may take long.

Particle Swarm Optimization¶

Background¶

Goal: Simulate Collective Behavior
- Individuals have no knowledge of the global behavior of the group.
- Individuals have the ability to move together based on social interaction between neighbors.
Separation Behavior: Each agent tries to move away from its nearby mates if they are too close.
Alignment Behavior: Each agent steers towards the average heading of its nearby mates.
Cohesion Behavior: Each agent tries to go towards the average position of its nearby mates.
Roost: An attractor for agents.
- Represented using the agent's previous best position and the the neighborhood's best position.
- By adjusting the positions of the swarm proportion to the distance from the best positions, they converge to the goal.

Overview¶

Particle Swarm Optimization: A stochastic optimization approach that manipulates a number of candidate solutions at once.
Particle: An individual solution.
Swarm: A whole population of solutions.

PSO Algorithm¶

Let $x_{i}$ be a particle's current position.
Let $v_{i}$ be a particle's current velocity.
Let $pbest_{i}$ be a particle's best position.
Let $Nbest$ be a particle's neighborhood's best position.
- If the neighborhood is the whole swarm, the best position is called the global best: $gbest_{i}$ or $p_{g}$.
- If the neighborhood is restricted to few particles, the best position is called the local best: $lbest_{i}$ or $p_{l}$.

Motion Equations¶

$$ \begin{aligned} v_{t + 1}^{id} &= w \ast v_{t}^{id} + c_{1}r_{1}^{id} (pbest_{t}^{id} - x_{t}^{id}) + c_{2}r_{2}^{id} (Nbest_{t}^{id} - x_{t}^{id}) \\ x_{t + 1}^{id} &= x_{t}^{id} + v_{t + 1}^{id} \end{aligned} $$

$w$ is the inertia weight.
$c_{1}$ and $c_{2}$ are the acceleration coefficients.
$r_{1}$ and $r_{2}$ are randomly generated numbers in $[0, 1]$.
- Generated for each dimension and not for each particle.
$t$ is the iteration number.
$i$ and $d$ are the particle number and the dimension.

Interpretation of Motion Equations¶

Inertia: $w \ast v_{t}^{id}$
- A particle cannot suddenly change its direction of movement.
Cognitive Component: $c_{1}r_{1}^{id} (pbest_{t}^{id} - x_{t}^{id})$
- $c_{1}$ makes a particle trust its own experience.
Social Component: $c_{2}r_{2}^{id} (Nbest_{t}^{id} - x_{t}^{id})$
- $c_{2}$ makes a particle trust the swarm experience.

Synchronous Update¶

Initialize the swarm.
While the termination criteria is not met,
1. For each particle,
  1. Update the particle's velocity.
  2. Update the particle's position.
  3. Update the particle's personal best.
2. Update the $Nbest$.

Asynchronous Update¶

Initialize the swarm.
While the termination criteria is not met,
1. For each particle,
  1. Update the particle's velocity.
  2. Update the particle's position.
  3. Update the particle's personal best.
  4. Update the $Nbest$.

Termination Criteria¶

Sample Conditions:
- A max number of iterations has been reached.
- A max number of function evaluations have been reached.
- An acceptable solution has been found.
- No improvement over a number of iterations.

Neighborhoods¶

Neighbourhoods

Global Best Model¶

Each particle is influenced by all the other particles.
The fastest propagation of information.
Easily stuck in local minima.

Local Best Model¶

Each particle is influenced only by particles in its own neighbourhood.
The slowest propagation of information.
Does not get easily stuck in local minima.

Initialization¶

Initialize $x_{i} = \text{random}([\alpha, \beta])$.
Initialize $v_{i} = 0$ or $v_{i} = \epsilon$.
Initialize $pbest_{i} = x_{i}$.
Initialize $c_{1} = c_{2}$.
- If $c_{1} = 0$, social-only model; particles are all attracted to $Nbest$.
- If $c_{2} = 0$, cognition-only model; particles are independent hill climbers.
- Small $c$ promotes smooth trajectories.
- Large $c$ promotes abrupt trajectories.
Initialize $w$ to balance exploration and exploitation.
- Small $w$ promotes exploitation.
- Large $w$ promotes exploration.

Convergence¶

Studies of PSO proved what suitable parameters guarantee convergence of a deterministic PSO algorithm.

Binary PSO¶

Each position is either a one or a zero.
Each velocity is the probability that one element will be a one or a zero.
- The sigmoid function $sig(v_{t}^{id}) = \frac{1}{1 + e^{-V_{t}^{id}}}$ ensures the velocities represent probabilities.
- If $r < sig(v_{t + 1}^{id})$, then $x_{t + 1}^{id} = 1$, else $x_{t + 1}^{id} = 0$.
  - Where $r$ is a randomly generated number in $[0, 1]$.

Permutation PSO¶

Each position is a permutation.
Each velocity is the set of swaps to be performed on a particle.

Adding a Velocity to a Position¶

Apply the sequence of swaps defined by the velocity to the position vector.

Subtracting Two Positions¶

Subtracting two positions should produce a velocity.
Produces the sequence of swaps that could transform one position to the other.

Multiplying a Velocity by a Constant¶

Change the length of the velocity vector (number of swaps) according to the constant $c$:
- If $c = 0$, the length is set to zero.
- If $c < 1$, the velocity is truncated.
- If $c > 1$, the velocity is augmented.

Permutation PSO (Alternative)¶

Space Transformation: Continuous Domain to Permutation
Great Value Priority: All the elements in the position vector and their indices were sorted in descending order.
- The sorted indices are treated as a permutation.
- If $x_{i}$ had the highest value in the position vector, $i$ comes first in the permutation vector.

Adaptation¶

Tribe: A group of connected particles.
- All tribes communicate to decide the global optimum amongst all the different solutions.
Neutral Particle:: A particle whose $pbest$ did not improve in the last iteration.
Good Particle: A particle whose $pbest$ improved in the last iteration.
Excellent Particle: A particle whose $pbest$ improved in the last two iterations.
A tribe is marked as good depending on the value of $G$ and $T$. $$ Tribe = \begin{cases} \text{Good} & \text{Uniform}(0, 1) < \frac{G}{T} \\ \text{Bad} & \text{otherwise} \end{cases} $$
- Where $G$ is the number good particles in the tribe.
- Where $T$ is the number particles in the tribe.
A good tribe deletes its worst particle to conserve the number of performed function evaluations.
A bad tribe generates a new random particle, and all the new random particles form a new tribe.
Each new random particle gets connected to the tribe that generated it through its best particle.

Cooperation¶

Concurrent PSO¶

Two different swarms are updated in parallel, both using different algorithms.
The swarms exchange their $gbest$ values every pre-determined number of iterations.
Both swarms track the better $gbest$.

Cooperative PSO¶

Have different swarms optimizing different variables of the problem / different dimensions of the solution.
The fitness of any particle is determined by its value and the value of the best particles in all the other swarms.
Important Note: Assumes that the problem variables are independent.

Hybrid Cooperative PSO¶

Have one swarm use the normal PSO algorithm.
Have another swarm use the concurrent PSO algorithm.
Each swarm is updated for one iteration only.
When the PSO swarm gets updated, the $gbest$ values are sent to the CPSO swarm.
- The CPSO swarm uses the elements of the received $gbest$ to update random particles of its sub-swarms.
When the CPSO swarm gets updated, its sends its context vector to the PSO swarm.
- The PSO swarm uses the received context vector to replace a randomly chosen particle.

Genetic Algorithms¶

Overview of Evolutionary Algorithms¶

Evolutionary Algorithms: Population-based meta-heuristic methods whose behavior is inspired by biological evolution.
A population of individuals compete for limited resources, with the "fitter" individuals being used as seeds to form future generations.
Over time, the population rises in overall fitness due to the principles of natural selection.
Evolutionary algorithms are stochastic, with the variation operators (crossover and mutation) propagating changes to new generations.

Active Information¶

Conservation of Information Theorems: Any search algorithm performs on average as well as random search without replacement unless it takes advantage of problem-specific information about the search target or the search-space structure.
Three measures of information that can increase the effectiveness of a search can be categorized as:
- Endogeneous Information: Measures the difficulty of finding a target through random search.
- Exogeneous Information: Measures the difficulty of finding a target once problem-specific information is applied.
- Active Information: The difference between exogeneous and endogeneous informaton.
  - i.e., Measures the contribution of problem-specific information in solving a problem.

Overview of Genetic Algorithms¶

Genetic Algorithms: A class of evolutionary algorithms that operate by maintaining a population of candidate solutions and iteratively applying a set of stochastic operators, namely selection, reproduction, and mutation.
Inspired by Darwin's theory of natural selection, the population eventually moves towards fitter solutions.

Simple Genetic Algorithms¶

General Scheme of Simple Genetic Algorithms

Representation: Binary Strings
Recombination: $1$-Point, $N$-Point, or Uniform
Mutation: Bitwise Bit-Flipping with Fixed Probability
Parent Selection: Fitness-Proportionate
Survivor Selection: All Children Replace Parents
Speciality: Emphasis on Crossover

Algorithm¶

Initialize the population with random candidates.
Evaluate all individuals.
Repeat,
1. Select parents,
2. Apply crossover.
3. Mutate offspring.
4. Replace current generation.
Until termination criteria is met.

Termination Criteria¶

Sample Criteria:
- A specified number of generations (or fitness evaluations).
- A minimum threshold reached.
- No improvement in the best individual for a specified number of generations.
- Memory/time constraints.

SGA Representation¶

Chromosomes/Genotype: Representation of candidate solutions as binary strings.
- Individual Gene: An individual binary digit.
- Ordering of Genes: The most important factor for the performance of a SGA.
Gray Coding: Small differences in the underlying solution result in small changes in the binary representation.

SGA Selection¶

Fitness Proportional Selection Algorithms: Select parents with a probability proportional to their fitness.
- Advantage: Probabilistically, $N$ parents can be selected from a population of $N$ solutions $\implies$ A balance between exploration and exploitation.
- Disadvantage 1: No guarantee on the distribution of selected parents, since each selection is done independently.
- Disadvantage 2: Premature convergence with one highly fit member dominating a population.
- Disadvantage 3: Lack of selection pressure at the end of runs with similar fitness.
  - Consider ranked selection or tournament selection.
Roulette Wheel Technique: Implement FPS by assigning each individual a part of the roulette wheel proportional to their probability, and "spin" the wheel $N$ times to select $N$ individuals.
- i.e., Multinomial Sampling.

SGA Crossover¶

Crossover: A recombination is applied between two parents with a probability of $P_{c}$, which is typically in the range $(0.6, 0.9)$.
If no crossover occurs, the two parents are copied to two offsprings unmodified.

$1$-Point Crossover¶

$1$-Point Crossover Example

A random point is chosen on the two parents, and the two children are formed by exchanging the tails this point partitions.

$N$-Point Crossover¶

N-Point Crossover Example

A generalization of $1$-point crossovers where $N$ points are chosen on the two parents, and the two children are formed by combining alternating partitions between points.

Uniform Crossover¶

Uniform Crossover Example

Each gene has an independent $0.5$ chance of undergoing recombination, which makes inheritance independent of position.
Prevents transmitting co-adapted genes.

SGA Mutation¶

After recombination, each gene can be altered with a probability of $P_{m}$, typically between $(\frac{1}{\text{Population Size}}, \frac{1}{\text{Chromosome Length}})$.
Crossovers tend to result in large changes in a population, while mutation results in small ones.
Thus, crossovers are considered to be explorative, making a big jump to a possibly unexplored area between two parent solutions.
Thus, mutations are exploitative, introducing small amounts of new information to further explore near an existing solution.

SGA Population Models¶

Generational Genetic Algorithm Model (GGA): Individuals survive for exactly one generation before they are replaced by offspring.
Steady-State Genetic Algorithm Model (SSGA): Part of a population is replaced by offspring.
Generational Gap: The proportion of the population replaced between successive generations is known as a generational gap.
- GGA: $1$.
- SSGA: $\frac{1}{\text{Population Size}}$.
Survivor Selection: The process of selecting individuals from parents and offsprings to make up the next generation.
- e.g., Age-Based $\implies$ Delete Oldest.
- e.g., Fitness-Based $\implies$ Delete Worst.

Real-Valued Genetic Algorithms¶

Crossover¶

Single Arithmetic Crossover (Single Random)¶

Single Arithmetic Crossover Example

For a single parent gene pair $x$ and $y$, one child's gene becomes $\alpha x + (1 - \alpha) y$, and the reverse for the other child.

Simple Arithmetic Crossover (Random After $k$)¶

Simple Arithmetic Crossover Example

For each parent gene pair $x$ and $y$, after a certain gene pair $k$, one child's gene becomes $\alpha x + (1 - \alpha) y$, and the reverse for the other child.

Whole Arithmetic Crossover (All Random)¶

Whole Arithmetic Crossover Example

For each parent gene pair $x$ and $y$, one child's gene becomes $\alpha x + (1 - \alpha) y$, and the reverse for the other child.

Mutation¶

Assign a uniform random value between some lower and upper bound to a gene.
Some variants on this technique also adds some noise (e.g. from a Gaussian distribution) to this number.

Permutation Genetic Algorithms¶

Permutation Problems: Arranging Elements.
- Adjacency of Elements.
- Overall Order of Elements.

Crossover¶

Partially Mapped Crossover (PMX)¶

Partially Mapped Crossover Example

Copy a random segment from parent $P_{1}$.
Starting from the first crossover point, look for elements in parent $P_{2}$ that have not been copied.
For each of these elements $i$, look in the offspring to see which element $j$ has been copied in its place.
Place $i$ in the position occupied by $j$ in $P_{2}$. If the place occupied by $j$ in $P_{2}$ has already been filled in by $k$, put $i$ in the position occupied by $k$ in $P_{2}$.
The rest of the offspring can be filled in from $P_{2}$.

Order 1 Edge Crossover¶

Order 1 Edge Crossover Example

Choose arbitrary part from $P_{1}$.
Copy this part to the first child.
Starting from the right of the cut point of the copied part, copy the elements in the order of $P_{2}$ that are not yet in the child, wrapping around if needed.
Analogous for the second child, with parent roles reversed.

Cycle Edge Crossover¶

Cycle Edge Crossover Crossover Example

Form a cycle of genes from $P_{1}$ by the following,
1. Start with the first gene from $P_{1}$.
2. Go to the position in $P_{1}$ that has the value of the corresponding gene in $P_{2}$.
3. Add this gene to the cycle.
4. Repeat this cycle formation until the first gene of $P_{1}$ is reached.
Put the genes of the cycle in the positions from $P_{1}$.
Repeat steps above for the second parent.

Mutation¶

Insert Mutation¶

Insert Mutation Example

Pick two elements, and move one to immediately follow the other, shifting any elements if necessary (i.e. similar to insertion sort).

Swap Mutation¶

Swap Mutation Example

Pick two elements and swap their order.

Inversion Mutation¶

Inversion Mutation Example

Pick two genes and reverse the subsequence between them.

Scramble Mutation¶

Scramble Mutation Example

Pick two genes and find a random permutation of the genes between them.

Adaptation¶

Parameter Tuning¶

Parameter Tuning: Finding suitable values for different algorithm parameters.

Parameter Control¶

Deterministic: Parameters are controlled as a function of the generation number; search progress is not taken into account.
Adaptive: Parameters are controlled using the current state of the search combined with some heuristics.
Self-Adaptive: Parameters are incorporated into chromosomes and are controlled through similar processes as the problem population (i.e. selection, crossover, and mutation).

Parallel Genetic Algorithms¶

Master-Slave Genetic Algorithms (Global Parallel GAs)¶

In master-slave GAs, selection and mating are performed by a single master processor.
However, fitness evaluation is distributed among several slave processors.

Fine-Grained Genetic Algorithms¶

In fine-grained GAs, selection and mating occurs in local (not necessarily disjunct) neighborhoods.
This technique can be effective for handling computations on massively parallel computers.

Coarse-Grained Genetic Algorithms (Distributed/Multi-Dense GAs)¶

Coarse-grained genetic algorithms feature multiple populations evolving in parallel.
Selection and mating is limited to individuals within the same population, and different populations may periodically exchange individuals.

Coarse-Grained Factors¶

Topology: Controls which populations are connected to one another. Individuals may only migrate between connected populations.
Migration Policy: Determines how migrating individuals are sent and received.
- e.g., Exchange a random individual in one population for a random individual in another population.
- e.g., Exchange the best individual in one generation for the worst individual in another.
Migration Frequency: Controls whwn communication between populations occurs.
- Synchronous: Migrations occur every predetermined number of generations.
- Asynchronous: Migrations are triggered by a certain event.
Migration Rate: The number of individuals migrating from one population to another at every communication step.
- Low: Populations act almost completely independently.
- High: Fast convergence to sub-optimal solutions.

Cooperative Genetic Algorithms¶

Cooperative GAs tend to contain multiple populations where the fitness of an individual in one population depends on the fitness of individuals in other population.
A commonly applied strategy is to have different populations optimize different variables of a problem, where the fitness of an individual is determined by its value and the value of the best individuals in other populations.
This strategy tends to perform best when the different variables are independent of one another.