Authors: Kishan G. Mehrotra, Chilukuri K. Mohan, and HuaMing Huang
ISBN: 978-3-319-67526-8
Given a data set $\mathcal{D}$, suppose an outlier detection algorithm identifies $m > 0$ potential anomalies, of which $m_t \leq m$ are known to be true outliers. Then precision, which measures the proportion of true outliers in top $m$ suspicious instances, is:
$$Pr = \frac{m_t}{m}$$and equals $1.0$ if all the points identified by the algorithm are true outliers.
If $\mathcal{D}$ contains $d_t \geq m_t$ true outliers, then recall is defined as:
$$Re = \frac{m_t}{d_t}$$which equals $1.0$ if all true outliers are discovered by the algorithm.
If $R_i$ denotes the rank of the $i$th true outlier in the sorted list of most suspicious objects, then the RankPower is given by:
$$RP = \frac{m_t (m_t + 1)}{2 \sum_{i=1}^{m_t} R_i}$$which takes the maximum value $1$ when all $d_t$ true outliers are in the top $d_t$ positions.
where $S$ is the covariance matrix measuring the mutual correlations between dimensions for all points in the data set $\mathfrak{D}$.
The Euclidean Distance is the Mahalanobis Distance where $S$ is the identity matrix.
If $l = 1$, the Minkowski Distance is equal to the Euclidean Distance.
The Cosine Similarity ranges from $-1$ to $1$ where $1$ implies $p$ is equivalent to $q$ and $-1$ implies $p$ is exactly opposite to $q$.
where $A$ and $B$ are two datasets.
Ramaswamy, Sridhar, Rajeev Rastogi, and Kyuseok Shim. "Efficient algorithms for mining outliers from large data sets." ACM Sigmod Record. Vol. 29. No. 2. ACM, 2000.
Tang, Jian, et al. "Capabilities of outlier detection schemes in large datasets, framework and methodologies." Knowledge and Information Systems 11.1 (2007): 45-84.
Jin, Wen, et al. "Ranking outliers using symmetric neighborhood relationship." Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, Heidelberg, 2006.
Papadimitriou, Spiros, et al. "Loci: Fast outlier detection using the local correlation integral." Proceedings 19th International Conference on Data Engineering (Cat. No. 03CH37405). IEEE, 2003.
Breunig, Markus M., et al. "LOF: identifying density-based local outliers." ACM sigmod record. Vol. 29. No. 2. ACM, 2000.
Kriegel, Hans-Peter, et al. "LoOP: local outlier probabilities." Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 2009.
Tao, Yunxin, and Dechang Pi. "Unifying density-based clustering and outlier detection." 2009 Second International Workshop on Knowledge Discovery and Data Mining. IEEE, 2009.
Huang, Huaming, Kishan Mehrotra, and Chilukuri K. Mohan. "Rank-based outlier detection." Journal of Statistical Computation and Simulation 83.3 (2013): 518-531.