04_conclusion.tex

\chapter{Conclusion}
\label{c:conclusion}

In the previous chapters, we have introduced a foundational set of features and properties that can be used to compare anomaly detection approaches for streaming systems, including the types of anomalies that can be identified, whether an approach explicitly accounts for concept drift and noise in the underlying data stream, and how an approach has been evaluated and documented. Furthermore, we have introduced and categorized several approaches from past and recent research, including popular methods like DenStream and MCOD. We finally used our features to align the approaches on 11 dimensions (see \cref{tab:approach_overview}), providing a big-picture overview and allowing for easier comparison.

Besides the foundational features we have described, many often more challenging problems cannot be quantified as easily and have been left for further analysis. We conclude our work with a short motivation for some of these problems.

\paragraph{Applicability in Practical Environments.}
In today's high-velocity, high-volume streaming context, it is often practically infeasible to process a stream in a centralized fashion (i.e., on a single processing unit). However, a large part of algorithms for data mining and anomaly detection (e.g., kNN and SVM) have been developed for and evaluated using single machine processing \citep{hayes_contextual_2015}. What is needed to satisfy the requirements of today’s context are systems whose computations are efficiently parallelizable so that they can run in distributed processing environments like Apache Flink \citep{toliopoulos_continuous_2019}. It is necessary to incorporate parallelism (e.g., by applying data parallelism paradigms like MapReduce), as according to \citet{pimentel_review_2014}, ``Efficient novelty detection techniques should be scalable to large and high-dimensional datasets.'' Furthermore, emerging patterns like distributing data collection and analysis to low-power devices or moving computation and persistence closer to where it is used (i.e., \emph{edge computing}) should be accounted for as well.

\paragraph{Computational Complexity and Performance.}
Another factor that is important in stream processing systems is the speed of processing (i.e., the computational complexity) \citep{ahmad_unsupervised_2017}. Modern computing environments require a paradigm shift towards less expensive approximative algorithms that can potentially provide similar prediction performance (even though they perform worse during training) \citep{hayes_contextual_2015}. Online stream processing systems are highly dependent on low-latency algorithms, meaning that the testing phase of such algorithms must be rapid. Computations during testing that have a high complexity can quickly lead to bottlenecks and slow down processing to unusable levels \citep{ahmad_unsupervised_2017}.

\paragraph{Unclear or Unrealistic Assumptions.}
Many anomaly detection algorithms base on a set of statistical assumptions about the data and the generative process. If these assumptions do not hold (they often do not) or are not documented, the algorithms might not work as well as would be expected. One assumption that is often made is that the distribution that generates a data stream must have a particular shape and be subject to specific constraints (e.g., ``normal'' distribution). Other common assumptions concern the size or dimensionality of the dataset to be processed. Some algorithms work better on large datasets with high dimensionality, while others suffer from the ``curse of high dimensionality''. In general, the fewer assumptions an algorithm is based on, the more generalizable it is to real-world problems, and the better it tends to work \citep{ahmad_unsupervised_2017}.

\paragraph{Transferability.}
Transferability describes how well an algorithm can be transferred to other use cases and domains and is important when working in highly complex environments, but is rarely explicitly considered in research \citep{kanarachos_detecting_2017}. Many anomaly detection approaches are tightly coupled to a specific context, have been tuned with significant domain knowledge, or do not provide the parameters necessary to apply them in a different context or to different data. This makes it harder to exploit research that has been performed in different application areas, even if it could otherwise be very useful.

\paragraph{Commercial Compromise.}
While we have focused on approaches that detect or classify anomalies as autonomously as possible, the anomaly detection problem is often also a problem of finding a suitable commercial compromise between cost and accuracy. Due to the impossibility of catching every single anomaly even with large investments, it often makes sense to build systems with several layers, including manual processes. For example, once a fraud-detection system detects a highly-probable anomaly, it might assign this anomaly to a human for immediate investigation \citep{bolton_statistical_2002}. Anomaly detection approaches in research should account for this by appropriate means of parametrization (i.e., allowing for the system to be tuned in terms of the tradeoff between accuracy and cost).