06 Aug 2019
Welcome to the Tuesday edition of my KDD 25 Roundup. Yesterday we looked at adversarial attacks on machine learning models, and how to defend against them, with some focus on graph models. Today’s themes are timeseries data and real-time detection systems.
This talk by Microsoft Research introduced a cloud-based timeseries anomaly detection service and some of the new mathematics which support it. Falling under Microsoft Cognitive Services, the timeseries anomaly detection feature builds on Apache Flink and Kafka for data movement, and exposes an experimentation environment at the end of the pipeline for developing new analytics.
Their actual approach to unsupervised (maybe pseudo-unsupervised? You’ll see what I mean) anomaly detection, interestingly, draws on advances in the visual salience field. The algorithm samples random anomalies from the normal distribution and injects them as labeled anomalies into unlabeled, normative data, then draws a spectral residual map (SRM) over the data. The SRM in tandem with the labels from the synthetic data injection enable a CNN to learn how to detect anomalies.
Like any good timeseries presentation, this one made a point of emphasising the “real-time” execution of the presented algorithms. What made this talk so compelling was that they actually presented the latency constraints they were working with: nanosecond-order responses with at least 80% accuracy. The actual task of interest, held to these performance requirements, was the suggestion of maintenance and diagnostic actions to mobile phone users.
The team experimented with rules-based methods, a multi-layer perceptron, and an LSTM to classify the high-level activity described by a slice of logs, with the LSTM coming out on top in terms of raw performance. One of the main points the team drove home was the importance in applied ML of user interface, in addition to model performance, describing techniques like 2-stage classification to help models learn the user context.
This talk introduces the real-time attention based look-alike model (RALM). The abstract formulates the problem well:
Although the performance of recommendations is greatly improved [by deep learning], the “Matthew effect” becomes increasingly evident. While the head contents get more and more popular, many competitive long-tail contents are difficult to achieve timely exposure because of lacking behavior features.
RALM’s contribution is its exploration of a recently popular phenomena in deep learning: attention mechanics. Attention can be built into the architecture of a neural network, allowing it to dynamically focus (or “pay attention to”) different regions of the input vector depending on that vector’s values.
The authors show how leveraging this innovation in user representation learning can vastly improve the relevance of recommendations without deprioritizing that long-tail content.
Put mildly, Walmart has a lot of items in its online catalogue. It should be no surprise, then, that pricing is managed, by and large, by machines. These machines read all sorts of intelligence streams, such as internal cost data, and competitor pricing, and fold them together with a little business logic to predict a good price. Apparently they do pretty well, but as this presentation showed, they aren’t perfect.
This talk explained the daunting applied ML task of finding anomalies in these price predictions (as a proxy for spotting prediction errors). The talk was particularly interesting a) because of the immediate business ramifications (you can imagine how a pricing error in either direction could have a huge effect) and b) because of the scale of the engineering problem.
The naïve baseline was to simply watch for spikes or dips in price with respect to historical pricing data for the given item. This approach results in a high false positive rate and doesn’t account for seasonality, sales, or anything else that would make timeseries anomaly detection interesting.
The first step in a more intelligent direction was to engineer a rich feature space including raw, timeseries, binary, and hierarchical data. With these features, the team trained a handful of anomaly scorers: isolation forest, AutoEncoder, and Gaussian Naïve Bayes. With a specialty model trained for each category and subcategory of product, the team deployed the ensemble to flag and prioritize anomalies for a team of human reviewers.
Like graphs, timeseries models offer a promising way to learn about data in a way that matches (or at least, is a closer match to) the real-world objects and processes that data describes. Exciting advances in the underlying mathematics and large-scale engineering endeavors to prove these concepts will certainly drive the domain to even greater heights and popularity.