This is the blog of Will Badart.

Archive Series Tags Feed

KDD Daily Roundup 01

05 Aug 2019

As I write this, I’m tucked away in the north-western corner of North America in Anchorage, Alaska, attending the 25th ACM SIGKDD Conference of Knowledge Discovery and Data Mining. I’m here to soak up some of the cutting edge data science coming out of some of the top research institutions of the word (commercial, academic, and government alike), and to enjoy the tranquil majesty of the last frontier.

This post is a quick run down of some of the good bits from the Monday (Aug. 5) sessions I attended.

Attacking Graph Models

Graphs (the data structure of interconnected entities, not the visualization) are a hot area of study in data mining because of their ability to represent data within a structural context (something harder to do with tabular data). This is particularly true in cyber security, where the frequent domain of interest, namely the computer network, is represented naturally as a graph.

Of course, the part of cyber security that makes it interesting is that there’s always someone actively working against your interests. You introduce a graph-based intrusion detection scheme, baddies introduce an attack on your graph model. A specially tailored attack on a model or the data it uses can deteriorate the overall quality of the model, but it can also do more interesting things like make targeted adjustments to outputs (e.g. having a particular collection of Twitter bots follow a human to trick the platform’s spam detection model into falsely thinking the human victim is a bot).

These sorts of data attacks (or “perturbations”) were the subject of Dr. Stephan Günnemann’s presentation on Adversarial Robustness of Machine Learning Models for Graphs. In this talk, Dr. Günnemann introduces a principled approach for injecting “unnoticeable” perturbations into graph data which significantly degrade the performance of all tested graph models (thus “poisoning” the model). One of the most interesting findings was that while a graph model typically outperforms a non-graph baseline, a poisoned graph model performs much worse. In other words, it’s better to use a “dumb” non-graph model than to use a poisoned graph model (point being that since you can’t know if a graph model has been poisoned, you may be better off without it).

Günnemann teased further work in securing graph models against such poisoning, based on certification and other techniques.

SHIELD Appraisal

One of the best-studied areas of defense against model attacks is defense for computer vision (CV) models (indeed, the canonical example of an attack comes from CV: a model predicts an image to contain a panda, but after human-imperceptible perturbations, changes its response to a gibbon).

This talk, by Nilaksh Das, reviews SHIELD: an ensemble of defenses which defend against these attacks. SHIELD’s innovation is its separation of the defense task from the inference task; the decoupling seems to allow it to protect a broader collection of models. Code and results can be found on GitHub.


This presentation, given by Booz Allen’s own Dr. Edward Raff, discusses recent innovations in malware detection. Edward presented a highly-accurate top-k n-gram estimation algorithm with time complexity proportional to a chosen budget parameter L rather than the gram-length n. This enables the use of very large values of n (grams up to 1024 bytes in size were tested, hence KiloGrams).

The given example use case was in malware detection, where the raw individual bytes of a potentially malicious executable form the grams of the language model. The need for security analysts to work with longer, more meaningful grams motivated the creation of KiloGrams. Indeed, larger n-grams often contain familiar data, such as an image or recognizable function.

Outlier Detection for Mining Social Misbehavior

Continuing the discussion on graph based methods was Dr. Neil Shah’s talk, Outlier Detection for Mining Social Misbehavior. Shah drew on past experience in academia and industry (currently Snap) to share lessons learned in the space.

In particular, Shah introduced a twist on a well known graph-based anomaly detection technique: matrix factorization. In short, factorizing the adjacency matrix of a graph can reveal community structures and anomalies alike. The problem is that the anomalies it reveals are mainly large-scale structural anomalies; some information is necessarily lost in approximate factorizations, so smaller, perhaps “one-off” anomalous events can slip through the cracks.

Shah’s contributions leverage this fact to detect these “stealth” anomalies. Shah, in essence, proposes the use of projetcion as a signal, that is, raise the alarm when an observation does not appear in the post-factorization, reduced-dimensionality space. The approach seems very promising, particularly as a complement to other methods.

Extracting Knowledge for Adversarial Detection and Defense in Deep Learning

This was another talk in the CV defence space. The defence proposed here was to augment inference with a secondary “component” model to verify the prediction of the main model. The component model would identify sub-components of the object of interest (e.g. could pick out wheels and a seat from a photo of a bike) and compare them to an expected list of components for the main model’s predicted object. A mismatch indicates an error or attack by perturbation.

The outlined approach was unique, but the authors didn’t provide a compelling defence when perturbations target the component model.

Reducing False Alarms in Cyber Attack Detection by Using Support Vector Data Description

This talk introduced a surprisingly simple anomaly detection technique for cyber security:

  1. In a target feature space, algorithmically determine the ideal number of clusters, K (if I were to attempt this overall approach, I would try out parameter-free clustering to skip this step).
  2. Do some good old-fashioned K-means
  3. Train one Support Vector Machine (SVM) for each cluster. That is, each of the K SVMs are trained to individually infer in-ness or out-ness for a single cluster.

Under this regime, a new observation is said to be an anomaly if all the models classify it as out-group.

SVM (with kernel tricks) can learn somewhat tricky decision boundaries, which makes it a good candidate for detection models in cyber security.

The experimental results, frankly, were uncompelling, showing false-positive rates that would be unacceptable for a production system. However, I suspect that a more information-rich feature space could remedy this. Perhaps performing clustering in a custom embedding space could reveal more prescient spacial relationships between the cyber events of interest and improve the system’s overall precision.

In cyber, we often conceptualize “defence” in terms of literal IT systems; securing networks against intrusion, patching software vulnerabilities against exploitation, etc. As the prevalence of machine learning within cyber security grows, so too will the importance of including model defence in that conceptualization. These techniques are paving the way for the adoption of secure, robust models.

Recent posts:

Next post: KDD Daily Roundup 02
Previous post: How I Do an Analysis