26 Oct 2018
Convolutional Neural Networks (CNNs) are used for learning incrementally complex abstractions over raw input data. For instance, in image recognition, an early layer might learn to identify curved lines, a middle layer may learn to flag mouths, and close to the output, a layer might learn to differentiate between a smile and a frown. (For more details, please see this fantastic blog post by Ujjwal Karn, esp. §Visualizing Convolutional Neural Networks.)
The core operation here is a convolution (which gives its name to the network) which, for the image analysis example, you can think of as a spotlight which scans over the image, processing only on what’s illuminated.
Architecture of a typical CNN. Source
The key here is that a convolution maintains the spacial relationship between
the inputs. But how do we represent a graph in a spacially consistent way?
|V| ways to order the indices of an adjacency matrix, and they’re
all correct. Also, compared to something like an image, a graph is a highly
irregular (non-continuous, non-differentiable) space with breaks and
self-loops, that doesn’t play nice with CNN math.
Several key approaches have been proposed in the last few years to solve these
issues. The first generation of solutions amount to the creation of an N
dimensional spectral embedding
of the graph, and performing N dimensional convolutions over that with the
existing techniques. Unfortunately, this is very expensive (
Two key papers in the last 2 years have sought to answer this, Defferrard et al (NIPS 2016) and Kipf and Welling (ICLR 2017). The first introduces a fast spectral filtering technique, and the second, which we’ll be exploring, introduces a new forward propagation rule for GCNs.
Architecture of a typical GCN. Source
I’ll spare you the math, but Kipf and Welling’s propagation rule solve the spacial invariance issue by symmetrically normalizing the adjacency matrix against the graph’s degree matrix in the computation. By adding the representation of the graph structure into the mix (as a normalized adjacency matrix) you allow each layer to learn not just the features of individual nodes, but the features of nodes and their neighbors (the deeper the network, the larger the neighborhood). The network, by default, produces Z, an |V| x F feature matrix where F is the number of output features per node. Graph-level outputs can be produced by introducing pooling.
The paper goes into detail to prove that this operation is in fact a close approximation of the familiar (expensive) spectral graph convolution.
As part of my work with the Cyber AI team, I sought to teach a GCN to identify anomalous activity at the network level. The idea my team came up with was that if a GCN could learn the “typical” structure of the network and perform a classification task on it, then a drop in classification performance could indicate a significant structural change in the network and, by extension, an anomalous event or breach.
The implementation I started and rolled with did so by learning to classify the nodes belonging to the main core of the graph (in short, a k-core of a graph is a maximal subgraph that contains nodes of degree k or more, and we define the main core to be the k-core with the largest k. In a sufficiently large graph, there aren’t usually ties).
My work automated the process on training a GCN on the network data in one window of time and evaluating that GCN on the network data of another slice (again, the hypothesis being that a dip in model performance could be an indicator of compromise).
Ultimately, I found no evidence to support the above stated hypothesis, that this approach could be used as intrusion detection. However, being able to identify and drill down into structural changes in the network can still support situational awareness (for instance, as a network health monitor, and to detect when weekends and holidays roll around 😉).
The initial implementation from our NVIDIA partners sought to produce graph-level insight, but perhaps this could be a more productive model at the node-level; removing or replacing the pooling operation in the GCN architecture we used could provide more interesting node-level insights.
My GPG Identity
Previous post: Fast DataFrame Processing with Vectorized Functions