This is the blog of Will Badart.

26 Oct 2018

- #data science
- #deep learning

Convolutional Neural Networks (CNNs) are used for learning incrementally
complex abstractions over raw input data. For instance, in image recognition,
an early layer might learn to identify curved lines, a middle layer may learn
to flag mouths, and close to the output, a layer might learn to differentiate
between a smile and a frown. (For more details, please see this fantastic blog post by
Ujjwal Karn, esp. *§Visualizing Convolutional Neural Networks*.)

The core operation here is a *convolution* (which gives its name to the
network) which, for the image analysis example, you can think of as a spotlight
which scans over the image, processing only on what’s illuminated.

*Architecture of a typical CNN.
Source*

The key here is that a convolution maintains the spacial relationship between
the inputs. But how do we represent a **graph** in a spacially consistent way?
There are `|V|`

ways to order the indices of an adjacency matrix, and they’re
all correct. Also, compared to something like an image, a graph is a highly
irregular (non-continuous, non-differentiable) space with breaks and
self-loops, that doesn’t play nice with CNN math.

Several key approaches have been proposed in the last few years to solve these
issues. The first generation of solutions amount to the creation of an *N*
dimensional spectral embedding
of the graph, and performing *N* dimensional convolutions over that with the
existing techniques. Unfortunately, this is very expensive (`O(|V|^3|)`

).

Two key papers in the last 2 years have sought to answer this, Defferrard et al (NIPS 2016) and Kipf and Welling (ICLR 2017). The first introduces a fast spectral filtering technique, and the second, which we’ll be exploring, introduces a new forward propagation rule for GCNs.

*Architecture of a typical GCN.
Source*

I’ll spare you the math, but Kipf and Welling’s propagation rule solve the
spacial invariance issue by symmetrically normalizing the adjacency matrix
against the graph’s degree matrix in the computation. By adding the
representation of the graph structure into the mix (as a normalized adjacency
matrix) you allow each layer to learn not just the features of individual
nodes, but the features of nodes and their neighbors (the deeper the network,
the larger
the neighborhood). The network, by default, produces *Z*, an *|V| x F* feature
matrix where *F* is the number of output features per node. Graph-level outputs
can be produced by introducing pooling.

The paper goes into detail to prove that this operation is in fact a close approximation of the familiar (expensive) spectral graph convolution.

As part of my work with the Cyber AI team, I sought to teach a GCN to identify anomalous activity at the network level. The idea my team came up with was that if a GCN could learn the “typical” structure of the network and perform a classification task on it, then a drop in classification performance could indicate a significant structural change in the network and, by extension, an anomalous event or breach.

The implementation I started and rolled with did so by learning to classify the
nodes belonging to the main *core* of the graph (in short, a k-core of a graph
is a maximal subgraph that contains nodes of degree *k* or more, and we define
the *main* core to be the *k*-core with the largest *k*. In a sufficiently
large graph, there aren’t usually ties).

My work automated the process on training a GCN on the network data in one window of time and evaluating that GCN on the network data of another slice (again, the hypothesis being that a dip in model performance could be an indicator of compromise).

Ultimately, I found no evidence to support the above stated hypothesis, that this approach could be used as intrusion detection. However, being able to identify and drill down into structural changes in the network can still support situational awareness (for instance, as a network health monitor, and to detect when weekends and holidays roll around 😉).

The initial implementation from our NVIDIA partners sought to produce graph-level insight, but perhaps this could be a more productive model at the node-level; removing or replacing the pooling operation in the GCN architecture we used could provide more interesting node-level insights.

Recent posts:

Next post:
My GPG Identity

Previous post:
Fast DataFrame Processing with Vectorized Functions