Neural Power Unit

In this paper accepted to NeurIPS 2020, we introduce the Neural Power Unit (NPU) that allows better extrapolation and helps to build more transparent models.

Schematic of the internals of the Neural Power Unit.

Conventional Neural Networks can approximate simple arithmetic operations, but fail to extrapolate beyond the range of numbers that were seen during training. Neural Arithmetic Units address this poor generalization by assuming that the underlying problem is composed of arithmetic operations. As a toy example, consider training a standard neural network with Dense layers to learn one of the functions below. Within the training range the dense network learns the task perfectly, but it cannot generalize beyond this range. In contrast, our Neural Power Unit (NPU) can easily extrapolate the first two functions with the assumption that they are a composition of arithmetic operations. Unfortunately, the NPU fails on the third, periodic function, but we are working on that. ;)

Training classic (Dense) networks with values from the training range (between the gray bars) results in poor extrapolation. With the NPU we can extrapolate better. Periodic functions are still problematic but we will address this in future work.

Previous arithmetic units were either limited to operate on positive numbers (Neural Arithmetic Logic Unit - NALU) or could only represent a small subset of arithmetic operations (Neural Multiplication Unit - NMU). We introduce the NPU that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer. The NPU fixes the shortcomings of existing arithmetic units and extends their expressivity. We achieve this by using complex number arithmetic without requiring a conversion of the remaining network to complex numbers.

Extrapolating Beyond The Training Range

In the figure below we compare a classic Dense network to the different arithmetic layers including our NPU on the task: f(x,y) = (x+y, xy, x/y, y)T.

Logarithmic error of different models learning the function f(x,y). Each model is trained with examples in the range of [0.01,2]. Bright colors indicate low error. Only our NPU performs well on all four tasks.

Each layer is trained on examples in a range from [0.01,2]. The Dense layer learns the task in this 2D-range (bright regions in the heatmaps), but extrapolates poorly (dark regions). The same is true for the NALU. The NMU learns addition and multiplication perfectly but fails to learn division and the square root. Only the NPU is capable of learning all tasks at the same time.

Better Interpretability by Building More Transparent Models

Current neural network architectures are often perceived as black box models that are difficult to explain or interpret. This becomes highly problematic if ML models are involved in high stakes decisions in e.g. criminal justice, healthcare, or control systems. With the NPU, we hope to contribute to the broad topic of interpretable machine learning, with a focus on scientific applications. As an example, we demonstrate its ability to identify a model that can be interpreted as a SIR (Susceptible, Infected, Recovered) model with fractional powers that was used to fit the COVID-19 outbreak in various countries. The fractional SIR (fSIR) model is defined by a differential equation

Differential equation of the fractional SIR model.

and contains fractional powers of its variables. The numerical solution of this equation is shown in the plot below, along with predictions from a network containing a type of NPU (called RealNPU). These predictions are generated by learning from data points of the fSIR model without any knowledge about the generating equation.

State trajectories of the fractional SIR model along with predictions from the RealNPU.

By examining the parameters of the model which produced the predictions above it, we can recover an expression that is very close to the original fSIR equations:

Reading from right to left we can see two fractional powers in the first row of the NPU (parameters 0.62 and 0.57). They represent the term IS. The second layer (NAU - Neural Addition Unit) looks very similar to the parameter matrix of the fSIR equations. For example, the parameters in the column on the left (-0.12 and 0.11) approximately represent the -parameter (with negative and positive sign). This model is of course not exact and just a proof of concept. We caution everyone trying to do equation discovery in the real world to perform a much more thorough validation of their model. However, this simple approach still shows what can be possible by employing neural arithmetic layers. They can not only help with extrapolation on tasks containing arithmetic operations, but also help building transparent models.

See full result

➔