Tuesday, 4 February, 2020
Ship Surveillance & Tracking with TensorFlow 2.0: The Basic Solution
Diversity of satellite images conditions and scales makes object detection one step harder
Given the exponential growth of images, and in particular optical, infrared and SAR satellite images, business opportunities are growing faster than the number of data scientists that know how to handle them.
For companies used to consume structured data, this means:
With the increase of satellites in orbit, daily pictures of most of the world are now widely available. Data sources include both free (USGS, Landviewer, ESA’s Copernicus etc) and commercial (Planet, Orbital insight, Descartes Labs, URSA etc) entities. The competition is fierce, and seems to yield very fast innovation, with companies such as Capella Space now pitching hourly updates by launching constellations of small satellites of only a few kg. Those satellites increasingly cover beyond the optical visible spectrum, notably Synthetic Aperture Radar (SAR) which offer the benefit to work independently of clouds or daylight. Along with the growth of other image sources, the ability to interpret image data is a key to untapped commercial opportunities.
The monitoring of human activity is increasing for the purpose of fishing, drilling, exploration, cargo and passenger transport, tourism, for both governmental and commercial purposes, particularly at sea. For ship tracking in particular, satellite images offer a rich complement to baseline cooperative tracking systems such as AIS, LRIT and VMS.
Here is how we conduct this pre-processing on the fly with Keras’ ImageDataGenerator class, with the labeling done with flow_from_dataframe, all feeding later on into the fit / fit_generator API:
Starting simple and iterating:
To optimise this model we need:
[Left] Illustration of a neural network optimisation problem: the goal here is to find the minimum while being blindfolded. [Right] Exploring the learning rate landscape — we want to pick the learning rates with the largest loss gradient, and stay away from divergence. Method: start at a very low LR, say 1E–7, and at each batch increase the LR slowly, until a high LR, say 1E-2 or even 10. This is a very approximate method, as both weights and data are changing as the learning rate increase. However, this is computationally cheaper than running multiple simulations in parallel, so hopefully a reasonable stop-gap solution to pick a safe learning rate.
Here is the Tensorflow 2.0 implementation:
Keras’ architecture summary of the network described earlier, following a classic LeNet design (LeCun) of a series of convolutional and max pooling layers, followed by dense layers. Note that most of the weights are in the dense layer, despite limiting the image size to 256 x 256 pixels. More recent architectures have moved away from this design, and are now fully convolutional, avoiding this concentration of weights on a single layer, which, all things being equal, tends to overfit and yield lower performance.
Here is the basic Tensorflow / Keras code to train the model, with the parameters used:
By showing this simple neural network our training set about 2.5 times, we reached 94.8% cross-validation accuracy. A naive approach given the class imbalance would be 77.5%.
Note that the model is still learning, and that we aren’t yet observing signs of overfitting. This is by design, due to the relatively small size of the model compared to the dataset size (200k + 600k augmentation), and the dropout layer.
Examples of misclassification below. We can see that some cases are hard to resolve to the human eye while there still seems to be a minority of easy wins. Also interesting that a fraction of labels seem wrong: another reason to avoid overfitting.
A tested improvement to this model is to add batch normalisation between weights and activation: it allows the model to reach 93% accuracy in only 10 pseudo-epochs, compared to 25 pseudo-epochs without. The plateau seems the same, close to 95% (only trained 40 pseudo-epochs).
Further improvements could involve more capacity for our model as it may underfit slightly currently, as well as exploit higher resolution images.
An issue with this architecture is that input image size is fixed. That can be solved through
In the second part of this post, we explore the former architecture, along with a deeper network.
Visualisation of the model’s attention, and with it emerging localisation properties, while having never labeled ship location! Explanation in part 2 of this post.
Author: Romain Guion, Head of Signal Processing and Enrichment
For more practical tips on Data Science and Tech - follow the VorTECHsa team on Medium