date Jul 2, 2020
authors Pete Warden
reading time 6 mins

Deep learning

But what has always struck me is the way a system of simple elements can come together to create a subtle and complex thing, and deep learning really takes this to new heights… There’s some magic in this idea: simple algorithms running on tiny computers made from sand, metal and plastic can embody a fragment of human understanding.


… but the most exciting was the work that the OK Google team were doing. They were running neural networks that were just 14 KB in size!… The team had to use the DSPs for this job because the main CPU was powered off to conserve battery, and these specialised chips use only a few mW of power.

New technology…

there was a whole new class of products emerging, with the key characteristics that they used ML to make sense of noisy sensor data, could run using a battery or energy harvesting for years, and cost only a dollar or two. One term I heard repeatedly was “peel-and-stick sensors”, for devices that required no battery changes and could be applied anywhere in an environment and forgotten.

Practical application:

This is where the idea of TinyML comes in. Long conversations with colleagues across the industry and academia have led to the tough consensus that if you can run a neural network model at an energy cost of below 1mW, it makes a lot of entirely new applications possible… it means a device running on a coin cell battery has a lifetime of a year.

With or without a power source

This is usually not a problem in automotive or robotics applications, since the mechanical parts demand a large power source themselves, but it does make it touch to use these platforms for the kinds of products I’m most interested in, which need to operate without a wired power supply.


By contrast, the cheapest 32-bit microcontrollers cost much less than a dollar each. This low price has made it possible for manufacturers to replace traditional analog or electromechanical control circuits with software-defined alternatives for everything from toys to washing machines.

Recent leaps and changes

A big step forward came when Arduino introduced a user-friendly integrated development environment (IDE) along with standard0zed hardware. Since then, 32-bit CPUs have become the standard, largely thanks to Arm’s Cortex-M series of chips.

Unchanging skills

We also aim to focus on skills like debugging, model creation, and developing an understanding of how deep learning works, which will remain useful even as the infrastructure you’re using changes.

What is machine learning?

Fundamentally, machine learning is a technique for using computers to predict things based on past observations. We collect data about our factory machine’s performance and then create a computer program that analyses that data and uses it to predict future states.

When to use ML?

in many cases, it can be difficult to know the exact combination of factors that predicts a given state… there might be various different combinations of production rate, temperature and vibration level that might indicate a problem but are not immediately obvious from looking at the data.

ML and complexity

To create a machine learning program, a programmer feeds into a special kind of algorithm and lets the algorithm discover the rules. This means that as programmers, we can create programs that make predictions based on complex data without having to understand all of the complexity ourselves.


Key terms:

The machine learning algorithm builds a model of the system based on the data we provide, through a process we call training. The model is a type of computer program. We run the data through this model to make predictions, in a process called inference.


  1. Decide on a goal
  2. Collect a dataset
  3. Design a model architecture
  4. Train the model
  5. Convert the model
  6. Run inference
  7. Evaluate and troubleshoot


Classification is a machine learning task that takes a set of input data and returns the probability that this data fits each of a set of known classes. In our example, we might have 2 classes: “normal”, meaning that our machine is operating without issue, and “abnormal”, meaning that our machine is showing signs that it might soon break down.

Selecting data

You should always try to combine your domain expertise with experimentation when deciding whether to include data. You can also use statistical techniques to try to identify which data is significant. If you’re still unsure about including a certain data source, you can always train 2 models and see which one works best!

Labeling data

The process of associating data with classes is called labeling, and the “normal” and “abnormal” classes are our labels.

Supervised learning

This type of training, in which you instruct the algorithm what the data means during training, is called supervised learning. The resulting Classification model will be able to process incoming data and produict to which class it is likely to belong.


Deep learning models accept input and generate output in the form of tensors. For the purposes of this book, a sensor is essentially a list that can contain either numbers of other tensors; you can think of it as similar it an array.


  • Vector
  • Matrix
  • Higher-dimensional tensors
  • Scalar

Generating features from data:

  • Windowing: Since the time series data each have different intervals, it we pass in only the data available at a given moment, it might not include all of the types of data we have available… One solution to this problem might be to choose a window in time, and combine all of the data in this window into a single set of values.
  • Normalisation: One way to doing this is to calculate the mean of each feature across the dataset and subtract it from the values.

When to stop training

We generally stop training when a model’s performance stops improving. At the point that it begins to make accurate predictions, it is said to have converged.

Loss and accuracy

The loss metric gives us a numerical estimate of how far a model is from producing the expected answers, and the accuracy metric tells us the percentage of the time that it chooses the correct prediction. A perfect model would have a loss of 0.0 and an accuracy of 100%, but real models are rarely perfect.


A neural network learns to fit its behaviour to the patterns it recognises in data. If a model is correctly fit, it will produce the correct output for a given set of inputs.


When a model is underfit, it has not yet been able to learn a strong enough representation of these patterns to be able to make good predictions.


When a model is overfit, it has learned its training data too well. The model is able to exactly predict the minutiae os its training data, but it is not able to generalize its learning to data it has not previously seen.

Training, validation, testing

To understand when this is happening, we need to validate the model using new data that wasn’t used in training. It’s common to split a dataset into 3 parts: 60% training, 20% validation and 20% test.


  1. Hello world with Sine graph data