Running Artificial Neural Nets on an AVR

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 3

Every since I started working with Microcontrollers, I've been on an continuous quest to push the limits of what you can do using a small, 8-bit CPU running at 20MHz. From a Mandelbrot Renderer, to a full web-/fileserver, a 3D Raycasting Engine and a N-Body Simulation. But I have the feeling that I won't be topping this one anytime soon. I've basically succeeded in creating a bit of C code to train and run large Artificial Neural Networks....that can run on my AtMega1284p.

In this post, I'm going to be talking about the Hardware required, give a (hopefully) brief explanation of how the C code works and lastly I will train a Neural Net from scratch to recognize images of handwritten digits on my AtMega1284p to prove that this does indeed work. (I feel like I should explain what Artificial Neural Networks even are, but I'd be here all day if I tried to do that. Instead, I'm just going to say that if you have any questions after reading this post, feel free to ask them and I'll do my best to answer them.)

Anyways, let's get into it.

 

The Hardware

Now, it is obvious that the 16KB of my AtMega1284p's internal SRAM isn't going to be nearly enough for running Neural Nets. In fact, getting enough memory was so large of a problem, that it ended up preventing me from starting this project in the first place for a long time. 32KB SRAM ICs just won't cut it, and chaining 40 of them obviously wasn't a good Idea either. Luckily, a few weeks ago, while buying components for another project, I stumbled across the single IC that made this project possible: the Lyontek LY68L6400SLIT. A single, 8-pin IC that provides 8 Megabytes of fast SRAM (up to about 144 mbits per second read/write) through a simple SPI port. And best of all, it only costs 80 cents a piece. So that was my Memory problems solved. Only obstacle was that the IC is both in an SMD package and runs at 3.3 Volts. But a combination of an SMD-to-THT adapter and simple level converter solves that pretty easily too. One tip though: don't be like me and think that resistor voltage dividers are good enough level converters. They work fine if only the IC is connected to the SPI bus, but once I added another device to the SPI bus, things started getting weird. Let's just say that my hardware setup still isn't 100% stable to this day.

 

Next, I needed a place to store my Datasets, after all, I was going to be training a Neural Net. And SD-Card was a pretty obvious choice. Nothing much to say here.

 

The Software

This is where things get interesting. Now, this wasn't my first time writing a bit of code to train a Neural Net, and I know the Maths behind it like the back of my hand, but the fact that I needed to store literally everything on external Memory made things a lot more difficult. I also ended up restricting myself to implementing only simple "Dense" Layers. Theoretically, one could implement Convolutional Layers as well, but implementing those already gives me a headache under normal circumstances, and they're not required for what I wanted to accomplish for now, so I skipped adding them. That left me with only those Dense Layers. Luckily, they boil down to just a matrix-vector multiplication, a vector-vector add and applying a simple mathematical function to the result. Operations which are also conveniently easy to implement memory caches for. And with the AtMega1284p still having 16KB of internal SRAM, I was able to cache entire vectors and rows of matrices, which obviously provides an insane speedup at the cost of 70% of the MCU's SRAM, which leaves just enough for....well, everything else.
I also had to implement backpropagation, which basically means I needed a bit more code that computes the derivatives of the needed operations, but at that point I was getting familiar enough with the SRAM IC to finish this step quite easily.

Then, I added a few data structures to help keep track of network parameters in the external SRAM and actually put together entire Neural Nets out of the Dense Layers I just implemented (at this point, I'd just like to mention how ridiculously weird it is to have to use 32-bit pointers for memory addressing on an 8-bit CPU).

Lastly, I simply needed some code to actually train a Neural Network using Gradient Descent and a given dataset and cost function. This is where I had the most trouble. Let's just say that debugging something this complex on a Microcontroller is almost impossible. I still don't know if my code works 100% as intended, but it works well enough, so I guess I'll be fine with that.

The final result was a long bit of code, that is easy to use on the surface, but is an absolute MESS underneath. But I was finally ready to test it.

 

The Final Test

In order to test what I had just created, I simply implemented the "Hello, World!" of Neural Networks: a Net to solve the MNIST dataset. The MNIST dataset is a collection of 10000, 28 by 28 pixel, grayscale images of handwritten digits. To solve the dataset, you have to train an Artificial Neural Network to take any of these images as its input and output the value of the number represented in each image. The Neural Net is simply given these 10000 images plus the correct solutions and has to use these to learn how to solve the problem.Afterwards, the Net is tested using images it hasn't seen before, and the accuracy of its results is measured in terms of Number of correctly identified images divided by the total number of images.

My setup was a relatively small Net using only two Dense Layers with 25 hidden units in-between. On my PC, this trains to 80%+ accuracy in about half a second. In comparison, and as a big surprise to no one, the AtMega1284p is abysmally slow. I only had enough time to let it run for about 20 - 30 minutes, and in that time, it completed about 1% as much work as my PC did in half a second. But, when I tested the now partially-trained Neural Net from the Microcontroller, it showed an accuracy of 20%, which may not seem as much, but the highest accuracy I managed to achieve using a Random Number Generator to classify the images was 5%. So....it worked! I just successfully trained an Artificial Neural Network using my AtMega1284p.

 

In Conclusion

Now, at this point you may be wondering just how all of this could be useful. Well, even though training neural networks is quite slow on an AtMega, running them isn't. The Net from my test was capable of classifying the 100 images in the test set in just a few seconds. This means that it would be completely possible to import a pre-trained Net and use it for some purpose. For example, it should be perfectly possible to connect up some kind of camera to an SMD version of the Microcontroller, load a Neural Net trained for image recognition, and create the world's smallest and cheapest live image recognition system. I do believe there's a lot of possibility here and I will be trying to actually create something useful with this in the near future.

 

The Code

Is available in this zip file. It's still really messy, and I wouldn't trust anything inside trainer.c at ALL (I'm 100% sure it'll break with anything that isn't my test network). I've also included the dataset used for the final test (data.dat).
https://drive.google.com/open?id...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I got one of these recently:

 

https://m5stack.com/products/esp...

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That's real dedication to achieve an incredible result!   Who needs a 700MHz chip with 3Gb and wasteful code?  Of course, then it might be too straightforward & less "fun".  The challenge is the adventure.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice project.

 

Perhaps you could add a paragraph with a description of the algorithm the system uses to "learn".

 

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

TheGhastModding wrote:
I only had enough time to let it run for about 20 - 30 minutes, and in that time, it completed about 1% as much work as my PC did in half a second.
So 2,000 - 3,000 minutes for a full training?  Hey that's only 2 days ;-)

 

Still, I think it's an incredible result.  Well done!

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Fri. Aug 23, 2019 - 11:37 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm with JoeyMorin on this.  Only two days?  And it's 'fire and forget'?  So just set it up, let it rip, and take time off.  Go to the art museum or something.  Make drinks with little umbrellas in them and hang a hammock to drink them in.  Have an LED or something illuminate when it's done.  It's not really the length of time that's the problem - it's how long you have to babysit the darn thing.  S.

 

PS - The days of 32k SRAM chips have passed.  See, say, one of these:

https://www.digikey.com/product-...

granted, though, it'll need a LOT of pins to address, and isn't USD$0.80 - More like $7.00  S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

TheGhastModding, your project is fascinating.  Think I'll have a look at Neural Networks when I run out of projects.

 

Thanks for your post 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just a comment regarding the serial ram chip - it is dram, not sram! And you only hold CE low for a maximum of 8us which makes it a bit tricky with the AVR as the spi data rate is slow. Violate this and the memory becomes flaky.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
And you only hold CE low for a maximum of 8us which makes it a bit tricky with the AVR as the spi data rate is slow. Violate this and the memory becomes flaky.

That....explains a lot, actually. Especially since I was only running my AtMega at 16MHz as I didn't have any faster oscillators on hand at the time. I'm guessing that going up to 25MHz should help with my memory problems, then. Thanks for the tip, probably would've never caught that myself.

 

Scroungre wrote:

I'm with JoeyMorin on this.  Only two days?  And it's 'fire and forget'?  So just set it up, let it rip, and take time off.

I may just do that if I can actually improve memory stability. Also, it wouldn't really take two days. Reaching 80%+ accuracy happens much more quickly then that. When I ran the program on my PC, I just let it do its thing for half a second, but past that 80% mark, it also hits the beginning of diminishing returns in terms of work done vs accuracy gained. So training beyond that point with such a small network is actually just pointless.

 

DocJC wrote:

Perhaps you could add a paragraph with a description of the algorithm the system uses to "learn".

Going straight to the hard questions, I see. I guess the easiest way to explain it is that the training algorithm just goes through samples in the dataset, and for each one, it asks the network to classify that sample, then compares its result to the correct result by calculating the difference (called the "error") between the two, and then adjusts the Neural Net's parameters in a way that minimizes that error. It does so by computing the derivative values of all of the network's parameters given the initial difference calculated by the error-function to figure out how exactly to change them. If you do this enough times, eventually, the Net becomes better at classifying images.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What you have is a PID loop stability problem*. 

 

It's not unlike a servo motor (an industrial one, not one of those RC gizmos).  You have an error.  You need to correct it, but not too fast, and not too far, and if you're off by a bit at the end you need to push a little harder.

 

Figuring out the co-efficients is up to you.

 

S.

 

* Proportional, Integrative, and Derivative.  And I'd say that at the moment, it's woefully overdamped.  Takes effin' forever to settle.  S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scroungre wrote:
See, say, one of these:
versus MRAM.

Mechatronics (artificial arm, training for production line automation to replace one's arm) and automotive are some application areas for ANN though the electrical power will fail (loss of off-line power, defective battery backup, severe load dump shorts voltage regulator pass transistor, surge due to lightning results in a brown-out)

MRAM ECC is built-in though significantly more expensive (4x) than SRAM :

MR2A08A | Everspin (4Mb MRAM, 8b wide)

MR2A08ACYS35 Everspin Technologies | Mouser

 

XMEGA have 16MB data space; XMEGA A1U has an EBI.

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was just mentioning bigger SRAM chips.

 

The idea that you can just cheerfully swap SRAM (volatile) for MRAM (or any other non-volatile memory) in mechanical systems just fills me with horror.

 

If you are depending on the non-volatile stuff to tell you where you are on power restoration, I ask you to consider what happens if the mechanical bits get pushed around while the power is off?

 

And then you turn the power back on.

 

Do they (the mechanical bits) snap back to where they used to be?  Do they move slowly to where they used to be?  Crashing through all kinds of structure on the way?  What trajectory are they going to use?  What are you going to use for a 'fail-safe' position here, when you don't know where in the sequence the power failed?  And that the mechanical parts have been moved by outside influences?

 

Power will fail, sooner or later.  Depending on non-volatile memories to tell you where the machinery is will fail a lot faster, and very very badly.  Please don't do that.  Thanks.  S.