Originally published on Medium.
On screen, the grey-skinned velociraptor twitches and spasms, trying to hurl itself to its feet under the command to rise. A thousand parallel iterations live in their simple digital worlds, trying to find a path to standing up. One manages, briefly, then collapses again. Hundreds of the iterations are abandoned, and the ones that were closest spawned again into a thousand dreams of a standing robotic dinosaur. Eventually, one succeeds in lunging to its feet and staying erect. The neural net that learned to stand survives. The rest disappear into the ether.
This article continues the exploration that David Clement, Principal at Wavesine and Co-Founder of Senbionic, and I are making into machine learning via Plastic Dinosaur, a robotic velociraptor guided by neural nets. It’s a fictional exercise to introduce and play with concepts of robotics and machine learning, and to explore aspects of where machine learning is today. The first piece was about PD’s physical body. This piece is about his brains, what makes him tick and also occasionally lunge at small dogs with his rubber teeth.
A digression into neural networks & machine learning
What the neck is a neural network? All it is is a deeply trivial approximation of the neurons in our brains and their connectedness to other neurons around them, along with the sensors and actuators outside of the neural net itself. While we think we know how we think, we don’t. Our brain is a black box filled with goo and neurons into which we pour a lot of sensor data and out of which comes a bunch of shouting to various things in our body that do things.
Our brain’s neural network is incredibly flat. It has precious little structure. It evolved over millions of years to do remarkable things from five or six components and a lot of goo and interconnected goo and neurons.
Our manufactured neural networks, on the other hand, have developed hierarchical structures for sight and meaning. The Turing Prize was recently awarded to three deep researchers in this space for their decades of efforts, Yoshua Bengio, Geoffrey Hinton, and Yann LeCun. Two of them are Canadian, but they all work in academia and industry across the US and in Canada. RetinaNet, a reusable tiered image recognition neural net that enables us to rapidly train new visual identification systems with high-accuracy very quickly, grew out of their work. ELMo, a neural net that understands core concepts of language including idiom and context, is another reusable component upon which new solutions can be built quickly.
And modern neural network training and execution processors are graphics processing units on steroids. The same massively parallel supercomputers that render our computer games and the movies we love to watch also are used in both learning and execution of neural net technologies.
That’s an important distinction in this. With our current state of the art, learning is very distinct from doing. A neural network is trained, then it operates, it’s retrained with more inputs and it operates again. Our brains are squishier and more dynamic. Training and operation are much less distinct in our gooey neurons.
Let’s take the example of Tesla’s cars again. They all have sensors and they all have neural nets in their trunks (or someplace). They are all getting new, custom built chips for their onboard systems by the way. But the cars are dumb. They just shove sensor inputs into the chip and listen to the shouting that comes out. If a Tesla car was disconnected from the internet and the learning Cloud that Tesla has built, then it would make exactly the same mistakes under the same conditions no matter how many times their driver corrected them. However, all that sensor data and the corrections that the drivers make gets fed back to their Cloud, the neural net is retrained and then the results are downloaded to the car. So the cars get smarter, but not individually.
What does this mean in real life? Well, when the Tesla Autopilot Beta was turned on, for the first week a lot of drivers found that their Teslas really wanted to take off ramps from freeways. They kept correcting the cars by turning the wheel back. And after a week of that, Teslas didn’t try to take off ramps. The individual cars didn’t learn, the Cloud learned and downloaded new neural nets to the systems on the car.
This collaboration between offline and realtime systems to jointly improve performance is a key element of the state of the art of neural networks. Tesla’s implementation is an example of a federated learning system, where there are multiple instantiations of the neural net that does things and a single instance of the learning neural net that gets trained.
PD’s nervous system & brain
What does this have to do with our new friend, Plastic Dinosaur? Well, it doesn’t mean we’re going to wire up a bunch of transistors and play with them until we get emergent but simple behaviors as Brooks did with the original subsumption robots. Time has moved on in multiple ways.
Even in 2001, David had introduced me to 3D NURBS modeling using Rhino, and we were doing moderately sophisticated robotics 3D modeling. We had the vision of creating an actuated model with virtual sensors and setting it loose in a 3D simulation powered by GPU cards. We foresaw edge detection as a mechanism for dealing with physical encounters, massive parallelization of simulation environments, and evolutionary algorithmic improvement of the robots in the virtual before instantiation in the physical. None of this was a particular upsight in these spaces, but they were concepts that we integrated into our thinking.
Neural nets weren’t on our radar at all. This was going to be coded, parameterized robotics. Lots of hard physical and software engineering to make things work and not fall over.
But now they exist. So how are we envisioning using them? Our initial thought was approximating the way that the human nervous system and brain is structured.
The human nervous system is pretty interesting as a subsumption device. Without any input from the brain, the nervous system has reflexes. It will react to stimulus without any conscious thought, and incredibly quickly. Touch flame, jerk back, then be conscious of the pain. Get hit with one of those little doctor’s hammers in the right place on your knee, and you kick out a bit, something that’s both involuntary and something you can’t stop by thinking about it.
And that’s the layer of the human system that feels everything directly and just does stuff like keep our hearts beating. It’s the autonomic nervous system including the spinal cord that controls and monitors all of our organs.
By the way, neither David nor I are neurobiologists, so mistakes we make in human biology are a combination of our failures of understanding and simplifications for the sake of advancing our Plastic Dinosaur thought exercise.
Above the autonomic nervous system sits the cerebellum. It’s the part that listens to all of the systems that figure out what sensors are saying and tells the parts of the body that do things to do them. It’s the part that helps us stand, walk, and run smoothly. It’s the part that controls our lips, tongue, throat, and lungs so that we talk.
For our purposes, some of the stuff is a lot simpler. Human bodies are incredibly complex compared to mechanical bodies, with absurd amounts of evolutionary kludges. Our eyes alone are miracles of bad design compared to octopus eyes, because they evolved separately and octopuses won the evolutionary lottery. If we were going to redesign humans from scratch, we’d start with octopus eyes. But with mechanical systems and sensors, it’s just a lot simpler. We have simpler, faster communication. We have simpler, much more straightforward controls. The end devices are pretty smart in many cases. Computers can react a lot faster than human brains can.
All of which is to say that for our thought experiment, we’re collapsing the autonomic nervous system and cerebellum into one neural net with some distributed intelligence in the components. Uh oh, more octopus comparisons, but not to the point of the tail just doing its own thing, right? What could go wrong?
After that, the fictional piece intrudes itself again. Conceptually, we wanted to make a responsive dinosaur that started to have emergent behaviors that would be informative and interesting. So we decided that it would be useful to have the equivalent of the amygdala. That’s the part of our brain that is really emotional, has the fight-or-flight instincts, and makes a lot of our decisions for us. It deals with fear and emergency responses. A lot of our instinctual behavior comes from the amygdala, a not-conscious component of our brain that does a lot more of our thinking than we realize.
So that was going to be the next neural net. Fight or flight. Fear. Strong emotions. Rash decisions. And a lot of survivability. This is the part of the architecture that refuses to step in deep holes, and even shies away from them. This is the part that sees rapidly moving objects and flinches back. This is the part that says, “Hey, I don’t recognize that so let’s be careful.” Combined with a physically durable frame with a padded skin and a low center of gravity due to the placement of the battery, survivability is high. Subsumption imperatives met.
And then there’s the rest of the brain. For the purposes of this, we assumed it was a complex neural net that had pattern matching, curiosity, and higher-order goal seeking. It’s the part that when it was hungry, aka the battery was running low, would look around, see its induction pad and yell, “Hey, go over there” to the other neural nets which would take that under advisement. It’s the part that sees something moving and says “Let’s move toward that interesting thing.” It’s the part that recognizes things and decides on higher order goals.
This is a massive simplification, but it’s a useful one.
So we have stack of neural nets approximating the autonomic nervous system+cerebellum, the amygdala and the rest of the brain. We’ll call them cerebellumnet, amygdalanet and curiousnet for convenience, for now. We have a lot of sensors sending a lot of messages. We have a physical structure these all sit on that has motors that allow it to move its legs and tail and head, and it’s wrapped in a skin that has sensors of its own.
That’s a complete architecture, but it’s still just a modernist piece of weird-looking furniture with teeth. How do we make it get up and go?
Well, this is where we get to the concept of dreaming, which we asserted that PD could do. This, conceptually, maps onto the learning phase of machine learning, and it’s the phase in which the learning neural net is trained. It’s cyclical. Train it, try it, train it, try it, ad infinitum.
There are a couple of challenges to overcome. The first is that all simulations are imperfect. The second is that real world experiments are very slow. So we collapsed this a bit.
First off, the physical dinosaur is a pre-baked object, more than not. It has a complete body with actuators and sensors. It’s engineered to be something that works based on the experiences we have and mechanical components that exist. The first step is that engineering stage, and instantiation in the physical and fixing the basics of the physical. That’s a strong connection to a constrained reality.
The second is that it’s replicated, as best as possible, in simulation, and that simulated physicality is the only thing that the neural nets have any control over or receive any inputs from. The design models become as-built models, so that they have high fidelity.
That faithful replication allows the following, at least in theory (and yes, practice is very hard and this is a thought experiment). We run massively parallelized learning sessions on the virtual instantiation before decanting the result into the physical instantiation. A thousand or million virtual attempts to 1–10 physical attempts.
And we start with the first neural net. All we want it to do is stand up, its default stance. All we want to operate are the autonomic and cerebellum components. And so, in the virtual, we run a lot of simulations with just cerebellumnet. It has all the sensors telling it where it is and isn’t. We reward it for achieving and maintaining an upright stance, something pretty trivial to do automatically. We reward it for staying relatively still. We reward it for expending as little as possible energy for staying still. And we let that play out until cerebellumnet figures out which sensors to pay attention to and which to ignore to achieve that basic goal.
Because it’s a neural net, parallelized learning, it’s going to experiment until it figures out some way to stand up. Because we’re doing this virtually on high-powered computers in an evolutionary model, we’ll keep the cerebellumnets that work and throw away the ones that don’t. In theory, in a period of a few days or months, we would have a virtual, standing robot.
Then we instantiate that into the physical object and see what happens. We figure out mismatches between the physical and virtual. We debug. We iterate. Then we go back to the virtual. And then back to the physical. Etc. Until we have a robot that stands up by itself and stands still with minimal power usage.
When it can stand by itself, we start giving it new goals and rewarding it for achieving them. Moving toward something would be the obvious thing. Moving more quickly. Getting over basic obstacles. Not falling over when it’s pushed. All of this is stuff that can be done with cerebellumnet. This is all safe environment stuff. It’s only good for padded rooms without pits or live wires.
When we have a robot that can walk around with basic externalized commands that we feed it, and get over humps and the like, then it’s time to add the amygdalanet.
Once again, lots of iterations in the virtual. Moving objects, holes, fire, extremes of sensor inputs. And we reward it for running away from things, or for biting them. This is the fight or flight layer, after all, and it’s the one that’s making basic decisions of friend, foe, or food constantly. We wouldn’t really be training it for snapping behavior, except that it’s a dinosaur and in the fictional context it’s being used to explore emergent behavior of predation as part of the context, and maybe turned into the next Disney animatronics attraction. Yes, Jurassic Park with robots is part of the fictional world in which PD was created. That’s going to go well, as you can imagine.
And once again, instantiation in the physical. Our little plastic friend is now running competing goals. Externally, we’re telling it to go toward something, but there’s a big hole in the way and its amygdalanet is yelling stop. It’s walking toward its goal and something swings out at it so its amygdalanet yells duck.
And back to the virtual iterations and back to the physical. Until Plastic Dinosaur can go where we tell it and avoid hurting itself in complex environments. Neural net-based subsumption approaches. But only with externally supplied goals.
And so we add the last neural net, the curious one that creates goals and recognizes things. Once again, lots of time in the virtual. It’s interacting with the other two neural nets, integrating their inputs with the more sophisticated ability to recognize things it has. It’s the part that doesn’t just react, but will say, “Hmmm, I’m hungry, where’s that charging stump. Hey, there it is, let’s go over there.” It’s the part that will see something it doesn’t recognize and override the amygdala neural net and say “Let’s go over there and look closer.” And the amygdalanet will say, “That’s close enough!” and they’ll go back and forth and circle around the thing until they figure out whether it’s dangerous or not.
It’s the part that develops stereotypes and applies them to the world. It’s going to be bigoted about everything early on, but over time it will figure out that not all large, white moving objects moving through the air near it are dangerous and to be avoided. Eventually it will learn to catch balls as opposed to considering them enemies.
Every iteration in the virtual is dreaming. Each neural net in the set dreams as well, so they stumble forward in some semblance of unison, improving slowly. It’s not consciously experienced but nothing is conscious in this layered model.
When we have a basic working robot that survives and is curious, then comes the next part. Let it loose in complex environments and have it try things. It will fail a lot of the time. But failure is a recognizable condition. “I’m trying to get to that object, but there’s a hole with flat surface across part of it.” When it ‘dreams’, that hole and the object and flat surface are automatically used to craft a parallelized virtual version from the sensory inputs and experiment forward rapidly to success. And then the resulting improved neural net is instantiated back into the physical, and Plastic Dinosaur has learned a new skill, how to walk across a bridge.
Similarly, door levers and knobs. The curiousnet wants to explore things and sees doors open and close. It learns to recognize them as temporary and that opening is an achievable action. But it can’t work the lever. So it dreams massively parallel dreams of trying a lot of different things. And when it wakes up, the newly downloaded neural net knows how to work a door lever. But not a door knob. So it dreams about that too, and wakes up with a pretty good idea of how to do that. Or not. It might need to dream a few more times.
But is Plastic Dinosaur conscious?
We have a mobile bipedal robot with a lot of sensors that is curious and can learn new things. We can teach it by giving it new external goals and letting it fail, improvise virtually and succeed, much more rapidly.
But there’s still no consciousness there. As another part of the fictional story that emerges from PD’s creation, eventually with a bunch of iterations of dreaming and learning, he acquires consciousness as an emergent property of the stack of neural nets instantiated in a heavily sensor-laden mobile body. That’s more of the direction of any fictional story about his life and conflicts that might spin off of this.
But David and I have been using Plastic Dinosaur to explore machine learning limitations, patterns, and an emergent language about them. We talk about attention and attention as a commodity. We talk about compact models vs stereotypes vs axioms vs patterns. We talk about neural nets stabilizing vs diverging. We talk about unexpected or unplanned outcomes of the different neural nets yelling sometimes conflicting things, with one likely possibility that some small dogs will be in for rude surprises when they run in front of PD and he ‘instinctively’ snaps them up in his rubber teeth, and then has his curiousnet catch up so he can let the dog go again. We talk about how he might forget some skills as neural nets grow, or wake up with new skills that completely replace old ones.
The third article in the series features training Plastic Dinosaur to play goalie. Following articles will deal with further concepts and discussions, laying out the emergent language using Plastic Dinosaur as a framework for the discussions as much as is reasonable. After all, he’s cuddly and cute, or at least as cuddly and cute as a robotic AI Swedish furniture dinosaur with rubber teeth can be.