Multicores and Sensors – which will make robots that jump?
There are two ways to look at solving the problems in robotics. One way is to imagine that robots need really good processing of data. In this form, a robot has a few sensors and a huge back-end processing system to crunch the sensor data down to a particular understanding of its environment. Due to the high cost of sensors, this has been the main approach to robots since the 1960s. A typical research robot has a single sensor (often vision via a camera) and nothing else. Elegant software and fast processors develop models of the world and a plan for action. In this case, there’s room for hope in the recent announcement that Intel has abandoned increasing processor speeds and is not concentrating on ‘multicore’ processors – single chips with several CPUs on them. By 2010 Intel might be fitting dozens of Pentium IVs into a single chip, creating an ideal system for massive number-crunching.
The other approach – which I believe is more likely to lead to robots that jump – s a sensor-rich approach. In this form, a robot has a huge number of sensors. Sensors are redundant (there are many touch sensors) and sensors are of different types (vision, hearing, smell, touch, inertial, magnetic, pressure, temperature, GPS, you name it). The robot does not have a huge back-end processing system for the data, but does organize all the data into a combined map.
Perception in each kind of robot is different. In the case of few sensors, the robot creates a single “sensor world” with defined objects out of data from that sensor. Thus, a robot only using visual data would create a map of objects perceived by light. An ultrasound system might create and auditory map in the same way.
In the second kind, there is a more complex map. Instead of identifying objects within lidar-world or ultrasound-world, the system overlays all the sensor data along the primary categories of perception (according to Kant, that is) of space and time. Objects may not be well-defined in this map, however, the set of attributes (values for a particular sensor at that point) are well-defined at a particular point in space and time.
Traditionally, robot designers have been leery of systems using a lot of sensor data, since it appears to require exponentially-greater computing. In this model, defining an object in terms of multiple attributes derived from many sensors is just too hard. It is much easier (according to this line of thought) to restrict sensation and coax as much as possible out of smaller amounts of sensory data.
While there may be situations where a low-sensor robot is useful, in most cases it seems unlikely. The reason is that biological systems never take this approach. No animal – or plant (which do a surprising amount of computing) has ever evolved with a tiny number of sensors and a large brain. In contrast, the opposite is always true – animals with tiny brains always have comparably rich sensation. Even a bacterial cell has thousands of sensors, and hundreds of unique types. An animal with a brain the size of a thumbtack, e.g., a squid, has an eye comparable to our own and additional senses for temperature, pressure, touch, electric fields, and more. Since evolution is a random process, it would expected to pick the simpler of two solutions – and the rich sensor/small brain model wins every time.
What does this mean for robots? As mentioned before, there may be cases where a robot can get by with limited sensation. Tasks performed in restricted, simple environments should probably emphasize processing. The extreme example of this is a virtual robot, where (unfortunately) much robot research occurs. In a virtual robot, say a game character with “artificial intelligence” sensory input is incredibly limited. There may be a list of objects nearby, and a primitive “physics” governing motion in the environment. Since the world is simple, robots can compete based on their smarts.
In contrast, the real world – even relatively simple environments like a highway or hospital corridor – is hugely more complex. The environment varies along a huge number of parameters, and the root cause of variation is buried at the atomic level. There’s no escape from crunching huge amounts of sensory data to navigate in the real world.
However, most of this crunching is not high-level. What a sensor-rich robot needs instead is a huge number of low-level, parallel processors converting primary sensor data into a useful form. The form need is a map of the data in space and time. This form of low-level processing might be performed by DSPs and other analog/digital chips. For example, researcher as U. Penn have created a mostly analog artificial retina which does the basic processing of images at incredible rates. Instead of dedicating a single computer to crunch visual data you use one of these sensory chips to make things work.
At higher levels, the sensory data does require more elaborate computing, but I would maintain that the increase in power needed is linear rather than exponential. The low-level processing extracts space/time information for the particular sensory data. A little higher up, additional parallel processing extracts a few useful and elementary “features” for the data. The goal of the high-level routines is simply to overlay the mix of sensory features into a common space/time world.
Such a system can react to its virtual world model in two ways. Similar to a low-sensor system, it can extract “objects” based on sensory data and place them in the space/time model. This kind of processing might continue to use one or a few sensors. However, a second kind of processing would be to measure unusual combinations of information from each sensor in the space/time model. For example, focusing on visual appearance would tend to result in a 3D model of rocks nearby the robot. However, this would result in numerous false positives, as is seen in current-day robots. But attribute analysis using many sensor types could catch these problems. For example, a medium “rockness” value in the 3D map would mean something unique if it was paired with high temperature or rapid movement. In a shape-detecting map a bush might appear similar to a rock – but pair that shape with color, internal versus translational movement and one could find “tree-like” regions without having to perfectly find the shape of the tree object. I suspect that with a huge number of unconventional sensors (e.g. a Theramin electrical sense, for example) most objects could be recognized by attribute superposition.
One problem with this is determinating what unique combination of attributes signal the presence of a particular object. It might be very hard to figure this out a priori so training would be required. One can imagine taking the robot out for a stroll in the part, and “telling” it when it was nearby a particular object. A neural-net type system might be enough to create a high-level feature detector using this method. This contrasts with single-sensor programming, where one might try to figure out a priori sensory data for particular objects and hard-code them in.
To date, the company that seems to be thinking this way most closely is SEEGRID, a company co-founded by Hans Moravec to commercialize long-standing research. SEEGRID software is supposed to allow information from multiple sensors to be fused into a single space-time map. This map in turn can be used to reliably control navigation. At present, SEEGRID software is just hitting the range in which several commercial PCs could run it in near real-time in a robot. This is too big for hobby robots or humanoids but fine for that other class of robots that jump – cars. A robo-truck using hundreds of sensors, dozens of DSP processors, and several high-end PC systems might be enough to try attribute-based perception.