One of the problems with 3D occupancy grids, is that they
can occupy a lot of storage space in terms of memory or disk
storage. Imagine a space the size of an average house turned
into small 1cm cubes, and that's quite a lot of cubes to
keep track of.
Much of the space inside homes is actually empty, or rather
filled with air, but from the robot's point of view knowing
about probably empty space is just as important (maybe even
more important!) than knowing about what is occupied, and
thereby a potential obstacle. Some savings can be made by
not storing information about terra incognita - areas of the
map which have so far not been explored, but assuming that
we want the robot to have a good understanding of an entire
house this still leaves us with quite a heap of data.
At this point the unimaginative can simply appeal to Gordon
Moore and his famous "law". The capacity of storage devices,
such as hard disk drives, is always increasing and it does
look as if even the smallest storage devices around today
would be able to handle the number of cubes that we would
like to deal with. Even though this is the case loading from
and saving to the storage device is still going to be
relatively slow, and the robot needs to be able to access
the data more or less in real time if it's going to be
useful. We could also be lazy and just load the whole lot
into a large amount of RAM, but ideally it would be good if
low cost devices could be used, such as netbooks, which only
have modest memory and local storage capacity. This would
help robotics to continue becoming more economical and
therefore marketable.
So what to do? Since the occupancy data in this case is
being produced from stereo vision a way to get better
storage economy might be to only store a random sample of
the stereo disparities observed from a dense disparity
image. If we know the location and pose from which the
observation was originally made, based upon the results of
SLAM, then a local 3D occupancy grid can be regenerated
dynamically from a fairly small amount of data as the robot
moves around the house. This means that storage access times
are going to be much shorter, and potentially a lot of
stereo disparity data could be buffered in memory.
Some back an envelope calculations go as follows:
If we randomly sample 300 stereo disparities from a dense
disparity image, and represent the image coordinates and
disparity as floating point values (sub-pixel accuracy),
this translates into
300 stereo features x 3 values (x,y,disparity) x 4 bytes per
value
= 3600 bytes per observation, or 3.5K
If we also want to store colour information, so that
coloured 3D occupancy grids can be produced this increases
to 4500 bytes or 4.4K. There is also the robot's pose
information to store, but this is only a small number of
bytes, so doesn't make a big overall difference. This seems
quite tractable. Potentially the robot could make several
thousand observations as it maps the house, and this only
translates into a few tens of megabytes which is well within
the limitations of what a netbook could handle. Even if the
number of observations rises into the tens of thousands this
still looks feasible.