One of the problems with 3D occupancy grids, is that they can occupy a lot of storage space in terms of memory or disk storage. Imagine a space the size of an average house turned into small 1cm cubes, and that's quite a lot of cubes to keep track of.
Much of the space inside homes is actually empty, or rather filled with air, but from the robot's point of view knowing about probably empty space is just as important (maybe even more important!) than knowing about what is occupied, and thereby a potential obstacle. Some savings can be made by not storing information about terra incognita - areas of the map which have so far not been explored, but assuming that we want the robot to have a good understanding of an entire house this still leaves us with quite a heap of data.
At this point the unimaginative can simply appeal to Gordon Moore and his famous "law". The capacity of storage devices, such as hard disk drives, is always increasing and it does look as if even the smallest storage devices around today would be able to handle the number of cubes that we would like to deal with. Even though this is the case loading from and saving to the storage device is still going to be relatively slow, and the robot needs to be able to access the data more or less in real time if it's going to be useful. We could also be lazy and just load the whole lot into a large amount of RAM, but ideally it would be good if low cost devices could be used, such as netbooks, which only have modest memory and local storage capacity. This would help robotics to continue becoming more economical and therefore marketable.
So what to do? Since the occupancy data in this case is being produced from stereo vision a way to get better storage economy might be to only store a random sample of the stereo disparities observed from a dense disparity image. If we know the location and pose from which the observation was originally made, based upon the results of SLAM, then a local 3D occupancy grid can be regenerated dynamically from a fairly small amount of data as the robot moves around the house. This means that storage access times are going to be much shorter, and potentially a lot of stereo disparity data could be buffered in memory.
Some back an envelope calculations go as follows:
If we randomly sample 300 stereo disparities from a dense disparity image, and represent the image coordinates and disparity as floating point values (sub-pixel accuracy), this translates into
300 stereo features x 3 values (x,y,disparity) x 4 bytes per value
= 3600 bytes per observation, or 3.5K
If we also want to store colour information, so that coloured 3D occupancy grids can be produced this increases to 4500 bytes or 4.4K. There is also the robot's pose information to store, but this is only a small number of bytes, so doesn't make a big overall difference. This seems quite tractable. Potentially the robot could make several thousand observations as it maps the house, and this only translates into a few tens of megabytes which is well within the limitations of what a netbook could handle. Even if the number of observations rises into the tens of thousands this still looks feasible.