|  | Memory Leak Blamed for Princeton's Urban Challenge Loss | Posted 18 Nov 2007 at 23:41 UTC (updated 20 Nov 2007 at 16:53 UTC) by steve  |
The Princeton Autonomous Vehicle
Engineering (PAVE) team chose Microsoft's proprietary Robotics Studio and the
C# programming language as the development platform for their robot. As
it turns out, this choice may be partially responsible for their failure
in the qualifying rounds. The robot exhibited a slowly deteriorating
response time, which ended in complete software failure after about 40
minutes. As a work around, they adopted a strategy of having a timer
reboot their computers every 40 minutes. However, the problem turned out
to be related to garbage collection of memory used to store information
about obstacles, which meant the more obstacles the robot saw, the
faster the software failed. In the qualifying rounds, there were many
more obstacles than expected leading to a crash after 28 minutes.
Analysis of the code performed later with a proprietary .net code
profiler revealed that under certain conditions, the C# garbage collector
wasn't freeing memory as the programmers expected. For more see PAVE team
member Bryan
Cattle's more detailed description of the problem. A discussion of
the incident can also be found in a recent Slashdot
posting. Correction: The events mentioned in the referenced
article occurred in the 2005 DARPA Grand Challenge, not the 2007 Urban
Challenge as suggested by the articles release date. See comments below from
Microsoft's Tandy Trower for further details.
Are they sure this was a problem with the runtime itself and not the user's code? A quick read of the /. postings reveal a debate about this.
The article sort of looks like an promo piece for ANTS.
Or both?, posted 19 Nov 2007 at 19:54 UTC by steve »
(Master)
I've seen similar disasters with Java development. I think the
fundamental problem is that some programmers rely too much on garbage
collectors doing the "right thing", for definitions of "right thing"
held by the programmer and not the language designers. :-)
Maybe the programmers left objects around that were still referenced somewhere that they didn't need anymore and the whole thing just bogged down after memory ran out. I've written programs that do that :)
If the garbage collector could just read our mind!
R
Stress test, posted 19 Nov 2007 at 20:47 UTC by motters »
(Master)
I've also run into garbage collection problems with C# in a few cases, but generally it works well. The bottom line is that whenever developing realtime systems you always need to benchmark the code and see what the timings are and how they change. Also for a robot like this there's no substitute for doing "stress testing" just by running the system for long periods of time to see if anything breaks.
I received this comment in an email from Tandy Trower of Microsoft this
morning, offering corrections of some of the facts of the story. Most
importantly, that while this story was published this month, the events
described actually took place in the 2005 Grand Challenge, not
the 2007 Urban Challenge. This date precedes the release of Microsoft's
Robotics Studio and PAVE's use of the software:
I wanted to offer some corrections to your recent post on robots.net
where you suggested that PAVE’s failure at the DARPA Challenge might
have been due to their selection and use of Microsoft Robotics Studio.
Perhaps you have also received from other sources.
First, the reference you used from Slashdot, which was in turn derived
from the article posted on the Code Project website, was about
Princeton’s participation in the 2005 DARPA Grand Challenge, not the
2007 event, which you can see if you take a closer look. Microsoft
Robotics Studio was only used by the PAVE team in the 2007 event, and
wasn’t even announced or previewed in 2005. So the conclusion and
linkage to our robotics SDK is incorrect.
Second, the article written by Bryan Cattle cited on Code Project I
believe was intended to talk about how the ANTS profiler tool could have
been used to identify a problem, not a C# or CLR GC issue.
Third, it is important to know that you can still leak in a managed
memory environment, if you don’t remove references to resources.
As an aside, Microsoft Robotics Studio actually does more than any
system we are aware of, in that it will automatically discard messages
accumulating on DSS ports, if a certain limit is exceeded.
Even the Slashdot post you derived your post from never mentions any
reference to Microsoft Robotics Studio nor does the Code Project
article. I am certain if you contact Bryan Cattle or the other folks at
PAVE they would also confirm that your inclusion of Microsoft Robotics
Studio in your post was in error. Again the article references only
their 2005 participation, not 2007. While they indeed did not make into
the finals at the 2007 event, we have no information that suggests that
it had anything to do with their use of our SDK.
Tandy Trower
|