Older blog entries for rudybrian (starting at number 28)

It's been a while since the last post, so I guess it's time ;)

We have had to push back the Zaza Phase IV deployment plans until we can resolve a voice intelligibility problem introduced with the new speech system. I had originally been using the MBROLA 'us1' voice with Festival, but both the quality and pitch of the voice were too low. I made several attempts to improve recognition accuracy by pitch-shifting the waveform and applying high pass filters to block-out some of the resonant frequencies of the enclosure, but they seemed to have no effect on intelligibility. I recently switched to the OGI CSLU 'tll' voice, which isn't quite as good as the M$ SAPI4/5 'Mary' voice, but a marked improvement over the MBROLA voice. Initial tests this past weekend showed a remarkable improvement in recognition accuracy. Some of the informational text monologs for the exhibit locations were taken directly from the museum's webpage, and should probably be re-worded to improve pronunciation with the TTS engine. It might also be beneficial to introduce support for Sable in the near future.

Zaza had another major hardware failure on Saturday. Since her hardware has used quite a bit, the belts that link all of the wheels to the motors are worn and due for replacement. It also looks, rather sounds, like one of the motors is in desperate need of new brushes. The amount of odometric error has been growing progressively worse and has begun to negatively influence the localizer. The 'MCP' has begun 'limping' the drive motors after most motion commands with a 'TERR' (translate error). This needs to be fixed before we can run the robot again. Hopefully RWI will be forthcoming with the info we need to service the robot...

17 Jun 2002 (updated 17 Jun 2002 at 19:01 UTC) »

Whooboy... The Zaza Phase IV deployment target is edging ever closer, and quite a bit has been done to wrap things up, but there is more to do. I have made major updates to reaction, poslibtcx and poslib and a bunch of minor updates to the map applet, face, and voice system since the last post.

Last weekend's test run went very well. All of the software modules performed flawlessly. The only problem observed was an ACCESS.bus lockup at the end of the run.

Three weeks ago the DC/DC converter that supplies the -12V reference voltage for both of the motherboards on the robot failed and needed to be replaced. Unfortunately the RocketPort serial board needs this voltage to drive the serial interfaces for three of the robot's subsystems. A drop-in replacement part wasn't available, so a TI PT4224 had to be installed in it's place. The new part is substantially more efficient, but larger and a different form factor so it required an adapter board to be made. Removing the old part was a real exercise. The power distribution board needed to be removed from the robot which took about an hour and a half. Then the old part needed to be delicately removed with a Dremel tool and large pair of Vise-Grips ;) The new part was installed and tested operational less than a week after the old part failed, but I might have stressed the ACESS.bus cables a bit re-installing the power distribution board causing the overly sensitive bus to be even more flakey. I'll need to fix this before next weekend's run.

I'll be announcing a limited public run of the new web-based tourguide functionality (Phase IV) for TRCY club members this week for next Saturday's run. If you aren't already a member, join and get in on the fun!

Last Saturday's Phase IV test went pretty much as planned. We didn't have any collisions or close calls during the run. During the test another demo activity converged in the 'atrium area' on the lower level of the museum blocking Zaza's only route to her next goal. The large number of people and the unmapped obstacles currently in the area from the installation of the 'Play' exhibit in the temporary exhibit space prevented Zaza's localizer from getting a revised position for an extended period of time. She began searching for an obstacle that looked familiar but the visitors were so engaged in the demo that they would not let her pass. The 'reaction' module was running, and the robot began verbalizing her dislike of being blocked to the demo audience, to the dismay of the presenter ;) To keep Zaza from further interfering with the demo, we manually joysticked her out of the area. The remainder of the test was fairly uneventful. I was finally able to get the high-level people detection code working about half-way through the run, and we used it for the remainder of the test. If it can do a better job of detecting people, I'll write a new version of reaction to support it for Phase III and IV operation modes.

I made a few architectural improvements to the voice/face system over the last few days. The voiceServer now maintains a 'stack' of the last n cues in shared memory to provide slower asynchronous clients a loss-free way to get speech cue data. This should eliminate the possibility of loosing cues that are sent too quickly to be spoken in real-time. The change required updating the clients and applet, but it was worth it.

Just for grins, I ran the code I have written since the aquiring the robot through SLOCCount, quite amusing this obsession is ;)

The Tech Museum's web folks now have the 'official' Zaza website online.


I finally had a chance to update the Zaza Info and Technical sections on my site this week. Quite a bit has happened in the last five months!

I'm looking forward to tomorrow's Phase IV test and hoping everything goes smoothly ;)

Made a fair amount of progress today.

I solved a fairly serious problem where Zaza would occasionally crash into obstacles after visitors 'herded' her into them. It ends up we weren't using a critical component of the high-level collision avoidance code, and hadn't noticed until now. Oops ;)

I fixed a bug in the voiceClient Perl code that caused the Java face applet to get 'shy' and stop talking when it received a bad speech request from a client.

Last week I submitted a few pages I have been working on for the Tech's website about Zaza. Hopefully the Tech's webfolks will have it up in a week or so.

The FFmpeg team is nearly ready to release another official version that will fix some of the problems with live streaming. It's been nearly nine months since the last release, so it's been a long time in coming. This will allow me to replace the JPEG-based 'webcam' scripts and applet/ActiveX control with a true video streaming system using MPEG4/H263 compression and MP3 audio.

23 Apr 2002 (updated 23 Apr 2002 at 21:15 UTC) »

The Zaza project is picking up momentum.

After three months of software development work I was finally able to switch Zaza's second onboard computer over to Linux. There were some quirks to getting the machine working properly though. The SMP kernel/hardware combination had some trouble with two of the network cards we tried, but I eventially found one that worked properly. We couldn't get the composite NTSC video out port on the video card to sync when running under X, so the addition of a dedicated external VGA to NTSC converter was required. Last Saturday we did the first public run with the new face/voice setup made possible by the upgrade. Things went pretty well all considered.

We have been doing software tests of the Phase IV code on Saturday afternoons after the regular public demo. Slow progress is being made on the 'tour' behavior code, but we should have something usable by mid-summer.

Lots of updates on Zaza's code development since last month:

I think I finally resolved the bug in the zazacam applet that caused it to slowly consume virtual memory. Apparently Java likes to cache images in memory indefinitely if the getImage() function is used. Garbage collection and flush()ing manually don't seem to help.

I fixed a bug accidentally introduced into zazamap applet during the re-write for 0.60 that prevented the graphics canvas from repainting automatically after a position update. The applet still consumes far too much of the CPU in auto-track and higher zoom modes, so I still have some work to do.

As mentioned in the last entry, the zazaface applet now works with the faceServer and performs fairly well. I expected both the scripts and waveforms to be cached on the client-side, but it appears that only the waveform is. Performance is still acceptible. I also updated the viseme image handling routine to auto-scale to the graphics canvas. I still need to tweak the sections of the code related to running in application mode, and correct the polling delay so that is is consistant between platforms.

Before yesterday's Phase IV test run I found and fixed one longstanding bug in poslibtcx's goal arrival code that caused unpredictable behavior. For some reason GCC isn't issuing warnings about variables that are declared twice ;)

I started researching possibilities for the new video distribution system that switching zaza2 to Linux enables. MPEG4IP has some great tools for live streaming, but until the licencing fee issue is resolved, and MPEG4 clients are standardized it isn't an option. FFMpeg worked quite well in tests with an earlier version, FFServer in the current version is broken, so we won't be able to use it either. I guess we are still stuck with MJPEG until a better option becomes available.

20 Feb 2002 (updated 20 Feb 2002 at 22:35 UTC) »

Last week I finished a reference implimentation of Zaza's new face/voice server in Perl. I debated using either Perl or Java for the server-side components as each has aspects that could be of some use. I finally settled on Perl because it would take less time to put something together that would be stable.

The 'faceServer' application is designed to be scalable to the available computing resources both onboard and offboard the robot. It connects to a Festival server by use of the Festival::Client::Async module. Since Festival's rendering of speech can be rather slow, the Festival server can be located on a higher-speed offboard computer. Speech 'cues' are cached on the local file system for better performance with pre-rendered cues. The Java face applet is notified when new cues have been sent to the server by use of cue ID numbers (10-digit CRC32s of the cue string). The client downloads the waveform and 'script' data and begins playback as soon as it has both files. Since the filenames of each cue are unique, they are cached by the applet so no download is required if the cue has already been 'performed' by the applet.

I still need to spend some time with the applet, but expect to have it operational by Friday afternoon.

As always, the project code can be found here.

The Tech Museum is hosting an 'Engineering WOW Weekend' this Saturday and Sunday. Folks from the HBRC and SFRSA will be there showing off their robotic creations. Zaza will have her first chance to interact with another robot since May of last year. It should be fun ;)

I cleaned up Zaza's goal arrival code a bit on the morning of the 15th and ran for most of the afternoon until her batteries ran out. The new code correctly identifies goal arrival, and initiates a verbal announcement of this.

On the 18th I added a few new 'virtual' security barriers to the planners map of the lower level of the museum which should help to prevent collision with a few 'invisible' obstacles in the Explorations gallery.

Work on the new control interface continues. I completed the dual-machine process monitoring Perl CGI, map applet hooks and CSS frame definitions last Friday. The new startup/shutdown CGI will take some time, but I should be able to complete it this Friday.

Over the last five months I have been searching for a way to replace Zaza's Windows-based face application with something that would run under Linux. Zaza2, the second computer onboard Zaza, has been getting progressively less reliable, and has hamstrung our ability to enhance Zaza's 'personality' by adding voice recognition, face tracking, or other vision applications. I tried Wine and several 'virtual machine' applications to run the application as-is. Unfortunately, either the sound output was horrible, or rendering of the mouth was done too slowly. I ran across a project attempting to synchronize the output from Festival with Ken Perlin's Java face. It doesn't appear that he was sucessfull, but gave me a few ideas. On the 16th I began working on a proof-of-concept to see if I could synchronize the output from Festival with the 13 'visemes' used by the MS SAPI SDK's 'talking microphone' with a Java applet/application. By the evening of the 17th I had a working demo. The applet uses a 'phoneme script' and WAV file produced by Festival to synchronize the audio playback with the appropriate 'viseme' for the phoneme being spoken. The results are encouraging, and prove the viability of this approach. My only lingering concern is performance. The architecture of the new face system will need to address this.

I finally had a chance to clean up the map image rendering routines in Zaza's map applet yesterday. Rather than rendering the map directly to the on-screen canvas after each position update, the canvas is double-buffered for a small speed increase. This should help to localize the source of the reported memory leak in one of the applets.

I spent last Friday working on automating the startup of the Phase III/IV applications and servers. Since two machines are involved, and the startup of some of the servers can be quite lengthy I will be moving the CGI back-end of the control interface to the offboard server. Fortunately this machine has a modern Perl version, so I can clean up the Apache CGI interface.

poslibtcx still has some bugs relating to arrival at goals. Occasionally goal arrival is not registered, and the planner gets out of sync with poslib. I need to update the 'reached goal' routine to correct this, and only allow a single execution at arrival. After these issued have been stabilized, I will stub-in routines, to stop, turn to the goal (or audience) and deliver an informational monolog.

19 older entries...

Share this page