CRA-W Proposal

CREW Proposal

General Project Description

The goal of this project is to integrate several techniques and algorithms to create a robot capable of giving tours of our Science Building . The robot will also be designed to provide directions from its current location to any room in the building. While such a robot may be viewed as a novelty, it will serve to generate more interest in the college's Computer Science program. For us, computer science students, the project presents several challenges. It will also serve to bring about awareness to the general public of the state-of-the-art of robotics and will be an attempt to make people feel comfortable around robots. The robot, besides being mobile, will interact with people using voice generation and recognition and computer vision.

Our science building has a complex layout that is confusing even to its regular inhabitants. Students unfamiliar with the building often get lost and end up arriving late for classes and meetings. Our robot will be capable of providing directions on demand.

We realize that creating a perfect tour guide robot is a huge task that can involve several research-level issues and problems that still remain unsolved. For example, when we say that the robot is going to interact with a person using voice and computer vision, we are not going to design the necessary devices or software, instead we plan to integrate the available techniques and algorithms.

There have been a few robots created by research teams that were designed to give museum tours. Rhino is a robot that gave tours in the "Deutsches Museum Bonn" for six days [1]. Another example is Minerva, an interactive tour-guide robot which gave tours at the Smithsonian Museum of American History for two weeks [2].

Although tour-giving has been done before, it is neither wide spread nor common. It will be a challenge for us, because the people that have succeeded in this project were part of a research team and not undergraduate students, like us. Given our background and preparation we feel that this project will appropriately challenge us and give us an opportunity to apply our computer science knowledge.

Specific Questions to be Addressed

Solving the task described above will require us to learn and integrate several techniques and algorithms dealing with Mapping, Localizing, Path Planning, Collision Avoidance, Task Planning and User Interaction.

In order for the robot to give tours and directions, it must first be equipped with a map/floor plan of the building. These come in two forms: Dynamic and Static. A static map does not change (e.g. a simple floor plan), whereas a dynamic map incorporates changes in the position of obstacles. Commonly used techniques of mapping include occupancy grids, texture maps and traditional graphs.

Localization is the ability to, at any time, have the robot know its position in the building. It is important for the robot to be able to estimate its position so that it knows where to move to get to the next place on the tour. The ability to locate itself requires corroboration of the internal representation of the map with its perception and relies heavily on sensor readings and motor values. Both RHINO and MINERVA used a version of Markov localization algorithms that use probabilities to determine the most likely location of the robot [1,2].

How will the robot get from one room to another while giving a tour? When asked to be taken to a specific room, which route should be taken? These are questions usually addressed by a path-planning algorithm. Path-planning algorithms are either static (once defined, will not change) or dynamic (responding to changing obstructions). They come in forms ranging from graph searching trees to occupancy grids. The team who developed MINERVA used a coastal planner which generated a path between two exhibits minimizing the chances of getting lost by staying close enough to walls so that it was not left in open spaces where sensor data would no-longer be useful [2].

As the robot moves around the building we must insure that it does not bump into objects or people, therefore the robot must automatically stop or turn away from an obstacle as soon as it detects one within a certain specified distance.

The robot needs to be able to coordinate various activities that it can perform involving either motion or interaction. To do this a task planner is used to take commands or readings from a microphone, camera, path-planning, obstacle avoidance, etc. and translate these into motor values or speech values. RHINO used GOLOG; a language used to specify complex actions, and GOLEX which translates the actions from GOLOG into basic commands for the robot.

Interacting using speech inherently involves using natural language understanding. However, this in itself is a large project. For our purposes we will be restricting the interaction to a small, well-defined grammar. For example:

visitor : Help
robot: Would you like a tour?
visitor: No
robot: Would you like to be shown to a room?
visitor : No
robot: Would you like directions to a room?
visitor: Yes
robot: Where would you like to go?
visitor: Room 230
robot: Go down this hall, turn right at the end, the room is on your left.

To be able to start the tour process, the robot must first learn how to recognize people. Given this ability, the robot will be able to stand at the front door and ask any incoming people if they would like a tour or need to be shown a specific room. Using a camera, the robot will also be able to recognize people whom the robot has seen previously that day and should respond 'hello again' or something similar, and will also avoid asking the same person for a tour twice. Once the robot has been adapted to do this it will also be made to recognize an obstacle as human and will perhaps ask them to move out of the way. The techniques used for this were employed by a hors d'oeuvres serving robot named Elektro lead by one of our advisers at the AAAI robot competition [6].

Plan of Work

We will use Elektro a B21R robot from iRobot (see attached picture). Elektro is equipped with laser sensors which provide accurate information about the environment (distances to objects, object forms), sonar sensors and dual light-performance color cameras.

The software that we will use to control the behavior of the robots is 'Pyro' which is developed by a team including both Professor Blank and Professor Kumar [3]. Pyro stands for Python Robotics. All robot behaviors and functions are programmed in the Python programming language.

The robot will be given a map of the building (in the form of an occupancy grid) which it will then adapt to its own representation of the environment. It will build on this map by collecting data while displaying a basic innate behavior to learn its environment. In real-time, the robot will constantly be renewing the map as obstacles move, using sensor readings and images from the camera to detect changes in the occupancy grid. Thus it will use a dynamic mapping procedure.

Having the map of the building is not enough, the robot also needs to know where it is on the map to be able to know what its next move should be. The robot would use its sensors (sonar, IR and camera), finding specific features of a place and its motor data (distance and rotation) to localize itself. It will do this by using probabilistic measures combined with landmark recognition. Once the robot has localized itself, it will find a way to get to the next target using path-planning.

Once an occupancy grid has been created and the robot has localized itself, we will integrate a program currently being developed by the Pyro team. This program is given a grid, starting location and finishing location and finds a path of grid squares to get from start to finish. The path planning will be dynamic and will be updated in real-time to incorporate new obstacles etc. This will also enable the robot to give directions to people as it is giving a tour.

The robot will be trained to avoid obstacles using a Neural Network. Once taught, it will be equipped to navigate through the environment without harming others or itself. This Neural Network will be especially important since during certain times of the day there are large crowds in the hallways. Accompanying the automatic motor values associated with avoidance (stopping, turning out of the way), if the robot recognizes the obstacle as a person, it will interact with the human ('Excuse me!'). We have already written and tested obstacle avoidance using neural nets on several robots as part of a term project in the Developmental Robotics course and also in the subsequent summer research program in 2003.

Task Planning will be built into the 'brain' using behavior-based control. In the brain, the inputs and commands from other various parts of the program (e.g. Path-Planning, voice commands etc) will be translated into changes in the map, producing a current location, motor values and predicted sensor readings.

Using the camera, images will be manipulated to use for landmark detection for the robot's localization of and also in the recognition of a person. Using color histograms, shape recognition and movement, the robot will be able to detect if it is looking at a human and will also be able to recognize a previously seen person (identifying them using a mixture of clothing color and voice recognition). We will achieve this by integrating and building upon the program developed by Professor Blank and fellow researchers [6]. The robot will also have the ability to use a microphone to pick up sound from the surrounding area. Integrating voice recognition software with the robot, we will be able to extract the background noises to an extent and will take voice commands and respond. We are proposing to use the Sphinx speech recognition software from CMU [7]. For speech generation, we plan to use the Festival Speech Synthesis System from University of Edinburgh [8].

Expected Outcomes

So far, based on our coursework in AI, Robotics, Summer Research and Developmental Robotics, we have acquired substantial hands-on experience on working with Elecktro (among other robots). We are well accustomed to programming all kinds of robot behaviors and learning experiments using the Pyro software. This summer, we will be doing research work on mapping algorithms that use occupancy grids. In the work we are proposing, we will be acquiring and learning about path planning using occupancy grids, voice recognition and generation, and integration of all the working components through Pyro.

We plan to release all of our programs as open-source software. We will also produce a DVD movie that will record our work on the project as it progresses through the year, resulting ultimately in a series of clips that demonstrate all the behaviors of the tour guide robot. We plan to write about our results in senior theses and will also send articles for publicity (campus and national magazines and newspapers) and publications in student research conferences (NCUR, SIGCSE, AI Magazine, etc).

References

1. W. Burgard, A.B. Cremers, D. Fox, G. Lakemeyer, D. Hähnel, D. Schulz, W. Steiner, and S. Thrun. The museum tour-guide robot RHINO . In Proceedings of the 14. Fachgespräch Autonome Mobile Systeme (AMS '98). Springer Verlag, 1998

2. S. Thrun, M. Bennewitz, W. Burgard, A.B. Cremers, F. Dellaert, D. Fox, D. Hähnel, C. Rosenberg, N. Roy, J. Schulte, and D. Schulz. MINERVA: A second generation mobile tour-guide robot . In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1999

3. D. Blank, L. Meeden, and D. Kumar. Python Robotics: An Environment for Exploring Robotics Beyond LEGOs . In Proceedings of the Thirty-Fourth SIGCSE Technical Symposium on Computer Science Education, Reno Nevada , ACM Press, February 2003

4. J. Howell and B. Randall Donald. Practical Mobile Robot Self-Localization . Proceedings, 2000 International Conference on Robotics and Automation, San Francisco , CA , April 24-28, 2000

5. L. Meeden, B. Maxwell, N. Saka Addo, L. Brown, P. Dickson, J. Ng, S. Olshfski, E. Silk, and J. Wales. Alfred: The Robot Waiter Who Remembers You . Published in Autonomous Robots , 2001.

6. D. Blank, G. Beavers,W. Arensman, C. Caloianu, T. Fujiwara, S. McCaul,and C. Shaw. A Robot Team that Can Search, Rescue,and Serve Cookies: Experiments in Multi-modal Person Identification and Multi-robot Sound Localization . In Proceedings of the 2001 Twelfth Annual Midwest Artificial Intelligence and Cognitive Science Society Meetings, 2001.

7. The software to perform speech recognition is available from CMU, Sphinx. http://www.speech.cs.cmu.edu/sphinx/

8. The software to convert text to speech is available from the University of Edinburgh , Festival Speech Synthesis System. http://www.cstr.ed.ac.uk/projects/festival/

Student Activity and Responsibility

The students will be responsible for carrying out the necessary research in order to complete the project. We will work on integrating different parts of the project and testing the results making sure that everything works as proposed. We will have to program in Python and use Pyro (which is Python based).

Faculty Activity and Responsibility

Our faculty sponsors will continue the development of Pyro and make sure that the features needed for the project will be part of the program. They will provide the lab and the robots necessary for the project. They will also give assistance on how the robots work.

Budget

The following items will need to be purchased: Speakers and microphone, DVD disks, blank CDs, digital video tape, and books.

Computer Science Dept, Bryn Mawr College, 101 N Merion Ave, Bryn Mawr, PA, 19010