CS 330: Algorithms: Design & Practice

Homework 3: The Soundex Phonetic Encoding Algorithm
Due: In class on Thursday, February 18

Page 16 of Bentley's book makes a passing reference to the Soundex Indexing system. In this assignment you will do some independent research on the Sondex algorithm: find out what it is, implement it in Python.

Implement the algorithm as a function that, given a name (string) encodes and returns the desired length soundex encoding. That is:

def soundex(name, len=4):
      """Returns the Soundex encoding of name. The returned code will be len chars long.""" 

Write a short program that inputs names, and outputs the Soundex encoding. Show the output of your program on at least 3-5 different sets of homophones. You may use variations on your own name as examples.

Do some background research on the use of Soundex, its applications, and its evolution.

What to hand in

1. Write a short history of the Soundex algorithm (at most 5 pages). Please write the document as a research paper, in the style of Bentley's Chapters. Use proper citations and attributions. Format the document using 1-inch margins all around, single-spaced, in 12pt Times Roman (or equivalent) font. In your write up, besides the history and its evolution, include a description of the algorithm, sample encodings, and sample inputs/outputs and encodings from your program, followed by some analysis.

2. Your implementation in Python and a printout of all sample output. Include this as an appendix to your report.

Notes:

Please, do not consult any implementations available on the web or from a text for this one. A good starting place is the Wikipedia page: http://en.wikipedia.org/wiki/Soundex where the algorithm is described. You are to design your own program to implement the algorithm, based on the description. Document any assumptions that you had to make during the implementation. Later, we will learn about some other phonetic encodings. Please restrict your research and writing to the Soundex algorithm itself.

Presentation:

No presentations for this exercise.


Back to CS330 Materials