Machine
Learning for the (CS) Masses |
Curriculum
Descant
From ACM Intelligence Magazine Volume 12, Number 3, Summer 2001 ACM Press |
To appear soon. |
Formerly considered a somewhat esoteric subfield of computer science, machine learning is now seeing broad use in computer science applications, for example, in search engines, computer games, adaptive user interfaces, personalized assistants, web bots, and scientific applications. However, few colleges and universities require a course in machine learning as part of an undergraduate major. It is time for us as computer science educators to recast an introduction to machine learning concepts as a staple of a computer science education. Of course, there are many possible flavors of machine learning that might be emphasized, and in a one-course introduction to the field, one must chart a course consistent with the educational environment and instructor's background. For example, one might focus a course on a particular approach to machine learning, such as neural nets, or might focus on a particular form of machine learning such as data mining. Regardless of the specific focus of the course, it is imperative that the course entail a significant hands-on component, working with actual systems. This component is essential for giving computer science students the concrete exposure to working with programs that adapt, rather than the abstract knowledge that such a thing is possible. In a course focused on the data mining form of machine learning, students can work with a variety of systems. They can ask the question "is system A better than system B?", learn to refine that question to add "in this context" or "on this type of problem" to the question, and learn some simple statistical methods (such as the use of confidence intervals and t-tests) that are part of answering the question. Statistical approaches are often used within machine learning systems as well as to evaluate and compare the performance of different systems, so this provides a particularly nice context for introducing students to statistics. In addition to hands-on experience with data mining (and as time permits) students may do additional reading on other forms of machine learning to broaden their view of what the field includes, while not becoming overwhelmed with implementation details. Useful data sets can often be obtained from faculty members in other departments on campus. Faculty members in other departments are likely to use other approaches (such as logistic regression) when analyzing their data, and some will welcome having a student try an entirely different approach to gain insight on the problem. The students, in turn, are able to see their computer science talents as useful to another discipline's questions. Another reason the course is attractive is that one is able to give students the experience of using and modifying large software systems. There are several machine learning systems available for the downloading, for example, at the University of California, Irvine (UCI) Machine Learning Repository [2], the Carnegie Mellon AI Repository [3], and the AAAI Education Repository [1]. Students can be exposed to a culture where data sets and code are shared, and can be expected to get oriented to a large system and make changes or extensions to a specific part of such a system. An undergraduate course in machine learning can be offered with minimal prerequisites, for example, to students who have completed the data structures course, which enables a broad range of students to take it. Not even an AI course need be a prerequisite, though of course key topics such as search, heuristics, and representation must be introduced. While a graduate-level machine learning course is often structured to read scores of journal papers that describe different systems (perhaps working with three or four systems), an undergraduate level course can eschew the "breadth" of machine learning and adopt a specific focus, such as the focus on the "data mining" form of machine learning as described here. One possible structure for such as course proceeds as follows. At the beginning of the course, lectures focus on an overview of machine learning while students work on writing a "data preprocessor" that reads data in from a file and collects simple statistics on it, for example, the percentage of positive and negative examples. Next, students read about decision trees and continue their programming work to write their own decision tree program. Third, students do experiments with their system, evaluating the effects of varying the metric used to determine the split in the tree and on different data sets. For this project, students should be taught and expected to use confidence intervals and paired-t tests, and not to blithely state that one system is "best" based on simple average performance. Fourth, students can work with an alternate form of data mining, for example, to develop a genetic algorithm or neural net approach to data mining, refining "off the shelf" code to work with their problem. (Throughout most of the semester, students can work with publicly available datasets, such as those available at the UCI repository.) As final projects, students should choose a problem, ideally a real research problem from faculty members in other departments, and either experiment with different variations on a single machine learning system, or compare two or three different systems. (Data sets from the UCI repository can be used as a fallback.) Students should be expected at this point to be able to pose a research question (perhaps with some guidance), conduct their experiments according to the methodology described above (including the use of statistics), and present their findings as a research paper. It may be useful to pose this as "preliminary research" for a larger project, perhaps to investigate which of two or three approaches seems to hold the most promise for a specific data set. Throughout the semester, short labs can be used to give students hands-on experience with systems written by others, such as Cobweb, Autoclass, and neural nets. (As you may guess, the course described here is one that I have designed and delivered. Online materials are available.) An undergraduate course in machine learning provides an opportunity to introduce students to research, the experimental paradigm, and the use of statistical methods. Students can conduct actual original research (though modest) in the context of the class, which can lead to further independent study pursuit of larger projects. The course can also provide experience in working on a large software system and can expose students to a culture where data and code are shared. And, of course, an undergraduate course in machine learning introduces students to a vital and increasingly important area of computer science, providing them with skills, factual knowledge, and insights that are likely to serve them in many ways in their futures. References [2] Blake, C. L. & Merz, C. J. (1998). The UCI Repository of machine learning databases (also contains software). Irvine, CA: University of California, Department of Information and Computer Science. [3] The Carnegie Mellon University (CMU) Artificial Intelligence Repository: (particularly the "learning" subtree, which includes Genesis, Classweb, and source code and notes for a neural networks course). |
Fall
1997 Summer
1998 Fall
1998 Winter
1998 Spring
1999 Summer
1999 Fall
1999 January
2000 Spring
2000 Summer
2000 Fall
2000 January
2001 Spring
2001 Spring
2001 |
About Curriculum Descant
Curriculum Descant has been a regular column in ACM's Intelligence magazine
(formerly published as ACM SIGART's Bulletin). The column is edited by
Deepak Kumar. The column features short essays on any topic relating to the
teaching of AI from any one willing to contribute. If you would like to contribute
an essay, please contact Deepak Kumar.