Predicting the Future: AI Approaches to Time-Series Problems
Andrea Danyluk
andrea@cs.williams.edu, 413-597-2178.
The Workshop on AI Approaches to Time-Series Problems, jointly sponsored by
the Fifteenth National Conference on Artificial Intelligence (AAAI-98) and
the International Conference on Machine Learning (ICML-98), was held in
Madison, Wisconsin, on July 27, 1998. The organizing committee consisted
of Andrea Danyluk of Williams College, and Tom Fawcett and Foster Provost,
both of Bell Atlantic Science and Technology. There were approximately 30
attendees.
The goal of the workshop was to bring together AI researchers who study
time-series problems, along with practitioners and researchers from related
fields. These problems are of particular interest because of the large
number of high-profile applications today that include historical time
series (e.g., prediction of market trends, crisis monitoring). The focus
was primarily on machine learning and data mining approaches, but
perspectives on statistical time-series analysis and on state-space
analysis (e.g., work on hidden Markov models) were also included. These
communities arguably overlap significantly (and indeed, much work on
state-space analysis, for example, has fallen under the heading of machine
learning). Our goal was to make researchers and practitioners not only
aware of approaches similar to their own, but also aware of methods from
other communities that might be applied effectively to their problems.
Time-series problems include segmentation and labeling of time series
as well as prediction. Classical time-series prediction involves fitting
a function to a numeric time series in order to predict future values (e.g,
the prediction of a stock's price given its past performance). Time-series
prediction may also be required for problems involving categorical, rather
than numeric, data. For instance, one might be interested in predicting
the next action of a computer user, given a history of user actions.
In many situations, the problem is not to predict future values of a time
series but rather to label the series. For example, cardiologists examine
electrocardiograms in order to diagnose arrhythmias. Problems of
segmentation and labeling often occur together. The problem may be to
assign a label to an entire time series, or to segment the series into
subsequences and label them individually. The latter problem is more
difficult because there is temporal information to consider both
within each subsequence itself and among the various subsequences. In
the extreme case, the subsequences may be single atomic events, where
temporal information exists among the events, but there is no temporal
information within the event itself.
The labeling problems we have described assume that there are fixed boundaries
that define the pieces of the time series that are to be labeled. The
boundaries, however, may not be fixed. For instance, consider the problem
of identifying and labeling the stages of a multi-stage flight plan, given
a pilot's command sequence. Here the boundaries of the subsequences will
not be fixed for all flight plans, but will be variable. The problem
becomes not only labeling the stages, but identifying the boundaries
themselves.
In other cases the identification of the boundary may be paramount. For
example, consider the problem of detecting credit card fraud. Time-series
information exists in the form of a stream of credit card transactions for
an account, and the problem is to decide at any given time whether the
account has been defrauded. In real time this would involve examining
(sub)sequences of account activity, and determining, as soon as possible,
whether fraudulent activity exists. Here the problem is not only to label
accurately, but to identify as closely as possible the point where
fraudulent behavior begins, so that usage can be stopped and losses minimized.
The working notes contain 16 papers, seven of which were selected for
presentation at the workshop. In keeping with the goal of making the
workshop a resource for researchers and practitioners in this area,
the working notes included a small selection of relevant papers that
also appear elsewhere. The papers presented at the workshop span the
range of problems described above.
In addition to presentations of papers in the proceedings, there
were two invited talks. Leslie Kaelbling gave a brief tutorial on
reinforcement learning and how it might be applied to problems in time
series. Padhraic Smyth discussed the importance of not dismissing
well-understood techniques from statistics, at least as a first method
of attack when addressing new problems.
Discussions were held throughout the day, and a number of common issues
arose. One recurring issue is the difficulty of obtaining data. While
time-series problems are becoming increasingly important and prominent,
much of the data of interest (for instance, financial or marketing data)
are difficult to obtain due their sensitive nature. User-profiling data
are similarly difficult to share because of privacy concerns. A goal of
the researchers at the workshop is to try to overcome some of these
difficulties and to establish a common repository of time-series data. By
making these data commonly available, researchers can test new algorithms
more easily, and the field can benefit from replication of experiments and
comparison of results.