Predicting the Future: AI Approaches to Time-Series Problems Andrea Danyluk andrea@cs.williams.edu, 413-597-2178. The Workshop on AI Approaches to Time-Series Problems, jointly sponsored by the Fifteenth National Conference on Artificial Intelligence (AAAI-98) and the International Conference on Machine Learning (ICML-98), was held in Madison, Wisconsin, on July 27, 1998. The organizing committee consisted of Andrea Danyluk of Williams College, and Tom Fawcett and Foster Provost, both of Bell Atlantic Science and Technology. There were approximately 30 attendees. The goal of the workshop was to bring together AI researchers who study time-series problems, along with practitioners and researchers from related fields. These problems are of particular interest because of the large number of high-profile applications today that include historical time series (e.g., prediction of market trends, crisis monitoring). The focus was primarily on machine learning and data mining approaches, but perspectives on statistical time-series analysis and on state-space analysis (e.g., work on hidden Markov models) were also included. These communities arguably overlap significantly (and indeed, much work on state-space analysis, for example, has fallen under the heading of machine learning). Our goal was to make researchers and practitioners not only aware of approaches similar to their own, but also aware of methods from other communities that might be applied effectively to their problems. Time-series problems include segmentation and labeling of time series as well as prediction. Classical time-series prediction involves fitting a function to a numeric time series in order to predict future values (e.g, the prediction of a stock's price given its past performance). Time-series prediction may also be required for problems involving categorical, rather than numeric, data. For instance, one might be interested in predicting the next action of a computer user, given a history of user actions. In many situations, the problem is not to predict future values of a time series but rather to label the series. For example, cardiologists examine electrocardiograms in order to diagnose arrhythmias. Problems of segmentation and labeling often occur together. The problem may be to assign a label to an entire time series, or to segment the series into subsequences and label them individually. The latter problem is more difficult because there is temporal information to consider both within each subsequence itself and among the various subsequences. In the extreme case, the subsequences may be single atomic events, where temporal information exists among the events, but there is no temporal information within the event itself. The labeling problems we have described assume that there are fixed boundaries that define the pieces of the time series that are to be labeled. The boundaries, however, may not be fixed. For instance, consider the problem of identifying and labeling the stages of a multi-stage flight plan, given a pilot's command sequence. Here the boundaries of the subsequences will not be fixed for all flight plans, but will be variable. The problem becomes not only labeling the stages, but identifying the boundaries themselves. In other cases the identification of the boundary may be paramount. For example, consider the problem of detecting credit card fraud. Time-series information exists in the form of a stream of credit card transactions for an account, and the problem is to decide at any given time whether the account has been defrauded. In real time this would involve examining (sub)sequences of account activity, and determining, as soon as possible, whether fraudulent activity exists. Here the problem is not only to label accurately, but to identify as closely as possible the point where fraudulent behavior begins, so that usage can be stopped and losses minimized. The working notes contain 16 papers, seven of which were selected for presentation at the workshop. In keeping with the goal of making the workshop a resource for researchers and practitioners in this area, the working notes included a small selection of relevant papers that also appear elsewhere. The papers presented at the workshop span the range of problems described above. In addition to presentations of papers in the proceedings, there were two invited talks. Leslie Kaelbling gave a brief tutorial on reinforcement learning and how it might be applied to problems in time series. Padhraic Smyth discussed the importance of not dismissing well-understood techniques from statistics, at least as a first method of attack when addressing new problems. Discussions were held throughout the day, and a number of common issues arose. One recurring issue is the difficulty of obtaining data. While time-series problems are becoming increasingly important and prominent, much of the data of interest (for instance, financial or marketing data) are difficult to obtain due their sensitive nature. User-profiling data are similarly difficult to share because of privacy concerns. A goal of the researchers at the workshop is to try to overcome some of these difficulties and to establish a common repository of time-series data. By making these data commonly available, researchers can test new algorithms more easily, and the field can benefit from replication of experiments and comparison of results.