Nello Cristianini, Royal Holloway, University of London, nello@cs.rhul.ac.uk
The introduction of Support Vector Machines (SVMs) in the 1990s created
considerable interest in the entire family of kernel-based learning
methods (KMs). In the past few years, a considerable number of
researchers have investigated the theory, algorithmics, and applications
of this class of learning machines.
The main reason for interest in KMs
is their flexibility and remarkable resistance to overfitting, but their
simplicity and theoretical elegance also appeal to practitioners as well
as theoreticians.
Their modular design (a general purpose learning module fitted with a
problem specific kernel function) make them simple to analyse and use,
and very flexible tools.
They have been applied to diverse fields, delivering
state of the art performance in applications from
analysis of DNA microarray data to text categorization, from handwritten
digits recognition to protein homology detection.
For these and other reasons, KMs are now part of the standard toolbox of
machine learning practitioners. Books, workshops, special issues, and
websites have been devoted to this class of learning methods in the last
years, witnessing their growing uptake.
This tutorial will introduce the basic ideas necessary to
appreciate this new approach, and provide pointers to relevant literature
and online software.
Peter A. Flach, University of Bristol, Peter.Flach@bristol.ac.uk
Knowledge Representation is at the heart of all the core methods in
machine learning. Traditionally, ML has assumed the attribute-value
format, where each instance is described by a vector of values for a
fixed sequence of pre-defined attributes. However, this fixed vector
format can be too restrictive for some domains. Considerable recent
research has been directed toward less rigid approaches, such as Koller's
relational probabilistic models, Dietterich's multiple instance problem,
and inductive logic programming (ILP). This tutorial reviews such
extensions of the attribute-vector format from a common perspective, by
concentrating on individual-centred representations. This representation
is applicable to any domain where there is a clear notion of individual,
including molecules, weather forecasts, or chess endgame positions. The
tutorial will introduce individual-centred representations and consider
various formalisms for dealing with them, such as declarative programming
languages or relational databases. This perspective clarifies the
relation between attribute-value learning and approaches such as ILP
which use the full power of first-order logic; it also suggests several
new research problems such as first-order neural networks and first-order
support vector machines.
David D. Lewis, dave@daviddlewis.com
Text classification in the research literature focuses on comparing the effectiveness of supervised learning approaches in classical train/test experiments. Text classification in operational settings is subject to a host of priorities and constraints, and effectiveness can be fairly far down the list. In this tutorial, I will first review text classification as an abstract machine learning problem, but will then devote the bulk of the effort to practical problems that arise in applying text classification. Examples will be drawn from applications such as knowledge management, customer service automation, web directories, vertical portals, spam and porn filtering, and content analysis. I will end by discussing some machine learning research directions suggested by the practical challenges of text classification.