|
还是看看美国研究生的水平再来护主吧
ICASSP 2009 Final Submission 9/29/08
SPARSE BOOSTING
Zhen James Xiang and Peter J. Ramadge
Dept. of Electrical Engineering, Princeton University, Princeton NJ
ABSTRACT straint [10]. The idea of l -regularized loss minimization has
1
been explored in [9]. However, direct solution of the con-
We propose a boosting algorithm that seeks to minimize the
vex l -regularized loss problem is computationally too ex-
1
AdaBoost exponential loss of a composite classifier using
pensive in many real applications. This has lead to propos-
only a sparse set of base classifiers. The proposed algorithm
als for indirect methods of solution. To this end, -boosting
is computationally efficient and in test examples produces
seeks to solve the regularized problem iteratively adding a
composite classifies that are sparser and generalization bet-
small weight to one base classifier each iteration [5]. How-
ter than those produced by Adaboost. The algorithm can be
ever, -boosting is too inefficient for practical application.
viewed as a coordinate descent method for the l -regularized
1 Other work has examined combining Adaboost with smaller
Adaboost exponential loss function.
l -regularized loss optimization problems from the perspec-
1
Index Terms— Pattern classification, Algorithms, Signal tive of maximizing the margin on the training examples [11].
representations, Optimization methods. Despite this work, there remains a need for an efficient boost-
ing algorithm that directly incorporates a sparsity mechanism.
Our main contribution is to propose a new, computation-
1. INTRODUCTION AND OUTLINE
ally efficient boosting algorithm, called RBoost (short for
Regularized Boost) that iteratively solves the l -regularized
AdaBoost [1] is used as a pattern classification algorithm in a 1
loss minimization problem. RBoost works in a similar fash-
wide variety of signal processing applications [2]. AdaBoost
ion to AdaBoost except that it incorporates a simple, intuitive
constructs a powerful composite classifier as a weighted lin-
mechanism for l -regularization. Moreover, with one simple
ear combination of a large set of base classifiers. Although 1
change, RBoost reverts to Adaboost. Our empirical studies
the discriminant power of a single base classifier is usually
show that RBoost achieves better generalization than Ad-
weak, the composite classifier can achieve an acceptable clas-
aBoost with sparser composite classifiers.
sification accuracy.
In Section 2, we introduce the basic notation of the pattern
In addition to classification accuracy, sparsity of the com-
classification problem and review the AdaBoost algorithm.
posite classifier is a desirable attribute. By this we mean that
We then present and analyze the RBoost algorithm in Section
relatively few base classifiers are assigned a nonzero weight
3. The performance of RBoost is examined experimentally in
in the linear combination. A sparse classifier is easier to store,
Section 4, followed by conclusions in Section 5.
process, and interpret and, most importantly, is less prone to
over fitting. Through empirical studies, AdaBoost is known
to be generally resistant to over fitting, even for a large num- 2. PRELIMINARIES
ber of iterations [3, 4]. However, we will show that there are
additional gains to be made by adding a sparsity mechanism In a typical binary classification problem, one is given train-
ing data S = {(x , y )}m (where x ∈ Rp are instances
directly into the boosting algorithm. i i i=1 i
Boosting algorithms can be interpreted as iterative gradi- and yi ∈ {-1, +1} are corresponding labels) and a set of
base classifiers H = {h }|H| . A base classifier h ∈ H
ent descent procedures that minimize a loss function on the j j =1 j
training data [5, 6, 7]. In many cases, these schemes add at is a mapping from instances to possible labels, hj : Rp →
most one new weight per iteration. Hence early stopping of {-1, +1}. The performance of any classifier h under prob-
ability distribution w = {w }m is measured by its edge:
the boosting process is a simple method to ensure sparsity i i=1
edge(h, w) = m w y h(x ), which is related to the error
[8]. It has been shown that early stopping of AdaBoost ap- i=1 i i i
proximately minimizes an exponential loss function subject probability Perr (h) with respect to w on the training set by
to an l1 constraint on the coefficient vector [9]. This suggests edge(h, w) = 1 - 2Perr (h)
that sparsity can be ensured by imposing l -regularization Let h (x) = sign(|H| α h (x)) denote the composite
1 c j =1 j j
on the optimization of the loss function. This is in accord classifier. We assume that h ∈ H ? -h ∈ H and require
with results in compressed sensing, where l -regularization αj ≥ 0, j = 1, . . . , |H|. Using this notation, AdaBoost can
1
has proved an effective approximation to an l0 sparsity con- be specified as follows: |
|