ICML 2013 Workshop on Machine Learning For System Identification

Schedule

Thursday, June 20

8:20 - 8:30	Welcome
8:30 - 9:30	Lennart Ljung (Linköping University) Some Classical and Some New Ideas in System Identification Abstract: This presentation gives an overview of state-of-the art in System Identification. The focus is on classical parameter estimation in carefully selected model structures. The most common structures will be reviewed as well as the basic, classical asymptotic properties in terms of bias and variance. The advantages as well as the disadvantages of the common identification practice will be mentioned. Recent developments inspired by machine learning will then be discussed and also how they relate to the classical approaches by the introduction of well-known, but carefully tuned regularization techniques.
9:30 - 10:15	Lieven Vandenberghe (University of California, Los Angeles) Graphical models of autoregressive time series Abstract: In a graphical model of a Gaussian random variable the sparsity pattern of the inverse covariance matrix determines the conditional independence relations and the topology of the graph. A popular method for estimating the sparse inverse covariance matrix is based on a penalized maximum likelihood formulation with 1-norm penalty. This requires the solution of a convex optimization problem for which several efficient algorithms have recently been proposed. In this talk we will discuss extensions of this sparse inverse covariance problem to autoregressive models of multivariate time series. We will present maximum likelihood formulations with convex penalties and constraints, and discuss first-order algorithms for solving them.
10:15 - 10:30	Coffee break
10:30 - 11:15	Giuseppe De Nicolao (University of Pavia) Bayesian linear system identification with stable spline priors Abstract: The issue of linear system identification is revisited as a Bayesian learning problem related to the reconstruction of an unknown function. As such, the estimator is specified by the choice of the prior distribution for the unknown impulse response, regarded as the realization of a stochastic process. Under gaussianity assumptions, the prior is specified by an autocovariance function, whose choice is the key ingredient for developing a successful system identification scheme. Classical choices used in the field of Gaussian Processes do not prove adequate as they are not tailored to the specific features of linear system identification. The introduction of the so-called stable-spline prior brings definite advantages as it guarantees asymptotic stability in addition to standard smoothness properties. The probabilistic framework offers a further advantage as the tuning of the hyperparameters can be restated as the problem of maximizing a marginal likelihood, overcoming some shortcomings of classical model order selection procedures largely employed in parametric identification of linear dynamic systems.
11:15 - 12:00	Alessandro Chiuso (University of Padova) Smoothness priors, Shrinkage and Sparsity in System Identification: Bayesian procedures from a classical perspective Abstract:
12:00 - 14:00	Lunch break
14:00 - 14:45	Byron Boots (University of Washington) Spectral Approaches to Learning Dynamical Systems Abstract: In this talk I will give an overview of spectral algorithms for learning compact, accurate, predictive models of partially observable dynamical systems directly from sequences of observations. I will discuss several related approaches with a focus on spectral methods for classical models like Kalman filters and hidden Markov models. I will also briefly discuss variations of these algorithms including batch and online algorithms, and kernel-based algorithms for learning models in high- and infinite-dimensional feature spaces. All of these approaches share a common framework: the model's belief space is represented as predictions of observable quantities and an eigen-decomosition is applied as a key step for learning the model parameters. Unlike the popular EM algorithm, spectral learning algorithms are statistically consistent, computationally efficient, and easy to implement using established matrix-algebra techniques.
14:45 - 15:30	Mario Sznaier (Northeastern University) Hankel Based Maximum Margin Classifiers: A Connection Between Machine Learning and Nonlinear Systems Identification Abstract: Finding low dimensional parsimonious non-linear representations of high dimensional correlated data is a classical problem in machine learning and a large number of solutions are available. However, while these methods have proved very efficient in handling static data, most do not exploit dynamical information, encapsulated in the temporal ordering of the data. Thus, the resulting embeddings may not be suitable for problems such as tracking, anomaly detection or time-series classification, that critically hinge on capturing the underlying temporal dynamics. Alternatively, from a control perspective, the problem of finding low dimensional embeddings that respect the temporal dynamics can be recast as a Wiener system (the cascade of a linear system and a static nonlinearity) identification problem. Here the linear dynamics account for the temporal evolution in the embedding manifold, while the nonlinearity models the mapping from this manifold to the original (high dimensional) data. Identification of Wiener systems has been the subject of recent intense research in the control community, leading to large number of approaches, which can be roughly classified into statistical and set membership. A salient feature of these approaches is that the dimension of the state space of the system is assumed to be known. However, in the case of interest here (embedding of dynamic data) this information is not a-priori available and must also be identified from the experimental data, a situation that cannot be handled by existing techniques. This talk presents a rapprochement between systems identification and machine learning techniques. Our goal is, starting from experimental input output data, to find an embedding manifold such that the data can indeed be explained as a trajectory of a Wiener system, and to identify its linear and non-linear portions, as well as the dimensions of both the embedding manifold and the dynamics there. A salient feature of the proposed approach (common in machine learning, but to the best of our knowledge hitherto not used in the identification community), is its ability to use of both positive and negative samples, that is experimental data generated both by the system to be identified and by other systems. This is a situation commonly encountered in applications such as activity classification, where sample clips of different activities are available, or in tracking, where often a segmentation separating the target of interest from other targets and the background is known. The main result of the talk shows that in this context, the problem of jointly finding the embedding manifold and the linear dynamics can be recast into a convex optimization over a semi-algebraic set (a set defined by a collection of polynomial inequalities). In turn, the use of recent results from polynomial optimization allows for relaxing this problem to a tractable convex optimization. Further, as in kernel based methods, the proposed algorithm uses only information about the covariance matrix of the observed data (as opposed to the data itself). Thus, it can comfortably handle cases such as those arising in computer vision applications where the dimension of the output space is very large (since each data point is a frame from a video sequence with thousands of pixels). These results will be illustrated with both academic examples and practical ones involving human activity classification from video clips.
15:30 - 17:00	Poster Session
17:00 - 17:45	Thomas Schön (Linköping University) Nonlinear system identification enabled via sequential Monte Carlo Abstract: Sequential Monte Carlo (SMC) methods are computational methods primarily used to deal with the state inference problem in nonlinear state space models. The particle filters and the particle smoothers are the most popular SMC methods. These methods open up for nonlinear system identification (both maximum likelihood and Bayesian solutions) in a systematic way. As we will see it is not a matter of directly applying the SMC algorithms, but there are several ways in which they enter as a natural part of the solution. The use of SMC for nonlinear system identification is a relatively recent development and the aim here is to first provide a brief overview of how SMC can be used in solving challenging nonlinear system identification problems by sketching both maximum likelihood and Bayesian solutions. We will then introduce a recent powerful class of algorithms collectively referred to as Particle Markov Chain Monte Carlo (PMCMC) targeting the Bayesian problem. PMCMC provides a systematic way of combining SMC and MCMC, where SMC is used to construct the high-dimensional proposal density for the MCMC sampler. The first results emerged in 2010 and since then we have witnessed a steadily increasing activity within this area. We focus on our new PMCMC method "Particle Gibbs with ancestor sampling" and show its use in computing the posterior distribution for a general Wiener model (i.e. identifying a Bayesian Wiener model).

Friday, June 21

8:30 - 9:15	Håkan Hjalmarsson (KTH - Royal Institute of Technology) Identification of structured linear regression models: Model structure selection and computation Abstract: In this talk we discuss identification of structured models. While the concepts are applicable in a very broad context, for reasons of clarity and conciseness of the exposition, we focus attention to identification of structured linear regression models. We show that many novel methods, such as l-1 and nuclear norm based estimation fit into the presented framework. We also introduce a new type of model structure: quadratically constrained linear regression models, that leads to estimation methods closely related to kernel-based regularization. A likelihood perspective is taken on the model structure selection problem and compared with classical methods such as Stein’s unbiased risk estimate and information based criteria (e.g. AIC). Finally, we address computational issues, in particular relaxation techniques ensuring convexity.
9:15 - 10:00	Marco Signoretto (KU Leuven) Tensor Estimation Problems with Multilinear Spectral Penalties Abstract: Recently there has been an increasing interest in the cross-fertilization of ideas coming from (convex) optimization, kernel methods and tensor-based techniques. In the first part of this talk we will present a framework for learning when feature observations are multidimensional arrays (tensors). The approach leads to transductive as well as inductive problem formulations that we illustrate by means of applications. In either case the technique exploits precise low multilinear rank assumptions over unknown tensors; regularization is based on composite spectral penalties and connects to the method of Multilinear Singular Value Decomposition (MLSVD). As a by-product of using a tensor-based formalism, the approach allows one to tackle the multi-task case in a natural way. In the second part of the talk we discuss how these ideas can be generalized from the finite dimensional setting to the functional setting. Specifically, we discuss the estimation of tensors from data in the unifying framework of reproducing kernel Hilbert spaces (RKHSs). When the functions in the RKHS are defined on the Cartesian product of finite discrete sets, the formulation specializes into existing matrix and tensor completion problems. Additionally, the approach leads to the extension of kernel-based formalisms based on operator estimation. We elaborate on a novel representer theorem and highlight the advantages with respect to alternative learning techniques.
10:00 - 10:30	Coffee break
10:30 - 11:15	Cristian Rojas (KTH - Royal Institute of Technology) On the consistency of the fused lasso Abstract: In this talk, we will discuss the use of the l1 heuristic for segmenting time series with respect to changes in the mean or variance. This technique, known as the fused lasso or total variation denoising, has been also successfully used to address other tasks, such as filtering images, segmenting ARX models, or estimating threshold policies in Markov decision processes. Given the wide range of applications of the fused lasso, a relevant question to address is when does it work, i.e., when does the fused lasso determine the true change points of the underlying signal (e.g., mean or variance). We will consider here this question, in an approximate support set recovery consistency sense, as the number of samples tends to infinity. An important aspect revealed by our analysis lies in the fact that the true change points cannot be found exactly, but at most within an arbitrarily small neighborhood, under specific conditions described in the talk.
11:15 - 12:00	Marc Deisenroth (TU Darmstadt) Bayesian Machine Learning for Autonomous Systems and Robots Abstract: Technical, biological, and economical systems often produce large quantities of data. However, the measured signals are often noisy, and the latent processes that generate the data are not fully known. Making predictions or decisions based on such noisy signals and uncertain functional relationships is a challenging task. There are two ways of addressing this problem: One can either technically reduce uncertainty with additional infrastructure or develop algorithms that are robust to uncertainty by adaptation. Reducing uncertainty is expensive in terms of engineering and often no longer possible for the complex systems frequently encountered today. We will show that Bayesian machine learning can be used to automatically extract important information and latent relationships from noisy data. I will largely focus on Bayesian machine learning in control and robotics, but the developed algorithms are applicable to a wide range of areas, such as sensorimotor control, neuroscience, and economics. We will present fully data-driven machine learning approaches to sensing and acting in robotics and control. Using Bayesian system identification, a posterior probability distribution over plausible dynamics models is robust to model errors and crucial for meaningful predictions and decision making. As a distribution over dynamics models, we use nonparametric Gaussian processes (GPs). Building upon system identification, we will present a unifying view on Bayesian latent-state estimation (filtering and smoothing). This unifying perspective allows both to re-derive common filters (e.g., the Kalman filter) and devise novel smoothing algorithms for smoothing with Gaussian processes. Based on Bayesian inference, weI will present a novel framework for autonomously learning robot controllers. Our framework achieves an unprecedented speed of learning compared to the state-of-the-art methods.
12:00 - 14:00	Lunch break
14:00 - 14:45	Necmiye Ozay (California Institute of Technology) Dynamics-based information extraction via hybrid system identification Abstract: This talk addresses the problem of robust identification and (in)validation of hybrid models from noisy data. In particular, for static data we look for geometric invariants; and for dynamic data we try to infer the underlying switched affine dynamical system that could interpolate the data within a given noise bound. We define suitable a priori model sets and objective functions that seek "simple" models which can capture the information sparsely encoded in the data set. Although this leads to generically hard to solve, nonconvex problems, as we show, computationally tractable relaxations (and in some cases exact solutions) can be obtained by exploiting a combination of elements from convex analysis and the classical theory of moments. In the second part of the talk, we will illustrate the application of these results to some nontrivial computer vision problems such as video and image segmentation, where the goal is to detect changes, for instance in scenes, activities, or texture.
14:45 - 15:30	Henrik Ohlsson (University of California, Berkeley) Rank Minimization and Sparsity for Identification Abstract: Rank minimization and sparsity are two of the hottest topics in machine learning and system identification. This talk presents two novel applications for theses techniques. We first consider blind identification with piecewise constant inputs. We show that this problem can be posed as a rank minimization problem and motivate our work using the application of energy disaggregation. We secondly consider a particular case of nonlinear system identification, namely the problem of finding the linear system in a Wiener system with a known non injective nonlinearity. We show how this problem can be posed as a rank minimization problem and derive conditions for perfect recovery. We further make connections to recent developments in nonlinear compressive sensing.
15:30 - 16:00	Coffee break
16:00 - 16:45	Tianshi Chen (Linköping University) System identification with sparse multiple kernel-based regularization method Abstract: In this talk, the recently introduced kernel-based regularization method for linear system identification will be considered. Instead of single kernels, multiple kernels are used which are conic combinations of certain fixed kernels suitable for impulse response estimation. With multiple kernels, it is possible to better capture impulse responses of systems with diverse and complicate dynamics than well-tuned single kernels. Moreover, multiple kernels equip the non-convex marginal likelihood maximization problem with two features. First, it is a difference of convex programming problem, a local minima of which can be found efficiently using sequential convex optimization techniques such as majorization minimization algorithms. By exploiting the structure of each convex optimization problem, it is further possible to develop an efficient and accurate implementation for the proposed algorithm. Second, it favors sparse weights of the multiple kernel and thus sparse multiple kernel. This feature enables this multiple kernel-based regularization method to handle various structure detection problems in system identification.