Current and Past Research
Estimation under uncertainty using probabilistic modeling [1, 2, 3]
To produce reliable estimates under uncertainty, the model, measurement, and parameter uncertainties must all be properly modeled. In the standard Bayesian system ID framework, measurement uncertainty is modeled as a noise term in the system output, parameter uncertainty is modeled as a posterior distribution over the parameters, but model uncertainty is neglected. To correct for this omission, my research accounts for model uncertainty by including a process noise term in the dynamics. Then, a recursive Bayesian filtering procedure can be used to evaluate the new likelihood created by this alteration in an efficient manner.
One benefit of this approach is that it is general enough to contain many other system ID objectives as special cases. By pinpointing the assumptions needed to derive other objectives from this general framework, one can identify the types of problems for which a given objective is best and worst suited. As examples, I have shown in [1] the assumptions needed to arrive at well-known algorithms such as the dynamic mode decomposition (DMD) and the sparse ID of nonlinear dynamics algorithm, better known as SINDy. Then, I numerically validated my theoretical observations by showing that these algorithms’ estimates incur greater error relative to the general approach when their corresponding assumptions are broken. An example result showing the posterior mean from the general framework significantly outperforming DMD on a simple linear system is given below. In a subsequent work [2], I also compared this Bayesian algorithm to a machine learning approach that had achieved state-of-the-art performance on a benchmark dataset containing 100,000 training data. After reducing the dataset to only 1,000 data points and adding noise, the Bayesian approach yielded 8.7 times lower mean squared error than the comparison method. This showed that even state-of-the-art machine learning methods can be greatly improved for small and noisy datasets by proper modeling of uncertainty. Overall, these results demonstrate the robustness of the probabilistic approach under uncertainty and illustrate how the framework can be used to offer a new perspective on the optimality of existing objective functions.
In addition to improvements in estimation accuracy, the use of a stochastic dynamics model for system ID delivers the following three unique benefits.
Novel measure of model complexity for regularization
Explicit regularization in parametric system ID often amounts to introducing an additive term penalizing some notion of the parameter vector size. For black-box models, e.g., neural networks, such terms are difficult to interpret due to the complex interactions between parameters and the non-uniqueness of parameter estimates. By using a stochastic dynamics model, I discovered that a regularization term arises naturally in the likelihood without the need for ad hoc approaches. This regularization term amounts to the determinant of the output covariance of the model. One benefit of this term is that it is more interpretable than parameter regularization because the output of a system typically carries physical meaning, unlike black-box parameters. The presence of this term results in a penalty on dynamics models that map nearby points farther apart, yielding a likelihood that favors dynamics that possess the simplest behavior needed to fit the data. The regularizing effect of this term was validated on numerical experiments in [2] that showed 3.87 times better root mean squared testing error compared to a likelihood without this term that had achieved lower training error. Therefore, the determinant of the output covariance can be interpreted as an effective measure of model complexity for regularization.
Precise tuning of likelihood smoothness
It is well-known in nonlinear system ID that there is a tradeoff between the smoothness of the objective function and assessing the accuracy of the long-term model behavior within the objective. Since objective function smoothness influences the success of optimization, it is common practice to tune the smoothness by introducing a truncation length hyperparameter. This hyperparameter specifies the maximum number of timesteps to consecutively evaluate the model before restarting evaluation at a different initial condition. The performance of this approach, however, strongly depends on careful selection of the truncation length. In [2], I showed that the process noise covariance has a similar smoothing effect as this hyperparameter, but with four main advantages. 1) The covariance is a continuous parameter that allows for more precise tuning than the discrete truncation length. 2) The covariance term can account for correlations between the components of the state for greater tuning flexibility. 3) The covariance enters directly into the likelihood and can therefore be tuned automatically during optimization or sampling. 4) Use of the covariance does not require the estimation of additional initial conditions.
Data-efficient quantification of model uncertainty
In many estimation approaches, it is impossible to know how good the estimated model is without testing it on a set of validation data. If the available dataset is small, however, it is preferable to utilize the full dataset for training rather than reserve a subset for validation. With a stochastic dynamics model, one can get a sense for the model quality without need for a validation dataset by interpreting the process noise covariance as a quantification of the model uncertainty. In [3], I showed through numerical experimentation that comparing the estimated process noise variances of two models was effective in accurately ranking the quality of those models with respect to prediction error.
Reducing data requirements through physics-informed modeling [3, 4]
When modeling physical systems, there is often more information known about the structure and characteristics of the system than that which is contained in the data. Typically, this additional information comes from physics knowledge such as conservation laws. Finding ways to encode this knowledge into the system ID framework is important for three main reasons. Models that follow the laws of physics are (1) more physically meaningful and interpretable, (2) generalize better outside the training data, and (3) require fewer data to train. Other works have validated these benefits numerically by comparing point estimates of physics-informed and non-physics-informed models. In [3,4], I provided a fresh perspective on physics-informed modeling through the lens of uncertainty. Specifically, I showed that using a structure-preserving integrator to evaluate the model during training not only yields more accurate estimates, as others have shown, but it also reduces the uncertainty of model predictions. The reduction in uncertainty was seen both quantitatively through an order of magnitude smaller process noise covariance and qualitatively in a more narrow spread of the outputs yielded by the parameter posterior. These experiments serve as numerical validation for the intuition that encoding physical knowledge into system ID improves model certainty and provide greater insight into the ways in which physics-informed modeling affects the estimation procedure.
Future Research
Fast posterior sampling with generative models
Quantifying the uncertainty in dynamical systems often involves drawing parameter samples from the posterior distribution and then feeding those samples into the dynamics to generate samples of the system output. A significant barrier to this approach, however, is the difficulty of efficient posterior sampling. One approach is to use Markov chain Monte Carlo (MCMC) sampling, but this involves evaluation of the posterior for each sample and can be slow to converge for distributions containing complex correlations. A growing and exciting area of research that has arisen to address these issues is generative modeling. Generative models draw on machine learning techniques to estimate functions that can transform samples from one distribution, such as a Gaussian, to samples from a target distribution, e.g., the posterior. Although these methods have a substantial upfront training cost, sampling can be as simple as drawing a Gaussian sample and evaluating a neural network once training is complete. In this research project, I aim to drastically improve the cost of sampling the posterior by adapting state-of-the-art techniques from generative modeling, such as score-based models and generative adversarial networks, to the Bayesian system ID framework.
This research project will begin with a literature review on generative modeling techniques and their applications to system ID. Then, I will seek to develop a novel methodology for combining the Bayesian system ID approach with generative modeling in a way that addresses gaps identified in the literature. Once this method is developed, it can be tested on benchmark examples against the standard sampling approach of MCMC. To assess the performance of the novel method, I will (1) record the computation time of each method and (2) evaluate the closeness of the samples of the generative model to those of MCMC using a Monte Carlo approximation of the Kullback-Leibler divergence. By using these two metrics, I can effectively quantify the tradeoff between the expense and accuracy of the two approaches. After obtaining good results on these experiments, the next step will involve investigating data assimilation approaches for efficiently updating the generative model with new data. This method will be compared to an MCMC approach where the target distribution is updated with the new data after the chain converges to the initial distribution. Similar performance metrics can be evaluated for comparison.
Correcting reduced-order models with partially-observed experimental data
Many models of high-dimensional systems are prohibitively expensive for real-time applications and control schemes. Reduced-order modeling is often used to address this issue by creating cost-effective, low-dimensional models of high-dimensional systems. The process for this modeling approach requires two steps: (1) identify a low-dimensional subspace that captures most of the system behavior, and (2) estimate a model of the reduced-order dynamics. To perform these estimation steps, many reduced-order modeling methods require high-resolution, full-field, and noiseless data, usually generated through numerical simulation of the full-order model (FOM). In practice, however, this setup is unrealistic. The FOM is almost always partially unknown, inaccurate, or uncertain. The aim of this research project is to correct the reduced-order model (ROM) learned by the imperfect simulation data using experimental data collected directly from the system. This will require corrections to both the subspace ID and ROM estimation steps, as well as uncertainty quantification (UQ) of each correction. The outcome of this project has the potential to reduce the burden of full-order modeling and analysis by using inexpensive data to improve the ROM accuracy.
This project will begin with a literature review of methods that quantify uncertainty in steps (1) and/or (2) of the reduced-order modeling procedure and methods for updating a ROM using partially-observed experimental data. From this review, I seek to identify common modeling assumptions that are needed for state-of-the-art approaches but may be restrictive for real-world problems. Then, I will develop a more general modeling framework that can potentially be applied to a wider range of problems. My next step will be to create a computational algorithm that relies on Bayesian system ID to perform efficient estimation under this new modeling framework. To test the algorithm, I will seek out realistic examples that break the modeling assumptions of comparison methods, and I will assess whether the novel algorithm is better-suited for these problems. For this assessment, the mean squared error of the model predictions will be used as a performance metric. Lastly, an analysis of the computational complexity of the algorithm will be included.