Skip to Content

Department of Mathematics

RTG Seminars on Data Science

We invite speakers to present original research in Data Science.

2023-2024 Academic Year

Organized by: Wuchen Li (

This page will be updated as new seminars are scheduled. Make sure to check back each week for information on upcoming seminars.

We will try to offer a virtual option via Zoom, as well as the regular in person option. The Zoom details are listed below:

Zoom Link:

Meeting ID: 942 9769 4178

Passcode: 488494

When: February 2nd 2024 from 3:40pm-4:40pm

Where:  LeConte 440

Speaker: Yuehaw Khoo, (University of Chicago)

Abstract: Tensor-network ansatz has long been employed to solve the high-dimensional Schrödinger equation, demonstrating linear complexity scaling with respect to dimensionality. Recently, this ansatz has found applications in various machine learning scenarios, including supervised learning and generative modeling, where the data originates from a random process. In this talk, we present a new perspective on randomized linear algebra, showcasing its usage in estimating a density as a tensor-network from i.i.d. samples of a distribution, without the curse of dimensionality, and without the use of optimization techniques. Moreover, we illustrate how this concept can combine the strengths of particle and tensor-network methods for solving high-dimensional PDEs, resulting in enhanced flexibility for both approaches.

When: November 10th from 3:40pm-4:40pm

Where: LeConte 440

Speaker: Sangmin Park (Carnegie Mellon University)

Abstract: We study the space of probability measures equipped with the 2-sliced Wasserstein distance SW2, a projection-based variant of the Wasserstein distance with increasing popularity in statistics and machine learning due to computational efficiency especially in high dimensions. Using the language of the Radon transform, we examine the metric differential structure of the sliced Wasserstein space and the induced length space, and deduce that SW2 (and the associated length metric) behave very differently near absolutely continuous and discrete measures. We apply this discrepancy to demonstrate the lack of stability of gradient flows in the sliced Wasserstein (length) space. If time permits, we will also discuss the empirical estimation rate of absolutely continuous measures in the sliced Wasserstein length. This is a joint work with Dejan Slepcev.

When: October 6th from 3:40pm-4:40pm

Where: LeConte 440

Speaker: Jiajia Yu (Duke University)

Abstract: Mean-field games study the Nash Equilibrium in a non-cooperative game with infinitely many agents. Most existing works study solving the Nash Equilibrium with given cost functions. However, it is not always straightforward to obtain these cost functions. On the contrary, it is often possible to observe the Nash Equilibrium in real-world scenarios. In this talk, I will discuss a bilevel optimization approach for solving inverse mean-field game problems, i.e., identifying the cost functions that drive the observed Nash Equilibrium. With the bilevel formulation, we retain the essential characteristics of convex objective and linear constraint in the forward problem. This formulation permits us to solve the problem using a gradient-based optimization algorithm with a nice convergence guarantee. We focus on inverse mean-field games with unknown obstacles and unknown metrics and establish the numerical stability of these two inverse problems. In addition, we prove and numerically verify the unique identifiability for the inverse problem with unknown obstacles. This is a joint work with Quan Xiao (RPI), Rongjie Lai (Purdue) and Tianyi Chen (RPI).

When: September 29th from 3:40pm--4:40pm

Where: LeConte 440

Speaker: Qi Feng (Florida State University)

Abstract: In this talk, I will discuss long-time dynamical behaviors of Langevin dynamics, including Langevin dynamics on Lie groups and mean-field underdamped Langevin dynamics. We provide unified Hessian matrix conditions for different drift and diffusion coefficients. This matrix condition is derived from the dissipation of a selected Lyapunov functional, namely the auxiliary Fisher information functional. We verify the proposed matrix conditions in various examples. I will also talk about the application in distribution sampling and optimization. This talk is based on several joint works with Erhan Bayraktar and Wuchen Li.

When: September 22nd from 3:40pm--4:40pm

Where: LeConte 440 & Zoom (if possible, see link above)

Speaker: Guosheng Fu (University of Norte Dame)

Abstract: We design and compute first-order implicit-in-time variational schemes with high-order spatial discretization for initial value gradient flows in generalized optimal transport metric spaces. We first review some examples of gradient flows in generalized optimal transport spaces from the Onsager principle. We then use a one-step time relaxation optimization problem for time-implicit schemes, namely generalized Jordan-Kinderlehrer-Otto schemes. Their minimizing systems satisfy implicit-in-time schemes for initial value gradient flows with first-order time accuracy. We adopt the first-order optimization scheme ALG2 (Augmented Lagrangian method) and high-order finite element methods in spatial discretization to compute the one-step optimization problem. This allows us to derive the implicit-in-time update of initial value gradient flows iteratively. We remark that the iteration in ALG2 has a simple-to-implement point-wise update based on optimal transport and Onsager's activation functions. The proposed method is unconditionally stable for convex cases. Numerical examples are presented to demonstrate the effectiveness of the methods in two-dimensional PDEs, including Wasserstein gradient flows, Fisher--Kolmogorov-Petrovskii-Piskunov equation, and two and four species reversible reaction-diffusion systems. This is a joint work with Stanley Osher from UCLA and Wuchen Li from University of South Carolina.


When: September 1st from 2:30pm to 3:30pm

Where: LeConte  440

Speaker: Tianyi Lin (MIT)

Abstract: Reliable and multi-agent machine learning has seen tremendous achievements in recent years; yet, the translation from minimization models to min-max optimization models and/or variational inequality models --- two of the basic formulations for reliable and multi-agent machine learning --- is not straightforward. In fact, finding an optimal solution of either nonconvex-nonconcave min-max optimization models or nonmonotone variational inequality models is computationally intractable in general. Fortunately, there exist special structures in many application problems, allowing us to define reasonable optimality criterion and develop simple and provably efficient algorithmic schemes. In this talk, I will present the results on structure-driven algorithm design in reliable and multi-agent machine learning. More specifically, I explain why the nonconvex-concave min-max formulations make sense for reliable machine learning and show how to analyze the simple and widely used two-timescale gradient descent ascent by exploiting such special structure. I also show how a simple and intuitive adaptive scheme leads to a class of optimal second-order variational inequality methods. Finally, I discuss two future research directions for reliable and multi-agent machine learning with potential for significant practical impacts: reliable multi-agent learning and reliable topic modeling.

Notes: This is a joint talk with ACM seminar 

Previous Seminars

Abstract: In this talk, we construct a new Markov chain Monte Carlo method on finite states with optimal choices of acceptance-rejection ratio functions. We prove that the constructed continuous time Markov jumping process has a global in-time convergence rate in L1 distance. The convergence rate is no less than one-half and is independent of the target distribution. For example, our method recovers the Metropolis-Hastings (MH) algorithm on a two-point state. And it forms a new algorithm for sampling general target distributions. Numerical examples are presented to demonstrate the effectiveness of the proposed algorithm. This is based on a joint work with Linyuan Lu.

In this talk, I will discuss a family of traffic flow models. The classical Lighthill-Whitham-Richards model is known to have a finite time shock formation for all generic initial data, which represents the creation of traffic jams. I will introduce a family of nonlocal traffic flow models, with look-ahead interactions. These models can be derived from discrete cellular automata models.
We show an intriguing phenomenon that the nonlocal slowdown interactions prevent traffic jams, under suitable settings. This talk is based on joint works with Thomas Hamori, Yongki Lee and Yi Sun.

Abstract: Approximating high-dimensional functions is challenging due to the curse of dimensionality. In this talk, we will discuss the Dimension Reduction via Learning Level Sets for function approximations. The approach contains two major components: one is the pseudo-reversible neural network module that effectively transforms high-dimensional input variables to low-dimensional active variables, the other is the synthesized regression module for approximating function values based on the transformed data in the low-dimensional space. This is a joint work with Prof. Lili Ju and our graduate student Mr. Yuankai Teng, and Dr. Anthony Gruber (Sandia) and Dr. Guannan Zhang (ORNL). 

Abstract:  Anomalously diffusive transport, which exhibits power-law decaying behavior, occurs in many applications along with many other power-law processes. In this talk we will go over related modeling and analysis issues in comparison to normal Fickian diffusive transport that exhibits exponentially decaying behavior.  We will show why fractional calculus, in which the order of differentiation may be a function of space, time, the unknown variable, or even a distribution, provides an appropriate modeling tool to these problems than conventional integer-order models do.

Abstract:  Networks in ecology can take many forms, describing interactions between species, dispersal pathways between different habitat patches in space, or associations between different classes of species (e.g., host and parasite species). In this talk, we will explore the different uses and issues present in the analysis of ecological networks and the prediction of potentially missing links in networks. In doing so, we will identify some frontiers in which graph theory may be applied to ecological networks using existing data, model simulations, and laboratory experiments.

Abstract: This talk is about the intrinsic obstructions encountered when approximating or recovering functions of a large number of variables, commonly subsumed under the term “Curse of Dimensionality”. Problems of this type are ubiquitous in Uncertainty Quantification and machine learning. In particular, we highlight the role of deep neural networks  (DNNs) in this context. A new sparsity notion, namely compositional dimension sparsity, is introduced, which is shown to favor efficient approximation by DNNs. It is also indicated that this notion is suited for function classes comprised of solutions to operator equations. This is quantified for solution manifolds of parametric families of transport equations. We focus on this scenario because (i)  it cannot be treated well by currently known concepts and (ii) it has interesting ramifications for related more general settings.

Abstract: Solid tumors are heterogeneous in composition. Cancer stem cells (CSCs) are a highly tumorigenic cell type found in developmentally diverse tumors that are believed to be resistant to standard chemotherapeutic drugs and responsible for tumor recurrence. Thus understanding the tumor growth kinetics is critical for development of novel strategies for cancer treatment. For this talk, I shall introduce mathematical modeling to study Her2 signaling for the dynamical interaction between cancer stem cells (CSCs) and non-stem cancer cells, and our findings reveal that two negative feedback loops are critical in controlling the balance between the population of CSCs and that of non-stem cancer cells. Furthermore, the model with negative feedback suggests that over-expression of the oncogene HER2 leads to an increase of CSCs by regulating the division mode or proliferation rate of CSCs

We derive mean-field information Hessian matrices on finite graphs. The ``information'' refers to entropy functions on the probability simplex. And the ``mean-field" means nonlinear weight functions of probabilities supported on graphs. These two concepts define a mean-field optimal transport type metric. In this metric space, we first derive Hessian matrices of energies on graphs, including linear, interaction energies, entropies. We name their smallest eigenvalues as mean-field Ricci curvature bounds on graphs. We next provide examples on two-point spaces and graph products. We last present several applications of the proposed matrices. E.g., we prove discrete Costa's entropy power inequalities on a two-point space.


This talk is about the problem of learning an unknown function f from given data about f. The learning problem is to give an approximation f^ to f that predicts the values of f away

from the data. There are numerous settings for this learning problem depending on:

(i) what additional information we have about f (known as a model class assumption);

(ii) how we measure the accuracy of how well  f^ predicts f;

(iii) what is known about the data and data sites;

(iv) whether the data observations are polluted by noise.

A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, we show that a near optimal f^ can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this talk prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation f^ of the function f from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of f.  An extension of these results to the case where the data is polluted by additive deterministic noise is also given.

This is a joint research project with Andrea Bonito, Ronald DeVore, and Guergana Petrova from Texas A&M University.

State Estimation or Data Assimilation are about estimating ``physical states'' of interest from two sources of partial information: data produced by external sensors   and a (typically incomplete or uncalibrated) background model, given  in  terms of a partial differential equation.   In this talk we focus on states that ideally satisfy a parabolic equation with known right hand side but unknown initial values. Additional partial information is given in terms of data that represent the unknown state  in a subdomain of  the whole space-time cylinder up to a fixed time horizon. Recovering the state from this information is known to be a (mildly) ill-posed problem. Earlier contributions  employ mesh-dependent regularizations in a fully discrete setting, bypassing a continuous problem formulation. Other contributions, closer to the approach discussed in this talk, consider a regularized least squares formulation first on an
infinite-dimensional level. The essential difference in the present talk is that the least squares formulation exploits the “natural mapping properties” of the underlying forward problem. The main consequences delineating our results from previous work are:

(i) no excess regularity
are needed, thereby mitigating the level of ill-posedness;  

(ii) one obtains stronger a priori estimates that are  uniform with respect to the regularization parameter;
(iii) error estimates no longer require consistent data; (iv) one obtains rigorous computable a posteriori bounds that
provide stopping criteria for iterative solvers and  allow one to estimate data inconsistency and model bias.
The price is to deal with dual norms and their efficient evaluation. We sketch the main concepts and illustrate the results
by numerical experiments.


We present a systematic framework for Nesterov's accelerated gradient flows and Newton flows in the spaces of probabilities embedded with general information metrics. Here two metrics are considered, including the Fisher-Rao metric and the Wasserstein-2 metric. For the Wasserstein-2 metric case, we prove the convergence properties of the accelerated gradient flows and introduce their formulations in Gaussian families. Furthermore, we propose a practical discrete-time algorithm in particle implementations with an adaptive restart technique. Finally, we formulate a novel bandwidth selection method, which learns the Wasserstein-2 gradient direction from Brownian-motion samples. Experimental results, including Bayesian inference, show the strength of the current approach compared with the state-of-the-art. Finally, we discuss some further connections between inverse problems and data/neural network optimization techniques.


We present some concepts for the construction of nonlinear reduced deep neural network models for parameter dependent families of PDEs. The proposed methodology is based on combining stable variational formulations for the PDE models and regression concepts in machine learning. Central objectives concern:
- avoiding the Curse of Dimensionality in high parameter dimensionality regimes;
- rigorous accuracy quantification of resulting estimators, based on contriving variationally correct training risks that avoid variational crimes, often encountered with Physics Informed Neural Network (PINN) formulations.
We highlight the role of optimization strategies involving dynamic network expansion, currently in progress in our group at UofSC, and high-dimensional sparsity concepts.

Abstract: It was observed that many real-world networks such as the
Internet, social networks, biological networks, and Collaboration
graphs have the so-called power law degree distributions.
A graph is called a power law graph if the fraction of vertices with
degree k is approximately proportional to k^{-b} for
some constant b. The classical Erdos and Renyi random graph
model G(n,p) is not suitable for modeling these power law graphs.  
Many random graphs models are developed. Among these models, we
directly generalize G(n,p) into ``random graphs with given expected
degree sequences''. We considered several graph properties such as
the size and volume of the giant component, the average distance/the diameter,
and the spectra. Some theoretic results will be compared to real data.

Challenge the conventional. Create the exceptional. No Limits.