### 2023-2024 Academic Year

**Organized by: **Wuchen Li (wuchen@mailbox.sc.edu)

This page will be updated as new seminars are scheduled. Make sure to check back each week for information on upcoming seminars.

We will try to offer a virtual option via Zoom, as well as the regular in person option. The Zoom details are listed below:

**Zoom Link**: https://zoom.us/j/94297694178?pwd=cUs0dTZDeXhjVnN3S1ZIcVJ1RU1sUT09

**Meeting ID**: 942 9769 4178

**Passcode**: 488494

**When: **February 2^{nd} 2024 from 3:40pm-4:40pm

**Where:** LeConte 440

**Speaker: **Yuehaw Khoo, (University of Chicago)

**Abstract: **Tensor-network ansatz has long been employed to solve the high-dimensional Schrödinger
equation, demonstrating linear complexity scaling with respect to dimensionality.
Recently, this ansatz has found applications in various machine learning scenarios,
including supervised learning and generative modeling, where the data originates from
a random process. In this talk, we present a new perspective on randomized linear
algebra, showcasing its usage in estimating a density as a tensor-network from i.i.d.
samples of a distribution, without the curse of dimensionality, and without the use
of optimization techniques. Moreover, we illustrate how this concept can combine the
strengths of particle and tensor-network methods for solving high-dimensional PDEs,
resulting in enhanced flexibility for both approaches.

**When: **November 10^{th} from 3:40pm-4:40pm

**Where: **LeConte 440

**Speaker: **Sangmin Park (Carnegie Mellon University)

**Abstract: **We study the space of probability measures equipped with the 2-sliced Wasserstein
distance SW2, a projection-based variant of the Wasserstein distance with increasing
popularity in statistics and machine learning due to computational efficiency especially
in high dimensions. Using the language of the Radon transform, we examine the metric
differential structure of the sliced Wasserstein space and the induced length space,
and deduce that SW2 (and the associated length metric) behave very differently near
absolutely continuous and discrete measures. We apply this discrepancy to demonstrate
the lack of stability of gradient flows in the sliced Wasserstein (length) space.
If time permits, we will also discuss the empirical estimation rate of absolutely
continuous measures in the sliced Wasserstein length. This is a joint work with Dejan
Slepcev.

**When: **October 6^{th} from 3:40pm-4:40pm

**Where:** LeConte 440

**Speaker: **Jiajia Yu (Duke University)

**Abstract: **Mean-field games study the Nash Equilibrium in a non-cooperative game with infinitely
many agents. Most existing works study solving the Nash Equilibrium with given cost
functions. However, it is not always straightforward to obtain these cost functions.
On the contrary, it is often possible to observe the Nash Equilibrium in real-world
scenarios. In this talk, I will discuss a bilevel optimization approach for solving
inverse mean-field game problems, i.e., identifying the cost functions that drive
the observed Nash Equilibrium. With the bilevel formulation, we retain the essential
characteristics of convex objective and linear constraint in the forward problem.
This formulation permits us to solve the problem using a gradient-based optimization
algorithm with a nice convergence guarantee. We focus on inverse mean-field games
with unknown obstacles and unknown metrics and establish the numerical stability of
these two inverse problems. In addition, we prove and numerically verify the unique
identifiability for the inverse problem with unknown obstacles. This is a joint work
with Quan Xiao (RPI), Rongjie Lai (Purdue) and Tianyi Chen (RPI).

**When: **September 29^{th} from 3:40pm--4:40pm

**Where: **LeConte 440

**Speaker: **Qi Feng (Florida State University)

**Abstract: **In this talk, I will discuss long-time dynamical behaviors of Langevin dynamics, including
Langevin dynamics on Lie groups and mean-field underdamped Langevin dynamics. We provide
unified Hessian matrix conditions for different drift and diffusion coefficients.
This matrix condition is derived from the dissipation of a selected Lyapunov functional,
namely the auxiliary Fisher information functional. We verify the proposed matrix
conditions in various examples. I will also talk about the application in distribution
sampling and optimization. This talk is based on several joint works with Erhan Bayraktar
and Wuchen Li.

**When: **September 22^{nd} from 3:40pm--4:40pm

**Where:** LeConte 440 & Zoom (if possible, see link above)

**Speaker: **Guosheng Fu (University of Norte Dame)

**Abstract: **We design and compute first-order implicit-in-time variational schemes with high-order
spatial discretization for initial value gradient flows in generalized optimal transport
metric spaces. We first review some examples of gradient flows in generalized optimal
transport spaces from the Onsager principle. We then use a one-step time relaxation
optimization problem for time-implicit schemes, namely generalized Jordan-Kinderlehrer-Otto
schemes. Their minimizing systems satisfy implicit-in-time schemes for initial value
gradient flows with first-order time accuracy. We adopt the first-order optimization
scheme ALG2 (Augmented Lagrangian method) and high-order finite element methods in
spatial discretization to compute the one-step optimization problem. This allows us
to derive the implicit-in-time update of initial value gradient flows iteratively.
We remark that the iteration in ALG2 has a simple-to-implement point-wise update based
on optimal transport and Onsager's activation functions. The proposed method is unconditionally
stable for convex cases. Numerical examples are presented to demonstrate the effectiveness
of the methods in two-dimensional PDEs, including Wasserstein gradient flows, Fisher--Kolmogorov-Petrovskii-Piskunov
equation, and two and four species reversible reaction-diffusion systems. This is
a joint work with Stanley Osher from UCLA and Wuchen Li from University of South Carolina.

**When: **September 1^{st} from 2:30pm to 3:30pm

**Where: **LeConte 440

**Speaker: **Tianyi Lin (MIT)

**Abstract: **Reliable and multi-agent machine learning has seen tremendous achievements in recent
years; yet, the translation from minimization models to min-max optimization models
and/or variational inequality models --- two of the basic formulations for reliable
and multi-agent machine learning --- is not straightforward. In fact, finding an optimal
solution of either nonconvex-nonconcave min-max optimization models or nonmonotone
variational inequality models is computationally intractable in general. Fortunately,
there exist special structures in many application problems, allowing us to define
reasonable optimality criterion and develop simple and provably efficient algorithmic
schemes. In this talk, I will present the results on structure-driven algorithm design
in reliable and multi-agent machine learning. More specifically, I explain why the
nonconvex-concave min-max formulations make sense for reliable machine learning and
show how to analyze the simple and widely used two-timescale gradient descent ascent
by exploiting such special structure. I also show how a simple and intuitive adaptive
scheme leads to a class of optimal second-order variational inequality methods. Finally,
I discuss two future research directions for reliable and multi-agent machine learning
with potential for significant practical impacts: reliable multi-agent learning and
reliable topic modeling.

**Notes: **This is a joint talk with ACM seminar** **

### Previous Seminars

Abstract: In this talk, we construct a new Markov chain Monte Carlo method on finite
states with optimal choices of acceptance-rejection ratio functions. We prove that
the constructed continuous time Markov jumping process has a global in-time convergence
rate in L1 distance. The convergence rate is no less than one-half and is independent
of the target distribution. For example, our method recovers the Metropolis-Hastings
(MH) algorithm on a two-point state. And it forms a new algorithm for sampling general
target distributions. Numerical examples are presented to demonstrate the effectiveness
of the proposed algorithm. This is based on a joint work with Linyuan Lu.

In this talk, I will discuss a family of traffic flow models. The classical Lighthill-Whitham-Richards
model is known to have a finite time shock formation for all generic initial data,
which represents the creation of traffic jams. I will introduce a family of nonlocal
traffic flow models, with look-ahead interactions. These models can be derived from
discrete cellular automata models.

We show an intriguing phenomenon that the nonlocal slowdown interactions prevent traffic
jams, under suitable settings. This talk is based on joint works with Thomas Hamori,
Yongki Lee and Yi Sun.

Abstract: Approximating high-dimensional functions is challenging due to the curse of dimensionality. In this talk, we will discuss the Dimension Reduction via Learning Level Sets for function approximations. The approach contains two major components: one is the pseudo-reversible neural network module that effectively transforms high-dimensional input variables to low-dimensional active variables, the other is the synthesized regression module for approximating function values based on the transformed data in the low-dimensional space. This is a joint work with Prof. Lili Ju and our graduate student Mr. Yuankai Teng, and Dr. Anthony Gruber (Sandia) and Dr. Guannan Zhang (ORNL).

Abstract: Anomalously diffusive transport, which exhibits power-law decaying behavior, occurs in many applications along with many other power-law processes. In this talk we will go over related modeling and analysis issues in comparison to normal Fickian diffusive transport that exhibits exponentially decaying behavior. We will show why fractional calculus, in which the order of differentiation may be a function of space, time, the unknown variable, or even a distribution, provides an appropriate modeling tool to these problems than conventional integer-order models do.

Abstract: Networks in ecology can take many forms, describing interactions between species, dispersal pathways between different habitat patches in space, or associations between different classes of species (e.g., host and parasite species). In this talk, we will explore the different uses and issues present in the analysis of ecological networks and the prediction of potentially missing links in networks. In doing so, we will identify some frontiers in which graph theory may be applied to ecological networks using existing data, model simulations, and laboratory experiments.

Abstract: This talk is about the intrinsic obstructions encountered when approximating or recovering functions of a large number of variables, commonly subsumed under the term “Curse of Dimensionality”. Problems of this type are ubiquitous in Uncertainty Quantification and machine learning. In particular, we highlight the role of deep neural networks (DNNs) in this context. A new sparsity notion, namely compositional dimension sparsity, is introduced, which is shown to favor efficient approximation by DNNs. It is also indicated that this notion is suited for function classes comprised of solutions to operator equations. This is quantified for solution manifolds of parametric families of transport equations. We focus on this scenario because (i) it cannot be treated well by currently known concepts and (ii) it has interesting ramifications for related more general settings.

Abstract: Solid tumors are heterogeneous in composition. Cancer stem cells (CSCs) are a highly tumorigenic cell type found in developmentally diverse tumors that are believed to be resistant to standard chemotherapeutic drugs and responsible for tumor recurrence. Thus understanding the tumor growth kinetics is critical for development of novel strategies for cancer treatment. For this talk, I shall introduce mathematical modeling to study Her2 signaling for the dynamical interaction between cancer stem cells (CSCs) and non-stem cancer cells, and our findings reveal that two negative feedback loops are critical in controlling the balance between the population of CSCs and that of non-stem cancer cells. Furthermore, the model with negative feedback suggests that over-expression of the oncogene HER2 leads to an increase of CSCs by regulating the division mode or proliferation rate of CSCs

We derive mean-field information Hessian matrices on finite graphs. The ``information'' refers to entropy functions on the probability simplex. And the ``mean-field" means nonlinear weight functions of probabilities supported on graphs. These two concepts define a mean-field optimal transport type metric. In this metric space, we first derive Hessian matrices of energies on graphs, including linear, interaction energies, entropies. We name their smallest eigenvalues as mean-field Ricci curvature bounds on graphs. We next provide examples on two-point spaces and graph products. We last present several applications of the proposed matrices. E.g., we prove discrete Costa's entropy power inequalities on a two-point space.

Abstract:

This talk is about the problem of learning an unknown function f from given data about
f. The learning problem is to give an approximation f^{^} to f that predicts the values of f away

from the data. There are numerous settings for this learning problem depending on:

(i) what additional information we have about f (known as a model class assumption);

(ii) how we measure the accuracy of how well f^{^ }predicts f;

(iii) what is known about the data and data sites;

(iv) whether the data observations are polluted by noise.

A mathematical description of the optimal performance possible (the smallest possible
error of recovery) is known in the presence of a model class assumption. Under standard
model class assumptions, we show that a near optimal f^{^ }can be found by solving a certain discrete over-parameterized optimization problem
with a penalty term. Here, near optimal means that the error is bounded by a fixed
constant times the optimal error. This explains the advantage of over-parameterization
which is commonly used in modern machine learning. The main results of this talk prove
that over-parameterized learning with an appropriate loss function gives a near optimal
approximation f^{^ }of the function f from which the data is collected. Quantitative bounds are given
for how much over-parameterization needs to be employed and how the penalization needs
to be scaled in order to guarantee a near optimal recovery of f. An extension of
these results to the case where the data is polluted by additive deterministic noise
is also given.

This is a joint research project with Andrea Bonito, Ronald DeVore, and Guergana Petrova from Texas A&M University.

Abstract:

State Estimation or Data Assimilation are about estimating ``physical states'' of
interest from two sources of partial information: data produced by external sensors
and a (typically incomplete or uncalibrated) background model, given in terms
of a partial differential equation. In this talk we focus on states that ideally
satisfy a parabolic equation with known right hand side but unknown initial values.
Additional partial information is given in terms of data that represent the unknown
state in a subdomain of the whole space-time cylinder up to a fixed time horizon.
Recovering the state from this information is known to be a (mildly) ill-posed problem.
Earlier contributions employ mesh-dependent regularizations in a fully discrete setting,
bypassing a continuous problem formulation. Other contributions, closer to the approach discussed in this talk, consider a regularized
least squares formulation first on an

infinite-dimensional level. The essential difference in the present talk is that the least squares formulation
exploits the “natural mapping properties” of the underlying forward problem. The main consequences delineating our results from
previous work are:

(i) no excess regularity

are needed, thereby mitigating the level of ill-posedness;

(ii) one obtains stronger a priori estimates that are uniform with respect to the
regularization parameter;

(iii) error estimates no longer require consistent data; (iv) one obtains rigorous
computable a posteriori bounds that

provide stopping criteria for iterative solvers and allow one to estimate data inconsistency
and model bias.

The price is to deal with dual norms and their efficient evaluation. We sketch the
main concepts and illustrate the results

by numerical experiments.

Abstract:

We present a systematic framework for Nesterov's accelerated gradient flows and Newton flows in the spaces of probabilities embedded with general information metrics. Here two metrics are considered, including the Fisher-Rao metric and the Wasserstein-2 metric. For the Wasserstein-2 metric case, we prove the convergence properties of the accelerated gradient flows and introduce their formulations in Gaussian families. Furthermore, we propose a practical discrete-time algorithm in particle implementations with an adaptive restart technique. Finally, we formulate a novel bandwidth selection method, which learns the Wasserstein-2 gradient direction from Brownian-motion samples. Experimental results, including Bayesian inference, show the strength of the current approach compared with the state-of-the-art. Finally, we discuss some further connections between inverse problems and data/neural network optimization techniques.

Abstract:

Abstract: It was observed that many real-world networks such as the

Internet, social networks, biological networks, and Collaboration

graphs have the so-called power law degree distributions.

A graph is called a power law graph if the fraction of vertices with

degree k is approximately proportional to k^{-b} for

some constant b. The classical Erdos and Renyi random graph

model G(n,p) is not suitable for modeling these power law graphs.

Many random graphs models are developed. Among these models, we

directly generalize G(n,p) into ``random graphs with given expected

degree sequences''. We considered several graph properties such as

the size and volume of the giant component, the average distance/the diameter,

and the spectra. Some theoretic results will be compared to real data.