Awesome
LinSATNet
This is the official implementation of our ICML 2023 paper "LinSATNet: The Positive Linear Satisfiability Neural Networks".
With LinSATNet, you can enforce the satisfiability of general positive linear constraints to the output of neural networks.
LinSATNet supports sparse constraints starting v0.1.1
! The forward and backward should be identical to the dense
version, but the sparse version is expected to be more efficient in time and GPU memory. Upgrade by
pip install --upgrade linsatnet
The LinSAT layer is fully differentiable, and the gradients are exactly computed. Our implementation now supports PyTorch.
You can install it by
pip install linsatnet
And get started by
from LinSATNet import linsat_layer
Table of contents
- LinSATNet
A Quick Example
There is a quick example if you run LinSATNet/linsat.py
directly. In this
example, the doubly-stochastic constraint is enforced for 3x3 variables.
To run the example, first clone the repo:
git clone https://github.com/Thinklab-SJTU/LinSATNet.git
Go into the repo, and run the example code:
cd LinSATNet
python LinSATNet/linsat.py
In this example, we try to enforce doubly-stochastic constraint to a 3x3 matrix. The doubly-stochastic constraint means that all rows and columns of the matrix should sum to 1.
The 3x3 matrix is flattened into a vector, and the following positive linear constraints are considered (for $\mathbf{E}\mathbf{x}=\mathbf{f}$):
E = torch.tensor(
[[1, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1],
[1, 0, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 1]], dtype=torch.float32
)
f = torch.tensor([1, 1, 1, 1, 1, 1], dtype=torch.float32)
We randomly init w
and regard it as the output of some neural networks:
w = torch.rand(9) # w could be the output of neural network
w = w.requires_grad_(True)
We also have a "ground-truth target" for the output of linsat_layer
, which
is a diagonal matrix in this example:
x_gt = torch.tensor(
[1, 0, 0,
0, 1, 0,
0, 0, 1], dtype=torch.float32
)
The forward/backward passes of LinSAT follow the standard PyTorch style and are readily integrated into existing deep learning pipelines.
The forward pass:
linsat_outp = linsat_layer(w, E=E, f=f, tau=0.1, max_iter=10, dummy_val=0)
The backward pass:
loss = ((linsat_outp - x_gt) ** 2).sum()
loss.backward()
You can also set E
as a sparse matrix to improve the time & memory efficiency
(especially for large-sized input). Here is a dumb example (consider to construct
E in sparse for the best efficiency):
linsat_outp = linsat_layer(w, E=E.to_sparse(), f=f, tau=0.1, max_iter=10, dummy_val=0)
We can also do gradient-based optimization over w
to make the output of
linsat_layer
closer to x_gt
. This is what happens when you train a
neural network.
niters = 10
opt = torch.optim.SGD([w], lr=0.1, momentum=0.9)
for i in range(niters):
x = linsat_layer(w, E=E, f=f, tau=0.1, max_iter=10, dummy_val=0)
cv = torch.matmul(E, x.t()).t() - f.unsqueeze(0)
loss = ((x - x_gt) ** 2).sum()
loss.backward()
opt.step()
opt.zero_grad()
print(f'{i}/{niters}\n'
f' underlying obj={torch.sum(w * x)},\n'
f' loss={loss},\n'
f' sum(constraint violation)={torch.sum(cv[cv > 0])},\n'
f' x={x},\n'
f' constraint violation={cv}')
And you are likely to see the loss decreasing during the training steps.
API Reference
To use LinSATNet in your own project, make sure you have the package installed:
pip install linsatnet
and import the pacakge at the beginning of your code:
from LinSATNet import linsat_layer, init_constraints
The linsat_layer
function
LinSATNet.linsat_layer(x, A=None, b=None, C=None, d=None, E=None, f=None, constr_dict=None, tau=0.05, max_iter=100, dummy_val=0, mode='v2', grouped=True, no_warning=False) [source]
LinSAT layer enforces positive linear constraints to the input x
and
projects it with the constraints
$$\mathbf{A} \mathbf{x} <= \mathbf{b}, \mathbf{C} \mathbf{x} >= \mathbf{d}, \mathbf{E} \mathbf{x} = \mathbf{f}$$
and all elements in $\mathbf{A}, \mathbf{b}, \mathbf{C}, \mathbf{d}, \mathbf{E}, \mathbf{f}$ must be non-negative.
Parameters:
x
: PyTorch tensor of size ($n_v$), it can optionally have a batch size ($b \times n_v$)A
,C
,E
: PyTorch tensor of size ($n_c \times n_v$), constraint matrix on the left hand sideb
,d
,f
: PyTorch tensor of size ($n_c$), constraint vector on the right hand sideconstr_dict
: a dictionary with initialized constraint information, which is the output of the functionLinSATNet.init_constraints
. Specifying this variable could avoid re-initializing the constraints for the same constraints and improve the efficiencytau
: (default=0.05
) parameter to control the discreteness of the projection. Smaller value leads to more discrete (harder) results, larger value leads to more continuous (softer) results.max_iter
: (default=100
) max number of iterationsdummy_val
: (default=0
) the value of dummy variables appended to the input vectormode
: (default='v2'
) LinSAT kernel implementation.v1
is the one came with the ICML paper,v2
is the improved version with (usually) better efficiencygrouped
: (default=True
) group non-overlapping constraints in one operation for better efficiencyno_warning
: (default=False
) turn off warning message
return: PyTorch tensor of size ($n_v$) or ($b \times n_v$), the projected variables
Notations:
- $b$ means the batch size.
- $n_c$ means the number of constraints ($\mathbf{A}$, $\mathbf{C}$, $\mathbf{E}$ may have different $n_c$)
- $n_v$ means the number of variables
Some practical notes
- You must ensure that your input constraints have a non-empty feasible space.
Otherwise,
linsat_layer
will not converge. It is also worth noting thatx
is in the range of[0, 1]
, and you may add a multiplier to scale it. - You may tune the value of
tau
for your specific tasks. Monitor the output of LinSAT so that the "smoothness" of the output meets your task. Reasonable choices oftau
may range from1e-4
to100
in our experience. - Be careful of potential numerical issues. Sometimes
A x <= 1
does not work, butA x <= 0.999
works. - The input vector
x
may have a batch dimension, but the constraints can not have a batch dimension. The constraints should be consistent for all data in one batch. - Input constraints as sparse tensors can usually help you save GPU memory. When
working with sparse constraints,
A
,C
,E
should betorch.sparse_coo_tensor
, andb
,d
,f
should be dense tensors.
How it works?
Here we introduce the mechanism inside LinSAT. It works by extending the Sinkhorn algorithm to multiple sets of marginals (to our best knowledge, we are the first to study Sinkhorn with multi-sets of marginals). The positive linear constraints are then enforced by transforming the constraints into marginals. For more details and formal proofs, please refer to our paper.
Classic Sinkhorn with single-set marginals
Let's start with the classic Sinkhorn algorithm. Given non-negative score matrix
$\mathbf{S}\in\mathbb{R}_{\geq 0}^{m\times n}
$ and a set of marginal
distributions on rows $\mathbf{v}\in \mathbb{R}_{\geq 0}^m
$ and columns
$\mathbf{u} \in \mathbb{R}_{\geq 0}^n
$, where
$$\sum_{i=1}^m v_i = \sum_{j=1}^n u_j = h,$$
the Sinkhorn algorithm outputs a normalized matrix
$\mathbf{\Gamma}\in[0,1]^{m\times n}
$ so that
$$\sum_{i=1}^m \Gamma_{i,j}u_{j}=u_j, \sum_{j=1}^n \Gamma_{i,j}u_{j}=v_i.$$
Conceptually, $\Gamma_{i,j}
$ means the proportion of $u_j
$ moved to $v_i
$.
If you are seeing the math formulas not rendered correctly, it is an issue of github. Please refer to our main paper for better view.
The algorithm steps are:
Initialize $\Gamma_{i,j}=\frac{s_{i,j}}{\sum_{i=1}^m s_{i,j}}
$
$\quad
$repeat:
$\qquad{\Gamma}_{i,j}^{\prime} = \frac{{\Gamma}_{i,j}v_{i}}{\sum_{j=1}^n {\Gamma}_{i,j}u_{j}}
$; $\triangleright
$ normalize w.r.t. $\mathbf{v}
$
$\qquad{\Gamma}_{i,j} = \frac{{\Gamma}_{i,j}^{\prime}u_{j}}{\sum_{i=1}^m {\Gamma}_{i,j}^{\prime}u_{j}}
$; $\triangleright
$ normalize w.r.t. $\mathbf{u}
$
$\quad
$until convergence.
Note that the above formulation is modified from the conventional Sinkhorn formulation. $
\Gamma_{i,j}u_j
$ is equivalent to the elements in the "transport" matrix in papers such as (Cuturi 2013). We prefer this new formulation as it generalizes smoothly to Sinkhorn with multi-set marginals in the following.To make a clearer comparison, the transportation matrix in (Cuturi 2013) is $
\mathbf{P}\in\mathbb{R}_{\geq 0}^{m\times n}
$, and the constraints are $$\sum_{i=1}^m P_{i,j}=u_{j},\quad \sum_{j=1}^n P_{i,j}=v_{i}$$ $P_{i,j}
$ means the exact mass moved from $u_{j}
$ to $v_{i}
$.The algorithm steps are:
Initialize $
\Gamma_{i,j}=\frac{s_{i,j}}{\sum_{i=1}^m s_{i,j}}
$$
\quad
$repeat:$
\qquad{P}_{i,j}^{\prime} = \frac{P_{i,j}v_{i}}{\sum_{j=1}^n {P}_{i,j}}
$; $\triangleright
$ normalize w.r.t. $\mathbf{v}
$$
\qquad{P}_{i,j} = \frac{{P}_{i,j}^{\prime}u_j}{\sum_{i=1}^m {P}_{i,j}^{\prime}}
$; $\triangleright
$ normalize w.r.t. $\mathbf{u}
$$
\quad
$until convergence.
Extended Sinkhorn with multi-set marginals
We discover that the Sinkhorn algorithm can generalize to multiple sets of marginals.
Recall that $\Gamma_{i,j}\in[0,1]
$ means the proportion of $u_i
$ moved to
$v_j
$. Interestingly, it yields the same formulation if we simply replace
$\mathbf{u},\mathbf{v}
$ by another set of marginal distributions, suggesting
the potential of extending the Sinkhorn algorithm to multiple sets of marginal
distributions. Denote that there are $k$ sets of marginal distributions that are
jointly enforced to fit more complicated real-world scenarios. The sets of
marginal distributions are
$\mathbf{u}_\eta\in \mathbb{R}_{\geq 0}^n, \mathbf{v}_\eta\in \mathbb{R}_{\geq 0}^m
$,
and we have:
$$\forall \eta\in {1, \cdots,k}: \sum_{i=1}^m v_{\eta,i}=\sum_{j=1}^n u_{\eta,j}=h_\eta.$$
It assumes the existence of a normalized $\mathbf{Z} \in [0,1]^{m\times n}
$ s.t.
$$\forall \eta\in {1,\cdots, k}: \sum_{i=1}^m z_{i,j} u_{\eta,j}=u_{\eta,j}, \sum_{j=1}^n z_{i,j} u_{\eta,j}=v_{\eta,i}$$
i.e., the multiple sets of marginal distributions have a non-empty feasible
region (you may understand the meaning of "non-empty feasible region" after
reading the next section about how to handle positive linear constraints).
Multiple sets of marginal distributions could be jointly enforced by traversing
the Sinkhorn iterations over $k$ sets of marginal distributions.
The algorithm steps are:
Initialize $\Gamma_{i,j}=\frac{s_{i,j}}{\sum_{i=1}^m s_{i,j}}
$
$\quad
$repeat:
$\qquad
$for $\eta=1
$ to $k$ do
$\quad\qquad{\Gamma}_{i,j}^{\prime} = \frac{{\Gamma}_{i,j}v_{\eta,i}}{\sum_{j=1}^n {\Gamma}_{i,j}u_{\eta,j}}
$; $\triangleright
$ normalize w.r.t. $\mathbf{v}_\eta
$
$\quad\qquad{\Gamma}_{i,j} = \frac{{\Gamma}_{i,j}^{\prime}u_{\eta,j}}{\sum_{i=1}^m {\Gamma}_{i,j}^{\prime}u_{\eta,j}}
$; $\triangleright
$ normalize w.r.t. $\mathbf{u}_\eta
$
$\qquad
$end for
$\quad
$until convergence.
In our paper, we prove that the Sinkhorn algorithm for multi-set marginals shares the same convergence pattern with the classic Sinkhorn, and its underlying formulation is also similar to the classic Sinkhorn.
Transforming positive linear constraints into marginals
Then we show how to transform the positive linear constraints into marginals, which are handled by our proposed multi-set Sinkhorn.
Encoding neural network's output
For an $l$-length vector denoted as $\mathbf{y}
$ (which can be the output of a
neural network, also it is the input to linsat_layer
), the following matrix
is built
$\mathbf{W} = {y}_1 \quad {y}_2 \quad ... \quad {y}_l \quad \beta
$
$\qquad \ \ \beta \ \quad \beta \ \quad ... \quad \ \beta \quad \ \beta
$
where $\mathbf{W}
$ is of size $2 \times (l+1)
$, and $\beta
$ is the dummy
variable, the default is $\beta=0
$. $\mathbf{y}
$ is put at the upper-left
region of $\mathbf{W}
$. The entropic regularizer is then enforced to control
discreteness and handle potential negative inputs:
$$\mathbf{S} = \exp \left(\frac{\mathbf{W}}{\tau}\right).$$
The score matrix $\mathbf{S}
$ is taken as the input of Sinkhorn for multi-set
marginals.
From linear constraints to marginals
- Packing constraint $
\mathbf{A}\mathbf{x}\leq \mathbf{b}
$. Assuming that there is only one constraint, we rewrite the constraint as $$\sum_{i=1}^l a_ix_i \leq b.$$ Following the "transportation" view of Sinkhorn, the output $\mathbf{x}
$ moves at most $b
$ unit of mass from $a_1, a_2, \cdots, a_l
$, and the dummy dimension allows the inequality by moving mass from the dummy dimension. It is also ensured that the sum of $\mathbf{u}_p
$ equals the sum of $\mathbf{v}_p
$. The marginal distributions are defined as
\mathbf{u}_p = \underbrace{\left[a_1 \quad a_2 \quad ...\quad a_l \quad b\right]}_{l \text{ dims}+1 \text{ dummy dim}}, \quad \mathbf{v}_p^\top = \left[b \quad \sum_{i=1}^l a_i \right]
- Covering constraint $
\mathbf{C}\mathbf{x}\geq \mathbf{d}
$. Assuming that there is only one constraint, we rewrite the constraint as $$\sum_{i=1}^l c_ix_i\geq d.$$ We introduce the multiplier $$\gamma=\left\lfloor\sum_{i=1}^lc_i / d \right\rfloor$$ because we always have $$\sum_{i=1}^l c_i \geq d$$ (else the constraint is infeasible), and we cannot reach the feasible solution where all elements in $\mathbf{x}
$ are 1s without this multiplier. Our formulation ensures that at least $d
$ unit of mass is moved from $c_1, c_2, \cdots, c_l
$ by $\mathbf{x}
$, thus representing the covering constraint of "greater than". It is also ensured that the sum of $\mathbf{u}_c
$ equals the sum of $\mathbf{v}_c
$. The marginal distributions are defined as
\mathbf{u}_c = \underbrace{\left[c_1 \quad c_2 \quad ...\quad c_l \quad \gamma d\right]}_{l \text{ dims} + 1 \text{ dummy dim}}, \quad
\mathbf{v}_c^\top = \left[ (\gamma+1) d \quad \sum_{i=1}^l c_i - d \right]
- Equality constraint $
\mathbf{E}\mathbf{x}= \mathbf{f}
$. Representing the equality constraint is more straightforward. Assuming that there is only one constraint, we rewrite the constraint as $$\sum_{i=1}^l e_ix_i= f.$$ The output $\mathbf{x}
$ moves $e_1, e_2, \cdots, e_l
$ to $f
$, and we need no dummy element in $\mathbf{u}_e
$ because it is an equality constraint. It is also ensured that the sum of $\mathbf{u}_e
$ equals the sum of $\mathbf{v}_e
$. The marginal distributions are defined as
\mathbf{u}_e = \underbrace{\left[e_1 \quad e_2 \quad ...\quad e_l \quad 0\right]}_{l \text{ dims} + \text{dummy dim}=0}, \quad
\mathbf{v}_e^\top = \left[f \quad \sum_{i=1}^l e_i - f \right]
After encoding all constraints and stacking them as multiple sets of marginals, we can call the Sinkhorn algorithm for multi-set marginals to enforce the constraints.
More Complicated Use Cases (appeared in our paper)
I. Neural Solver for Traveling Salesman Problem with Extra Constraints
The Traveling Salesman Problem (TSP) is a classic NP-hard problem. The standard TSP aims at finding a cycle visiting all cities with minimal length, and developing neural solvers for TSP receives increasing interests. Beyond standard TSP, here we develop a neural solver for TSP with extra constraints using LinSAT layer.
Contributing author: Yunhao Zhang
To run the TSP experiment, please follow the code and instructions in
TSP_exp/
.
II. Partial Graph Matching with Outliers on Both Sides
Standard graph matching (GM) assumes an outlier-free setting namely bijective mapping. One-shot GM neural networks (Wang et al., 2022) effectively enforce the satisfiability of one-to-one matching constraint by single-set Sinkhorn. Partial GM refers to the realistic case with outliers on both sides so that only a partial set of nodes are matched. There lacks a principled approach to enforce matching constraints for partial GM. The main challenge for existing GM networks is that they cannot discard outliers because the single-set Sinkhorn is outlier-agnostic and tends to match as many nodes as possible. The only exception is BBGM (Rolinek et al., 2020) which incorporates a traditional solver that can reject outliers, yet its performance still has room for improvement.
Contributing author: Ziao Guo
To run the GM experiment, please follow the code and instructions in ThinkMatch/LinSAT.
III. Portfolio Allocation
Predictive portfolio allocation is the process of selecting the best asset allocation based on predictions of future financial markets. The goal is to design an allocation plan to best trade-off between the return and the potential risk (i.e. the volatility). In an allocation plan, each asset is assigned a non-negative weight and all weights should sum to 1. Existing learning-based methods (Zhang et al., 2020), (Butler et al., 2021) only consider the sum-to-one constraint without introducing personal preference or expert knowledge. In contrast, we achieve such flexibility for the target portfolio via positive linear constraints: a mix of covering and equality constraints, which is widely considered for its real-world demand.
Contributing author: Tianyi Chen
To run the portfolio experiment, please follow the code and instructions in
portfolio_exp/
.
Citation
If you find our paper/code useful in your research, please cite
@inproceedings{WangICML23,
title={{LinSATNet}: The Positive Linear Satisfiability Neural Networks},
author={Wang, Runzhong and Zhang, Yunhao and Guo, Ziao and Chen, Tianyi and Yang, Xiaokang and Yan, Junchi},
booktitle={International Conference on Machine Learning (ICML)},
year={2023}
}