---
title: "Sparse derivatives with sparsediff"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Sparse derivatives with sparsediff}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## What sparsediff is

**sparsediff** is a thin R interface to the
[SparseDiffEngine](https://github.com/SparseDifferentiation/SparseDiffEngine)
C library --- the sparse Jacobian and Hessian differentiation backend used by
CVXPY for its Disciplined Nonlinear Programming (DNLP) extension. It is the R
analog of the Python `sparsediffpy` package and wraps the same C library.

You build a nonlinear **expression graph** out of the `sd_*` atom constructors,
assemble the pieces into a **problem**, and then evaluate, at any primal point,

* the objective value and its dense **gradient**,
* the constraint vector and its sparse **Jacobian** (in COO form), and
* the lower-triangular **Lagrangian Hessian** (in COO form).

This is a low-level backend: the intended user is a higher-level modelling layer
such as **CVXR**, not an analyst writing models by hand. The walk-through below
shows the moving parts.

```{r setup}
library(sparsediff)
engine_version()
```

## Building an expression graph

Leaves are created with `sd_variable()` (a slice of the differentiation vector)
and `sd_parameter()` (fixed data). Everything else is an atom applied to other
expressions. Each constructor returns an **expression handle** --- an external
pointer into the engine's graph that frees itself when garbage collected.

Here is `f(x) = sum(exp(x))` for a length-3 variable, with a single linear
constraint `g(x) = sum(x)`:

```{r build}
n   <- 3L
x   <- sd_variable(d1 = n, d2 = 1L, var_id = 0L, n_vars = n)  # 0-based var offset
obj <- sd_sum(sd_exp(x), axis = -1L)                          # sum(exp(x))  (axis -1 = all)
g1  <- sd_sum(x, axis = -1L)                                  # sum(x)
```

`var_id` is the 0-based, column-major offset of the variable's first entry in
the flat primal vector; `n_vars` is the total number of variables. Reduction
`axis` follows the engine convention: `-1` (all entries), `0` (down rows),
`1` (across columns).

## Assembling a problem and its derivative oracle

```{r problem}
prob <- sd_problem(objective = obj, constraints = list(g1), verbose = FALSE)

# initialise the (structural) derivative layout once
sd_init_derivatives(prob)
sd_init_jacobian_coo(prob)
sd_init_hessian_coo(prob)
```

Sparsity patterns are structural and fixed after these `sd_init_*` calls, so you
query them once and reuse them; only the *values* are recomputed at each point.

## Evaluating value and derivatives

Evaluation is ordered: a **forward pass** populates the node values at a primal
point `u`, after which the gradient, Jacobian values and Hessian values can be
read.

```{r evaluate}
u <- c(0, 0.5, 1)

sd_objective_forward(prob, u)   # value of sum(exp(x))
sd_gradient(prob)               # gradient = exp(u)

sd_constraint_forward(prob, u)  # value of sum(x)
```

The constraint Jacobian comes back in COO form (0-based row/column indices) plus
the matching nonzero values:

```{r jacobian}
sd_jacobian_sparsity(prob)      # rows / cols / nrow / ncol
sd_jacobian_values(prob)        # d/dx sum(x) = (1, 1, 1)
```

The Lagrangian Hessian is the lower triangle of
`obj_w * Hess(f) + sum_i w[i] * Hess(g_i)`. Here the objective weight is 1 and
the constraint is linear (zero Hessian), so the result is `diag(exp(u))`:

```{r hessian}
sd_hessian_sparsity(prob)
sd_hessian_values(prob, obj_w = 1, w = 0)
```

## Parameters and fast re-evaluation

A `sd_parameter()` is fixed data that can be **updated between evaluations**
without rebuilding the graph --- the basis for CVXPY/CVXR's Disciplined
Parametrized Programming (DPP) fast path. Register the parameters once, then push
new values with `sd_update_params()`:

```{r parameters}
p    <- sd_parameter(d1 = 1L, d2 = 1L, param_id = 0L, n_vars = n, values = 2)
obj2 <- sd_sum(sd_scalar_mult(p, sd_exp(x)), axis = -1L)      # theta * sum(exp(x))
prob2 <- sd_problem(obj2, constraints = list(), verbose = FALSE)
sd_register_params(prob2, list(p))
sd_init_derivatives(prob2)

sd_objective_forward(prob2, u)        # theta = 2
sd_update_params(prob2, 3)            # theta -> 3
sd_objective_forward(prob2, u)        # re-evaluated, no rebuild
```

## Where to go next

The full atom catalogue is grouped in the reference index: leaves, elementwise,
affine/shape, bivariate, product-reduction, and parameter/constant-matrix atoms.
For modelling at a higher level, see **CVXR**, which uses sparsediff as the
derivative backend for its DNLP solver path.