A footprint analysis answers the question: “How much of an environmental pressure (land, water, emissions…) caused in country A is ultimately driven by consumption in country B?”
Modern supply chains are global. Spain might import soybeans from Brazil, feed them to pigs, and export ham to France. A footprint analysis traces these flows backwards through the entire supply chain, so that the environmental cost of growing those soybeans is attributed to the French consumer who ate the ham — not to the Brazilian farmer who grew the beans.
This is done by building a Multi-Regional Input-Output (MRIO) model, a mathematical framework that captures all inter-industry and inter-country linkages in one system of equations.
The whep package implements this in three steps:
build_io_model() — Assemble the MRIO
tables from supply-use data, bilateral trade, and commodity balance
sheets.compute_leontief_inverse() — Compute
the Leontief inverse, which captures both direct and indirect supply
chain linkages.compute_footprint() — Multiply the
Leontief inverse by environmental extension data to trace pressures
through to final consumption.The methodology follows the FABIO model (Bruckner et al., 2019), which builds physical MRIO tables for the global food system.
Before diving into the code, here are the key concepts:
A supply table records how much of each product each process produces. A use table records how much of each product each process consumes as input. Together they describe the technology of the economy.
In whep, each process belongs to one of four process groups, which together cover all transformations in the food system:
| Process group | What it does | Inputs | Outputs |
|---|---|---|---|
crop_production |
Grow crops | Seed (from CBS) | Harvested crop + residues (straw, etc.) |
husbandry |
Raise livestock | Feed (crops, grass, etc.) | Live animals (heads) + non-slaughter products (milk, eggs, wool…) |
slaughtering |
Slaughter animals | Live animals (heads) | Meat, offals, fats, hides and skins |
processing |
Derive secondary products | Primary commodities (tonnes) | Processed products (oil, flour, sugar, etc.) |
Each process is uniquely identified by a
(proc_group, proc_cbs_code) pair.
Most flows are measured in tonnes, but live animals
are measured in heads (number of animals). This matches
the FABIO convention. Different rows of the Z matrix can
have different units because each row’s entries are only compared to
that row’s total output X[i]. The technical coefficients
\(A_{ij} = Z_{ij} / X_j\) are
dimensionless ratios, so the system remains mathematically
consistent.
For example, a slaughtering process for cattle:
The cattle heads and beef tonnes live in different rows of
Z, each with its own unit.
A bilateral trade matrix records, for each product, how many tonnes (or heads, for live animals) flow from each exporting country to each importing country. This is what connects domestic economies into a global system.
A commodity balance sheet is an accounting identity for each product in each country:
\[ \text{Production} + \text{Imports} = \text{Exports} + \text{Food} + \text{Feed} + \text{Other uses} + \Delta\text{Stocks} \]
The right-hand categories (food, feed, other uses) form the final demand — the end of the supply chain where products are consumed rather than being used as intermediate inputs.
For live animals (which FAO does not include in its CBS data), whep constructs synthetic CBS entries from primary production data:
processing (i.e. they enter the
slaughtering process).other_uses.Z is the core of the MRIO model. It is a square matrix
where each row and column represents a (country, product) pair.
Entry \(Z_{ij}\) gives the physical
flow from sector \(i\) to sector \(j\).
For example, if row 5 is (Spain, wheat) and column 12 is (Spain, flour), then \(Z_{5,12}\) is the amount of Spanish wheat used by Spanish flour mills.
The model constructs Z using the industry
technology assumption:
\[ Z = U_{MR} \times T \]
where:
Y is the final demand matrix. Rows are (country,
product) pairs (same as Z), and columns are
(country, demand category) pairs. Entry \(Y_{ij}\) tells us how much of product \(i\) goes to final demand category \(j\).
The demand categories are derived from the CBS columns present in the
data, typically: food, other_uses, and
stock_addition.
Like Z, the final demand is expanded with trade shares
so that, for example, food consumption of wheat in France can partly
originate from French wheat and partly from imported wheat.
X is a vector where \(X_i\) is the total output of sector \(i\). By construction:
\[ X = Z \cdot \mathbf{1} + Y \cdot \mathbf{1} \]
meaning total output equals intermediate deliveries plus final demand.
After the initial construction of Z and Y,
the model applies several corrections following the FABIO
methodology:
Leftover redistribution: When CBS reports
processing or seed use but the supply-use
table does not fully account for it, the unmatched quantity is
redistributed to the food category in Y. This
prevents demand from vanishing.
Loss endogenization (optional, via
endogenize_losses = TRUE): If the CBS contains a
losses column, losses are moved from Y to the
diagonal of Z. This treats losses as self-use within each
sector (the commodity is “consumed” by its own production process)
rather than as final demand.
Diagonal rebalancing: When \(\text{diag}(Z)_i \geq X_i\) (the diagonal accounts for all or more than total output, usually due to reporting errors), 80% of the diagonal value is moved to final demand and 20% is kept on the diagonal, proportionally distributed across the sector’s demand categories.
Negative output fixing: Any sectors with \(X_i < 0\) (which can arise from trade
data inconsistencies) are corrected by adjusting negative entries in
Y.
The Leontief inverse \(L = (I - A)^{-1}\) is the key analytical tool. The technical coefficients matrix \(A\) is defined as \(A_{ij} = Z_{ij} / X_j\), representing how much input from sector \(i\) is needed per unit of output from sector \(j\).
Before inversion, two corrections are applied:
The Leontief inverse captures the total requirements — both direct and indirect — needed to deliver one unit of final demand. Entry \(L_{ij}\) tells us: “To deliver one extra unit of product \(j\) to final consumers, how much total output does sector \(i\) need to produce (including all the intermediate steps)?”
This is what makes footprint tracing possible. A consumer buying bread in France triggers not just French bakeries, but also French mills, French wheat farmers, Brazilian soybean exporters (if feed is involved), and so on. The Leontief inverse encodes all of these ripple effects in a single matrix.
An environmental extension is any pressure you want to trace through the supply chain. Common examples:
You provide these as a vector \(e\) with one value per (country, product) sector.
The footprint matrix is computed as:
\[ FP = \hat{f} \cdot L \times Y \]
where \(\hat{f}\) is a diagonal matrix of extension intensities \(f_i = e_i / X_i\) (pressure per unit of output). Each entry \(FP_{ij}\) tells us: “How much environmental pressure occurring in sector \(i\) is driven by final demand category \(j\)?”
When fd_labels is provided, the footprint uses the
FABIO diagonal approach: for each final demand column
\(j\), the function computes \(MP \cdot \text{diag}(y_j) \cdot G\), where
\(G\) groups sectors by item. This
decomposes the footprint by both the target item being consumed and the
demand category (food, other uses, etc.).
The resulting tidy tibble decomposes this by origin country/product and target country/product, giving you a full bilateral footprint map.
The simplest call uses default arguments — the function fetches all input data internally:
This downloads supply-use tables, bilateral trade, and CBS data, then builds the IO model for all available years. To restrict to specific years:
If you want to inspect or pre-filter the input data, you can build them separately and pass them in:
supply_use <- build_supply_use()
bilateral_trade <- get_bilateral_trade()
cbs <- get_wide_cbs()
io <- build_io_model(supply_use, bilateral_trade, cbs, years = 2010)To endogenize losses (move them from final demand to the diagonal of
Z):
The result is a tibble with one row per year. Each row contains:
Z: The inter-industry flow matrix (sparse).Y: The final demand matrix (sparse).X: The total output vector.labels: A tibble mapping row/column indices to
area_code and item_cbs_code.fd_labels: A tibble mapping each Y column to its
area_code (consuming country) and fd_col
(demand category).You can inspect a single year:
# Extract matrices for one year
z_mat <- io$Z[[1]]
y_mat <- io$Y[[1]]
x_vec <- io$X[[1]]
labels <- io$labels[[1]]
fd_labels <- io$fd_labels[[1]]
selected_year <- io$year[[1]]
# Dimensions: (n_countries * n_products) x (n_countries * n_products)
dim(z_mat)
# Label mapping
labels
# Final demand column labels
fd_labelsFor small systems (< 5000 sectors), you can pre-compute L:
This inverts the \((I - A)\) matrix. Column sums of \(A\) are internally capped at 1 to prevent singularity, and negative entries are zeroed.
For systems larger than ~5000 sectors (e.g. 192 countries x 125+
products = 24 000 sectors), computing the explicit Leontief inverse
requires too much memory (e.g. 4.8 GiB for 25 000 sectors). In those
cases, pass z_mat directly to
compute_footprint(), which solves \((I - A)x = Y\) without ever materialising
the full L.
Now provide your environmental extension vector. This must have one
value per sector, in the same order as the labels. For land use, you can
use get_land_fp_production():
land_use_data <- get_land_fp_production() |>
dplyr::filter(year == selected_year) |>
dplyr::select(area_code, item_cbs_code, hectares = impact_u)
# Match it to the IO model's label order
extensions <- labels |>
dplyr::left_join(
land_use_data,
by = c("area_code", "item_cbs_code")
) |>
dplyr::mutate(
hectares = tidyr::replace_na(hectares, 0)
) |>
dplyr::pull(hectares)If you pre-computed L:
Alternatively, pass the Z matrix and compute_footprint()
will derive the Leontief inverse for you:
When fd_labels is provided (recommended), the result
includes target_item (the consumed product) and
target_fd (the demand category like "food" or
"other_uses"). This uses the FABIO diagonal decomposition
approach.
When fd_labels is omitted, columns of Y are treated as
individual sectors. This is appropriate only for square Y matrices and
does not produce a target_fd column.
The result is a tidy tibble with columns:
| Column | Meaning |
|---|---|
origin_area |
Country where the pressure physically occurs |
origin_item |
Product causing the pressure |
target_area |
Country whose consumption drives the pressure |
target_item |
Product being consumed |
target_fd |
Demand category (only with fd_labels) |
value |
Footprint in extension units (e.g. hectares) |
Because the output is a tidy tibble, you can use standard dplyr operations to answer questions:
Which country’s consumption drives the most land use?
footprint |>
dplyr::summarise(
total_land = sum(value), .by = target_area
) |>
dplyr::arrange(dplyr::desc(total_land))How much of Spain’s land use is driven by foreign consumption?
spain_code <- 203L
footprint |>
dplyr::filter(origin_area == spain_code) |>
dplyr::mutate(
destination = ifelse(
target_area == spain_code, "domestic", "exported"
)
) |>
dplyr::summarise(
total_land = sum(value), .by = destination
)What are the top bilateral flows?
footprint |>
dplyr::summarise(
total = sum(value),
.by = c(origin_area, target_area)
) |>
dplyr::arrange(dplyr::desc(total)) |>
head(20)Which products drive the most land use in food consumption?
A useful sanity check: the total footprint should equal the total extensions. All pressure must be attributed somewhere.
# These two numbers should be equal (up to floating-point tolerance)
sum(footprint$value)
sum(extensions)Small differences can arise from negative-output corrections or column-sum capping in the technical coefficients matrix.
Since build_io_model() returns one row per year, you can
loop over them:
io <- build_io_model(years = 2010:2013)
land <- get_land_fp_production()
all_footprints <- purrr::pmap_dfr(
list(
io$Z, io$X, io$Y, io$labels,
io$fd_labels, io$year
),
function(z, x, y, lab, fdl, yr) {
ext <- lab |>
dplyr::left_join(
land |>
dplyr::filter(year == yr) |>
dplyr::select(area_code, item_cbs_code, impact_u),
by = c("area_code", "item_cbs_code")
) |>
dplyr::mutate(impact_u = tidyr::replace_na(impact_u, 0)) |>
dplyr::pull(impact_u)
compute_footprint(
z_mat = z, x_vec = x, y_mat = y,
extensions = ext, labels = lab,
fd_labels = fdl
) |>
dplyr::mutate(year = yr)
}
)Since each year is independent, you can parallelise using
furrr:
library(furrr)
future::plan(multisession, workers = 4)
all_footprints <- furrr::future_pmap_dfr(
list(
io$Z, io$X, io$Y, io$labels,
io$fd_labels, io$year
),
function(z, x, y, lab, fdl, yr) {
ext <- lab |>
dplyr::left_join(
land |>
dplyr::filter(year == yr) |>
dplyr::select(area_code, item_cbs_code, impact_u),
by = c("area_code", "item_cbs_code")
) |>
dplyr::mutate(impact_u = tidyr::replace_na(impact_u, 0)) |>
dplyr::pull(impact_u)
compute_footprint(
z_mat = z, x_vec = x, y_mat = y,
extensions = ext, labels = lab,
fd_labels = fdl
) |>
dplyr::mutate(year = yr)
}
)| Step | Function | Input | Output |
|---|---|---|---|
| 1 | build_io_model() |
Supply-use, trade, CBS (all optional — fetched by default) | Z, Y, X, labels, fd_labels per year |
| 2 | compute_leontief_inverse() |
Z, X | Leontief inverse L (small systems only) |
| 3 | compute_footprint() |
L or Z, X, Y, extensions, labels, fd_labels | Tidy bilateral footprint |
Bruckner, M., Wood, R., Moran, D., Kuschnig, N., Wieland, H., Maus, V., & Borner, J. (2019). FABIO — The Construction of the Food and Agriculture Biomass Input-Output Model. Environmental Science & Technology, 53(19), 11302–11312. https://doi.org/10.1021/acs.est.8b03704