On-Demand Value at Risk Platform at Citi

On-demand Recompute portfolio VaR for any given day, on demand

Map-Reduce Embarrassingly parallel processing using Spark, Hadoop, Parquet

Multi-dimensional Aggregate by portfolio, instrument, client, account, trade, desk, book, legal entity, counterparty

Cloud-native Designed to replace legacy grid computing on TCO

The Business Problem

Value at Risk is one of the most data-intensive computations in modern banking. Computing VaR for a portfolio at a given confidence level and time horizon requires running valuations across millions of market scenarios, against thousands of trades, across multiple time intervals. The combinatorial math is enormous. A single day's VaR computation for a major trading book can involve billions of valuation operations.

Traditional VaR platforms at large banks are built on grid computing infrastructure that was state-of-the-art when it was deployed, often more than a decade ago. The architectural assumptions of those platforms (fixed compute capacity, batch processing windows, single-day result granularity, limited scenario flexibility) are increasingly out of step with how risk teams want to use the data.

For Citi, the limitations of the existing VaR platform were creating three concrete problems:

Limited recomputation flexibility. The legacy platform produced VaR results on a fixed daily cadence. Recomputing VaR for a historical day, or for a variant of a scenario, required jobs that took hours and competed for grid capacity with the daily run.
No easy comparison across days or scenarios. Risk analysts wanted to compare portfolio VaR between two days, or between two scenarios on the same day, to understand what was driving changes in risk. The existing platform made this kind of comparative analysis cumbersome.
Total cost of ownership. The legacy grid infrastructure was expensive to operate and difficult to scale. As risk computation demands grew, the cost trajectory was unsustainable.

Citi engaged airisDATA to build an On-Demand VaR platform: a distributed, massively parallel processing system designed to compute portfolio VaR on demand for any given day, compare VaR across any two days, and analyse portfolio behaviour between scenarios. The success criteria were explicit. The new platform had to outperform existing grid computing solutions in performance, scale, and total cost of ownership.

The Solution

The On-Demand VaR platform is structured as a Map-Reduce architecture on Spark and Hadoop, with Parquet as the storage format. The design takes advantage of the fact that VaR computation is an embarrassingly parallel processing problem: the valuations across scenarios and trades are largely independent and can be parallelised aggressively.

Process Flow

The platform processes risk in three modules.

Risk Scenario Module. Generates the scenarios that drive the VaR computation. Sources include market scenarios (defined parametrically or empirically), shock scenarios for stress testing, and Monte Carlo simulations at configurable confidence levels. Each scenario is fully specified across the relevant market data dimensions: credit spreads, volumes, rates, equity prices, FX, and other risk factors.
Valuation Module. For each scenario, computes pricing and risk per trade. The valuation engine handles the standard risk computations (CVA, RWA) and produces per-trade results that can be aggregated along multiple dimensions downstream. Reference data and market data feed the valuation through the platform's data layer.
Aggregation Module. Aggregates per-trade valuations along the dimensions that matter to the consuming analyst: by portfolio, by instrument, by client, by account, by trade, by desk, by book, by legal entity, by counterparty. The aggregation layer is where the analytical flexibility lives. The same underlying valuations can be sliced different ways without re-running the computation.

Map-Reduce Pattern

The Map step runs valuation per scenario in parallel, distributing the work across the Spark cluster. The Reduce step aggregates valuations by criteria, producing the consumable risk metrics. Storage and caching of scenario valuations means that subsequent queries against the same scenario set do not require re-computation. This is the pattern that delivers the on-demand recomputation capability.

What "On-Demand" Actually Means

The platform delivers three specific capabilities that the legacy grid did not:

Recompute portfolio VaR for any given day. Risk analysts can specify a date and get VaR results for that date without waiting for a batch job. This includes historical dates where the underlying market data is preserved.
Compare VaR across any two days. The platform can compute VaR for two dates and surface the deltas, helping analysts understand what changed in the portfolio risk profile between dates. The change can be decomposed by dimension (which counterparties drove the change, which instruments, which desks).
Analyse portfolio behaviour between scenarios. For a given portfolio on a given day, the platform can run multiple scenarios and let the analyst compare portfolio behaviour. This is the foundation for stress testing, what-if analysis, and scenario-based capital planning work.

Architecture Highlights

Distributed processing: Spark for the parallel valuation work, Hadoop for the underlying distributed file system
Storage format: Parquet (columnar, compressed, optimised for the analytical query patterns the platform serves)
Risk inputs: Reference data, market data (credit spreads, volumes, rates), scenario definitions
Time dimension: Risk computed across T0 through Tn time intervals, allowing risk-over-time analysis across the future horizon
Aggregation flexibility: Per-trade valuations aggregated on demand by any combination of portfolio, instrument, client, account, trade, desk, book, legal entity, counterparty

The combination of distributed computation, columnar storage, and cached scenario valuations is what produces the performance and TCO improvement over legacy grid computing. Spark workloads scale elastically with the underlying cluster, and Parquet's columnar layout makes the aggregation queries (which typically read a small subset of columns across many rows) substantially cheaper than the row-oriented storage in the legacy platform.

The Results

The On-Demand VaR platform delivered against all three success criteria.

Performance. The Map-Reduce architecture parallelised the valuation work aggressively, producing VaR results for trading book portfolios faster than the legacy grid. The combination of distributed computation, columnar storage, and scenario caching meant that subsequent queries against the same scenario set ran near-instantly rather than re-computing from scratch.
Scale. The platform handles the combinatorial math of millions of scenarios across thousands of trades across multiple time intervals without the rigid capacity ceiling of the legacy grid. Scaling up is a question of adding compute to the Spark cluster, not provisioning new dedicated grid hardware.
Total cost of ownership. Distributed cloud-native infrastructure replaced expensive dedicated grid computing capacity, reducing the underlying infrastructure cost while improving capacity flexibility. The shift away from fixed grid hardware also eliminated the procurement and maintenance overhead associated with the legacy platform.

Beyond the headline performance and TCO results, the platform unlocked analytical capabilities that were not practical on the legacy infrastructure:

On-demand historical recomputation. Risk analysts could investigate historical days and scenarios without waiting for batch jobs, accelerating risk research and post-mortem analysis.
Day-over-day comparisons. The ability to compare VaR across two days with delta decomposition by dimension gave risk teams a much faster path from "VaR moved" to "here is exactly what drove the move."
Scenario flexibility. The risk scenario module supported a much wider range of scenario definitions, including bespoke stress scenarios that risk teams could define on demand rather than commissioning the IT team to add to the batch grid.

The platform pattern (Map-Reduce on Spark for embarrassingly parallel risk computation, with cached scenario valuations and flexible aggregation) is reusable for adjacent quantitative risk use cases including CVA computation, exposure-at-default modelling, and scenario-based capital planning.

Why This Pattern Matters

The economics of cloud-native distributed computing for quant risk have been clear for several years. Legacy grid platforms still in production at large banks represent some of the most attractive modernisation candidates in capital markets IT, both for the cost reduction and for the analytical flexibility that the modern platforms unlock.

The harder problem is the engineering execution. Quant risk platforms have to be precise (every regulator and internal auditor will check the numbers), integrated with extensive upstream and downstream systems, and operationally robust at the cadence trading desks require. The platforms that successfully replace legacy grids do so on the back of teams that combine quant risk domain knowledge with distributed systems engineering. That combination is what airisDATA brings to this kind of work.

About airisDATA

airisDATA is the AI and data engineering practice of Innovative Information Technologies. Founded in 2015 and based in Princeton, NJ with delivery teams in Hyderabad and Pune, airisDATA has shipped production data and AI systems inside tier-1 banks for more than a decade. The On-Demand VaR engagement at Citi sits alongside production deliveries at UBS (Finance Data Hub), Credit Suisse (the original Finance Data Hub architecture, plus automated trade reconciliation, contract review, regulatory data quality, and treasury forecasting), BNP Paribas, and Deutsche Bank.

On-Demand Value at Risk at Citi