Project Goal
The project described aims to use the Micron CXL-enabled memory devices as part of the ingestion and data processing chain for the L1 Scouting system at CMS, providing a coherent and seamless access to buffered data from multiple processors and compute accelerators, and a low-latency access/short term storage space for both raw and processed data at scale.
Background
The Compute Express Link (CXL) protocol is a new alternate protocol that can run over the standard PCIe physical layer, and dynamically multiplexes IO, cache and memory protocols. It is designed to empower a new generation of heterogeneous and disaggregated computing with efficient resource sharing, shared memory pools, enhanced movement of operands and results between accelerators and target devices, and significant latency reduction. CMS intends to profit from the capabilities of this new technology in the online processing solution for the L1 scouting data, and in doing so will pave the way for its utilisation in the wider community.
Progress
In 2023, our focus shifted from the prior Micron-CMS openlab project centred on deep learning inference acceleration, as outlined in the previous year’s report, to a new initiative concentrating on CXL-enabled memory. This transition prompted the design of a novel architecture for L1 Scouting online processing, wherein the RAM-disk is either substituted or augmented by a CXL-enabled “memory lake.” The initial stages called for a new demonstrator configuration leveraging Micron CXL memory modules. In September, two CXL 2.0, DRAM-based memory devices, each boasting 128 GB of memory, were installed in an AMD Genoa testbed server at CMS Point 5. Subsequent efforts focused on configuring and utilizing these modules, culminating in a comprehensive series of throughput and performance measurements. These evaluations employed standard and custom tools for various memory tiering management configurations, including NUMA balancing and transparent page placement. In November 2023, key members of the CERN team engaged with Micron experts at the SuperComputing conference in Denver, USA. The teams used this opportunity to plan for the upcoming year, and exchange technical details and results.
Next Steps
The next line of investigation involves testing and understanding coherent memory sharing with dedicated accelerators, such as GPUs. Over the next 12 months, we will acquire a more extensive “memory lake” system, featuring expanded capacity and a CXL switch interconnect. This will propel us toward the ultimate goal of integrating CXL memory sharing into the CMS L1 Scouting system.
Project Coordinator: Emilio Meschi
Technical Team: Thomas Owen James, Giovanna Lazzari Miotto
Collaboration Liaisons from Micron: Jason Adlard, Tony Brewer, Glen Edwards, Patrick Estep, Andrey Kudryavtsev
In partnership with: Micron, CMS Experiment