The majority of stream processing work is spent on mundane transformation tasks like data scrubbing, normalization and filtering. Yet to perform these relatively simple tasks involves standing up multiple distributed systems that add complexity and take time to learn. Worse yet, once you’re done, you end up ping-ponging data back and forth between storage and compute just to remove a field from a JSON object or perform some simple validation. To the data engineer, it can feel like an endless game of system whack-a-mole just to start the interesting work of actually understanding the data.
Fortunately, help has arrived in the form of WebAssembly (Wasm), which enables users to create transformation modules — in the language of their choice — to perform fast data transformations on topics. By shipping these computations to the storage engine, developers can codify business practices like GDPR compliance or schema normalization, with near native-level performance at runtime.
In this talk, Tristan Stevens, Director of Customer Success at Redpanda, provides an overview of a Wasm-based data transformation architecture, and shows how it can simplify as well as boost the performance of real-time applications and data pipelines.