Anagha Jamthe, Gwen A. Jacobs, Michael Packard, Richard Cardone, Julia Looney, Joe Stubbs, Maytal Dahan, Sean B. Cleveland, Smruti Padhy, Jack A. Smith, Christian Garcia, and Steve Terry
The Tapis framework, an NSF-funded project, is an open-source, scalable API platform that enables researchers to perform distributed computational experiments securely and achieve faster scientific results with increased reproducibility. Tapis Streams API focuses on supporting scientific use cases that require working with real-time sensor data. The Streams Service, built on the top of the CHORDS time-series data service, allows storing, processing, annotating, querying, and archiving time-series data. This paper focuses on the new Tapis Streams API functionality that enables researchers to design and execute real-time data-driven event workflow for their research. We describe the architecture and design choices towards achieving this new capability with Streams API. Specifically, we demonstrate the integration of Streams API with Kapacitor, a native data processing engine for time-series database InfluxDB, and Abaco, an NSF Funded project, web service, and distributed computing platform providing function-as-a-Service (FaaS). The Streams API, which includes a wrapper interface for the Kapacitor alerting system, can define and enable alerts. Finally, simulation results from the water-quality use case depict that Streams API’s new capabilities can support real-time streaming data event-driven workflows.