Failure Transparency in Stateful Dataflow Systems
Failure transparency enables users to reason about distributed systems at a higher level of abstraction, where complex failure-handling logic is hidden. This is especially true for stateful dataflow systems, which are the backbone of many cloud applications. In particular, this paper focuses on proving failure transparency in Apache Flink, a popular stateful dataflow system. Even though failure transparency is a critical aspect of Apache Flink, to date it has not been formally proven. Showing that the failure transparency mechanism is correct, however, is challenging due to the complexity of the mechanism itself. Nevertheless, this complexity can be effectively hidden behind a failure transparent programming interface. To show that Apache Flink is failure transparent, we model it in small-step operational semantics. Next, we provide a novel definition of failure transparency based on observational explainability, a concept which relates executions according to their observations. Finally, we provide a formal proof of failure transparency for the implementation model; i.e., we prove that the failure-free model correctly abstracts from the failure-related details of the implementation model. We also show liveness of the implementation model under a fair execution assumption. These results are a first step towards a verified stack for stateful dataflow systems.
Slides (Failure Transparency in Stateful Dataflow Systems.pdf) | 5.88MiB |
Tue 17 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:30 - 12:00 | |||
10:30 15mTalk | Defining Name Accessibility using Scope Graphs Technical Papers Link to publication Pre-print | ||
10:45 15mTalk | Rose: Composable Autodiff for the Interactive Web Technical Papers Sam Estep Carnegie Mellon University, Wode Ni Carnegie Mellon University, Raven Rothkopf Barnard College, Joshua Sunshine Carnegie Mellon University | ||
11:00 15mTalk | Failure Transparency in Stateful Dataflow Systems Technical Papers Aleksey Veresov KTH Royal Institute of Technology, Jonas Spenger KTH Royal Institute of Technology, Paris Carbone KTH Royal Institute of Technology, Philipp Haller KTH Royal Institute of Technology DOI Pre-print Media Attached File Attached | ||
11:15 15mTalk | Fair join pattern matching for actors Technical Papers Philipp Haller KTH Royal Institute of Technology, Ayman Hussein Technical University of Denmark, Hernan Melgratti University of Buenos Aires, Argentina, Alceste Scalas Technical University of Denmark, Emilio Tuosto Gran Sasso Science Institute, L'Aquila, Italy DOI | ||
11:30 15mTalk | Constrictor: Immutability as a Design Concept Technical Papers DOI Pre-print | ||
11:45 15mTalk | A Language-Based Version Control System for Python Technical Papers Luís Carvalho NOVA School of Science and Technology, João Costa Seco NOVA-LINCS; Nova University of Lisbon |