The urgency for IT leaders to bring real-time analytics to their organizations is stronger than ever. For these organizations, the ability to start with fresh data and combine streaming, transactional, and analytical workloads in a single system can revolutionize their operations.
When moving from batch to real time, data architects should carefully consider what type of streaming semantics will optimize their workload. The table below highlights the nuances among different types of streaming semantics.
Understanding Streaming Semantics
At Most Once
With at most once semantics, a message is pulled once, but is not guaranteed to be received. At most once semantics conserves bandwidth and does not produce duplicates, but leaves us with a possibility of missing data.
At Least Once
At least once semantics pulls messages one or multiple times to guarantee every message is received. However, doing so consumes an unnecessary amount of network bandwidth and increases the likelihood of duplicate records being entered into the system, that ultimately need to be consolidated.
With exactly-once semantics, messages are pulled one or more times, processed only once, and delivery is guaranteed. Exactly-once semantics is ideal for operational applications, as it guarantees no duplicates or missing data. Many enterprise applications, like those used for credit card processing, require exactly-once semantics.
Exactly-Once Semantics for Apache Kafka
We recently launched MemSQL Pipelines, making it easier than ever to achieve real-time data ingestion at scale. MemSQL Pipelines ensures exactly-once semantics when streaming from message brokers like Apache Kafka.
The webcast recording below details how to use MemSQL Pipelines for real-time data ingest at scale and exactly-once semantics for Apache Kafka.
Watch now to learn how to:
Quickly build real-time data pipelines using a new SQL command: CREATE PIPELINE
Enable exactly-once semantics for Apache Kafka
Execute user-defined transformations written in any programming language
Enable rapid parallel data loading between MemSQL partitions and Kafka brokers