Designing Data-Intensive Applications by Martin Kleppmann

  • Rating: 5/5
  • Amazon
  • Ordering: version vector (detecting concurrent writes), logical clock or sequence number, total order broadcast algorithm
  • Pass version in read, have server wait until version applied before reading (zookeeper sync). Or add read to stream
  • Error handling: two-phase-commit, compensating transactions, JMS/JDBC XA distributed transactions are single point of failure and can hold locks on crash
  • Update queries to use DB arithmetic/row-level locks (to prevent write skew) and transactions (otherwise reads may get inconsistent data with snapshot isolation/repeatable read, Postgres RR detects lost updates and aborts), but queue/stream might make more sense.
  • Distributed transactions use atomic commit to ensure that changes take effect exactly once, while log-based systems are often based on deterministic retry and idempotence. Transactions within a single storage or stream processing system are feasible, but when data crosses the boundary between different technologies, an asynchronous event log with idempotent writes is a much more robust and practicable approach.
  • The effect of this write changes the precondition of the decision of step 2. In other words, if you were to repeat the SELECT query from step 1 after committing to write, you would get a different result, because the write changed the set of rows matching the search condition
  • Thus, in situations where messages may be expensive to process and you want to parallelize processing on a message-by-message basis, and where message ordering is not so important, the JMS/AMQP style of message broker is preferable. On the other hand, in situations with high message throughput, where each message is fast to process and where message ordering is important, the log-based approach works very well.
  • Any validation of a command needs to happen synchronously, before it becomes an event—for example, by using a serializable transaction that atomically validates the command and publishes the event. Alternatively, the user request to reserve a seat could be split into two events: first a tentative reservation, and then a separate confirmation event once the reservation has been validated
  • To adjust for incorrect device clocks, one approach is to log three timestamps: 1. The time at which the event occurred, according to the device clock 2. The time at which the event was sent to the server, according to the device clock 3. The time at which the event was received by the server, according to the server clock. By subtracting the second timestamp from the third, you can estimate the offset between the device clock and the server clock (assuming the network delay is negligible compared to the required timestamp accuracy). You can then apply that offset to the event timestamp, and thus estimate the true time at which the event actually occurred (assuming the device clock offset did not change between the time the event occurred and the time it was sent to the server).
  • The request to transfer money from account A to account B is given a unique request ID by the client, and appended to a log partition based on the request ID. A stream processor reads the log of requests. For each request message it emits two messages to output streams: a debit instruction to the payer account A (partitioned by A), and a credit instruction to the payee account B (partitioned by B). The original request ID is included in those emitted messages. Further processors consume the streams of credit and debit instructions, deduplicate by request ID (e.g., idempotent operations), and apply the changes to the account balances. Steps 1 and 2 are necessary because if the client directly sent the credit and debit instructions, it would require an atomic commit across those two partitions to ensure that either both or neither happen. To avoid the need for a distributed transaction, we first durably log the request as a single message, and then derive the credit and debit instructions from that first message. Single-object writes are atomic in almost all data systems (see "Single-object writes"), and so the request either appears in the log or it doesn't, without any need for a multi-partition atomic commit. If the stream processor in step 2 crashes, it resumes processing from its last checkpoint. In doing so, it does not skip any request messages, but it may process requests multiple times and produce duplicate credit and debit instructions. However, since it is deterministic, it will just produce the same instructions again, and the processors in step 3 can easily deduplicate them using the end-to-end request ID.

Stay up to date

Get notified when I publish. Unsubscribe at any time.