Wednesday, February 10, 2016

Write-ahead logging



http://infolab.stanford.edu/~ullman/dscb.html
http://work.tinou.com/2012/09/write-ahead-log.html
Enter the log. The log records information about transactions so we can restore our system to a consistent state. The first log approach, the undo log, reverses the changes of incomplete transactions. In our example, upon recovery, changes to A are undone so A is once again 10 and (A == B == 10). The log is of course written to nonvolatile storage.
An undo log looks something like this,
Screen Shot 2012-09-01 at 8.34.27 PM
When we update A we log a record indicate its before value 10. Likewise, when we change B from 10 to 20 we record its before value 10. Before outputting A and B to disk we must flush the log (the undo log, not the data log). Only after output(A) and output(B) are successful can we record <commit T>.
With the undo log in place, how do we recovery from failure? We read the undo log from the end (most recently written record) to the start and find incomplete transactions. Transactions with a <commit> record we can ignore because we know that <commit> can only be recorded after output has been successful. If there is no <commit> record for the transaction we cannot be certain that output was successful, so we use the undo information to revert the changes. <T, B, 10> sets B back to 10 and <T, A, 10> sets A back to 10. The undo records are idempotent so if there is a crash during recovery we can just recovery as usual, setting B to 10 even if B is already 10; setting A to 10 even if A is already 10. The undo log records <abort T> to indicate we aborted the transaction. 
Screen Shot 2012-09-01 at 8.34.38 PM
The undo log is great but there is one annoying performance issue. Before we can record <commit T> in the undo log we must do output(A) and output(B), incurring disk I/O. If we have a lot of transactions we keep having to do output in order to maintain the integrity of the undo log. We may want to buffer the output until a convenient time.

Enter the redo log. Instead of undoing a change we will record information (the new value v) so we can rerun transactions, reapplying the change if necessary. Before doing any output we must record the <commit> record. We write-ahead. The write-ahead logging rule,
Before modifying any database element X on disk, it is necessary that all log records pertaining to this modification of X, including both the update record <T, X, v> and the <COMMIT T> record, must appear on disk. 
So our redo log will look something like this,
Screen Shot 2012-09-01 at 8.39.25 PM
We record the new values (20 and 20) then commit then flush the log. We can only do output after the <commit T> record has been written and the log flushed. This solves the issue of buffering our output. We still will do disk I/O for the log flush but logs are sequential appends so it can be much faster than random outputs to random blocks on the disk 
To recovery with a redo log we begin at the head of the log scanning forward (opposite of the undo log). If we see an incomplete transaction (no <commit T> record) we can ignore (except adding an <abort T>), knowing that no output was ever done. However, if we see a <commit T> record we don't know whether the output was successful or not, so we just redo the change, even if it's redundant. A would be set to 20, B would be set to 20 even though the log indicates the transaction committed with <commit T>. Like the undo log, the changes are idempotent so repeated calls are fine.
https://en.wikipedia.org/wiki/Write-ahead_logging
write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems.
In a system using WAL, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log.
The purpose of this can be illustrated by an example. Imagine a program that is in the middle of performing some operation when the machine it is running on loses power. Upon restart, that program might well need to know whether the operation it was performing succeeded, half-succeeded, or failed. If a write-ahead log is used, the program can check this log and compare what it was supposed to be doing when it unexpectedly lost power to what was actually done. On the basis of this comparison, the program could decide to undo what it had started, complete what it had started, or keep things as they are.
WAL allows updates of a database to be done in-place. Another way to implement atomic updates is with shadow paging, which is not in-place. The main advantage of doing updates in-place is that it reduces the need to modify indexes and block lists.
http://www.redbook.io/archive/redbook3/aries/tsld008.htm

http://www.postgresql.org/docs/9.1/static/wal-intro.html
WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log: any changes that have not been applied to the data pages can be redone from the log records. (This is roll-forward recovery, also known as REDO.)

Because WAL restores database file contents after a crash, journaled file systems are not necessary for reliable storage of the data files or WAL files. In fact, journaling overhead can reduce performance, especially if journaling causes file system data to be flushed to disk. Fortunately, data flushing during journaling can often be disabled with a file system mount option, e.g. data=writeback on a Linux ext3 file system. Journaled file systems do improve boot speed after a crash.


Using WAL results in a significantly reduced number of disk writes, because only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction. The log file is written sequentially, and so the cost of syncing the log is much less than the cost of flushing the data pages. This is especially true for servers handling many small transactions touching different parts of the data store. Furthermore, when the server is processing many small concurrent transactions, one fsync of the log file may suffice to commit many transactions.
WAL also makes it possible to support on-line backup and point-in-time recovery, as described in Section 24.3. By archiving the WAL data we can support reverting to any time instant covered by the available WAL data: we simply install a prior physical backup of the database, and replay the WAL log just as far as the desired time. What's more, the physical backup doesn't have to be an instantaneous snapshot of the database state — if it is made over some period of time, then replaying the WAL log for that period will fix any internal inconsistencies.

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts