Designing Data-Intensive Applications
Atomicity
If an error occurs halfway through a sequence of writes, the transaction should be aborted, and the writes made up to that point should be discarded.
Isolation
Concurrently running transactions shouldn’t interfere with each other. For example, if one transaction makes several writes, then another transaction should see either all or none of those writes, but not some subset.
compare-and-set, atomic incremental
the whole point of aborts is to enable safe retries.
limiting the number of retries, use exponential backoff, and handle overload-related errors differently from other errors (if possible).
serializable isolation means that the database guarantees that transactions have the same effect as if they ran serially, i.e. one at a time, without any concurrency.
READ COMMITTED
The most basic level of transaction isolation is read committed.
When reading from the database, you will only see data that has been committed (no dirty reads).
When writing to the database, you will only overwrite data that has been committed (no dirty writes).
Implementing read committed
Most commonly, databases prevent dirty writes by using row-level locks: when a transaction wants to modify a particular object (row or document), it must first acquire a lock on that object. It must then hold that lock until the transaction is committed or aborted. Only one transaction can hold the lock for any given object; if another transaction wants to write to the same object, it must wait until the first transaction is committed or aborted before it can acquire the lock and continue.
How do we prevent dirty reads?
One option would be to use the same lock, and to require any transaction that wants to read an object to briefly acquire the lock, and then release it again immediately after reading. This would ensure that a read couldn’t happen while an object has a dirty, uncommitted value (because during that time the lock would be held by the transaction that has made the write).
However, the approach of requiring read locks does not work well in practice, because one long-running write transaction can force many read-only transactions to wait until the long-running transaction has completed
for every object that is written, the database remembers both the old committed value, and also the new value set by the transaction that currently holds the write lock. While the transaction is ongoing, any other transactions that read the object are simply given the old value. Only when the new value is committed, transactions switch over to reading the new value.
SNAPSHOT ISOLATION AND REPEATABLE READ
Snapshot isolation, also known as multiversion concurrency control (MVCC),
The idea is that each transaction reads from a consistent snapshot of the database, that is, all the data that was committed in the database at a particular point in time. Even if the data is subsequently changed by another transaction, each transaction sees the old data from the time when that transaction started.
Snapshot isolation is a boon for long-running, read-only queries such as backups and analytics. It is very hard to reason about the meaning of a query if the data on which it operates is changing at the same time as the query is executing.
When a transaction can see a consistent snapshot of the database, frozen at a particular point in time, it is much easier to understand.
readers never block writers, and writers never block readers.
In order to prevent dirty reads, the database needs to keep at most two versions of an object: the committed version, and the overwritten-but-not-yet-committed version.
For snapshot isolation, the database must potentially keep several different committed versions of an object, because various in-progress transactions may need to see the state of the database at different points in time. Hence snapshot isolation is also known as a multiversion technique.
When a transaction is started, it is given a unique, always-increasing[322] transaction ID. Whenever a transaction writes anything to the database, the data it writes is tagged with the transaction ID of the writer.
Each row in a table has a created by field, containing the ID of the transaction that inserted this row into the table.
Moreover, each row has a deleted by field, which is initially empty. If a transaction deletes a row, the row isn’t actually deleted from the database, but it is marked for deletion by setting the deleted by field to the ID of the transaction that requested the deletion.
At some later time, when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes any rows marked for deletion, and frees their space.
An update is internally translated into a delete and a create.
When a transaction reads from the database, transaction IDs are used to decide which objects a transaction can see, and which are invisible.
By carefully defining visibility rules, the database can present a consistent snapshot of the database to the application.
This works as follows:
At the start of each transaction, the database makes a list of all the other transactions which are in progress (not yet committed or aborted) at that time. Any writes made by one of those transactions are ignored, even if the transaction subsequently commits.
Any writes made by aborted transactions are ignored.
Any writes made by transactions with a later transaction ID (i.e. which started after the current transaction started) are ignored, regardless of whether that transaction has committed.
All other writes are visible to the application’s queries.
an object is visible if:
At the time when the reader’s transaction started, the transaction which created the object had already committed, and
The object is not marked for deletion—or if it is, the transaction which requested deletion had not yet committed at the time when the reader’s transaction started.
A long-running transaction may continue using a snapshot for a long time, continuing to read values which (from other transactions’ point of view) have long been overwritten or deleted.
By never updating values in place, but instead creating a new version every time a value is changed, the database can provide a consistent snapshot while incurring only a small overhead.
Indexes and snapshot isolation
How do indexes work in a multiversion database?
One option is to have the index simply point to all versions of an object, and an index query needs to filter out any object versions that are not visible to the current transaction.
When garbage collection removes old object versions that are no longer visible to any transaction, the corresponding index entries can also be removed.
In practice, many implementation details determine the performance of multiversion concurrency control.
PostgreSQL has optimizations for avoiding index updates if different versions of the same object can fit on the same page
http://www.developer.com/db/article.php/3706251/Database-Isolation-Levels.htm
Dirty Reads
One database transaction can see changes made by other databse transaction which is not yet commited, that change may be rollbacked later: the current transcation read a row that physically does not exist.
Nonrepeatable Read
This situation occurs when a transaction reads the same query multiple times and results are not the same each time.
It occurs when one transaction attempts to access the same data twice and a second transaction modifies the data between the first transaction's read attempts.
Transaction A reads a row of data.
Transaction B modifies this row and commits.
Transaction A re-reads the same row and sets back different data values.
Phantom read
This means that if you one transcation executes a query at time T1 and re-executes it at time T2, additional rows may have been added to the database, which may affect your results.
This differs from a nonrepeatable read in that with a phantom read, data you already read hasn't been changed, but instead, more data satisfies your query criteria than before.
It occurs when one transaction accesses a range of data more than once and a second transaction inserts or deletes rows that fall within that range between the first transaction's read attempts. This can cause "phantom" rows to appear or disappear from the first transaction's perspective.
Transaction A reads two rows based on a Query A where clause.
Transaction B inserts a new row that happens to fall under Transaction A Query A's where clause.
Transaction A runs Query A again and now gets back three rows.
The ANSI/ISO SQL standard defines four levels of transaction isolation, with different possible outcomes for the same transaction scenario.
READ UNCOMMITTED
This is lowest level of databse isolation level in this one database transaction can see changes made by other databse transaction which is not yet commited.
Read Commited
One database transaction only sees commited changes by other database transaction.
Repeatable Read
The Repeatable Read isolation level only sees data committed before the transaction began; it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. (However, the query does see the effects of previous updates executed within its own transaction, even though they are not yet committed.)
Serializable
Highest level of database isolation level. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently.
Highest level of database isolation level.
In this all database transactions are totally isolated with other database transaction.though this is safe but this safety can cause significant performance hit.
ACID Properties of database transaction
http://javarevisited.blogspot.com/2011/11/database-transaction-tutorial-example.html
A stands for Atomicity, Atom is considered to be smallest particle which can not be broken into further pieces.database transaction has to be atomic means either all steps of transaction completes or none of them.
C stands for Consistency, transaction must leave database in consistent state even if it succeed or rollback.
I is for Isolation
Two database transactions happening at same time should not affect each other and has consistent view of database. This is achieved by using isolation levels in database.
D stands for Durability
Data has to be persisted successfully in database once transaction completed successfully and it has to be saved from power outage or other threats.
MySQL
Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels - MySQL Performance Blog
https://dev.mysql.com/doc/refman/5.7/en/set-transaction.html
START TRANSACTION;
REPEATABLE READ
This is the default isolation level for InnoDB. All consistent reads within the same transaction read the snapshot established by the first read. This convention means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other.
For locking reads (SELECT with FOR UPDATE or LOCK IN SHARE MODE), UPDATE, and DELETE statements, locking depends on whether the statement uses a unique index with a unique search condition, or a range-type search condition. For a unique index with a unique search condition, InnoDB locks only the index record found, not the gap before it. For other search conditions, InnoDB locks the index range scanned, using gap locks or next-key locks to block insertions by other sessions into the gaps covered by the range.
SERIALIZABLE
This level is like REPEATABLE READ, but InnoDB implicitly converts all plain SELECT statements to SELECT ... LOCK IN SHARE MODE if autocommit is disabled. If autocommit is enabled, the SELECT is its own transaction.
In REPEATABLE READ every lock acquired during a transaction is held for the duration of the transaction.
In READ COMMITTED the locks that did not match the scan are released after the STATEMENT completes.
Read Committed is the default isolation level in PostgreSQL and SQL Server.
Repeatable Reads is the default isolation level in MySQL.
SELECT @@TX_ISOLATION FROM DUAL; -- Viewing the current transaction isolation level.
SET TRANSACTION ISOLATION LEVEL levelName; -- Changing the current transaction isolation level.
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
Extra locking (not gap locking)
It is important to remember that InnoDB actually locks index entries, not rows. During the execution of a statement InnoDB must lock every entry in the index that it traverses to find the rows it is modifying. It must do this to prevent deadlocks and maintain the isolation level.
In REPEATBLE READ, a ‘read view’ ( trx_no does not see trx_id >= ABC, sees < ABB ) is created at the start of the transaction, and this read view (consistent snapshot in Oracle terms) is held open for the duration of the transaction.
This is called MVCC (multiple version concurrency control) and it is accomplished using row versioning and UNDO information.
References
Atomicity
If an error occurs halfway through a sequence of writes, the transaction should be aborted, and the writes made up to that point should be discarded.
Isolation
Concurrently running transactions shouldn’t interfere with each other. For example, if one transaction makes several writes, then another transaction should see either all or none of those writes, but not some subset.
compare-and-set, atomic incremental
the whole point of aborts is to enable safe retries.
limiting the number of retries, use exponential backoff, and handle overload-related errors differently from other errors (if possible).
serializable isolation means that the database guarantees that transactions have the same effect as if they ran serially, i.e. one at a time, without any concurrency.
READ COMMITTED
The most basic level of transaction isolation is read committed.
When reading from the database, you will only see data that has been committed (no dirty reads).
When writing to the database, you will only overwrite data that has been committed (no dirty writes).
Implementing read committed
Most commonly, databases prevent dirty writes by using row-level locks: when a transaction wants to modify a particular object (row or document), it must first acquire a lock on that object. It must then hold that lock until the transaction is committed or aborted. Only one transaction can hold the lock for any given object; if another transaction wants to write to the same object, it must wait until the first transaction is committed or aborted before it can acquire the lock and continue.
How do we prevent dirty reads?
One option would be to use the same lock, and to require any transaction that wants to read an object to briefly acquire the lock, and then release it again immediately after reading. This would ensure that a read couldn’t happen while an object has a dirty, uncommitted value (because during that time the lock would be held by the transaction that has made the write).
However, the approach of requiring read locks does not work well in practice, because one long-running write transaction can force many read-only transactions to wait until the long-running transaction has completed
for every object that is written, the database remembers both the old committed value, and also the new value set by the transaction that currently holds the write lock. While the transaction is ongoing, any other transactions that read the object are simply given the old value. Only when the new value is committed, transactions switch over to reading the new value.
SNAPSHOT ISOLATION AND REPEATABLE READ
Snapshot isolation, also known as multiversion concurrency control (MVCC),
The idea is that each transaction reads from a consistent snapshot of the database, that is, all the data that was committed in the database at a particular point in time. Even if the data is subsequently changed by another transaction, each transaction sees the old data from the time when that transaction started.
Snapshot isolation is a boon for long-running, read-only queries such as backups and analytics. It is very hard to reason about the meaning of a query if the data on which it operates is changing at the same time as the query is executing.
When a transaction can see a consistent snapshot of the database, frozen at a particular point in time, it is much easier to understand.
readers never block writers, and writers never block readers.
In order to prevent dirty reads, the database needs to keep at most two versions of an object: the committed version, and the overwritten-but-not-yet-committed version.
For snapshot isolation, the database must potentially keep several different committed versions of an object, because various in-progress transactions may need to see the state of the database at different points in time. Hence snapshot isolation is also known as a multiversion technique.
When a transaction is started, it is given a unique, always-increasing[322] transaction ID. Whenever a transaction writes anything to the database, the data it writes is tagged with the transaction ID of the writer.
Each row in a table has a created by field, containing the ID of the transaction that inserted this row into the table.
Moreover, each row has a deleted by field, which is initially empty. If a transaction deletes a row, the row isn’t actually deleted from the database, but it is marked for deletion by setting the deleted by field to the ID of the transaction that requested the deletion.
At some later time, when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes any rows marked for deletion, and frees their space.
An update is internally translated into a delete and a create.
When a transaction reads from the database, transaction IDs are used to decide which objects a transaction can see, and which are invisible.
By carefully defining visibility rules, the database can present a consistent snapshot of the database to the application.
This works as follows:
At the start of each transaction, the database makes a list of all the other transactions which are in progress (not yet committed or aborted) at that time. Any writes made by one of those transactions are ignored, even if the transaction subsequently commits.
Any writes made by aborted transactions are ignored.
Any writes made by transactions with a later transaction ID (i.e. which started after the current transaction started) are ignored, regardless of whether that transaction has committed.
All other writes are visible to the application’s queries.
an object is visible if:
At the time when the reader’s transaction started, the transaction which created the object had already committed, and
The object is not marked for deletion—or if it is, the transaction which requested deletion had not yet committed at the time when the reader’s transaction started.
A long-running transaction may continue using a snapshot for a long time, continuing to read values which (from other transactions’ point of view) have long been overwritten or deleted.
By never updating values in place, but instead creating a new version every time a value is changed, the database can provide a consistent snapshot while incurring only a small overhead.
Indexes and snapshot isolation
How do indexes work in a multiversion database?
One option is to have the index simply point to all versions of an object, and an index query needs to filter out any object versions that are not visible to the current transaction.
When garbage collection removes old object versions that are no longer visible to any transaction, the corresponding index entries can also be removed.
In practice, many implementation details determine the performance of multiversion concurrency control.
PostgreSQL has optimizations for avoiding index updates if different versions of the same object can fit on the same page
http://www.developer.com/db/article.php/3706251/Database-Isolation-Levels.htm
Dirty Reads
One database transaction can see changes made by other databse transaction which is not yet commited, that change may be rollbacked later: the current transcation read a row that physically does not exist.
Nonrepeatable Read
This situation occurs when a transaction reads the same query multiple times and results are not the same each time.
It occurs when one transaction attempts to access the same data twice and a second transaction modifies the data between the first transaction's read attempts.
Transaction A reads a row of data.
Transaction B modifies this row and commits.
Transaction A re-reads the same row and sets back different data values.
Phantom read
This means that if you one transcation executes a query at time T1 and re-executes it at time T2, additional rows may have been added to the database, which may affect your results.
This differs from a nonrepeatable read in that with a phantom read, data you already read hasn't been changed, but instead, more data satisfies your query criteria than before.
It occurs when one transaction accesses a range of data more than once and a second transaction inserts or deletes rows that fall within that range between the first transaction's read attempts. This can cause "phantom" rows to appear or disappear from the first transaction's perspective.
Transaction A reads two rows based on a Query A where clause.
Transaction B inserts a new row that happens to fall under Transaction A Query A's where clause.
Transaction A runs Query A again and now gets back three rows.
The ANSI/ISO SQL standard defines four levels of transaction isolation, with different possible outcomes for the same transaction scenario.
READ UNCOMMITTED
This is lowest level of databse isolation level in this one database transaction can see changes made by other databse transaction which is not yet commited.
Read Commited
One database transaction only sees commited changes by other database transaction.
Repeatable Read
The Repeatable Read isolation level only sees data committed before the transaction began; it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. (However, the query does see the effects of previous updates executed within its own transaction, even though they are not yet committed.)
Serializable
Highest level of database isolation level. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently.
Highest level of database isolation level.
In this all database transactions are totally isolated with other database transaction.though this is safe but this safety can cause significant performance hit.
ACID Properties of database transaction
http://javarevisited.blogspot.com/2011/11/database-transaction-tutorial-example.html
A stands for Atomicity, Atom is considered to be smallest particle which can not be broken into further pieces.database transaction has to be atomic means either all steps of transaction completes or none of them.
C stands for Consistency, transaction must leave database in consistent state even if it succeed or rollback.
I is for Isolation
Two database transactions happening at same time should not affect each other and has consistent view of database. This is achieved by using isolation levels in database.
D stands for Durability
Data has to be persisted successfully in database once transaction completed successfully and it has to be saved from power outage or other threats.
MySQL
Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels - MySQL Performance Blog
https://dev.mysql.com/doc/refman/5.7/en/set-transaction.html
START TRANSACTION;
REPEATABLE READ
This is the default isolation level for InnoDB. All consistent reads within the same transaction read the snapshot established by the first read. This convention means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other.
For locking reads (SELECT with FOR UPDATE or LOCK IN SHARE MODE), UPDATE, and DELETE statements, locking depends on whether the statement uses a unique index with a unique search condition, or a range-type search condition. For a unique index with a unique search condition, InnoDB locks only the index record found, not the gap before it. For other search conditions, InnoDB locks the index range scanned, using gap locks or next-key locks to block insertions by other sessions into the gaps covered by the range.
SERIALIZABLE
This level is like REPEATABLE READ, but InnoDB implicitly converts all plain SELECT statements to SELECT ... LOCK IN SHARE MODE if autocommit is disabled. If autocommit is enabled, the SELECT is its own transaction.
In REPEATABLE READ every lock acquired during a transaction is held for the duration of the transaction.
In READ COMMITTED the locks that did not match the scan are released after the STATEMENT completes.
Read Committed is the default isolation level in PostgreSQL and SQL Server.
Repeatable Reads is the default isolation level in MySQL.
SELECT @@TX_ISOLATION FROM DUAL; -- Viewing the current transaction isolation level.
SET TRANSACTION ISOLATION LEVEL levelName; -- Changing the current transaction isolation level.
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
Extra locking (not gap locking)
It is important to remember that InnoDB actually locks index entries, not rows. During the execution of a statement InnoDB must lock every entry in the index that it traverses to find the rows it is modifying. It must do this to prevent deadlocks and maintain the isolation level.
In REPEATABLE READ every lock acquired during a transaction is held for the duration of the transaction.
In READ COMMITTED the locks that did not match the scan are released after the STATEMENT completes.
In all transaction isolation levels InnoDB creates locks over every index entry scanned. The difference between the levels is that once the statement completes in READ COMMITTED mode, the locks are released for the entries that did not match the scan.
This means that in READ COMMITTED other transactions are free to update rows that they would not have been able to update (in REPEATABLE READ) once the UPDATE statement completes.
Consistent read viewsIn REPEATBLE READ, a ‘read view’ ( trx_no does not see trx_id >= ABC, sees < ABB ) is created at the start of the transaction, and this read view (consistent snapshot in Oracle terms) is held open for the duration of the transaction.
This is called MVCC (multiple version concurrency control) and it is accomplished using row versioning and UNDO information.
In REPEATABLE READ InnoDB also creates gap locks for range scans.
select * from some_table where id > 100 FOR UPDATE;
The above update will create a gap lock that will prevent any rows with id > 100 from being inserted into the table until the transaction rolls back or commits.
Non-repeatable reads (read committed)
In READ COMMITTED, a read view is created at the start of each statement. In READ COMMITTED the read view for the transaction lasts only as long as each statement execution. As a result, consecutive executions of the same statement may show different results.
In READ COMMITTED, a read view is created at the start of each statement. In READ COMMITTED the read view for the transaction lasts only as long as each statement execution. As a result, consecutive executions of the same statement may show different results.
This is called the ‘phantom row’ problem.
In addition, in READ COMMITTED gap locks are never created. Since there is no gap lock, the example SELECT .. FOR UPDATE above will not prevent insertions of new rows into the table by other transactions. Thus, locking rows with SELECT … FOR UPDATE (ie “where id> 100″) and subsequently updating rows with “where id> 100″ (even in the same transaction) may result in more rows being updated than were earlier locked. This is because new rows may have been inserted in the table between the statements since there was no gap lock created for the SELECT … FOR UPDATE.