https://stackoverflow.com/questions/21057314/mysql-transactions-and-how-does-rollback-work
The rollback segment is inside the tablespace on disk.
https://oracle2mysql.wordpress.com/2015/10/05/two-very-common-reasons-for-replication-lag/
http://dimovelev.blogspot.com/2015/02/performance-pitfalls-hibernate-query.html
https://stackoverflow.com/questions/31557076/spring-hibernate-query-plan-cache-memory-usage
https://www.theserverside.com/news/1365244/Why-Prepared-Statements-are-important-and-how-to-use-them-properly
https://dba.stackexchange.com/questions/68801/how-does-the-alter-command-work-internally-when-adding-a-column
To elaborate on this. In this specific case the table is locked for read and writes. A new temp table is created, the data is copied into the new table with the additional column, the table is renamed and the old one is dropped, finally the table is unlocked.
https://stackoverflow.com/questions/277625/should-i-get-rid-of-clustered-indexes-on-guid-columns
https://azure.microsoft.com/en-us/blog/uniqueidentifier-and-clustered-indexes/
http://www.itprotoday.com/microsoft-sql-server/clustered-indexes-based-upon-guids
When architecting your underlying table structure, I'd steer clear of using GUIDs. Examine INTs or BIGINT columns based upon your expected scale (if you know for certain your scale is small you can also examine SMALLINT). GUIDs come with a great deal of overhead and performance hit just to increase the pool of available values open for uniquely-identifying rows in a table. You'd be better off adding a secondary column to deal with scale issues rather than rely on GUIDs in my opinion. The space concerns and institutional fragmentation alone are reasons to keep plenty of space between your tables and GUIDs.
https://www.clever-cloud.com/blog/engineering/2015/05/20/why-auto-increment-is-a-terrible-idea/
https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html
Every
https://dev.mysql.com/doc/refman/5.5/en/innodb-auto-increment-handling.html
https://www.flamingspork.com/blog/2006/05/10/how-auto_increment-is-implemented-in-ndb/
https://blog.jcole.us/2013/05/02/how-does-innodb-behave-without-a-primary-key/
https://stackoverflow.com/questions/840162/should-each-and-every-table-have-a-primary-key
If you start an explicit transaction, perform some updates, and then roll back, InnoDB restores the original state of data. It preserves the original data by storing it in an area of the database called the rollback segment. So if you roll back, it just re-copies those pages of data to replace the ones you changed.
This might take some time, so if you try to query data that was changed but rolled back, InnoDB automatically takes a detour to read the original data out of the rollback segment, until such time as it is re-merged into the tables.
Say for example you start a transaction, and UPDATE a billion rows. This copies many pages worth of the original rows to the rollback segment, and then fills the tables with changed data -- but the changed data is uncommitted. No one should be able to read uncommitted data, so anyone who queries the table will automatically get the original data from the rollback segment.
Then you rollback your transaction. Gradually over the next few minutes, InnoDB cleans up, and eventually it all comes back into sync. But anyone can continue to query the original data in the meantime.
If you had committed your transaction, then MySQL would just mark all the changed data as committed, and anyone subsequently reading the data wouldn't experience the slight overhead of reading from the rollback segment.
https://oracle2mysql.wordpress.com/2015/10/05/two-very-common-reasons-for-replication-lag/
The most common reason I encounter is databases having InnoDB tables without explicit primary keys. Especially if you are using row-based replication (“RBR”), you want explicit primary keys on all your tables. Otherwise, MySQL will scan the entire table for each row that is updated. (See bug 53375 . ) Maybe I’m a relational purist, but why would you want to have tables without explicit primary keys, anyway? (On the other, less-purist, hand, for performance reasons, sometimes a short surrogate PK may be preferred to a lengthy logical one. )
The other common reason is that the slave is single-threaded, and single-threaded performance can’t keep up with the multi-threaded master. In this case, if multiple databases are being updated, enabling the multi-threaded slave can help. ( See the manual for more.)
https://stackoverflow.com/questions/31557076/spring-hibernate-query-plan-cache-memory-usage
I've hit this issue as well. It basically boils down to having variable number of values in your IN clause and Hibernate trying to cache those query plans.
There are two great blog posts on this topic. The first:
Using Hibernate 4.2 and MySQL in a project with an in-clause query such as:select t from Thing t where t.id in (?)
Hibernate caches these parsed HQL queries. Specifically the HibernateSessionFactoryImpl
hasQueryPlanCache
withqueryPlanCache
andparameterMetadataCache
. But this proved to be a problem when the number of parameters for the in-clause is large and varies.These caches grow for every distinct query. So this query with 6000 parameters is not the same as 6001.The in-clause query is expanded to the number of parameters in the collection. Metadata is included in the query plan for each parameter in the query, including a generated name like x10_, x11_ , etc.Imagine 4000 different variations in the number of in-clause parameter counts, each of these with an average of 4000 parameters. The query metadata for each parameter quickly adds up in memory, filling up the heap, since it can't be garbage collected.This continues until all different variations in the query parameter count is cached or the JVM runs out of heap memory and starts throwing java.lang.OutOfMemoryError: Java heap space.Avoiding in-clauses is an option, as well as using a fixed collection size for the parameter (or at least a smaller size).For configuring the query plan cache max size, see the propertyhibernate.query.plan_cache_max_size
, defaulting to2048
(easily too large for queries with many parameters).
And second (also referenced from the first):
Hibernate internally uses a cache that maps HQL statements (as strings) to query plans. The cache consists of a bounded map limited by default to 2048 elements (configurable). All HQL queries are loaded through this cache. In case of a miss, the entry is automatically added to the cache. This makes it very susceptible to thrashing - a scenario in which we constantly put new entries into the cache without ever reusing them and thus preventing the cache from bringing any performance gains (it even adds some cache management overhead). To make things worse, it is hard to detect this situation by chance - you have to explicitly profile the cache in order to notice that you have a problem there. I will say a few words on how this could be done later on.So the cache thrashing results from new queries being generated at high rates. This can be caused by a multitude of issues. The two most common that I have seen are - bugs in hibernate which cause parameters to be rendered in the JPQL statement instead of being passed as parameters and the use of an "in" - clause.Due to some obscure bugs in hibernate, there are situations when parameters are not handled correctly and are rendered into the JPQL query (as an example check out HHH-6280). If you have a query that is affected by such defects and it is executed at high rates, it will thrash your query plan cache because each JPQL query generated is almost unique (containing IDs of your entities for example).The second issue lays in the way that hibernate processes queries with an "in" clause (e.g. give me all person entities whose company id field is one of 1, 2, 10, 18). For each distinct number of parameters in the "in"-clause, hibernate will produce a different query - e.g.select x from Person x where x.company.id in (:id0_)
for 1 parameter,select x from Person x where x.company.id in (:id0_, :id1_)
for 2 parameters and so on. All these queries are considered different, as far as the query plan cache is concerned, resulting again in cache thrashing. You could probably work around this issue by writing a utility class to produce only certain number of parameters - e.g. 1, 10, 100, 200, 500, 1000. If you, for example, pass 22 parameters, it will return a list of 100 elements with the 22 parameters included in it and the remaining 78 parameters set to an impossible value (e.g. -1 for IDs used for foreign keys). I agree that this is an ugly hack but could get the job done. As a result you will only have at most 6 unique queries in your cache and thus reduce thrashing.So how do you find out that you have the issue? You could write some additional code and expose metrics with the number of entries in the cache e.g. over JMX, tune logging and analyze the logs, etc. If you do not want to (or can not) modify the application, you could just dump the heap and run this OQL query against it (e.g. using mat):SELECT l.query.toString() FROM INSTANCEOF org.hibernate.engine.query.spi.QueryPlanCache$HQLQueryPlanKey l
. It will output all queries currently located in any query plan cache on your heap. It should be pretty easy to spot whether you are affected by any of the aforementioned problems.As far as the performance impact goes, it is hard to say as it depends on too many factors. I have seen a very trivial query causing 10-20 ms of overhead spent in creating a new HQL query plan. In general, if there is a cache somewhere, there must be a good reason for that - a miss is probably expensive so your should try to avoid misses as much as possible. Last but not least, your database will have to handle large amounts of unique SQL statements too - causing it to parse them and maybe create different execution plans for every one of them.
When a database receives a statement, the database engine first parses the statement and looks for syntax errors. Once the statement is parsed, the database needs to figure out the most efficient way to execute the statement. This can be computationally quite expensive. The database checks what indexes, if any, can help, or whether it should do a full read of all rows in a table. Databases use statistics on the data to figure out what is the best way. Once the query plan is created then it can be executed by the database engine.
It takes CPU power to do the access plan generation. Ideally, if we send the same statement to the database twice, then we'd like the database to reuse the access plan for the first statement. This uses less CPU than if it regenerated the plan a second time.
Statement Caches
Databases are tuned to do statement caches. They usually include some kind of statement cache. This cache uses the statement itself as a key and the access plan is stored in the cache with the corresponding statement. This allows the database engine to reuse the plans for statements that have been executed previously
the entire statement is the key. For example, if we later sent the statement "select a,b from t where c = 3", it would not find an access plan. This is because the "c=3" is different from the cached plan "c=2".
Here the cache won't be used. Each iteration of the loop sends a different SQL statement to the database. A new access plan is computed for each iteration and we're basically throwing CPU cycles away using this approach. However, look at the next snippet:
- PreparedStatement ps = conn.prepareStatement("select a,b from t where c = ?");
For(int I = 0; I < 1000; ++I)
{
ps.setInt(1, I);
ResultSet rs = ps.executeQuery();
Rs.close();
}
ps.close();
Here it will be much more efficient. The statement sent to the database is parameterized using the '?' marker in the sql. This means every iteration is sending the same statement to the database with different parameters for the "c=?" part. This allows the database to reuse the access plans for the statement and makes the program execute more efficiently inside the database.
When the J2EE server gives your application a connection, it isn't giving you the actual connection; you're getting a wrapper. You can verify this by looking at the name of the class for the connection you are given. It won't be a database JDBC connection, it'll be a class created by your application server. Normally, if you called close on a connection then the jdbc driver closes the connection. We want the connection to be returned to the pool when close is called by a J2EE application. We do this by making a proxy jdbc connection class that looks like a real connection. It has a reference to the actual connection. When we invoke any method on the connection then the proxy forwards the call to the real connection. But, when we call methods such as close instead of calling close on the real connection, it simply returns the connection to the connection pool and then marks the proxy connection as invalid so that if it is used again by the application we'll get an exception.
J2EE PreparedStatement Cache
J2EE PreparedStatement Cache is implemented using a cache inside the J2EE server connection pool manager. The J2EE server keeps a list of prepared statements for each database connection in the pool. When an application calls prepareStatement on a connection, the application server checks if that statement was previously prepared. If it was, the PreparedStatement object will be in the cache and this will be returned to the application. If not, the call is passed to the jdbc driver and the query/preparedstatement object is added in that connections cache.
We need a cache per connection because that's the way jdbc drivers work. Any preparedstatements returned are specific to that connection.
https://dba.stackexchange.com/questions/68801/how-does-the-alter-command-work-internally-when-adding-a-column
"In most cases, ALTER TABLE makes a temporary copy of the original table."
https://stackoverflow.com/questions/277625/should-i-get-rid-of-clustered-indexes-on-guid-columns
A big reason for a clustered index is when you often want to retrieve rows for a range of values for a given column. Because the data is physically arranged in that order, the rows can be extracted very efficiently.
Something like a GUID, while excellent for a primary key, could be positively detrimental to performance, as there will be additional cost for inserts and no perceptible benefit on selects.
https://azure.microsoft.com/en-us/blog/uniqueidentifier-and-clustered-indexes/
When you create a GUID in SQL Server using the NewId() command you are guaranteed that it will be unique across the whole universe. Which means if you have two databases (with the same schema) completely disconnected adding rows to the same table, using a primary key of uniqueidentifier will ensure that they primary keys don’t conflict. In comparison, if the two databases had an identity integer column as the primary key for their tables, they would be very likely to insert the same primary key in both tables.
It isn’t a good idea to create a clustered index on a uniqueidentifier column and generate your GUIDs with NEWID(). The reason for this is that NEWID() generates GUIDs in non-sequential order and SQL Server orders a clustered index sequentially. It will work – SQL Server will let you build a clustered index around a uniqueidentifier column, however it will cause the SQL Server to do unnecessary work and cause performance slowdowns. The reason for this is that to insert data into the middle of a clustered index (out of sequential order) causes SQL Server to make room for the data by rearranging the cluster.
So if it isn’t a good idea then why do people do it? Well, in SQL Server, if I assign a column as the primary key in SQL Server Management Studio it automatically generates a clustered index for you regardless of the data type of that column. Therefore, if you want a table with a uniqueidentifier data type as a primary key you need to change that index to a non-clustered index.
Non-clustered indexes don’t reorder the data as rows are inserted to the table, so they don’t have the performance impact of a clustered index on inserts of non-sequential data.
1. Add the column as data type datetime
2. I usually call it Date
3. Set the Default Value to GetDate().
4. Make it non-null.
5. Create your clustered index on it before you insert data into you.
Adding a default value of GetDate() to the column writes the date and time that the row was inserted into the column automatically. This insures that the data for the row is inserted at the end of the table data – there is no rearranging of the cluster. Adding the data to the end ensures the best performance for inserts.
Another good choice for a clustered index is a column that reflects the ordering of the table in the majority of the select statements. For example, if you have a table called categories and you have an integer column called ordervalue and you always call the table with an SELECT … ORDER BY [ordervalue] then making ordervalue the clustered index makes sense. Here is why: even though it hinders performance to insert a non-sequential ordervalue into the cluster you will get a performance benefit when you call your data, since the rows will be read sequential from the cluster, that it depending on your workload characteristics.
Just drop the existing clustered index, then create a new one on the
ID
column.DROP INDEX [Created] ON dbo.Realty;
GO
CREATE UNIQUE CLUSTERED INDEX CX_Realty ON dbo.Realty (ID);
GO
Of course, you'll want to do this during a maintenance window so you don't cause too much blocking. If you have Enterprise Edition you can do the CREATE INDEX operation online by adding
WITH (ONLINE=ON)
to the statement.
If your table is getting up to 1 TB in size and probably has LOTS of rows in it, I would strongly recommend NOT making the clustered index that much fatter!
First of all, dropping and recreating the clustered index will shuffle around ALL your data at least once - that alone will take ages.
Secondly, the big compound clustered index you're trying to create will significantly increase the size of all your non-clustered indices (since they contain the whole clustered index value on each leaf node, for the bookmark lookups).
When architecting your underlying table structure, I'd steer clear of using GUIDs. Examine INTs or BIGINT columns based upon your expected scale (if you know for certain your scale is small you can also examine SMALLINT). GUIDs come with a great deal of overhead and performance hit just to increase the pool of available values open for uniquely-identifying rows in a table. You'd be better off adding a secondary column to deal with scale issues rather than rely on GUIDs in my opinion. The space concerns and institutional fragmentation alone are reasons to keep plenty of space between your tables and GUIDs.
https://www.clever-cloud.com/blog/engineering/2015/05/20/why-auto-increment-is-a-terrible-idea/
Primary keys are often used as the sequence in which the data is actually stored. If the primary key is incremented, the data is simply appended. If the primary key is random, that would mean that existing data must be moved about to get the new row into the proper sequence. A basic (non-primary-key) index is typically much lighter in content and can be moved around faster with less overhead.
Use the smallest integer data type for the
AUTO_INCREMENT
column that is large enough to hold the maximum sequence value you will need. When the column reaches the upper limit of the data type, the next attempt to generate a sequence number fails. Use the UNSIGNED
attribute if possible to allow a greater range. For example, if you useTINYINT
, the maximum permissible sequence number is 127. For TINYINT UNSIGNED
, the maximum is 255.
To start with an
AUTO_INCREMENT
value other than 1, set that value with CREATE TABLE
or ALTER TABLE
, like this:mysql> ALTER TABLE tbl AUTO_INCREMENT = 100;
Every
InnoDB
table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key.- When you define a
PRIMARY KEY
on your table,InnoDB
uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically. - If you do not define a
PRIMARY KEY
for your table, MySQL locates the firstUNIQUE
index where all the key columns areNOT NULL
andInnoDB
uses it as the clustered index. - If the table has no
PRIMARY KEY
or suitableUNIQUE
index,InnoDB
internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID thatInnoDB
assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.
All indexes other than the clustered index are known as secondary indexes. In
InnoDB
, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB
uses this primary key value to search for the row in the clustered index.
If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.
There are three possible settings for the
innodb_autoinc_lock_mode
configuration parameter. The settings are 0, 1, or 2, for “traditional”, “consecutive”, or “interleaved” lock mode, respectively.
There are three possible settings for the
innodb_autoinc_lock_mode
configuration parameter. The settings are 0, 1, or 2, for “traditional”, “consecutive”, or “interleaved” lock mode, respectively.-
The traditional lock mode provides the same behavior that existed before the
innodb_autoinc_lock_mode
configuration parameter was introduced in MySQL 5.1. The traditional lock mode option is provided for backward compatibility, performance testing, and working around issues with “mixed-mode inserts”, due to possible differences in semantics.In this lock mode, all “INSERT-like” statements obtain a special table-levelAUTO-INC
lock for inserts into tables withAUTO_INCREMENT
columns. This lock is normally held to the end of the statement (not to the end of the transaction) to ensure that auto-increment values are assigned in a predictable and repeatable order for a given sequence ofINSERT
statements, and to ensure that auto-increment values assigned by any given statement are consecutive.
- Suppose that there are two transactions running, each inserting rows into a table with an
AUTO_INCREMENT
column. One transaction is using anINSERT ... SELECT
statement that inserts 1000 rows, and another is using a simpleINSERT
statement that inserts one row:Tx1: INSERT INTO t1 (c2) SELECT 1000 rows from another table ... Tx2: INSERT INTO t1 (c2) VALUES ('xxx');
InnoDB
cannot tell in advance how many rows are retrieved from theSELECT
in theINSERT
statement in Tx1, and it assigns the auto-increment values one at a time as the statement proceeds. With a table-level lock, held to the end of the statement, only oneINSERT
statement referring to tablet1
can execute at a time, and the generation of auto-increment numbers by different statements is not interleaved. The auto-increment value generated by the Tx1INSERT ... SELECT
statement is consecutive, and the (single) auto-increment value used by theINSERT
statement in Tx2 is either be smaller or larger than all those used for Tx1, depending on which statement executes first.As long as the SQL statements execute in the same order when replayed from the binary log (when using statement-based replication, or in recovery scenarios), the results are the same as they were when Tx1 and Tx2 first ran. Thus, table-level locks held until the end of a statement makeINSERT
statements using auto-increment safe for use with statement-based replication. However, those table-level locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time.In the preceding example, if there were no table-level lock, the value of the auto-increment column used for theINSERT
in Tx2 depends on precisely when the statement executes. If theINSERT
of Tx2 executes while theINSERT
of Tx1 is running (rather than before it starts or after it completes), the specific auto-increment values assigned by the twoINSERT
statements are nondeterministic, and may vary from run to run.Under the consecutive lock mode,InnoDB
can avoid using table-levelAUTO-INC
locks for “simple insert” statements where the number of rows is known in advance, and still preserve deterministic execution and safety for statement-based replication.If you are not using the binary log to replay SQL statements as part of recovery or replication, the interleaved lock mode can be used to eliminate all use of table-levelAUTO-INC
locks for even greater concurrency and performance, at the cost of permitting gaps in auto-increment numbers assigned by a statement and potentially having the numbers assigned by concurrently executing statements interleaved.
This is the default lock mode. In this mode, “bulk inserts” use the special
AUTO-INC
table-level lock and hold it until the end of the statement. This applies to all INSERT ... SELECT
, REPLACE ... SELECT
, and LOAD DATA
statements. Only one statement holding the AUTO-INC
lock can execute at a time. If the source table of the bulk insert operation is different from the target table, the AUTO-INC
lock on the target table is taken after a shared lock is taken on the first row selected from the source table. If the source and target of the bulk insert operation are the same table, the AUTO-INC
lock is taken after shared locks are taken on all selected rows.
“Simple inserts” (for which the number of rows to be inserted is known in advance) avoid table-level
AUTO-INC
locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes. No table-level AUTO-INC
lock is used unless an AUTO-INC
lock is held by another transaction. If another transaction holds an AUTO-INC
lock, a “simple insert” waits for the AUTO-INC
lock, as if it were a “bulk insert”.
This lock mode ensures that, in the presence of
INSERT
statements where the number of rows is not known in advance (and where auto-increment numbers are assigned as the statement progresses), all auto-increment values assigned by any “INSERT
-like” statement are consecutive, and operations are safe for statement-based replication.
Simply put, this lock mode significantly improves scalability while being safe for use with statement-based replication. Further, as with “traditional” lock mode, auto-increment numbers assigned by any given statement are consecutive. There is no change in semantics compared to “traditional” mode for any statement that uses auto-increment, with one important exception.
innodb_autoinc_lock_mode = 2
(“interleaved” lock mode)
In this lock mode, no “
INSERT
-like” statements use the table-level AUTO-INC
lock, and multiple statements can execute at the same time. This is the fastest and most scalable lock mode, but it is not safe when using statement-based replication or recovery scenarios when SQL statements are replayed from the binary log.
In this lock mode, auto-increment values are guaranteed to be unique and monotonically increasing across all concurrently executing“
INSERT
-like” statements. However, because multiple statements can be generating numbers at the same time (that is, allocation of numbers is interleaved across statements), the values generated for the rows inserted by any given statement may not be consecutive.
If the only statements executing are “simple inserts” where the number of rows to be inserted is known ahead of time, there are no gaps in the numbers generated for a single statement, except for “mixed-mode inserts”. However, when “bulk inserts” are executed, there may be gaps in the auto-increment values assigned by any given statement.
How auto_increment is implemented in NDBhttps://www.flamingspork.com/blog/2006/05/10/how-auto_increment-is-implemented-in-ndb/
https://blog.jcole.us/2013/05/02/how-does-innodb-behave-without-a-primary-key/
In The physical structure of InnoDB index pages I described how “Everything is an index in InnoDB”. This means that InnoDB must always have a “cluster key” for each table, which is normally the PRIMARY KEY. The manual has this to say in Clustered and Secondary Indexes:
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
I had previously assumed this meant that an invisible column would be used along with the same sequence generation code that is used to implement auto_increment (which itself has some scalability issues). However the reality is that they are completely different implementations.
How this is actually implemented is, as the manual says, if a table is declared with no PRIMARY KEY and no non-nullable UNIQUE KEY, InnoDB will automatically add a 6-byte (48-bit) integer column called ROW_ID to the table, and cluster the data based on that column. The column won’t be accessible to any queries nor usable for anything internally such as row-based replication.
What the manual doesn’t mention is that all tables using such ROW_ID columns share the same global sequence counter (the manual says “increases monotonically” and doesn’t clarify), which is part of the data dictionary. The maximum used value for all row IDs (well, technically the next ID to be used) is stored in the system tablespace (e.g. ibdata1) in page 7 (type SYS), within the data dictionary header (field DICT_HDR_ROW_ID).
This global sequence counter is protected by dict_sys->mutex, even for incrementing (as opposed to using atomic increment). The implementation is in include/dict0boot.ic (many blank lines deleted):
38 UNIV_INLINE 39 row_id_t 40 dict_sys_get_new_row_id(void) 41 /*=========================*/ 42 { 43 row_id_t id; 44 45 mutex_enter(&(dict_sys->mutex)); 47 id = dict_sys->row_id; 49 if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) { 51 dict_hdr_flush_row_id(); 52 } 54 dict_sys->row_id++; 56 mutex_exit(&(dict_sys->mutex)); 57 58 return(id); 59 }
(You may also notice that this code lacks any protection for overflowing the 48 bits allotted to row IDs. That is unnecessarily sloppy coding, but even at a continuous 1 million inserts per second [which is probably a bit optimistic ;)] it would take about 9 years to exhaust the ID space. I guess that’s okay.)
Ensuring non-conflicting IDs are generated
The counter is flushed to disk every 256th ID generated (the define DICT_HDR_ROW_ID_WRITE_MARGIN above), by modifying the value in the SYS data dictionary page, which is logged to the transaction log. On startup, InnoDB will increase the DICT_HDR_ROW_ID stored on disk by at least 256, and at most 511. This ensures that any IDs generated will have been less than the new starting value, and thus there will not be any conflicts.
Performance and contention implications
Given how much other code within InnoDB is protected by dict_sys->mutex I think it’s fair to say any tables with an implicit clustered key (ROW_ID) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable. Additionally, every 256th value generated will cause a log write (and flush) for the SYSpage modification, regardless of whether the transaction has committed yet (or ever will).
- You need your table to be joinable on something
- If you want your table to be clustered, you need some kind of a primary key.
- If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?
In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.
Always best to have a primary key. This way it meets first normal form and allows you to continue along the database normalization path.
A table should have a primary key so that you could identify each row uniquely with it.
Technically, you can have tables without a primary key, but you'll be breaking good database design rules.