I've been dealing with MySQL Cluster in one way or another since around 2005 or so (back in the MySQL 4.1 days) but it is still full of "funny" surprises. This post is a collection of different locking related issue i ran into during the previous weeks that i had not been aware of up to now (or simply may have forgotten over time)
== Unique hash indexes lock exclusively ==
This is the one that regular users are most likely to run into: in general row logs in MySQL Cluster distinguish between reads and writes so that writers can block other writers, but not readers and readers from other transactions always see the last committed row value (Cluster currently only supports the READ COMMITTED isolation level). As soon as you have a secondary unique index in addition to a primary key things are different though. Internally a unique index that is not the primary key is implemented as a unique hash index in Cluster (and optionally also as an additional ordered T-Tree), and for unique hash entries there only seems to be exclusive locking so that writers will block readers across transactions.
== Long running transactions can block starting nodes ==
Less likely to happen, but very annoying if it happens and you don't know about it:
A starting node needs to lock all rows for a very short period of time near the end of start phase 5. At this point it needs to wait for all current transaction to free the locks they are holding (by either COMMIT or ROLLBACK). This is usually not a problem as cluster transaction are not supposed to last very long anyway, but if there happens to be a long running transaction it can potentially block a node start forever. A stopping node will wait for pending transactions for 5 seconds and will then terminate these transactions the hard way, a starting node on the other hand will gracefully wait forever.
This is bad in two ways:
1) it is obviously bad availability wise as you may end up with a non-redundant configuration for extended periods of time and without any way to automatically identify and terminate the offending transaction(s)
2) currently the blocked starting node does not even tell what it is waiting for, it just silently sits and waits. So unless you know what is going on you are faced with a node that is simply stuck without doing anything (CPU, disc and network load next to zero)
The related bug report is http://bugs.mysql.com/65037 , proposed solutions are meaningful log messages in the short term and killing active transactions after a grace period in the long run.
== Weird error message on lock timeout ==
This last item is related to the non-standard INSERT ... ON DUPLICATE KEY UPDATE construct. What can happen here is that the INSERT part fails with a lock wait timeout as the key is write locked by another transaction. The expected error message in this case would simply be "Lock wait timeout exceed" but what you'll actually get is "Transaction already aborted". What seems to happen here is that the ON DUPLICATE KEY UPDATE part is triggered by any error in the INSERT phase and not only by duplicate key violation. So the INSERT part is tried tried, fails with "Lock wait timeout" (which implicitly rolls back the current transaction), next the UPDATE part is tried, fails with "Transaction already aborted" and overwrites the previous error ...
After the fun with InnoDB and MyISAM and massive partitioning it's time to move on to some other storage engines. So for the next round of fun i chose ARCHIVE:
mysql> create table t1 (id int primary key auto_increment, val int)
engine=archive partition by hash(id) partitions 1023;
Query OK, 0 rows affected (0.26 sec)
mysql> insert into t1 values(NULL,1);
ERROR 1194 (HY000): Table 't1' is marked as crashed
and should be repaired
mysql> drop table t1;
ERROR 2013 (HY000): Lost connection to MySQL server