Thursday, May 22, 2008

[GENERAL] deadlock debug methodology question

Hi All,
 
I have a deadlock situation, two transactions waiting on each other to complete. Based on the details below, would anyone have recommendations for me, please?
 
Regards,
 
A.
 
 
I am using:
rpm -qa|grep postgres
compat-postgresql-libs-4-2PGDG.rhel5_x86_64
postgresql-server-8.2.6-1PGDG.rhel5
postgresql-8.2.6-1PGDG.rhel5
postgresql-devel-8.2.6-1PGDG.rhel5
postgresql-libs-8.2.6-1PGDG.rhel5
 
I set 'deadlock_timeout = 1h' in order to have time to inspect pg_locks.

The locks are:
 
db0=# select * from pg_locks where not granted;
   locktype    | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid  |   mode    | granted
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+-----------+---------
 transactionid |          |          |      |       |          1407 |         |       |          |        1404 | 8303 | ShareLock | f
 transactionid |          |          |      |       |          1404 |         |       |          |        1407 | 8277 | ShareLock | f
(2 rows)
 
Each transaction seems to be waiting on a row-level lock the other has acquired. The tuples are:
 
db0=# select * from pg_locks where locktype='tuple';
 locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid  |     mode      | granted
----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+---------------+---------
 tuple    |    16384 |    16576 |   38 |     6 |               |         |       |          |        1407 | 8277 | ExclusiveLock | t
 tuple    |    16384 |    16576 |   38 |     5 |               |         |       |          |        1404 | 8303 | ShareLock     | t
(2 rows)

The corresponding rows are:
 
db0=# select id from tt where ctid = '(38,6)';
 id 
-----
 600
(1 row)
db0=# select id from tt where ctid = '(38,5)';
 id 
-----
 611
(1 row)

Note that the id column is defined as 'id serial primary key'.
 
The two queries in effect in each transaction are found using:
 
select current_query from pg_stat_activity where procpid = 8303;
select current_query from pg_stat_activity where procpid = 8277;
 
Careful inspection of these (unfortunately complex) queries seems to indicate row-level locks are acquired in consistent order, assuming that any command of the type
 
update tt where ....
 
will always lock rows in a consistent order (can someone confirm that it is necessarily the case).

Therefore, it is not clear to me how this deadlock situation arises.

Does anyone have a recommendation?
 

No comments: