SQL FAQ: How can I select every row from a database table where a column value is not unique?
I’m working on an problem today where a Drupal article can have many revisions, and the way Drupal works is that a table named
node_revisions has a
nid field for “node id,” and a
vid field for “revision id.” If you have five revisions of an article (i.e., a blog post), there will be five records in this table, and each record will have the same
nid value and a unique
vid. If an article has no revisions, this table will have one entry with a unique
nid and unique
To find all of the articles that have more than one revision, I just need a SQL query to find all of the table entries where the
nid is not distinct. To find all of those articles I use this query:
SELECT * FROM node_revisions WHERE nid IN (SELECT nid FROM node_revisions GROUP BY nid HAVING COUNT(1) > 1) ORDER BY nid ASC
This query comes back as desired, showing all the rows where the
nid is not unique.
Be warned that this SQL query can take a long time to run. My database table has less than 10,000 rows, and this query takes about 25 seconds to return, and the computer I’m running it on has an SSD drive and it’s not slow. I don’t use SQL subqueries too often, so at the moment I don’t know how to optimize this query.
If you just want to test the query on your system to make sure it works, add a
LIMIT clause, like this:
SELECT nid, vid FROM node_revisions WHERE nid IN (SELECT nid FROM node_revisions GROUP BY nid HAVING COUNT(1) > 1) ORDER BY nid ASC LIMIT 20
On my computer this query returns almost immediately.