Something that’s been an enormous pain for years is that MySQL does not have aggregate functions for first() and last(), like every other SQL-based language. Why they don’t have it, I have no idea. It’s been requested and ignored for over 15 years, and doing a search for this online reveals many people frustrated with the lack of this feature. You’ll find hundreds of “solutions”, most of which either don’t work at all or are so convoluted that they’d be nearly impossible to implement.
To make this as abstract and possible, let’s say we have a table that looks like so:
MariaDB [test_db]> select * from test_table; +----+----------+---------+ | id | ordering | groupid | +----+----------+---------+ | 1 | 4 | 1 | | 2 | 1 | 1 | | 3 | 2 | 1 | | 4 | 4 | 2 | | 5 | 6 | 2 | | 6 | 1 | 2 | | 7 | 3 | 2 | | 8 | 8 | 3 | | 9 | 1 | 3 | | 10 | 5 | 3 | +----+----------+---------+
And we need to find the id of the greatest ordering for each groupid.
If MySQL were sane, we’d be able to do this with a relatively simple query similar to this:
-- Reminder: This does not work in MySQL! The correct solution is later in this post. SELECT last(id) FROM test_table ORDER BY ordering GROUP BY groupid
But, no, that’s not possible.
It’s easy to figure out what the solution should be with this query:
MariaDB [test_db]> SELECT * from test_table order by groupid,ordering; +----+----------+---------+ | id | ordering | groupid | +----+----------+---------+ | 2 | 1 | 1 | | 3 | 2 | 1 | | 1 | 4 | 1 | | 6 | 1 | 2 | | 7 | 3 | 2 | | 4 | 4 | 2 | | 5 | 6 | 2 | | 9 | 1 | 3 | | 10 | 5 | 3 | | 8 | 8 | 3 | +----+----------+---------+ 10 rows in set (0.00 sec)
We can look at these results and say, “Hey, it’s obvious that the ids I need are 1, 5, and 8.” But that’s not going to do us much good if we’re doing a more complex query.
I’m pretty sure that at one point I had a moderately elegant solution to this using subqueries and LIMIT 1
, but I haven’t been able to figure out what that is.
But, considering how aggravating this has been, I thought I’d post my most recent solution so I’d have a reference and will, hopefully, never have to figure it out all over again.
My solution is a join with a subquery (which I’m not crazy about, but it’s better than most of the other really convoluted solutions that I’ve found online).
SELECT tbl1.id, tbl1.groupid FROM test_table as tbl1 INNER JOIN (SELECT groupid,max(ordering) as maxorder FROM test_table GROUP BY groupid) as tbl2 ON tbl1.groupid=tbl2.groupid AND tbl1.ordering=tbl2.maxorder ;
Which outputs:
+----+---------+ | id | groupid | +----+---------+ | 1 | 1 | | 5 | 2 | | 8 | 3 | +----+---------+
One thing that really stinks about this is that it’s probably not going to be equally useful in all situations, but this will hopefully be adaptable to different needs. Say you wanted to modify all records except the most recent ones, you could use the above as a subquery in a WHERE
clause.
I’m not the only person that uses this solution to this problem, but I sure wish there were a way to do this without a subqueries.
Update: It’s never quite as easy as you’d expect.
It turns out that to actually use the above query, you need to do yet another subquery. I actually don’t really understand why it’s necessary, but I was able (with some searching) to figure out how to do it.
Let’s say we want to actually do something with this query. Let’s add another column with tinyint(1) called “myvalue” and set them all to false.
+----+----------+---------+---------+ | id | ordering | groupid | myvalue | +----+----------+---------+---------+ | 1 | 4 | 1 | 0 | | 2 | 1 | 1 | 0 | | 3 | 2 | 1 | 0 | | 4 | 4 | 2 | 0 | | 5 | 6 | 2 | 0 | | 6 | 1 | 2 | 0 | | 7 | 3 | 2 | 0 | | 8 | 8 | 3 | 0 | | 9 | 1 | 3 | 0 | | 10 | 5 | 3 | 0 | +----+----------+---------+---------+
Suppose we want to set all but the most recent value to true for each group id.
So, this does not work:
-- Reminder: This does not work in MySQL! The correct solution is later in this post. UPDATE test_table SET myvalue = 1 WHERE id NOT IN ( SELECT id FROM test_table as tbl1 INNER JOIN (SELECT groupid,max(ordering) as maxorder FROM test_table GROUP BY groupid) as tbl2 ON tbl1.groupid=tbl2.groupid AND tbl1.ordering=tbl2.maxorder ) ;
This results in the following error in MariaDB:
Table 'test_table' is specified twice, both as a target for 'UPDATE' and as a separate source for data
I think I got a different error in the AWS database based off MySQL, but the result is the same– This does not work.
Instead, we have to use yet another subquery. So, this does work:
UPDATE test_table SET myvalue = 1 WHERE id NOT IN ( -- Starting the added subquery here! SELECT id FROM ( SELECT id FROM test_table AS tbl1 INNER JOIN (SELECT groupid,max(ordering) AS maxorder FROM test_table GROUP BY groupid) AS tbl2 ON tbl1.groupid=tbl2.groupid AND tbl1.ordering=tbl2.maxorder ) as selecttbl ) ;
When we run SELECT * FROM test_table ORDER BY groupid,ordering;
, we can see that it was successful.
+----+----------+---------+---------+ | id | ordering | groupid | myvalue | +----+----------+---------+---------+ | 2 | 1 | 1 | 1 | | 3 | 2 | 1 | 1 | | 1 | 4 | 1 | 0 | | 6 | 1 | 2 | 1 | | 7 | 3 | 2 | 1 | | 4 | 4 | 2 | 1 | | 5 | 6 | 2 | 0 | | 9 | 1 | 3 | 1 | | 10 | 5 | 3 | 1 | | 8 | 8 | 3 | 0 | +----+----------+---------+---------+
Yikes.