Getting a list in comma delimited format with T-SQL
Sometime we want to get data in a list. A good example is getting a list of orders that’s coming up for delivery and has other orders that are past due. We would like to know what those upcoming orders are so we can ensure that the past due accounts are settled prior to actually delivering the orders. In a typical query, we can get list of orders like the following:
But we might want to get the list in a comma separated string like «10-100, 10-101, 10-102». There is no native SQL function for that. MySQL conveniently offers GROUP_CONCAT() which works as an aggregate function. When using the Access backend, the answer is to use VBA function to string up the records as a string. An example of such VBA function can be done here. But when using SQL Server, it is possible to approximate the functionality of MySQL’s GROUP_CONCAT() function.
Getting it all in one string
Since 2005, SQL Server has offered functionalities to work with XML data as easily as with SQL. It is possible to build a XML document with a SQL query such as this:
SELECT FirstName, LastName
FOR XML PATH('Person');
Note the last clause, FOR XML PATH. Running this query would output a single XML document formatted to this:
That’s good and all but we’re more interested in getting a single string concatenated from a set of rows. Let’s add a comma and leave the root element blank.
SELECT ', ' + FirstName + ' ' + LastName
FOR XML PATH('');
The output then becomes:
, John Smith, Sandra Johnson, Tonya Young
Much closer! Now you can see how you can use XML PATH to easily convert a set of rows into a single concatenated string. Of course, we don’t want the leading ‘, ‘ so we need to clean that up. The trick is that we need to do the string formatting after the XML PATH query, not within. Otherwise, any formatting may end up being part of the XML output which would look funny. We can do that by putting the above expression in as a subquery.
SELECT STUFF((SELECT ', ' + FirstName + ' ' + LastName
FOR XML PATH('')), 2, 1, '');
When you consider that T-SQL does not have a Mid() function where you don’t have to specify a length to extract the rest of string and its REPLACE() function does not allow you to restrict time it may replace as VBA’s function let you, it makes sense to use STUFF() which basically delete a portion of string, then insert some other string. To demonstrate, consider this string:
Suppose I want to change ‘world’ to say ‘planet’, I would need to have two things; the position where the word ‘world’ starts, which is 8 and the length of the word, which is 5.
SELECT STUFF('Hello, world!', 5, 8, 'planet');
So going back to the query previously shown:
SELECT STUFF((<sub query>, 2, 1, '');
It should be apparent that we’re taking out the first 2 characters, which is the leading comma and space that the XML PATH added to the first record with a zero-length, effectively chopping off the leading comma and giving you a proper list.
Now, let’s put it to work. As mentioned at the start, we want to list all past due orders per company and only list companies that have upcoming orders so that they can ensure the past due balance will be settled before the new order go out.
DATEDIFF(Day, GETDATE(), DeliveryDate) AS DaysToShip,
SELECT ', ' + x.ordernum
FROM dbo.vw_OpenInvoices AS x
WHERE x.CompanyID = o.CompanyID
AND x.PastDue > 30
ORDER BY OrderNum
FOR XML PATH('')
),1,2,'') AS PastDueOrders,
FROM vw_OpenInvoices AS o
WHERE DeliveryDate < DATEADD(Day, 21, GETDATE())
AND DeliveryDate > GetDate()
AND EXISTS (
FROM dbo.vw_OpenInvoices AS d
WHERE d.CompanyID = o.CompanyID
AND d.PastDue > 30
Note that we use a correlated subquery. This is necessary because the FOR XML PATH effects the entire statement; you cannot apply FOR XML PATH to only one table or column. So we can’t just join all tables together in a single query. Furthermore, Since FOR XML PATH will output only one record per a query, and we want to list strings for each different company, we put in a filter referencing the outer query, «x.CompanyID = o.CompanyID». Using the EXISTS clause, we can restrict to only companies that are in arrears.
Bonus: VBA has STUFF(), almost!
For lark, if you want to use something similar in VBA, you might be surprised to find that it’s built-in to VBA. It’s called Mid statement. Now you might be thinking, «But I use Mid function all time and it doesn’t let me replace anything. It only returns a portion of string!» That’s true but that’s the Mid function, not Mid statement. Let’s see an example in VBA:
Dim strTest As String
strTest = "Hello, World!"
Mid(strTest, 8, 5) = "Planet"
The output is «Hello, Plane!» Note how Mid() has an assignment, which makes it a statement and not a function since we’re essentially assigning a string variable to the result of Mid() and therefore altering the original string expression. Unlike T-SQL’s STUFF(), though, you cannot put in bigger word and see it expand accordingly. That’s why we only got «Plane», not «Planet», since the latter is one letter longer than the original «World» that it replaced.
As you can see,T-SQL offer plenty of functions for you to make best use of query and do so without any coding. What other novel uses have you found?