2006/11/30

INNER JOIN with a comma separated values CSV field

Scenario: Sometimes we use varchar fields to store little pieces of data that, extriclty talking, should be normalized and stored on another table that references the former main table. Think for instance in a list of items referenced by a bigger entity, using a 1..n relationship. In certain particular scenarios that list might be as short as 4 or 5 items, and sure less than 10. In that case, and if you plan that you will never need to do a join with that data and another table you are tempted to store de data as a list of items (varchar), and simplify things not adding another table to your already huge schema, despite the normalization rules. And time goes by and needs change and then you need to do that JOIN that you planned you will never need to do, and you have just a list of integers such:

K1  List
-----------------
1  '1, 5, 22, 31'
2  '4, 9, 48'
3  '5, 13, 22'

And you need to generate something like:

K1  Item
-----------------
1  1
1  5
1  22
1  31
2  4
2  9
2  48
3  5
3  13
3  22

Solution: I must confess that I was not able to find a good solution by myself and asked for help on microsoft.public.es.sqlserver. Miguel Egea showed me a solution based on Itzik Ben Gan's work that was worth making this blog entry. Pure Transact-SQL, and efficient:

IF NOT OBJECT_ID('tempdb..#t') IS NULL
DROP TABLE #t
GO

SELECT 1 K1, '1, 5, 22, 31' List INTO #t
UNION ALL
SELECT 2, '4, 9, 48'
UNION ALL
SELECT 3, '5, 13, 22'
GO

/* CommonTableExpression to retrieve the first 1000 numbers */
WITH cte AS
( SELECT 1 id
  UNION ALL
  SELECT id+1 from cte WHERE id<1000
)
  --SELECT *,
  --  SUBSTRING(List, id+1, PATINDEX('%,%', SUBSTRING(List, id+1, LEN(List)))-1)
  SELECT K1,
    CONVERT(int, SUBSTRING(List, id+1, PATINDEX('%,%', SUBSTRING(List, id+1, LEN(List)))-1)) Item
  FROM cte INNER JOIN (SELECT K1, ',' + List + ',' List FROM #t) t
    ON SUBSTRING(t.List, id, 1) = ',' AND LEN(List)>id
  ORDER BY K1, Item -- OPTIONAL
  OPTION(MAXRECURSION 0)

No comments: