Selecting and Displaying ‘Nodes’ in SQL from Delimited Strings

SQL July 8, 2019

This is a bit of a niche use case, but honestly, those are usually the ones that are the most powerful to know. Keep in mind this is for use in SFMC SQL Query Activities and is not optimized for use outside of this platform.

Ok. So you have a specific naming convention for your Emails. This includes specific ‘nodes’ of information inside of it, separated by an ‘_’. (For example, MyBU_SendType_EmailName_Audience_ProductionState – ‘1234_COMM_Newsletter -August-Customers_All_PROD’).

Now for tracking/analytics purposes, you wish to use this naming convention to create segments inside of your dashboards. To this end, you need to capture the 3rd ‘node’ in this delimited string. How the heck do you do that when the values and char index are constantly changing from one email to the next?

CAST('<x>' + REPLACE(EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[N]','nvarchar(max)')

That little beauty above will take the EmailName field that you are calling in and turn it into a rowset via AS XML – by adding text nodes around each section via the delimiter in the string. It will then (via value()) pull the Nth row from that rowset and return that value.

For example:

CAST('<x>' + REPLACE(b.EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)')

would output: Newsletter-August-Customers

Which is great and exactly what we need. Now the plot thickens though…

Inside of that Node, we need to grab the last node and remainder in 2 separate fields based on a second delimited (-). The kicker, the node number will not always be the same. Could be 8 ‘nodes’, could be 2 ‘nodes’ or somewhere in-between. This means that the above solution will not be viable to be used here.

This is where things get very verbose. First to get the string outside the last ‘node’, use the following:

SUBSTRING(
 CAST('<x>' + REPLACE(EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)'),
 1,
 LEN(CAST('<x>' + REPLACE(EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)')) - CHARINDEX('-', REVERSE(CAST('<x>' + REPLACE(EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)')))
)

To break this out. First is obviously a SUBSTRING to cut down the original output. Inside this substring, we have 3 parts: SUBSTRING(1 – String to cut, 2 – starting point, 3 – char length of substring)

1 – String to cut:

CAST('<x>' + REPLACE(EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)')

This is pretty simple as we can use the above solution with AS XML and value() to gather the string.

2 – Starting point :

Even easier, this can simply be ‘1’ as it will start from the first character.

3 – Char length of Substring:

LEN( CAST('<x>' + REPLACE(EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)')) /* Length of the String */

 -   /* Minus */

CHARINDEX('-', REVERSE(CAST('<x>' + REPLACE(EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)'))) /* Length of the last node */

Now this is the tricky part. To gather the length that the Substring should continue on for requires a bit of math. Now, this in itself isn’t all that difficult, but add in that its a rowset of a rowset and you have yourself a bit of complexity inside that simplicity.

Basically the above is taking the full length of the String (node 3) and subtracting the length of the last node of the String (node 3 – 3).

To gather this, we first use the LEN() function to get the strings length.

Next we use ‘-‘ to denote the subtraction

Then we use CharIndex() and REVERSE() to get the index of the beginning of the last node, which will actually be the full length of the node as (via REVERSE) the string is now completely backwards.

This should now Output: Newsletter-August

Next we need to get only the value in the last node inside of a node.

This is very similar to how we gathered the remainder nodes above. You mostly just reverse the starting and length positions:

SUBSTRING(
 CAST('<x>' + REPLACE(b.DataExtension,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)'),
 LEN(CAST('<x>' + REPLACE(b.DataExtension,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)')) - CHARINDEX('-', REVERSE(CAST('<x>' + REPLACE(b.DataExtension,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)'))) + 2,
 LEN(CAST('<x>' + REPLACE(b.DataExtension,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)'))
)

As you can see above, it is utilizing another Substring() to accomplish the task.

1 – String to cut:

CAST('<x>' + REPLACE(EmailName,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)')

This is pretty simple as we can use the above solution with AS XML and value() to gather the string.

2 – Starting point :

LEN(CAST('<x>' + REPLACE(b.DataExtension,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)'))
 - 
CHARINDEX('-', REVERSE(CAST('<x>' + REPLACE(b.DataExtension,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)')))
 + 
2

This is where most of the complexity comes from as we need to find the place that the final delimiter takes place. It is almost a carbon copy of what was used inside the ‘Char Length’ section above. The major change is the ‘+ 2’ added at the end. This is to remove the delimiter from the string that is output

3 – Char length of Substring:

LEN(CAST('<x>' + REPLACE(b.DataExtension,'_','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)'))

Although this may look confusing, it is just taking the length of the string (node 3) and setting it to the character length.

This should now Output: Customers

This is not an everyday use case, but it does allow for some pretty powerful segmentation and reporting when used.

No Comments Gortonington