try-catch-FAIL

Failure is inevitable

NAVIGATION - SEARCH

Working with Large JSON Blobs Sanely Using SQL Server

Have you ever wished that you could query a large JSON document? I wished that very thing recently while working with a 3rd party API. One particular API call was returning mountains of data, and I needed to get a list of distinct values of a particular property buried within the objects of this JSON response. I started trying to write a regex expression to do it, but quickly decided that one problem was enough.

My salvation came from an unexpected source: Microsoft SQL Server.

I was working on a data synchronization project, and one of my tasks was to produce a list of possible values for fields from one of the systems. I could get the data in question in JSON format via their API, but the result was something like this:

{
    "Id": 1111,
    "Type": "Human",
    "Fields": [
        {
            "Id": 1111,
            "Name": "TitleId",
            "Type": "Select",
            "Options": [
                {
                    "Id": 1,
                    "Name": "Mr. & Mrs."
                },
                {
                    "Id": 2,
                    "Name": "Other"
                },
                {
                    "Id": 3,
                    "Name": "Rev."
                },
                {
                    "Id": 4,
                    "Name": "Prof."
                },
                {
                    "Id": 5,
                    "Name": "Dr."
                },
                {
                    "Id": 6,
                    "Name": "Ms."
                },
                {
                    "Id": 7,
                    "Name": "Mrs."
                },
                {
                    "Id": 8,
                    "Name": "Mr."
                }
            ]
        },
        {
            "Id": 1112,
            "Name": "FirstName",
            "Type": "Text Input",
            "Required": true
        },
        {
            "Id": 1113,
            "Name": "MiddleName",
            "Type": "Text Input",
            "Required": false
        },
        {
            "Id": 1114,
            "Name": "LastName",
            "Type": "Text Input",
            "Required": true
        },
        {
            "Id": 1115,
            "Name": "Phone1",
            "Type": "Text Input",
            "Required": false
        },
        {
            "Id": 1116,
            "Name": "Phone2",
            "Type": "Text Input",
            "Required": false
        },
        {
            "Id": 1117,
            "Name": "IsMailingList",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Yes"
                },
                {
                    "Id": 2,
                    "Name": "No"
                }
            ]
        },
        {
            "Id": 1118,
            "Name": "BirthDate",
            "Type": "Date Input",
            "Required": false
        },
        {
            "Id": 1119,
            "Name": "Gender",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Male"
                },
                {
                    "Id": 2,
                    "Name": "Female"
                }
            ]
        },
        {
            "Id": 1120,
            "Name": "Veteran",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Yes"
                },
                {
                    "Id": 2,
                    "Name": "No"
                }
            ]
        },
        {
            "Id": 1121,
            "Name": "Needs",
            "Type": "Multiple Checkboxes",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Food"
                },
                {
                    "Id": 2,
                    "Name": "Water"
                },
                {
                    "Id": 3,
                    "Name": "Air"
                },
                {
                    "Id": 4,
                    "Name": "Energy"
                }
            ]
        },
        {
            "Id": 1122,
            "Name": "Bed Preference",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Single"
                },
                {
                    "Id": 2,
                    "Name": "Double"
                },
                {
                    "Id": 3,
                    "Name": "Queen"
                },
                {
                    "Id": 4,
                    "Name": "King"
                },
                {
                    "Id": 5,
                    "Name": "Double Queen"
                },
                {
                    "Id": 6,
                    "Name": "Double King"
                }
            ]
        },
        {
            "Id": 1123,
            "Name": "CarPreferences",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Sedan"
                },
                {
                    "Id": 2,
                    "Name": "Coupe"
                },
                {
                    "Id": 3,
                    "Name": "Truck"
                },
                {
                    "Id": 4,
                    "Name": "Van"
                },
                {
                    "Id": 5,
                    "Name": "Hoverboard"
                },
                {
                    "Id": 6,
                    "Name": "Other"
                }
            ]
        }
    ]
}

That really doesn't do the actual data justice (the actual data set was about 100 times that size!), but it gives you an idea of the basic model:

public class Field {
    id: int,
    name: string,
    type: string,
    options: Option[]
}

public class Option {
    id: int,
    name: string
}

My task was to produce a spreadsheet that listed each field and it's valid options, if any.

It was late, and I was feeling especially lazy, so the brute-force method was not appealing at all.

Fortunately for lazy-me, I remembered that SQL Server can work with JSON data, and I just so happened to have SQL Server Management Studio open on one of my monitors already!

The first thing I did was get the fields portion of my JSON data into a VARCHAR(MAX) variable in a new query window...

DECLARE @Data VARCHAR(MAX) =
'
[
    {
        "Id": 1111,
        "Name": "TitleId",
        "Type": "Select",
        "Options": [
            {
                "Id": 1,
                "Name": "Mr. & Mrs."
            },
            {
                "Id": 2,
                "Name": "Other"
            },
            {
                "Id": 3,
                "Name": "Rev."
            },
            {
                "Id": 4,
                "Name": "Prof."
            },
            {
                "Id": 5,
                "Name": "Dr."
            },
            {
                "Id": 6,
                "Name": "Ms."
            },
            {
                "Id": 7,
                "Name": "Mrs."
            },
            {
                "Id": 8,
                "Name": "Mr."
            }
        ]
    },
    {
        "Id": 1112,
        "Name": "FirstName",
        "Type": "Text Input",
        "Required": true
    },
    {
        "Id": 1113,
        "Name": "MiddleName",
        "Type": "Text Input",
        "Required": false
    },
    {
        "Id": 1114,
        "Name": "LastName",
        "Type": "Text Input",
        "Required": true
    },
    {
        "Id": 1115,
        "Name": "Phone1",
        "Type": "Text Input",
        "Required": false
    },
    {
        "Id": 1116,
        "Name": "Phone2",
        "Type": "Text Input",
        "Required": false
    },
    {
        "Id": 1117,
        "Name": "IsMailingList",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Yes"
            },
            {
                "Id": 2,
                "Name": "No"
            }
        ]
    },
    {
        "Id": 1118,
        "Name": "BirthDate",
        "Type": "Date Input",
        "Required": false
    },
    {
        "Id": 1119,
        "Name": "Gender",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Male"
            },
            {
                "Id": 2,
                "Name": "Female"
            }
        ]
    },
    {
        "Id": 1120,
        "Name": "Veteran",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Yes"
            },
            {
                "Id": 2,
                "Name": "No"
            }
        ]
    },
    {
        "Id": 1121,
        "Name": "Needs",
        "Type": "Multiple Checkboxes",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Food"
            },
            {
                "Id": 2,
                "Name": "Water"
            },
            {
                "Id": 3,
                "Name": "Air"
            },
            {
                "Id": 4,
                "Name": "Energy"
            }
        ]
    },
    {
        "Id": 1122,
        "Name": "Bed Preference",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Single"
            },
            {
                "Id": 2,
                "Name": "Double"
            },
            {
                "Id": 3,
                "Name": "Queen"
            },
            {
                "Id": 4,
                "Name": "King"
            },
            {
                "Id": 5,
                "Name": "Double Queen"
            },
            {
                "Id": 6,
                "Name": "Double King"
            }
        ]
    },
    {
        "Id": 1123,
        "Name": "CarPreferences",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Sedan"
            },
            {
                "Id": 2,
                "Name": "Coupe"
            },
            {
                "Id": 3,
                "Name": "Truck"
            },
            {
                "Id": 4,
                "Name": "Van"
            },
            {
                "Id": 5,
                "Name": "Hoverboard"
            },
            {
                "Id": 6,
                "Name": "Other"
            }
        ]
    }
]
'

I then used the OPENJSON function to convert my data into a queryable JSON rowset:

SELECT	
	*
FROM
	OPENJSON(@Data)

This gave me one row per field, which was a start, but it wasn't very useful yet. Here's what that looked like:


What I needed to do was parse the JSON up so that I could effectively query it.

OPENJSON allows you to describe the shape of the JSON data, and using that, I was able to convert it to something that was closer to what I wanted.

With my updated OPENJSON query...

SELECT	
	*
FROM
	OPENJSON(@Data)
		WITH(
			[id] INT 'strict $.Id',
			[Name] VARCHAR(MAX) '$.Name',
			[Options] NVARCHAR(MAX) '$.Options' AS JSON
		)

I was able to actually get some structured data out of it and into a rowset!


I now had the names of each field, but the options themselves were still trapped in a JSON blob. I needed to go further.

First, I parsed the fields themselves into a temp table...

SELECT	
	id ,
    Name ,
    Options
INTO #FieldsWithOptionsBlob
FROM
	OPENJSON(@Data)
		WITH(
			[id] INT 'strict $.Id',
			[Name] VARCHAR(MAX) '$.Name',
			[Options] NVARCHAR(MAX) '$.Options' AS JSON
		)

I then parsed apart the options and loaded them up into a temp table. This proved a bit tricky. I had to use the CROSS APPLY operator along with the OPENJSON function:

SELECT
	f.Id,
	opt.Name
INTO #OptionValues
FROM
	#FieldsWithOptionsBlob f
	CROSS APPLY OPENJSON(f.Options, '$') 
	WITH (
		[id] INT 'strict $.Id',
		[Name] VARCHAR(MAX) '$.Name'
	)
	AS opt

Sidenote: CROSS APPLY is neat. Read more about it at MSSQLTips.

This query gave me the following results:


I now had a row for each option and corresponding field ID. All that was left was to put it all back together!

For that, I followed this approach from Codepal and used a CTE, combined with the good ole FOR XML trick for making CSV columns:

;WITH cte AS
    ( 
    SELECT
		f.Id,
        FieldName=f.Name,
		OptionName=o.Name
    FROM
        #FieldsWithOptionsBlob f
		LEFT JOIN #OptionValues o
			ON o.Id = f.Id
    )
SELECT
	f.Id,
	f.Name,
	STUFF(o.Options, 1, 1, '')
FROM
	#FieldsWithOptionsBlob f
	CROSS APPLY
	(
		SELECT
			', ' + cte.OptionName
		FROM
			cte
		WHERE
			cte.Id = f.Id
		FOR XML PATH('')
	) o(Options)

And that gave me what I wanted:


I'll admit that it took me a bit to figure out the CROSS APPLY part, but stumbling through that sure beat manually trying to put this spreadsheet together! 😄

About Matt Honeycutt...

Matt Honeycutt is a software architect specializing in ASP.NET web applications, particularly ASP.NET MVC. He has over a decade of experience in building (and testing!) web applications. He’s an avid practitioner of Test-Driven Development, creating both the SpecsFor and SpecsFor.Mvc frameworks.

He's also an author for Pluralsight, where he publishes courses on everything from web applications to testing!

blog comments powered by Disqus