(scottish) sql bob blog

rss

Some thoughts of your typical data shepard / data groomer / data dance teacher sort of person.


sql bob
sql bob
Twitter handle - @SQL_Bob

Yes l could well be dyslexic deal with it ;-)


Import sessionize.com JSON data into SQL Server - Part 2
Import sessionize.com JSON data into SQL Server - Part 2
In the first blog post the Json string from sessonize.com API has been placed into the table dbo.JsonData.  The first check run is to see if the data is valid JSON, using the ISJSON() function.  By running the query shown in Figure 1 the result shows that the data contained in the table dbo. JsonData is valid JSON.









 



Figure 1 – The function ISJSON() returns 1 showing the data in the field [Data] is valid JSON

Now the JSON data is in the SQL database lets see about doing something useful with it.  In SQL Server 2016 there was a number of new functions added that allow the querying and manipulation of JSON data.  Having done some research, I found this blog post - https://visakhm.blogspot.com/2016/07/whats-new-in-sql-2016-native-json_13.html.  Using code in this blog post I was able to extract the data from the JSON string supplied by the API from the sessionise.com website.
 

Before querying the data I need to explain one concept which is crucial for extracting data from structured JSON.  In the example in Figure 1 below the path of the ‘title’ key value pair is as follows 

Sessions.0.title this would have the key value pair 'title: “DAX Gotchas”' see Figure 2 



 

 

 

 

 

 

 

 

Figure 2 – JSON data showing the sessions node and the first speaker node. 

In the JSON object that was returned from sessionize.com API there are a number of nodes for each session.  Starting with the number 0 through to 29 within each node there are a number of Key : Value pairs eg 'id : “117469”'.  The path, nodes and arrays eg []speakers, and []categoryItems are what TSQL is going to extract values from. 

Enough with all that waffling about JSON objects, lets write some proper TSQL.  In the next example we are going to use a function called OPENJSON().  This is only available in SQL 2016 or upwards.  Using OPENJSON() in this example we are going to provide two arguments, @AllJson which contains the JSON object and must be datatype NVARCHAR().  Next is the path, the way I think about the path, is it specifies the node or array that I want to return from the @AllJson.  The other function that we will use is JSON_VALUE(). This function also accepts two parameters, and an expression which is a variable or field name containing JSON data.  The other one is path, the way I think about the path is it specifics the node or array that I want to return from the JSON data (yes I said that already just wanted to see if you are paying attention ;->).   

That’s a lot of words so let's look at some TSQL in Figure 3 below 












Figure 3 – The JSON data from the sessions node returned as a result set in SSMS

When we look at Figure 3 we will notice that the first row of the data is the same as the data shown in Figure 2.  In essence the FROM OPENJSON(@AllJson, ‘$.sessions’) is returning a dataset which consists of three fields namely Key, Value, and Type.  The field Value contains the JSON object for all 30 session nodes.  Next the JSON_VALUE() function takes the Json and extracts the value for one key pair.  This is done by specifying the Key value for the 'Key:Value pair'.  So in the case of title the path ‘$.title’ is supplied for the path parameter.  Since there is only one 'Key:Value' pair where the Key = title, the value is return from the JSON_VALUE() function, and returned in the field ‘SessionTitle’. 

Looking at Figure 2, there is a Key:Value pair in the []speakers array.  So sessions[0].id.value is “1174469”, the corresponding lookup value is speakers[7].sessions.value is “117469”.  The two values are their locations in the JSON object are shown in Figure 4 below.


Figure 4 – Showing the lookup values for both sessions to speakers and vice versa. 

So we know that we want to get access to the data in the []speakers array as this contains the list of speakerID’s for each session.  How is this done?  Well I found an answer in this blog post - https://visakhm.blogspot.com/2016/07/whats-new-in-sql-2016-native-json_13.html.  Below in Figure 5 is the TSQL and result set. 


 

 

 

 

 

 

 

 

 

Figure 5 – Updated query to return the speakerID from the []speakers array.

All we have done in the query shown in Figure 5 is to add a CROSS APPLY with a simple select statement. Now the speaker ID is returned, note that if there is more than one speakerID, such as in the case of sessionID 117615 (which has two awesome speakers).  In which case the query returns two rows, returning a different speakerID for each, which is just what we wanted. 

Next let's have a look at returning data for the speaker's node.  Below in Figure 6 the TSQL to return some data from the []speakers array. 










Figure 6 – TSQL query to return data from the []speakers array

Looking at the query inside the CROSS APPLY 

SELECT Value  FROM OPENJSON(s.Value, '$.links') 
WHERE Value LIKE '%Twitter%' 

There are a couple things that are worth looking at.  First it is possible to use a WHERE clause on the columns returned by the OPENJSON() function.  The reason for using the WHERE clause is that the links node can contain more than one type of link.  During development some of the speakers had a LinkedIn profile, which they then removed 🙁. 

So by now I am sure you are saying “show me the money”.  After some work I created a query which extracts, the session, speaker and room information.  Then returns it as a single result set as shown in Figure 7 below.










Figure 7 – Result set with Session, Speaker and room details 


If you want to have a try yourself and play with the code then you will find 

Python Code is in this Azure Data Studio Notebook is here

 

If you have not run the python code to import the data to import the data, then I have created a azure data studio notebook, containing the code to create the database and other tasks.  The notebook can be found here.

 

Last, but very much not least why did I spend some much effort to get all the data out of the sessonize API?  The end goal was to supply the data to SQL Server Report Builder (download from here https://www.microsoft.com/en-us/download/details.aspx?id=53613) .  This standalone tool will allow you to build an SSRS report.  Using this tool I created a report which when you run the report outputs pages that look like the one shown in Figure 8 below. 

Figure 8- Data finally published on the SSRS report


Import sessionize.com JSON data into SQL Server - Part 1
Import sessionize.com JSON data into SQL Server - Part 1
....TLDR in the next few blog posts I will be showing how to 

  • Export Json data from a Sessionise.com API url
  • Insert the Json data into SQL Server 2016 database
  • Generate a dataset from the imported Json data
  • Display data on SSRS report
I love Azure Data Studio with the addition of the notebook feature it's just soooo much more wonderful.  For me this feature is something that makes the product even more amazing.  My first introduction to notebooks was Jupiter notebooks, when I did some courses on Python. The coursework required you to use them to submit coursework using Jupiter notebooks. So when the Azure Data Studio team announced that they would support notebooks I was very excited. Even better the notebooks can support SQL, Python, and several other languages. So this was a great opportunity for me to look at using Python and SQL. 

During one of the discussions about organising Data Scotland it was suggested to create some cards with the session details. The card would include the speakers photograph, session title, room name, twitter handle.  The required information is already stored on the sessionize.com website.

So what am I going to show in the following blog posts?  

  • Call the sessionize API grab the data returned as a JSON String 
  • Place the data into SQL server, using some TSQL code with a bit of Python magic
  • Query the JSON data in SQL server using TSQL
  • Return a recordset from the JSON data that can be read by SSRS Report Builder 

So that’s what I am going to show, next let’s look at what is required to do it.

Ingredients

  • SQL Server 2016 database instance 
  • SQL Server 2016 database (compatibility level 130) running on 2016 SQL Server 
  • Azure data studio with Python installed

First download and install Azure Data Studio you can download the program from here. 

Once you have installed Azure Data Studio, open the application.  In Azure Data Studio in the menu find ‘File’ and click it, from the menu select ‘New Notebook’ see Figure 1 below.

 

 

 

 

 

 



Figure 1 – File menu showing where to find the ‘New Notebook’ menu item. 


This will open a new notebook (yippee!!) this might not sound very exciting yet, however it is!  When a new notebook opens the Kernel must be set.  The way that I think about this that it sets the language which will be run in the notebook, and will default to SQL.  What we want run is Python v3.   From the list of Kernels available selected ‘Python 3’, this will set the language that will be run in the notebook. 


 

 

 

 



Figure 2 – selecting the Kernel (programming language) that will be run in the notebook.

Once ‘Python 3’ has been selected and if Python is not set up and installed, then Azure Data Studio will prompt you to set up and configure Python for Notebooks.  A screen will open as we can see in Figure 3.  For this blog post I accepted the default location and clicked on the ‘install’ button.





















Figure 3 – Install and configure python for use in Azure Data Studio

If everything has gone to plan, then you should see something that looks like Figure 4.


 

 

 

 

 

 

 

 

 

 

Figure 4 – installation of Python going as planned

Installing of Python can take sometime so it might be good idea to get a hot beverage or do something else till it is finished installing.  

 

 

 

 

 

 

 

 

 

 


Figure 5 – Installation of python is now completed successfully

In sessionize.com it is possible to create different API’s to output data, with this example the data is outputted as JSON.   It is possible to select different parts of the data to be outputted, in this example ‘All Data’ is selected.  Selecting the data from sessionize.com is beyond the scope of this blog post, it is very easy to do though. 

 In figure 6 the last step is to get the URL to be called in the code, this can be seen in Figure 6 below.

 

 


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6 - API /Embed screen in Session.com for Data Scotland 2019.

In figure 6a (yes I forgot to include this till a later edit) is the columns that are outputted from Sessions.com for the API endpoint used.


Figure 6a - Settings for Available API endpoint used in this blog post.

Ok enough setting up lets write some code.  To get access to other libraries in Python, the command that is used is import <library name>.  In this example there are four libraries which are imported to be used.   If you run the code shown in figure 7 you might get the error message shown.  












Figure 7 – Error message if the package for the library being imported is not installed.

If you do see this error message then all you need to do is install the required package.  In figure 7 at the top left hand side there a button titled ‘Install Packages’.  Click on that button and the terminal window will open (see Figure 8).  The command that installs the library ‘pyodbc’ is ‘.\python.exe - m pip install pyodbc, type the command into the terminal window and press enter.



 

 

 

 

 

 

 

 

Figure 8 – Entering the command to install the ‘pyodbc’ package in the terminal window.

Hopefully the ‘pyodbc’ package will install without any challenges.  If like me you are not so lucky and you get the error message shown in Figure 9.  Then this is quite easy to fix.

 

 

 

 


Figure 9 – Error message stating PIP (Pip Installs Packages) requires to be upgraded to install ‘pyodbc’ package
If you get the error message shown in Figure 9 then enter the following command at the prompt ‘.\python.exe - m pip install –upgrade pip’.   If everything goes well you will see a message like the one shown in Figure 10.

 

 

 

 


Figure 10 – Successfully upgraded PIP to v 18.

Once the new version of PIP has been installed restart Azure Data Studio.  Then open a notebook select Python 3 as the kernel language then click on the ‘Install Packages’ and install ‘pyobdc’ library (see Figure 8).  Once ‘pyobc’ has been installed, it is now time to run the Python script

The Python Script will do the following

1 - call the API call and get the Json string returned is this into a dict Object which is then cast to a string object.

2 - open a connection to a SQL database run SQL script to create table if does not exist

3 - insert Json string into field in the table

Below is the Python script that is used.  Much of the credit must go to the various websites which I have add references to in the script. In figure 10 we can see the script that is used.  All that is require to change, is URL for the sessionize.com API, user credentials in the connection string.  Otherwise this is the script is what I used.



 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 11 - Python script in Azure Data Studio Notebook to import Json in SQL server 2016

The Azure Data Studio Notebook that is shown in Figure 11 can be downloaded from here.

In the next blog post we will look at how work with the Json data in SQL Server.


SQL and Json Part 2
In an earlier post I looked at FOR JSON PATH which outputs a JSON string from SQL Server.  Now I am going to look at FOR JSON AUTO and some other options to see how these command differ.  

As before let's start off with a select statement.   The query shown in Figure 1 is from AdventureWorks2012 sample database. 

Figure 1 – Simple select statement for the SalesOrderNumber SO463659

Next let's see how the output looks when added FOR JSON PATH to the select statement see Figure 2. 

Figure 2 – SELECT statement from Figure 1 showing the output when FOR JSON PATH is used 

Let's make one change to the SELECT statement in Figure 2, instead of using FOR JSON PATH use FOR JSON AUTO. 

Figure 3 – Output from Select statement with FOR JSON AUTO 

Looking at Figure 3 the JSON string outputted is different from one shown in Figure 2.  With the select statement in Figure 3 all the records are from a single SalesOrderNumber and Order date.  Hence all the records from Sales.SalesOrderDetail alias “D” are in a child node underneath the parent record from  Sales.SalesOrderHeader. 
 
Let's try a different SELECT query this time select 3 records with different SalesOrderNumbers see Figure 4. 

Figure 4 – results set for the three SalesOrderNumbers 

Now lets try the same query with ‘FOR JSON AUTO’ the query and output is shown in Figure 5. 

Figure 5 – results set for the three SalesOrderNumbers outputted as a JSON string 

Looking at Figure 5 each individual SalesOrderNumber are placed in their own node.  The corresponding SalesOrderDetail values are placed in a child node underneath the parent SalesOrderNumber. 

Using FOR JSON AUTO the format of the JSON string is determined by the SELECT statement.  Whereas FOR JSON PATH which we demonstrated in the previous blog post the JSON string output is controlled by the fields and table presented in the SELECT statement.