Some thoughts of your typical data shepherd / data plumber / data dance teacher sort of person.
Yes l could well be dyslexic deal with it ;-)
My latest adventure was speaking at DataRelay in Leeds and Nottingham. As is the way with my submissions the session selected was not the one, that I expected to be chosen. It was titled “How to be awesome at finding your next job”.
Speaking is very much outside of my comfort zone and something that I really enjoy. Why do it then? Well when I see someone in a session, that their body language says they understand what you are trying to say. That feeling is just an absolute joy. As I have learned from others in the community, so I want to pass my knowledge to others. That is one of my primary reasons for speaking. Somehow, somewhere, someone, will be helped to do something. From the few brief conversations that I had during DataRelay I might have achieved that goal. If you came to my session, thank you for coming along, hopefully you got something from it. As I said if you want to chat about the session or ask questions then please do. When I do a session it’s because I am passionate about the topic and have so much to say. I can also listen if you need someone just to talk things over.
What did I get from DataRelay? More than I expected, not in ways I thought that I might. It was possible for me to get to a few sessions. There was time to see how DataRelay do things as an event (very slick BTW). Get to know more about people in the community, make some new friends, meet some old friends. At least one session I attended used Slack / Microsoft Teams in a way I never dreamed of. Yet the possibilities are amazing and mind blowing. Speaking to the sponsors was one of the goals I had. Thank you to the two gentlemen from Microsoft who took time to answer my questions. The time, effort and genuine enthusiasm is much appreciated. It’s nice to speak to a real person who is knowledgeable and passionate about their area of expertise. Over the two days I took quite a few notes which I am excited to share with my team back in Glasgow.
It will take me a little time to recover, there is a long train journey back from DataRelay which will help. Like some of us in the community, it takes energy to be around lots of people. It's worth every single moment of the effort. Especially when I see a room full of people who I can try and show just the one little nugget of information that makes the difference to something in their lives.
This post is being written as the train travels back to Sunny Glasgow. Tomorrow morning, I will be back in the office with my new company Eyecademy. They have supported and encouraged me to go and speak at DataRelay. Which I really appreciate more than anyone in Eyecademy might realise. Thank you for the encouragement and support.
What’s next? Next year I will be speaking at Scottish Summit on the 29th of February 2020 about Soft Skills for Success. Speaking in your home city is both exciting and challenging. So, I will need to put some more work into the session to tailored it to the event and audience. There are a few other submissions to other conferences which I am waiting to hear about. Some other conferences have caught my eye, so I need to consider where and when I would like to speak. Hopefully once I know more, I’ll update the blog.
Figure 1 – The function ISJSON() returns 1 showing the data in the field [Data] is valid JSON
Now the JSON data is in the SQL database lets see about doing something useful with it. In SQL Server 2016 there was a number of new functions added that allow the querying and manipulation of JSON data. Having done some research, I found this blog post - https://visakhm.blogspot.com/2016/07/whats-new-in-sql-2016-native-json_13.html. Using code in this blog post I was able to extract the data from the JSON string supplied by the API from the sessionise.com website.
Before querying the data I need to explain one concept which is crucial for extracting data from structured JSON. In the example in Figure 1 below the path of the ‘title’ key value pair is as follows
Sessions.0.title this would have the key value pair 'title: “DAX Gotchas”' see Figure 2
Figure 2 – JSON data showing the sessions node and the first speaker node.
In the JSON object that was returned from sessionize.com API there are a number of nodes for each session. Starting with the number 0 through to 29 within each node there are a number of Key : Value pairs eg 'id : “117469”'. The path, nodes and arrays eg speakers, and categoryItems are what TSQL is going to extract values from.
Enough with all that waffling about JSON objects, lets write some proper TSQL. In the next example we are going to use a function called OPENJSON(). This is only available in SQL 2016 or upwards. Using OPENJSON() in this example we are going to provide two arguments, @AllJson which contains the JSON object and must be datatype NVARCHAR(). Next is the path, the way I think about the path, is it specifies the node or array that I want to return from the @AllJson. The other function that we will use is JSON_VALUE(). This function also accepts two parameters, and an expression which is a variable or field name containing JSON data. The other one is path, the way I think about the path is it specifics the node or array that I want to return from the JSON data (yes I said that already just wanted to see if you are paying attention ;->).
That’s a lot of words so let's look at some TSQL in Figure 3 below
Figure 3 – The JSON data from the sessions node returned as a result set in SSMS
When we look at Figure 3 we will notice that the first row of the data is the same as the data shown in Figure 2. In essence the FROM OPENJSON(@AllJson, ‘$.sessions’) is returning a dataset which consists of three fields namely Key, Value, and Type. The field Value contains the JSON object for all 30 session nodes. Next the JSON_VALUE() function takes the Json and extracts the value for one key pair. This is done by specifying the Key value for the 'Key:Value pair'. So in the case of title the path ‘$.title’ is supplied for the path parameter. Since there is only one 'Key:Value' pair where the Key = title, the value is return from the JSON_VALUE() function, and returned in the field ‘SessionTitle’.
Looking at Figure 2, there is a Key:Value pair in the speakers array. So sessions.id.value is “1174469”, the corresponding lookup value is speakers.sessions.value is “117469”. The two values are their locations in the JSON object are shown in Figure 4 below.
Figure 4 – Showing the lookup values for both sessions to speakers and vice versa.
So we know that we want to get access to the data in the speakers array as this contains the list of speakerID’s for each session. How is this done? Well I found an answer in this blog post - https://visakhm.blogspot.com/2016/07/whats-new-in-sql-2016-native-json_13.html. Below in Figure 5 is the TSQL and result set.
Figure 5 – Updated query to return the speakerID from the speakers array.
All we have done in the query shown in Figure 5 is to add a CROSS APPLY with a simple select statement. Now the speaker ID is returned, note that if there is more than one speakerID, such as in the case of sessionID 117615 (which has two awesome speakers). In which case the query returns two rows, returning a different speakerID for each, which is just what we wanted.
Next let's have a look at returning data for the speaker's node. Below in Figure 6 the TSQL to return some data from the speakers array.
Figure 6 – TSQL query to return data from the speakers array
Looking at the query inside the CROSS APPLY
SELECT Value FROM OPENJSON(s.Value, '$.links')
WHERE Value LIKE '%Twitter%'
There are a couple things that are worth looking at. First it is possible to use a WHERE clause on the columns returned by the OPENJSON() function. The reason for using the WHERE clause is that the links node can contain more than one type of link. During development some of the speakers had a LinkedIn profile, which they then removed 🙁.
So by now I am sure you are saying “show me the money”. After some work I created a query which extracts, the session, speaker and room information. Then returns it as a single result set as shown in Figure 7 below.
Figure 7 – Result set with Session, Speaker and room details
If you want to have a try yourself and play with the code then you will find
TSQL source code is in this Azure Data Studio Notebook is here
Python Code is in this Azure Data Studio Notebook is here
If you have not run the python code to import the data to import the data, then I have created a azure data studio notebook, containing the code to create the database and other tasks. The notebook can be found here.
Last, but very much not least why did I spend some much effort to get all the data out of the sessonize API? The end goal was to supply the data to SQL Server Report Builder (download from here https://www.microsoft.com/en-us/download/details.aspx?id=53613) . This standalone tool will allow you to build an SSRS report. Using this tool I created a report which when you run the report outputs pages that look like the one shown in Figure 8 below.
Figure 8- Data finally published on the SSRS report
So what am I going to show in the following blog posts?
So that’s what I am going to show, next let’s look at what is required to do it.
First download and install Azure Data Studio you can download the program from here.
Once you have installed Azure Data Studio, open the application. In Azure Data Studio in the menu find ‘File’ and click it, from the menu select ‘New Notebook’ see Figure 1 below.
This will open a new notebook (yippee!!) this might not sound very exciting yet, however it is! When a new notebook opens the Kernel must be set. The way that I think about this that it sets the language which will be run in the notebook, and will default to SQL. What we want run is Python v3. From the list of Kernels available selected ‘Python 3’, this will set the language that will be run in the notebook.
Figure 2 – selecting the Kernel (programming language) that will be run in the notebook.
Once ‘Python 3’ has been selected and if Python is not set up and installed, then Azure Data Studio will prompt you to set up and configure Python for Notebooks. A screen will open as we can see in Figure 3. For this blog post I accepted the default location and clicked on the ‘install’ button.
Figure 3 – Install and configure python for use in Azure Data Studio
If everything has gone to plan, then you should see something that looks like Figure 4.
Figure 4 – installation of Python going as planned
Installing of Python can take sometime so it might be good idea to get a hot beverage or do something else till it is finished installing.
Figure 5 – Installation of python is now completed successfully
In sessionize.com it is possible to create different API’s to output data, with this example the data is outputted as JSON. It is possible to select different parts of the data to be outputted, in this example ‘All Data’ is selected. Selecting the data from sessionize.com is beyond the scope of this blog post, it is very easy to do though.
In figure 6 the last step is to get the URL to be called in the code, this can be seen in Figure 6 below.
Figure 6 - API /Embed screen in Session.com for Data Scotland 2019.
In figure 6a (yes I forgot to include this till a later edit) is the columns that are outputted from Sessions.com for the API endpoint used.
Figure 6a - Settings for Available API endpoint used in this blog post.
Ok enough setting up lets write some code. To get access to other libraries in Python, the command that is used is import <library name>. In this example there are four libraries which are imported to be used. If you run the code shown in figure 7 you might get the error message shown.
Figure 7 – Error message if the package for the library being imported is not installed.
If you do see this error message then all you need to do is install the required package. In figure 7 at the top left hand side there a button titled ‘Install Packages’. Click on that button and the terminal window will open (see Figure 8). The command that installs the library ‘pyodbc’ is ‘.\python.exe - m pip install pyodbc’, type the command into the terminal window and press enter.
Figure 8 – Entering the command to install the ‘pyodbc’ package in the terminal window.
Hopefully the ‘pyodbc’ package will install without any challenges. If like me you are not so lucky and you get the error message shown in Figure 9. Then this is quite easy to fix.
Figure 9 – Error message stating PIP (Pip Installs Packages) requires to be upgraded to install ‘pyodbc’ package
If you get the error message shown in Figure 9 then enter the following command at the prompt ‘.\python.exe - m pip install –upgrade pip’. If everything goes well you will see a message like the one shown in Figure 10.
Figure 10 – Successfully upgraded PIP to v 18.
Once the new version of PIP has been installed restart Azure Data Studio. Then open a notebook select Python 3 as the kernel language then click on the ‘Install Packages’ and install ‘pyobdc’ library (see Figure 8). Once ‘pyobc’ has been installed, it is now time to run the Python script
The Python Script will do the following
1 - call the API call and get the Json string returned is this into a dict Object which is then cast to a string object.
2 - open a connection to a SQL database run SQL script to create table if does not exist
3 - insert Json string into field in the table
Below is the Python script that is used. Much of the credit must go to the various websites which I have add references to in the script. In figure 10 we can see the script that is used. All that is require to change, is URL for the sessionize.com API, user credentials in the connection string. Otherwise this is the script is what I used.
Figure 11 - Python script in Azure Data Studio Notebook to import Json in SQL server 2016
The Azure Data Studio Notebook that is shown in Figure 11 can be downloaded from here.
In the next blog post we will look at how work with the Json data in SQL Server.
Figure 2 – SELECT statement from Figure 1 showing the output when FOR JSON PATH is used
Let's make one change to the SELECT statement in Figure 2, instead of using FOR JSON PATH use FOR JSON AUTO.
Figure 3 – Output from Select statement with FOR JSON AUTO
Looking at Figure 3 the JSON string outputted is different from one shown in Figure 2. With the select statement in Figure 3 all the records are from a single SalesOrderNumber and Order date. Hence all the records from Sales.SalesOrderDetail alias “D” are in a child node underneath the parent record from Sales.SalesOrderHeader.
Let's try a different SELECT query this time select 3 records with different SalesOrderNumbers see Figure 4.
Figure 4 – results set for the three SalesOrderNumbers
Now lets try the same query with ‘FOR JSON AUTO’ the query and output is shown in Figure 5.
Figure 5 – results set for the three SalesOrderNumbers outputted as a JSON string
Looking at Figure 5 each individual SalesOrderNumber are placed in their own node. The corresponding SalesOrderDetail values are placed in a child node underneath the parent SalesOrderNumber.
Using FOR JSON AUTO the format of the JSON string is determined by the SELECT statement. Whereas FOR JSON PATH which we demonstrated in the previous blog post the JSON string output is controlled by the fields and table presented in the SELECT statement.
Figure 1 – Three records returned from the temporary table.
There are three records returned, note that the second record the ‘Colour’ field has a NULL value returned. In the next step I am going to add ‘FOR JSON PATH’ after the ‘FROM’ statement.
Figure 2 – Adding ‘JSON PATH’ to the select statement, the results set it outputted as a JSON formatted string
When the SELECT statement with the ‘JSON PATH’ command is run the results are returned as a JSON formatted string. The query was ran in Azure Data Studio, so to see the JSON string formatted. All that was required is to click once with left hand mouse button to open the results in a new window, as show in Figure 2.
Note that for the second record, the ‘Colour’ field has no data returned, since the value for that record is ‘NULL’. Can we write a SELECT statement which includes ‘JSON PATH’ and return records with NULL values?
All we need to is add ‘INCLUDE_NULL_VALUES’ to the query see Figure 3.
Figure 4 - Adding ‘WITHOUT_ARRAY_WRAPPER’ to the query and the square brackets are suppressed.
All the queries shown in the screenshots were ran using Azure Data Studio, which if you click on the Results set returned opens it in another new window and formats the JSON.
That’s enough for just now there is more to come.
Method 1 – TSQL
SELECT ConnectionProperty('net_transport') AS 'net_transport'
, ConnectionProperty('protocol_type') AS 'protocol_type'
, ConnectionProperty('auth_scheme') AS 'auth_scheme'
, ConnectionProperty('local_net_address') AS 'local_net_address'
, ConnectionProperty('local_tcp_port') AS 'local_tcp_port'
, ConnectionProperty('client_net_address') AS 'client_net_address'
, ConnectionProperty('physical_net_transport') AS 'physical_net_transport'
Local_net_address was the field that I required.
This post was inspired by - https://www.sqlservercentral.com/forums/reply/1519373
Further information on ConnectionProperty can be found here.
Method 2 – DBA Tools
Test-DbaConnection <InstanceName> see -> https://docs.dbatools.io/#Test-DbaConnection
The output includes a property IPAddress which is what I wanted. If you have not looked at DBATools then you should, it is an amazing, open source project, and a genuine time saver.
Are there any other methods which I have not thought of please let me know.
The core application my current company has built allows people to apply for grants. We track the application through various stages, much like a loan application there are many stages that the application might go through. The status of the application is not linear. So the applications can go back and forth between the same status more than once. In figure 1 below an application .can given a status of ‘under review’ then get a status of ‘submitted’ or ‘Pre-Submission’. So, an application might have the same status more than once with different dates.
Figure 1 –showing the workflow an application might take.
One question which our customers want to know is how long an application takes to move from one status to another. This question could be answered using the existing application database. Using that method is not ideal as the queries are quite complex and slow. This is one of the reasons I decided to build a Datawarehouse as a proof of concept. To see if there was a quicker way to answer that question.
Calculating time difference
Given that an application can be allocated a status more than once during its lifecycle. This has an impact on the method we used for calculating the time difference. The methodology used to get the time difference is defined below
The fact table in the POC Datawarehouse, stores the ApplicationID, Status, and the datetime the application was allocated that status. If an application has been assigned the same status then there would be a record for each time the application has received that status. This has some advantages; the end user can easily understand the data as it is shown in the table. On the other hand, this presented some difficulties with calculating the time period between each status.
After various attempts the following calculated column was created. In this case calculating the days between a grant receiving Pre-Submitted status and Rejected status.
Figure 2 – First version of the calculated column DaysFromSubmittedToRejected
For both values the logic used is very similar, for the variable FirstSubmittedDate the MIN() function is used to get the first date. The FILTER statement only returns records with same ApplicationID and Status = ‘Submitted’. Whereas the variable LastRejectedDate the MAX() function is used to return the last date. Using variables, means the logic can be re-used for other statuses, and the DATEDIFF() function is easier to read. When the results are displayed they were not as expected.
Figure 3 – results from the first version of the DaysFromSubmittedToRejected function see Figure 2
The requirement was to return the number of Days elapsed From Submitted To Rejected with the value returned against the record which is the last time the application received that status. So, another function was created to see if the current record is the LastRejectedDate, which is shown in Figure 4.
With this calculated column (LastRejectedDate) in place the return statement of ‘DaysFromSubmittedToRejected’ function just requires a small change which is as follows
VAR FirstSubmittedDate =
MIN ( 'FactAppStatuses'[DateAddedAt] ),
ALL ( 'FactAppStatuses' ),
'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] )
&& 'FactAppStatuses'[Status] = ‘Submitted’
VAR LastRejectedDate =
MAX ( 'FactAppStatuses'[DateAddedAt] ),
ALL ( 'FactAppStatuses' ),
'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] )
&& 'FactAppStatuses'[Status] = 'Rejected’
'FactGrantStatuses'[LastRejectedDate] = "Y",
DATEDIFF ( FirstSubmittedDate, LastRejectedDate, DAY ),
Figure 5 – Second version of the calculated column DaysFromSubmittedToRejected
When the function is added to the PowerBI report shown in Figure 5 then run then the results can be see in Figure 6 below.
Figure 6 – results from the second version of the DaysFromSubmittedToRejected function (see Figure 5)
The addition of the IF() statement uses the value to check if current row contains the LastRejectedDate. Otherwise it returns a zero. This returns the record set which is expected.
The positive uses of this method are that the same calculation can be re-used for a variety of statuses only requiring a minor change to the formula. On the negative side the more calculated columns there are the longer it would take to generate a report.
No doubt there are other approaches that can be taken to improve this. The time taken to evaluate the calculated column will be slower as the number of rows increases. On the other hand, the current method is significantly faster than doing the same calculations on the application database. Even better with the use of a PowerBI report or a tabular data model. These figures could easily be used by an end user to reports which provide insight into their data. All the better that our clients could do this for themselves, with relative ease.
Someone was reading one of my previous blog posts and noted a mistake. I was very thankful that the person had taken time to let me know. Whilst pondering this it got me thinking about managing my strengths and weaknesses. There is a saying develop your strengths and manage your weaknesses. The first step is to identify your strengths and weakness. Let me list a few of them below;
Written English - yes, it's ironic that I choose to spend time writing a blog, given that my spelking and grammar is one of my weakness. Disciplining myself to write a blog forces me to use the tools that I have to help me manage that weakness. Tools such as a spell checker is a heaven-sent tool. There is another tool called Grammarly, which is a combination of a spelling and grammar checker. This tool makes an excellent crutch :-)
Social skills - despite starting and helping to run the Glasgow SQL server user group. My soft skills are not my strength. It would be fair to say, social interactions are those which I find the most challenging. One T-shirt slogan 'I like to party (and by party I mean read books)' I can identify with. One approach I take to manage this weakness is to read books quite literally, and so I have read quite a few books on soft/social skills. These have taught me a few strategies and techniques I use in everyday life.
Finding solutions - it's what I do, give me a challenge and I will find a solution. During an interview for a role, and an issue was raised regarding getting reports emailed automatically to managers. The company had tried various solutions none of which worked, over a period of a couple of years. The solution took me over 6 months, involved some VBA, a spreadsheet, and an SSIS package. I was able to deliver a solution which worked. The solution was not pretty or elegant, that said it met the specification and most important delivered with the constraints that required to work within. Many months after leaving the company it was a delightful pleasure to hear my solution was still in place being used.
Having taken time to recognize my weaknesses and strengths that was the first step for me to improve. This blog post will have been read and then re-read. Passed by someone with better grammar and spelling than myself before publishing. This is my way of managing that weakness. On the other hand, my strengths will have been used in other ways. These strengths and weakness are a part of who I am. They can help me make a positive and unique contribution. Learning to work with them is what makes me just that little bit better each day.
The email telling me that I had been selected to speak was to say the least a surprise. My session “Soft Skills for Success” had been accepted for Data in Devon 2019. Yes I am very excited about the opportunity to speak in Exeter. My last time in Exeter was a school holiday many years ago, so the opportunity to return that beautiful part of the country is wonderful surprise.
It might seem a little strange that I was surprised that my session was accepted. However, I have been submitting sessions for a while now, most of which have been unsuccessful. This pattern is much like the one I experienced when I decided to moved careers into IT a couple of years ago and I was applying for IT roles. One of the lessons I learned during that time is that you are going to fail more than you succeed. So it pays to keep trying and mostly importantly learn from each failure and success. The current lesson I am learning is all about abstracts, how can I improve them? make them better? As I know if one strategy does not work, then I need review it, make a small change, then try the changed strategy, till I find the one that works, so that’s what I am doing with my submission abstracts.
To be invited to speak is to my mind an honour and a privilege. Why? People who I see as peers and who I respect have reviewed and considered the abstract that I have submitted, then invited me to speak. They are also putting their faith in me that on the day I will be there and will deliver a session which people will want to see / listen to / learn something from. When I get there on the day, yes, no doubt I will be nervous, worried, and have checked everything at least twice. Before the getting to the room on the day, I will have rehearsed the session, reviewed it been through it more than once. Just now I am working on tweaks to the session just for Data In Devon, if you want to know more you will have to attend the session.
The topic “Soft Skills for Success” is something that I am passionate about. So, I will be giving it my all. Like many projects that I work on, what people see is the final, smallest part of the project. Beforehand there will be a lot of unseen hard work to make it look good. The prize is worth the price, which is to be invited to speak to my peers about a topic that I am passionate about.
Forgive me whilst I indulge in some nostalgia first. Whilst at SQL Bits 2016, I finally made the decision to start a SQL user group in Glasgow. If I remember correctly Craig Porteous attend the infamous SQL Bits party and met a gentleman dressed as a Wookie (William Durkin). Now Glasgow has a growing SQL server user group (thanks to a dedicated hard working team). Not only that we just had our second SQL conference! (Yes SQL Bits has lot to answer for ;->)
SQLGLA was a fantastic event it has left me with some wonderful feelings. This blog post is part thank you to those involved, also just to express some of my feelings / experiences of the event.
Craig Porteous, Louise Paterson and I are the people who organised this year's event. I felt we were a team (this is a theme, by the way). Each of us contributed to the success of the event, in a different way. The success of the event was not down a single person in my opinion. It was about us all coming together to make it a success. Craig did put a lot of work into making the event what it was.
There are few words to express my gratitude about our team of volunteers. As an organizing team, we have all volunteered before so we knew what is expected of volunteers. There was a lovely mix of those who have done it before and few who have never done any volunteering. What a team though, they all worked really hard. Speaking personally, I am so proud of them, they made this event for me. Not once did I have to worry about if something was done or not being done. They just made it happen. Whilst I have said this before I cannot thank them enough for being such an amazing team, they exceeded all my expectations.
Last year thanks to William Durkin we had such an amazing lineup, and yet this year lineup was even better! A few days before the event I was looking through the list. It would be a fair comment to say I was blown away by the quality of those who choose to speak at our event. The feedback that I have heard so far the attendees also agree 🙂 Looking the depth of experience and knowledge that was shared, was to me awe inspiring. Whats even better is they all came to Scotland to share their knowledge and expertise with us.
It’s a phrase that I've seen often on social media, these are not just words it's something that can be seen and experienced in real life. To experience it is very humbling and gratifying. A few weeks before the event I was contacted by Chris Taylor (@SQLGeordie) one of the speakers. The speakers had decided they wanted to get something for Craig & Karen Porteous who are expecting their second child. It was my privilege and honor to present the gift on their behalf. Hopefully, Craig will forgive me one day for gate-crashing the closing presentation to present the gift? As Craig knew nothing about this gift.
To make the event something that little different we wanted to find a design for the t-shirts that was rather special. Having worked with Emily Chappell over the years on various projects, her quirky designs and sense of style really works for me. So I suggested we should ask Emily for a design for the t-shirts. The first design was not what we wanted. So next we all got together in a coffee shop at lunchtime. We had a chat did some brainstorming, Emily then sent back a design which we all loved (see picture below). Personally, I’ve never been to a technical conference where they sold the t-shirts :-)
First there are few housekeeping duties to take care, which we are just finishing off just now. Followed by some discussions to see what will be the next step 🙂 All I can safely say is keep looking there will be some exciting news soon....