The core application my current company has built allows people to apply for grants. We track the application through various stages, much like a loan application there are many stages that the application might go through. The status of the application is not linear. So the applications can go back and forth between the same status more than once. In figure 1 below an application .can given a status of ‘under review’ then get a status of ‘submitted’ or ‘Pre-Submission’. So, an application might have the same status more than once with different dates.
Figure 1 –showing the workflow an application might take.
One question which our customers want to know is how long an application takes to move from one status to another. This question could be answered using the existing application database. Using that method is not ideal as the queries are quite complex and slow. This is one of the reasons I decided to build a Datawarehouse as a proof of concept. To see if there was a quicker way to answer that question.
Calculating time difference
Given that an application can be allocated a status more than once during its lifecycle. This has an impact on the method we used for calculating the time difference. The methodology used to get the time difference is defined below
The fact table in the POC Datawarehouse, stores the ApplicationID, Status, and the datetime the application was allocated that status. If an application has been assigned the same status then there would be a record for each time the application has received that status. This has some advantages; the end user can easily understand the data as it is shown in the table. On the other hand, this presented some difficulties with calculating the time period between each status.
After various attempts the following calculated column was created. In this case calculating the days between a grant receiving Pre-Submitted status and Rejected status.
Figure 2 – First version of the calculated column DaysFromSubmittedToRejected
For both values the logic used is very similar, for the variable FirstSubmittedDate the MIN() function is used to get the first date. The FILTER statement only returns records with same ApplicationID and Status = ‘Submitted’. Whereas the variable LastRejectedDate the MAX() function is used to return the last date. Using variables, means the logic can be re-used for other statuses, and the DATEDIFF() function is easier to read. When the results are displayed they were not as expected.
Figure 3 – results from the first version of the DaysFromSubmittedToRejected function see Figure 2
The requirement was to return the number of Days elapsed From Submitted To Rejected with the value returned against the record which is the last time the application received that status. So, another function was created to see if the current record is the LastRejectedDate, which is shown in Figure 4.
With this calculated column (LastRejectedDate) in place the return statement of ‘DaysFromSubmittedToRejected’ function just requires a small change which is as follows
VAR FirstSubmittedDate =
MIN ( 'FactAppStatuses'[DateAddedAt] ),
ALL ( 'FactAppStatuses' ),
'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] )
&& 'FactAppStatuses'[Status] = ‘Submitted’
VAR LastRejectedDate =
MAX ( 'FactAppStatuses'[DateAddedAt] ),
ALL ( 'FactAppStatuses' ),
'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] )
&& 'FactAppStatuses'[Status] = 'Rejected’
'FactGrantStatuses'[LastRejectedDate] = "Y",
DATEDIFF ( FirstSubmittedDate, LastRejectedDate, DAY ),
Figure 5 – Second version of the calculated column DaysFromSubmittedToRejected
When the function is added to the PowerBI report shown in Figure 5 then run then the results can be see in Figure 6 below.
Figure 6 – results from the second version of the DaysFromSubmittedToRejected function (see Figure 5)
The addition of the IF() statement uses the value to check if current row contains the LastRejectedDate. Otherwise it returns a zero. This returns the record set which is expected.
The positive uses of this method are that the same calculation can be re-used for a variety of statuses only requiring a minor change to the formula. On the negative side the more calculated columns there are the longer it would take to generate a report.
No doubt there are other approaches that can be taken to improve this. The time taken to evaluate the calculated column will be slower as the number of rows increases. On the other hand, the current method is significantly faster than doing the same calculations on the application database. Even better with the use of a PowerBI report or a tabular data model. These figures could easily be used by an end user to reports which provide insight into their data. All the better that our clients could do this for themselves, with relative ease.
On the the 10th of November 2017 the Glasgow SQL server user group ran #SQLGLA. This was a half day event, primarily for SQL server professionals in Scotland. Although we did have some visitors for further afield. Namely our four MVP speakers, William Durkin, Chrissy LeMaire, Andre Kamman and Rob Sewell.
Now the event is completed, we have gathered and compiled the feedback. Now I feel that there is time to think about the event. It's been a bit a rollercoaster ride, with the full complement of good bits and scary bits. Much of the thanks for making this entire event a success goes to my co-organiser Craig Porteous. Without Craig this event would not have come off at all.
It would be fair to say that both Craig and myself had visions of very few people coming. Other than the speakers, volunteers and ourselves. People did come and as often happens at these, people you did not expect arrived and those we expected did not arrive. All part of the learning curve. My memories of the actual day are, we arrived at the venue, we all ran up and down stairs, realized what we had forgotten to bring, set up the venue ready for our guests. People arrived, attendees, speakers, sponsors. Speakers, spoke, people mingled, chatted, had tea, coffee, sandwiches, then drank beer, & wine. We cleared up, said good bye. Some people head off to the pub to continue the fun. It was a fantastic day, all the objectives we had aimed for were achieved.
As we know these events do not run themselves. There is a dedicated team of selfless people who make these events happen. Craig Porteous my co-organiser put a massive amount of work into making this event happen. Even more his selfless decision to have one more drink at the SQL Bits party meant he met William Durkin, and the conversation resulted in the event :-). We were also very kindly blessed with two very enthusiastic and industrious volunteers Edith and Paul. Who always seemed to be in the right place at the right time, and just did what needed to be done.
This event was a huge learning experience for Craig and myself. This event combined with running the user group has taught me more than a few lessons. Some of which we have learned, some I am sure we are still learning. Now the event is over, and I have sometime to look back, I will blog about some of the lessons. About running a user group and a larger event such as #SQLGLA.
So first l created a variable in the query window see figure 1.
Now you have a card with dynamic title, if you want to change the client name all you have to do is change the text in the variable. A sample PowerBI report file can be downloaded here which contains the report shown above.
Whilst nothing can beat experience having a certificate can also improve the chances when looking for a new job. If an interviewer is looking at two evenly matched candidates, one without a certificate one with a certificate. Personally speaking, if l was the person making the decision then the person with the certificate would the one whom l would recommend getting the job. By studying and passing the exam the candidate has demonstrated, initiative, self-motivation, and a genuine desire for technology. At least that's my opinion. In one case l know of a colleague whom l work with, who said one the reasons they were offered a job was due to the fact that had studied and passed two exams, which were relevant to the role they applied for.
Craig Porteous spoke to some recruiters he knows and asked them some questions about their view of certification. So these thoughts are directly from those who make hiring decisions.
What weight do you put on certifications when hiring?
1) A lot. With two otherwise comparable candidates, the one with certifications wins in my mind.
2) A lot, think it shows that candidates are focused in developing their career.
3) For a technical role I see it as essential
Do you encourage the pursuit of certifications by your team?
Do you see any downsides/negative aspects to certifications?
1) No, none at all.
2) Some of the accreditation's could be more hands on focused.
3) Cost (retaining skilled up workers)
With staff who have completed certifications, do you see any differences in working practice etc to those who haven’t?
1) Yes. People working towards certifications are more engaged with technology and tend to apply their learning in the work environment, sharing their knowledge and improving the overall team dynamic leading to improved productivity.
2) It really depends on the individual so don’t think it’s a fair comparison. Better way to look at it is how doing accreditations adds value to that person in regards technical ability and confidence.
3) Last example was a infrastructure type. Stuck him on a sccm course. A,month later our sccm world upgraded. A year later he left and now heads up sccm at dell secure works in the US . For me a benefit – I get a sccm upgrade from a capable engineer. They get a badge they can use to get their career upgraded. Win win.
One reason l have heard for not doing any certification is the cost. My personal point of view on this is that l am investing in myself. If l learn a new skill gain or learn some new techniques. Yes, the company l am working for will benefit which is excellent news for them. If l choose to move to another role with a different company, then those skills transfer with me. Those skills l have invested both my time and (more importantly very often) my money in they are mine. So my choice is to invest in myself as l believe that the return on investment (ROI) is excellent.
So you have decided to study for certification, what resources are there available? The following list is just suggestions, based on largely on my experience and some others.
If you are planning to take the data platform exams then l would strongly suggest investing in the books for the relevant exam. For example, this one is for the 70-461 exam. The book covers all the topics that could be questioned in the exam. There is an accompanying CD has an electronic copy of the book and practice exam questions. One series of books which I used whilst studying for 70-461 exam Querying Microsoft SQL Server 2012/14 was series titled 2 Joes. The purpose of the books is to take a complete beginner through all the skills required to pass the exam. Personally l found this is the best explanation of how to query xml data using TSQL. They where for me worth the investment.
CBT Nuggets - an excellent resource, the videos l viewed for my 70-462 exam were really helpful. As part of the training package l signed up for included exam questions which l also found where excellent. More expensive that other sites, you have access to all the courses. Really worth considering if want to maximize your study time.
Udemy - with this provider you purchase one course at a time which allows you "lifetime access" to the course. When the courses are on sale, the prices are low. The quality of the courses can vary, so have a look a the reviews on the course before purchasing. Personally, l found the 70-463 covered the basics well, on the other hand, it did not go into sufficient detail for the exam questions.
Pluralsight - this is a well know training site (ok l have heard the name in quite a few place) there are lists of videos for specific certifications. The quaility of the courses l have watched were of a very high quality.
Microsoft Virtual Academy - its Free, which is not always a recommendation. That said the quality of the courses that l have viewed have been excellent. For the exams search for jump start videos, which are a really good jumping off point to start your studying. So excellent hints and tips in the videos from people who passed and training people for the exams.
SQL Bits - apart from being one of the best SQL conferences in Europe. The organizers have given back to the SQL community by recording some of the sessions and making the available for FREE on their website. To find what you are looking for might take a bit of searching, most topics will have at least one video on them.
YouTube - there are a lot of videos uploaded on a wide variety of topics. There will be some searching to find the topic you are looking. On the downside the quality of content is variable. Ranging from the excellent to the not so good.
Blog posts - again this will require some searching. It has in my experience, been worth the time and energy required. One author whilst studying for the 70-463 exam blogged about what she was learning as they went along.
Passing the exams
What is required to pass the exams? Practice and lots of it! One of the keys that are borne out by other people who passed the exams is the practice exams. These exam questions will not be exactly like the exam questions. What they will do is get in the way of thinking when doing the exams. Reading exam questions to see the question, examining multiple choice questions for the correct answer. Best way to get practice is to get hold of practice exam questions and take the exam. There are a number of providers which will allow you to purchase them, MeasureUp, CBTNuggets and others. They will not be exactly the same as the exam questions you take when you go into the exam room. On the other hand they will give you practice at answering the questions. If you pay attention to the score at the end of the practice exam you will also be able to see where you need to improve.
The questions are designed quite deliberately to test your knowledge, well you would not want them to be too easy ? This article from Pluralsight has some excellent examples of the format of the type of question you will be answering in the exam. If you are really interested in how the question are constructed and methodology behind them, this video from Pluralsight has an interview with someone who designs the exams.
All that remains to say if you have decided to study for a certification, good luck and happy studying.
Figure 2 – SELECT statement from Figure 1 showing the output when FOR JSON PATH is used
Let's make one change to the SELECT statement in Figure 2, instead of using FOR JSON PATH use FOR JSON AUTO.
Figure 3 – Output from Select statement with FOR JSON AUTO
Looking at Figure 3 the JSON string outputted is different from one shown in Figure 2. With the select statement in Figure 3 all the records are from a single SalesOrderNumber and Order date. Hence all the records from Sales.SalesOrderDetail alias “D” are in a child node underneath the parent record from Sales.SalesOrderHeader.
Let's try a different SELECT query this time select 3 records with different SalesOrderNumbers see Figure 4.
Figure 4 – results set for the three SalesOrderNumbers
Now lets try the same query with ‘FOR JSON AUTO’ the query and output is shown in Figure 5.
Figure 5 – results set for the three SalesOrderNumbers outputted as a JSON string
Looking at Figure 5 each individual SalesOrderNumber are placed in their own node. The corresponding SalesOrderDetail values are placed in a child node underneath the parent SalesOrderNumber.
Using FOR JSON AUTO the format of the JSON string is determined by the SELECT statement. Whereas FOR JSON PATH which we demonstrated in the previous blog post the JSON string output is controlled by the fields and table presented in the SELECT statement.
Figure 1 – The function ISJSON() returns 1 showing the data in the field [Data] is valid JSON
Now the JSON data is in the SQL database lets see about doing something useful with it. In SQL Server 2016 there was a number of new functions added that allow the querying and manipulation of JSON data. Having done some research, I found this blog post - https://visakhm.blogspot.com/2016/07/whats-new-in-sql-2016-native-json_13.html. Using code in this blog post I was able to extract the data from the JSON string supplied by the API from the sessionise.com website.
Before querying the data I need to explain one concept which is crucial for extracting data from structured JSON. In the example in Figure 1 below the path of the ‘title’ key value pair is as follows
Sessions.0.title this would have the key value pair 'title: “DAX Gotchas”' see Figure 2
Figure 2 – JSON data showing the sessions node and the first speaker node.
In the JSON object that was returned from sessionize.com API there are a number of nodes for each session. Starting with the number 0 through to 29 within each node there are a number of Key : Value pairs eg 'id : “117469”'. The path, nodes and arrays eg speakers, and categoryItems are what TSQL is going to extract values from.
Enough with all that waffling about JSON objects, lets write some proper TSQL. In the next example we are going to use a function called OPENJSON(). This is only available in SQL 2016 or upwards. Using OPENJSON() in this example we are going to provide two arguments, @AllJson which contains the JSON object and must be datatype NVARCHAR(). Next is the path, the way I think about the path, is it specifies the node or array that I want to return from the @AllJson. The other function that we will use is JSON_VALUE(). This function also accepts two parameters, and an expression which is a variable or field name containing JSON data. The other one is path, the way I think about the path is it specifics the node or array that I want to return from the JSON data (yes I said that already just wanted to see if you are paying attention ;->).
That’s a lot of words so let's look at some TSQL in Figure 3 below
Figure 3 – The JSON data from the sessions node returned as a result set in SSMS
When we look at Figure 3 we will notice that the first row of the data is the same as the data shown in Figure 2. In essence the FROM OPENJSON(@AllJson, ‘$.sessions’) is returning a dataset which consists of three fields namely Key, Value, and Type. The field Value contains the JSON object for all 30 session nodes. Next the JSON_VALUE() function takes the Json and extracts the value for one key pair. This is done by specifying the Key value for the 'Key:Value pair'. So in the case of title the path ‘$.title’ is supplied for the path parameter. Since there is only one 'Key:Value' pair where the Key = title, the value is return from the JSON_VALUE() function, and returned in the field ‘SessionTitle’.
Looking at Figure 2, there is a Key:Value pair in the speakers array. So sessions.id.value is “1174469”, the corresponding lookup value is speakers.sessions.value is “117469”. The two values are their locations in the JSON object are shown in Figure 4 below.
Figure 4 – Showing the lookup values for both sessions to speakers and vice versa.
So we know that we want to get access to the data in the speakers array as this contains the list of speakerID’s for each session. How is this done? Well I found an answer in this blog post - https://visakhm.blogspot.com/2016/07/whats-new-in-sql-2016-native-json_13.html. Below in Figure 5 is the TSQL and result set.
Figure 5 – Updated query to return the speakerID from the speakers array.
All we have done in the query shown in Figure 5 is to add a CROSS APPLY with a simple select statement. Now the speaker ID is returned, note that if there is more than one speakerID, such as in the case of sessionID 117615 (which has two awesome speakers). In which case the query returns two rows, returning a different speakerID for each, which is just what we wanted.
Next let's have a look at returning data for the speaker's node. Below in Figure 6 the TSQL to return some data from the speakers array.
Figure 6 – TSQL query to return data from the speakers array
Looking at the query inside the CROSS APPLY
SELECT Value FROM OPENJSON(s.Value, '$.links')
WHERE Value LIKE '%Twitter%'
There are a couple things that are worth looking at. First it is possible to use a WHERE clause on the columns returned by the OPENJSON() function. The reason for using the WHERE clause is that the links node can contain more than one type of link. During development some of the speakers had a LinkedIn profile, which they then removed 🙁.
So by now I am sure you are saying “show me the money”. After some work I created a query which extracts, the session, speaker and room information. Then returns it as a single result set as shown in Figure 7 below.
Figure 7 – Result set with Session, Speaker and room details
If you want to have a try yourself and play with the code then you will find
TSQL source code is in this Azure Data Studio Notebook is here
Python Code is in this Azure Data Studio Notebook is here
If you have not run the python code to import the data to import the data, then I have created a azure data studio notebook, containing the code to create the database and other tasks. The notebook can be found here.
Last, but very much not least why did I spend some much effort to get all the data out of the sessonize API? The end goal was to supply the data to SQL Server Report Builder (download from here https://www.microsoft.com/en-us/download/details.aspx?id=53613) . This standalone tool will allow you to build an SSRS report. Using this tool I created a report which when you run the report outputs pages that look like the one shown in Figure 8 below.
Figure 8- Data finally published on the SSRS report