(scottish) sql bob blog

rss

Some thoughts of your typical data shepard / data groomer / data dance teacher sort of person.


SQL and Json Part1
Being the SQL server geek in the office means that my desk is the goto place when people have questions about SQL server.  One of the developers asked a question about Json.  When people ask me about Json and SQL server I begin to panic, based on previous experiences.   After a short discussion with my colleagues we made some suggestions and the developer when off to investigate them. 

The next day I was asked by the developer to look over a script which would be run against a client's database.  This was the day's lesson for me. The developer had found some TSQL which I freely admit is something which I had never heard of before and which happened be ‘FOR JSON PATH’.  Using this the developer was able to output some JSON and append some JSON to get the data they required. 

Minimum requirement is that the database is on a SQL 2106 instance.

So what does FOR JSON PATH do?  Very simply it outputs the result of a SELECT statement in JSON format.   So lets create a small table, insert some records into it, then run a SELECT on the table to see what is returned.
















Figure 1 – Three records returned from the temporary table. 

There are three records returned, note that the second record the ‘Colour’ field has a NULL value returned.  In the next step I am going to add ‘FOR JSON PATH’ after the ‘FROM’ statement. 



 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2 – Adding ‘JSON PATH’ to the select statement, the results set it outputted as a JSON formatted string

When the SELECT statement with the ‘JSON PATH’ command is run the results are returned as a JSON formatted string.  The query was ran in Azure Data Studio, so to see the JSON string formatted.  All that was required is to click once with left hand mouse button to open the results in a new window, as show in Figure 2.  

Note that for the second record, the ‘Colour’ field has no data returned, since the value for that record is ‘NULL’.  Can we write a SELECT statement which includes ‘JSON PATH’ and return records with NULL values? 

All we need to is add ‘INCLUDE_NULL_VALUES’ to the query see Figure 3.


Figure 3 - Adding ‘INCLUDE_NULL_VALUES’ to the query, and NULL values are returned by the query.

Whilst demonstrating this feature to the developers in my company. One question which was asked, would it be possible to remove the square brackets eg [ ] from the JSON string outputted.   Yes you can. 


Figure 4 - Adding ‘WITHOUT_ARRAY_WRAPPER’ to the query and the square brackets are suppressed. 

All the queries shown in the screenshots were ran using Azure Data Studio, which if you click on the Results set returned opens it in another new window and formats the JSON. 

That’s enough for just now there is more to come. 


How to find SQL Server Instance IP address
Quite often I need to get some information about a SQL server instance, this morning it was the IP address for an SQL server instance.  Having googled for this more times than I care to remember this time I thought I would put the information in a blog post.


Method 1 – TSQL 
SELECT   ConnectionProperty('net_transport')          AS 'net_transport'
      , ConnectionProperty('protocol_type')          AS 'protocol_type'
      , ConnectionProperty('auth_scheme')            AS 'auth_scheme'
      , ConnectionProperty('local_net_address')      AS 'local_net_address'
      , ConnectionProperty('local_tcp_port')          AS 'local_tcp_port'
      , ConnectionProperty('client_net_address')      AS 'client_net_address'
      , ConnectionProperty('physical_net_transport') AS 'physical_net_transport'

Local_net_address was the field that I required.

This post was inspired by - https://www.sqlservercentral.com/forums/reply/1519373
Further information on ConnectionProperty can be found here.

Method 2 – DBA Tools

Test-DbaConnection <InstanceName>  see -> https://docs.dbatools.io/#Test-DbaConnection

The output includes a property IPAddress which is what I wanted.   If you have not looked at DBATools then you should, it is an amazing, open source project, and a genuine time saver.

Are there any other methods which I have not thought of please let me know.


DAX adventure with application statuses

The core application my current company has built allows people to apply for grants. We track the application through various stages, much like a loan application there are many stages that the application might go through. The status of the application is not linear.  So the applications can go back and forth between the same status more than once.  In figure 1 below an application .can given a status of ‘under review’ then get a status of ‘submitted’ or ‘Pre-Submission’.  So, an application might have the same status more than once with different dates. 

Figure 1 –showing the workflow an application might take.

One question which our customers want to know is how long an application takes to move from one status to another.  This question could be answered using the existing application database.  Using that method is not ideal as the queries are quite complex and slow.  This is one of the reasons I decided to build a Datawarehouse as a proof of concept.  To see if there was a quicker way to answer that question. 

Calculating time difference 
Given that an application can be allocated a status more than once during its lifecycle.  This has an impact on the method we used for calculating the time difference.  The methodology used to get the time difference is defined below 

  • Find the first datetime when the application was assigned the first specified status.  Allocate value to FirstStatusDatetime
  • Find the last datetime when the application was assigned the second specified status.  Allocate value to LastStatusDatetime
  • Calculate the difference in time between FirstStatusDatetime and LastStatusDatetime
Application Fact Table  

The fact table in the POC Datawarehouse, stores the ApplicationID, Status, and the datetime the application was allocated that status.  If an application has been assigned the same status then there would be a record for each time the application has received that status.  This has some advantages; the end user can easily understand the data as it is shown in the table.  On the other hand, this presented some difficulties with calculating the time period between each status.    After various attempts the following calculated column was created.  In this case calculating the days between a grant receiving Pre-Submitted status and Rejected status. 

DaysFromSubmittedToRejected =
VAR FirstSubmittedDate =
    CALCULATE (
        MIN ( 'FactAppStatuses'[DateAddedAt] ),
        FILTER (
            ALL ( 'FactAppStatuses' ),
            'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] )
                && 'FactAppStatuses'[Status] = 'Submitted'
        )
    )
VAR LastRejectedDate =
    CALCULATE (
        MAX ( 'FactAppStatuses'[DateAddedAt] ),
        FILTER (
            ALL ( 'FactAppStatuses' ),
            'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] )
                && 'FactAppStatuses'[Status] = 'Rejected'
        )
    )
RETURN
    DATEDIFF ( FirstSubmittedDateLastRejectedDateDAY )

Figure 2 – First version of the calculated column DaysFromSubmittedToRejected

For both values the logic used is very similar, for the variable FirstSubmittedDate the MIN() function is used to get the first date.  The FILTER statement only returns records with same ApplicationID and Status = ‘Submitted’.  Whereas the variable  LastRejectedDate the MAX() function is used to return the last date.  Using variables, means the logic can be re-used for other statuses, and the DATEDIFF() function is easier to read.  When the results are displayed they were not as expected. 


Figure 3 – results from the first version of the DaysFromSubmittedToRejected function see Figure 2

The requirement was to return the number of Days elapsed From Submitted To Rejected with the value returned against the record which is the last time the application received that status.  So, another function was created to see if the current record is the LastRejectedDate, which is shown in Figure 4.

LastRejectedDate = 
IF ( 
MAXX ( 
FILTER ( 
FactAppStatuses, 
EARLIER ( FactAppStatuses[ApplicationID] ) = FactAppStatuses[ApplicationID] 
&& FactAppStatuses[Status] = ‘Rejected’ 
), 
[ID] 
) 
= [ID], 
"Y", 
"N" 
)  
Figure 4 – LastRejectedDate function

With this calculated column (LastRejectedDate) in place the return statement of ‘DaysFromSubmittedToRejected’ function just requires a small change which is as follows 

DaysFromSubmittedToRejected = 
VAR FirstSubmittedDate = 
     CALCULATE ( 
         MIN ( 'FactAppStatuses'[DateAddedAt), 
         FILTER ( 
             ALL ( 'FactAppStatuses), 
             'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID)  
              && 'FactAppStatuses'[Status] = ‘Submitted’ 
         ) 
     ) 
VAR LastRejectedDate = 
     CALCULATE ( 
         MAX ( 'FactAppStatuses'[DateAddedAt), 
         FILTER ( 
             ALL ( 'FactAppStatuses), 
             'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID)  
              && 'FactAppStatuses'[Status] = 'Rejected’ 
         ) 
     ) 
RETURN 
IF ( 
'FactGrantStatuses'[LastRejectedDate] = "Y", 
DATEDIFF ( FirstSubmittedDate, LastRejectedDate, DAY ), 
0 
) 

Figure 5 – Second version of the calculated column DaysFromSubmittedToRejected

When the function is added to the PowerBI report shown in Figure 5 then run then the results can be see in Figure 6 below.  


Figure 6 – results from the second version of the DaysFromSubmittedToRejected function (see Figure 5)

The addition of the IF() statement uses the value to check if current row contains the LastRejectedDate.  Otherwise it returns a zero.   This returns the record set which is expected.

The positive uses of this method are that the same calculation can be re-used for a variety of statuses only requiring a minor change to the formula.  On the negative side the more calculated columns there are the longer it would take to generate a report.

No doubt there are other approaches that can be taken to improve this. The time taken to evaluate the calculated column will be slower as the number of rows increases.  On the other hand, the current method is significantly faster than doing the same calculations on the application database.  Even better with the use of a PowerBI report or a tabular data model.  These figures could easily be used by an end user to reports which provide insight into their data.  All the better that our clients could do this for themselves, with relative ease.