(scottish) sql bob blog

rss

Some thoughts of your typical data shepard / data groomer / data dance teacher sort of person.


DAX adventure with application statuses

The core application my current company has built allows people to apply for grants. We track the application through various stages, much like a loan application there are many stages that the application might go through. The status of the application is not linear.  So the applications can go back and forth between the same status more than once.  In figure 1 below an application .can given a status of ‘under review’ then get a status of ‘submitted’ or ‘Pre-Submission’.  So, an application might have the same status more than once with different dates. 

Figure 1 –showing the workflow an application might take.

One question which our customers want to know is how long an application takes to move from one status to another.  This question could be answered using the existing application database.  Using that method is not ideal as the queries are quite complex and slow.  This is one of the reasons I decided to build a Datawarehouse as a proof of concept.  To see if there was a quicker way to answer that question. 

Calculating time difference 
Given that an application can be allocated a status more than once during its lifecycle.  This has an impact on the method we used for calculating the time difference.  The methodology used to get the time difference is defined below 

  • Find the first datetime when the application was assigned the first specified status.  Allocate value to FirstStatusDatetime
  • Find the last datetime when the application was assigned the second specified status.  Allocate value to LastStatusDatetime
  • Calculate the difference in time between FirstStatusDatetime and LastStatusDatetime
Application Fact Table  

The fact table in the POC Datawarehouse, stores the ApplicationID, Status, and the datetime the application was allocated that status.  If an application has been assigned the same status then there would be a record for each time the application has received that status.  This has some advantages; the end user can easily understand the data as it is shown in the table.  On the other hand, this presented some difficulties with calculating the time period between each status.    After various attempts the following calculated column was created.  In this case calculating the days between a grant receiving Pre-Submitted status and Rejected status. 

DaysFromSubmittedToRejected =
VAR FirstSubmittedDate =
    CALCULATE (
        MIN ( 'FactAppStatuses'[DateAddedAt] ),
        FILTER (
            ALL ( 'FactAppStatuses' ),
            'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] )
                && 'FactAppStatuses'[Status] = 'Submitted'
        )
    )
VAR LastRejectedDate =
    CALCULATE (
        MAX ( 'FactAppStatuses'[DateAddedAt] ),
        FILTER (
            ALL ( 'FactAppStatuses' ),
            'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] )
                && 'FactAppStatuses'[Status] = 'Rejected'
        )
    )
RETURN
    DATEDIFF ( FirstSubmittedDateLastRejectedDateDAY )

Figure 2 – First version of the calculated column DaysFromSubmittedToRejected

For both values the logic used is very similar, for the variable FirstSubmittedDate the MIN() function is used to get the first date.  The FILTER statement only returns records with same ApplicationID and Status = ‘Submitted’.  Whereas the variable  LastRejectedDate the MAX() function is used to return the last date.  Using variables, means the logic can be re-used for other statuses, and the DATEDIFF() function is easier to read.  When the results are displayed they were not as expected. 


Figure 3 – results from the first version of the DaysFromSubmittedToRejected function see Figure 2

The requirement was to return the number of Days elapsed From Submitted To Rejected with the value returned against the record which is the last time the application received that status.  So, another function was created to see if the current record is the LastRejectedDate, which is shown in Figure 4.

LastRejectedDate = 
IF ( 
MAXX ( 
FILTER ( 
FactAppStatuses, 
EARLIER ( FactAppStatuses[ApplicationID] ) = FactAppStatuses[ApplicationID] 
&& FactAppStatuses[Status] = ‘Rejected’ 
), 
[ID] 
) 
= [ID], 
"Y", 
"N" 
)  
Figure 4 – LastRejectedDate function

With this calculated column (LastRejectedDate) in place the return statement of ‘DaysFromSubmittedToRejected’ function just requires a small change which is as follows 

DaysFromSubmittedToRejected = 
VAR FirstSubmittedDate = 
     CALCULATE ( 
         MIN ( 'FactAppStatuses'[DateAddedAt), 
         FILTER ( 
             ALL ( 'FactAppStatuses), 
             'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID)  
              && 'FactAppStatuses'[Status] = ‘Submitted’ 
         ) 
     ) 
VAR LastRejectedDate = 
     CALCULATE ( 
         MAX ( 'FactAppStatuses'[DateAddedAt), 
         FILTER ( 
             ALL ( 'FactAppStatuses), 
             'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID)  
              && 'FactAppStatuses'[Status] = 'Rejected’ 
         ) 
     ) 
RETURN 
IF ( 
'FactGrantStatuses'[LastRejectedDate] = "Y", 
DATEDIFF ( FirstSubmittedDate, LastRejectedDate, DAY ), 
0 
) 

Figure 5 – Second version of the calculated column DaysFromSubmittedToRejected

When the function is added to the PowerBI report shown in Figure 5 then run then the results can be see in Figure 6 below.  


Figure 6 – results from the second version of the DaysFromSubmittedToRejected function (see Figure 5)

The addition of the IF() statement uses the value to check if current row contains the LastRejectedDate.  Otherwise it returns a zero.   This returns the record set which is expected.

The positive uses of this method are that the same calculation can be re-used for a variety of statuses only requiring a minor change to the formula.  On the negative side the more calculated columns there are the longer it would take to generate a report.

No doubt there are other approaches that can be taken to improve this. The time taken to evaluate the calculated column will be slower as the number of rows increases.  On the other hand, the current method is significantly faster than doing the same calculations on the application database.  Even better with the use of a PowerBI report or a tabular data model.  These figures could easily be used by an end user to reports which provide insight into their data.  All the better that our clients could do this for themselves, with relative ease.












b i u quote


Save Comment
Showing 0 Comment