Some thoughts of your typical data shepherd / data plumber / data dance teacher sort of person.
So what am I going to show in the following blog posts?
So that’s what I am going to show, next let’s look at what is required to do it.
Ingredients
First download and install Azure Data Studio you can download the program from here.
Once you have installed Azure Data Studio, open the application. In Azure Data Studio in the menu find ‘File’ and click it, from the menu select ‘New Notebook’ see Figure 1 below.
This will open a new notebook (yippee!!) this might not sound very exciting yet, however it is! When a new notebook opens the Kernel must be set. The way that I think about this that it sets the language which will be run in the notebook, and will default to SQL. What we want run is Python v3. From the list of Kernels available selected ‘Python 3’, this will set the language that will be run in the notebook.
Figure 2 – selecting the Kernel (programming language) that will be run in the notebook.
Once ‘Python 3’ has been selected and if Python is not set up and installed, then Azure Data Studio will prompt you to set up and configure Python for Notebooks. A screen will open as we can see in Figure 3. For this blog post I accepted the default location and clicked on the ‘install’ button.
Figure 3 – Install and configure python for use in Azure Data Studio
If everything has gone to plan, then you should see something that looks like Figure 4.
Figure 4 – installation of Python going as planned
Installing of Python can take sometime so it might be good idea to get a hot beverage or do something else till it is finished installing.
Figure 5 – Installation of python is now completed successfully
In sessionize.com it is possible to create different API’s to output data, with this example the data is outputted as JSON. It is possible to select different parts of the data to be outputted, in this example ‘All Data’ is selected. Selecting the data from sessionize.com is beyond the scope of this blog post, it is very easy to do though.
In figure 6 the last step is to get the URL to be called in the code, this can be seen in Figure 6 below.
Figure 6 - API /Embed screen in Session.com for Data Scotland 2019.
In figure 6a (yes I forgot to include this till a later edit) is the columns that are outputted from Sessions.com for the API endpoint used.
Figure 6a - Settings for Available API endpoint used in this blog post.
Ok enough setting up lets write some code. To get access to other libraries in Python, the command that is used is import <library name>. In this example there are four libraries which are imported to be used. If you run the code shown in figure 7 you might get the error message shown.
Figure 7 – Error message if the package for the library being imported is not installed.
If you do see this error message then all you need to do is install the required package. In figure 7 at the top left hand side there a button titled ‘Install Packages’. Click on that button and the terminal window will open (see Figure 8). The command that installs the library ‘pyodbc’ is ‘.\python.exe - m pip install pyodbc’, type the command into the terminal window and press enter.
Figure 8 – Entering the command to install the ‘pyodbc’ package in the terminal window.
Hopefully the ‘pyodbc’ package will install without any challenges. If like me you are not so lucky and you get the error message shown in Figure 9. Then this is quite easy to fix.
Figure 9 – Error message stating PIP (Pip Installs Packages) requires to be upgraded to install ‘pyodbc’ package If you get the error message shown in Figure 9 then enter the following command at the prompt ‘.\python.exe - m pip install –upgrade pip’. If everything goes well you will see a message like the one shown in Figure 10.
Figure 10 – Successfully upgraded PIP to v 18.
Once the new version of PIP has been installed restart Azure Data Studio. Then open a notebook select Python 3 as the kernel language then click on the ‘Install Packages’ and install ‘pyobdc’ library (see Figure 8). Once ‘pyobc’ has been installed, it is now time to run the Python script
The Python Script will do the following
1 - call the API call and get the Json string returned is this into a dict Object which is then cast to a string object.
2 - open a connection to a SQL database run SQL script to create table if does not exist
3 - insert Json string into field in the table
Below is the Python script that is used. Much of the credit must go to the various websites which I have add references to in the script. In figure 10 we can see the script that is used. All that is require to change, is URL for the sessionize.com API, user credentials in the connection string. Otherwise this is the script is what I used.
Figure 11 - Python script in Azure Data Studio Notebook to import Json in SQL server 2016
The Azure Data Studio Notebook that is shown in Figure 11 can be downloaded from here.
In the next blog post we will look at how work with the Json data in SQL Server.
In the previous post (see here) l set up the blank datasets as parameters storing the values of the database server and name. Whilst attempting to create a suitable dataset to a report for an internal customer, l was frustrated that l could not do it using Power Query. Yet on the other hand l know that l could write some TSQL that would give me the dataset l wanted very easily. So l wondered if this was possible to combine the two? So lets return to the SQL.Database function which has this very nice example Sql.Database("localhost", "Northwind", [Query= select * from Customers]) The last setting is "Query as text - Provide a query text that is run on the server to return values". It's almost like someone thought it would be a good idea to allow a user to run a SQL query. A few minutes later l had the query l wanted to run and it looked something like this -:
DECLARE @IssueType AS VARCHAR(10) = 'Problem'; DECLARE @ClientName AS NVARCHAR(100) = 'RainbowPainters'; SELECT DATEPART(MONTH, [Issue_Created]) AS Month_Number ,DATENAME(MONTH, [Issue_Created]) AS Month_Name ,DATENAME(YEAR, [Issue_Created]) AS Year_Name ,COUNT(DATEPART(MONTH, [Issue_Created])) AS Total FROM [dbo].[IssueData] AS ID WHERE [Issue_type_name] = @IssueType AND [ClientName] = @ClientName GROUP BY DATEPART(MONTH, [Issue_Created]) ,DATENAME(MONTH, [Issue_Created]) ,DATENAME(YEAR, [Issue_Created]); Next when back to the dataset opened the 'Advanced Editor' and added the query, you can it in Figure 1 below Figure 1 - Advanced Editor window with the new query in place
To make it a bit easier to read below is the actual text from the 'Advanced Editor' window above. All that has been added is the text in red,and the query text which is in italics
letSource = Sql.Database(DatabaseServer, DatabaseName ,[Query = "DECLARE @IssueType AS VARCHAR(10) = 'Problem'; DECLARE @ClientName AS NVARCHAR(100) = 'RainbowPainters'; SELECT DATEPART(MONTH, [Issue_Created]) AS Month_Number ,DATENAME(MONTH, [Issue_Created]) AS Month_Name ,DATENAME(YEAR, [Issue_Created]) AS Year_Name ,COUNT(DATEPART(MONTH, [Issue_Created])) AS Total FROM [dbo].[IssueData] AS ID WHERE [Issue_type_name] = @IssueType AND [ClientName] = @ClientName GROUP BY DATEPART(MONTH, [Issue_Created]) ,DATENAME(MONTH, [Issue_Created]) ,DATENAME(YEAR, [Issue_Created]);" ]) in Source
When l refreshed the dataset, the data appeared :-) Job done? No, as often happens one client gets something and then everybody wants why they have. So l was asked if we could create the same set of reports for another customer. There was two choices at this point, l could just change the hard code client name, or could l parameterise the client name. Having heard this before l knew about passing parameters from excel to filter datasets see the following posts. https://blog.oraylis.de/2013/05/using-dynamic-parameter-values-in-power-query-queries/ http://blog.crossjoin.co.uk/2015/11/24/using-parameter-tables-to-control-data-refresh-in-power-bi/ Having looked at the blog posts which declare and then used parameters this looked like a simple concept. Kudos to the Power BI team for that. First step was to add additional dataset as a parameter. So l now had three dataset as parameters, the latest one being 'ClientName', which stores the client name
Figure 2 - List of datasets containing parameters So having looked at the blog posts above l realised that l just need to declare a query which would hold the query text and pass in the Client Name parameter to the query text. Following some trial and error, l ended up with the query text below let pQuery = "DECLARE @IssueType AS VARCHAR(10) = 'Problem'; DECLARE @ClientName AS NVARCHAR(100) = '" & ClientName & "'; SELECT DATEPART(MONTH, [Issue_Created]) AS Month_Number ,DATENAME(MONTH, [Issue_Created]) AS Month_Name ,DATENAME(YEAR, [Issue_Created]) AS Year_Name ,COUNT(DATEPART(MONTH, [Issue_Created])) AS Total FROM [dbo].[IssueData] AS ID WHERE [Issue_type_name] = @IssueType AND [ClientName] = @ClientName GROUP BY DATEPART(MONTH, [Issue_Created]) ,DATENAME(MONTH, [Issue_Created]) ,DATENAME(YEAR, [Issue_Created]);", Source = Sql.Database(DatabaseServer, DatabaseName,[Query=pQuery]) in Source
Figure 3 - showing the query text in the Advanced Editor Window Much to my delight this syntax works as expected. So l can now have a way that it is possible to specific parameters which can be passed to SQL server. From reading about M (Power Query Formula Language) there's more to learn. For the moment l am happy with the new found knowledge, whilst l play with some of the other technologies offered by PowerBI. The only downside is this only works in PowerBI Desktop :-(