Power Query + Dynamic SQL WHERE IN

SteelReyn

New member
Joined
Mar 1, 2016
Messages
32
Reaction score
0
Points
0
This is really two parts: 1 on whether or not what I'm trying to do is a good approach, and 2 on how to do it.

For part 1, I have a variable list of units (changes based on what the user needs to check) for which I need to pull the status of each from an Oracle database. The most direct route seemed to be:


  1. Pull the user's list into PQ using From Table and create a connection only
  2. Query the Oracle database table with a connection only (SELECT unit, status FROM table)
  3. Merge the two queries based on the unit field to filter the values I get from the Oracle table

The problem is that the table I'm querying in Oracle is about 40 million rows, and attempting to merge the two queries takes so long that I eventually cancel out of it. Even though I might only want 5 rows, I'm assuming the approach above is forcing PQ to do the matching after pulling in the data, instead of letting the server handle it. As a comparison, when I write a simple query like SELECT unit, status FROM table WHERE unit IN (1, 2, 3, 4, 5), I get those results instantly.

Code:
let
    Source = Oracle.Database("DATABASE", [Query="SELECT unit, status FROM table WHERE unit IN (1, 2, 3, 4, 5)"])
in
    Source

My thought, then, is to create a dynamic variable to generate my IN list and feed that to the SQL in PQ, bypassing the need to merge two different queries. I was able to create the unit parameter so that I have a "query" in PQ that is nothing but my IN list, and then modify the query above like so:

Code:
let
    Source = Oracle.Database("DATABASE", [Query="SELECT unit, status FROM table WHERE unit IN (" & UNIT_PARAM & ")"])
in
    Source

This is where I hit part 2, the how. The syntax works fine (I checked it with a hard-coded parameter instead of from another query - success!), but this introduced me to the "Please rebuild this data combination" error Ken mentions here (excelguru.ca/blog/2015/03/11/power-query-errors-please-rebuild-this-data-combination). I don't know if what I'm trying to do is possible, or if I don't understand the blog post well enough to adapt it to this situation. I tried variations of the following but got the same error:

Code:
let
    Source = UNITS_PARAM,
    Query = Oracle.Database("DATABASE", [Query="SELECT unit, status FROM table WHERE unit IN (" & Source & ")"])
in
    Query

Any help is appreciated. Thanks!
 
Hey SteelReyn,

It can be done, but I'm not totally thrilled with the method... I've attached an example that is dynamically passing parameters into the SQL query, but I don't have an Oracle database that I can test on. Instead I've connected to an Azure hosted SQL database (the one we provide for readers of M is for Data Monkey). So if you have the book, you'll be able to log in, toggle the field value and prove that it works. :)

The main issue here is this:

In order to avoid the formula firewall error, we have to connect to the data sources using "connection only" queries, then reference those from a master query in order to combine them. Any attempt to connect directly to a data source while pulling in data from another query is what's going to blow us apart. So far as I can tell, the only way to execute a SQL query against the database using Power Query is to pass it at the time the connection to the database is made. In short, we can't properly stage it to avoid the formula firewall issue as we'll ALWAYS need to connect to multiple data sources at once (the parameter and the actual database.)

Here's the parameterized version of a query to connect to a Sql database (and I'm sure Oracle is similar):

Code:
Source = Sql.Database(dbURL,dbName,[Query=dbQuery]),

If you see here, the dbQuery is the actual query, and seems to be required to be with the initial connection. I tried several things to break it apart, and couldn't make it work.

The good news, however is that we CAN still make this work. The bad news is that to do it we need to turn off the Privacy settings. I hate doing this - it feels wrong - but it will let you connect and refresh the query without hassling you at that point. To do this go to Query Options --> Privacy and select "Ignore the Privacy Levels and Potentially Improve Performance".

Hope this helps!
 

Attachments

  • DynamicSQL.xlsx
    49.6 KB · Views: 2,600
Thank you, Ken! First, I followed exactly what you laid out, and it worked. Documenting a couple exceptions and things I've learned in case it helps anybody.

The good news: I changed my overall query setup to mirror your format and still got the Firewall error as expected. Then I changed the Privacy setting and I could refresh the data.

However, I still couldn't do a refresh all. Everything would refresh except the query pulling from my Oracle database, which would give me the error/warning: "Permission is required to run this native database query," followed by an Edit Permission button. Clicked the Edit button and then had the option to Run the query anyway, but I wouldn't want to do this every time. To avoid it, I went to Query Options -> Security and unchecked "Require user approval for new native database queries." Yikes, but now I can refresh all.

With both of those settings change, I also tried going back and simplifying the query. I went from this:

Code:
let
    // Get the parameter
    myUnits = UNITS_PARAM,

    // Create the SQL
    myQuery = "SELECT unit, status FROM table WHERE unit IN (" & myUnits & ")",
    
    // Query values
    Source = Oracle.Database("dbName", [Query=myQuery])
in
    Source

to this:

Code:
let
    Source = Oracle.Database("dbName", [Query="SELECT unit, status FROM table WHERE unit IN (" & UNITS_PARAM & ")""])
in
    Source

Both options work equally well. So the settings we had to change are the key, more than building the query string separately. I don't like having to uncheck either of those settings, but this method works, and is much faster than letting PQ pull all of the data before doing a merge, at least in this case. I went from having to cancel out of the merge method before getting results, to near-instant results with the dynamic SQL.

As long as I know I'm not writing to the db or creating absurdly long-running queries, is there any danger I'm overlooking with this method? And is the Firewall error that started this a "feature" to keep me from messing myself up, or a hindrance? It would be nice to have something like an order of operations for PQ queries, because conceptually there is nothing difficult or tricky here. Pull the values from the workbook, get a parameter from them, and feed it to SQL.

Thanks again! Looking forward to reading the book. It's finally up next on my training to-do list.
 
I didn't mention this, but your question inspired me to finally write up a blog post on this topic. In there I remembered the bit about native database queries.

The thing with the native database query is that each time you "trust" it, it adds it to a list inside a file stored in your user settings folder. So every time you change the parameter, if you haven't pre-approved it, it prompts you. If you've approved that specific query before, then it shouldn't ask. But that's irritating.

So yes, it's the settings that are the key here.

This article talks about the privacy settings,

Honestly, I haven't tried issuing an UPDATE or DELETE query via Power Query. For some reason, I didn't think that they'd work. But if you're only using SELECT queries, then I can't see this being an issue.
 
This is all kinds of awesome. I can check "inspired blog post" off my list, and I have a validation tool that can handle more records in less time and return more information than the tool I had before. I've been bouncing off the walls all day. My manager said "It's sad what excites us." :)

My name's Kyle by the way. Thanks for the help.
 
Back
Top