referring to a PQ query in SQL Statement field for purpose of limiting records

vtoaso

New member
Joined
Sep 11, 2016
Messages
7
Reaction score
0
Points
0
This may be a tricky one - I have a need when creating a sql statement "from database" in the PowerQuery ribbon. I have a query in PowerQuery from a flat file, it has a few hundred persons records with a unique ID. I want to bring in corresponding salary information (and perhaps other info) for these persons into the data model I'm attempting from a SQL server, however, the table contains 600,000 + records; I dont want them all for this application and would like to limit records to the few hundred distinct person ID's from my flat file. Is there a way to refer to the unique ID's in the WHERE clause of the SQL Statement field (via Excel 2013 PowerQuery > From Database > From SQL Server Database > Advanced Options > SQL Statement ) ? I'm trying to find a smarter way than just copying the unique ID's, adding commas and parenthesizes and pasting into the query window. This use pattern will come up again.

I have ken plus's book M is for Monkey but I am unable to find this sort of issue or an example in the book. Below is a micro example of what Im attempting. Thank you all in advance

ScreenShot060117.JPG
 
Well..

If you skip the SQL screen and just build the whole thing through the UI, the query should fold and essentially do (almost) as well as writing the query from scratch. You can see the query by clicking on "View Native Query" in the applied steps window.

Alternately, what you really need here is SQL help to write your query. I'm thinking that you're looking SELECT DISTINCT User or something but, to be honest I use Power Query to do this as I don't know the correct SQL.

PS... Microsoft themselves recommend you use the Power Query UI, and not this dialog as it prevents further query folding.
 
Thank you for the response Ken, pardon I misspoke your name, Puls, not Plus...

When you say skip the SQL screen I assume you mean proceed through the generic workflow and load the entire dataset, then fold or merge. The issue is the volume of the dataset, its actually more than the 600k I mentioned. Ideally Id like to constrain results before loading the entire table for performance and the office is on 32 bit Excel (trying to change that). You mentioned advanced editor and M - is it possible to nest M within SQL on the SQL screen, have a hybrid Query?

As to your second suggestion, as a developer I cant see how SELECT Distinct would aid, the new data set I'm trying to bring in is transactional and there are many records for each person, the syntactic method I considered was to cut and paste the Person unique key from the brief, loaded data set into in the WHERE clause of the new query but I wanted to see if there was an easier way via PowerQuery, and honestly, I want to keep my PQ chops up while introducing PQ to a power user in the office and they wanted to achieve this as well.

Interesting about how Microsoft wants to direct intended use, I get that. Wonder if the volume issue has been an issue for other users or they just get clever about constraints and the WHERE clause.
 
Last edited:
All good, I've been called many things. ;)

So here's the deal... Power Query NEVER brings in the entire data set from a real database unless you ask it to. The window you look at is actually a preview window, which is why large data sets always say that there is 999+ rows. As you build the commands using the user interface in Power Query, it folds those up into a SQL statement. It's that statement that is sent, bringing back the most efficient data set. So if you connect, then filter out all rows except Dept 150, it rolls up that SQL statement and sends back "SELECT * WHERE 'DEPT'=150". It does not do a select all and then filter it locally.

The moment you provide your own SQL statement, all query folding stops. So having a hybrid M/SQL environment is possible, but it will bring in all the SQL data into Excel, then run the remaining M commands on that.

To be fair, not every command can be folded. At some point, the folding may collapse on you, and there's no real way to check it short of having something like SQL Profiler running on the server.

The best we can see is that when you connect to the database (not using a SQL query) then issue the commands, you'll see a "View Native Query" button. So long at that is lit up, query folding was active for that step. If it's not... the query folding MAY have broken.

Hope that helps,
 
This is very helpful Ken, thank you for the insight as always.
 
All good, I've been called many things. ;)

So here's the deal... Power Query NEVER brings in the entire data set from a real database unless you ask it to. The window you look at is actually a preview window, which is why large data sets always say that there is 999+ rows. As you build the commands using the user interface in Power Query, it folds those up into a SQL statement. It's that statement that is sent, bringing back the most efficient data set. So if you connect, then filter out all rows except Dept 150, it rolls up that SQL statement and sends back "SELECT * WHERE 'DEPT'=150". It does not do a select all and then filter it locally.

The moment you provide your own SQL statement, all query folding stops. So having a hybrid M/SQL environment is possible, but it will bring in all the SQL data into Excel, then run the remaining M commands on that.

To be fair, not every command can be folded. At some point, the folding may collapse on you, and there's no real way to check it short of having something like SQL Profiler running on the server.

The best we can see is that when you connect to the database (not using a SQL query) then issue the commands, you'll see a "View Native Query" button. So long at that is lit up, query folding was active for that step. If it's not... the query folding MAY have broken.

Hope that helps,

Something I noticed yesterday which is interesting.

You can MERGE two (maybe more) folding queries (on the same database) and that results in its own folded query - which is an actual join combining SQL from both queries.

So it appears that if the query you configure ends at the point where "View Native Query" is still available, any further uses of that query (by Reference, in MERGES, filters etc) takes the folding further.

Clever shit.
 
Back
Top