Solved A tablejoin shows a wrong record after opening it

pinarello · Sep 25, 2019

Hello,

in a German forum, a questioner is looking for a VBA solution. But the problem could be solved very easily with Power Query, if there wouldn't be such a stupid error.

The spreadsheet "Target" contains a list with the key values "Text1" and "Text2", for which various data should be transferred from the spreadsheet "Source". If there are several values for the key combination Text1/Text2, the data of the most recent entry (column "Date") should be transferred.

So really a small thing for Power Query, if there wouldn't be this error.

I loaded the data of the spreadsheet "Source" into the connection "Table5" and filtered it to the values DE0010 and GEH209 (Text1). There are 2 entries for DE0010. The newest one shows the date 23.09.2019 (Sep 23). Where I have only set the filter so that the error becomes directly visible.

If I now form a join from "Merge1" (Table2 and Table3 of "Target") and "Table5", then the preview window of the query "Merge2" also shows me exactly the right data set to "DE0010", with the date 23.09.2019. in the step "Source". But if I now open the table in the next step "Expanded (0)", then the data of the record with the older date (July 25) are shown to me, which is not included in the connection "Table5".

Only if I go the detour to load the connection "Table5" as table, then load this as connection, and then in the query "Merge2" for the Join assign the new connection, the correct record is shown to me.
View attachment mof - 2 Tabellenblätter mit jeweils 2 Spalten vergleichen und kopieren (PQ) - en.xlsx

cyborgski · Sep 25, 2019

Hi pinarello,

You have a Table.Sort operation in table5. PQ will optimize that step out depending on the conditions under which it is running. If you buffer the sort, then PQ will not be able to optimize out the line and the merge will work correctly.

Code:

let
    Source = Excel.CurrentWorkbook(){[Name="Tabelle5"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Text1", type text}, {"Spalte24", type any}, {"Spalte25", type any}, {"Text2", type text}, {"Spalte26", type any}, {"Spalte27", type any}, {"Spalte28", type any}, {"Spalte29", type any}, {"Spalte30", type any}, {"Spalte31", type any}, {"Spalte32", type any}, {"Spalte33", type any}, {"Spalte34", type any}, {"Spalte35", type any}, {"Spalte36", type any}, {"Spalte37", type any}, {"Spalte38", type any}, {"Spalte39", type any}, {"Spalte40", type any}, {"Spalte41", type any}, {"Date", type datetime}, {"Spalte1", type any}, {"Spalte2", type any}, {"Spalte3", type any}, {"Spalte4", type any}, {"Spalte5", type any}, {"Spalte6", type any}, {"Spalte7", type any}, {"Spalte8", type any}, {"Spalte9", type any}, {"Spalte10", type any}, {"Spalte11", type any}, {"Spalte12", type any}, {"Spalte13", type any}, {"Spalte14", type any}, {"Spalte15", type any}, {"Spalte16", type any}, {"Spalte17", type any}, {"Spalte18", type any}, {"Spalte19", type any}, {"Spalte20", type any}, {"Spalte21", type any}, {"Spalte22", type any}, {"pA", type text}, {"Spalte23", type any}, {"Text3", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Spalte24", "Spalte25", "Spalte26", "Spalte27", "Spalte28", "Spalte29", "Spalte30", "Spalte31", "Spalte32", "Spalte33", "Spalte34", "Spalte35", "Spalte36", "Spalte37", "Spalte38", "Spalte39", "Spalte40", "Spalte41", "Spalte1", "Spalte2", "Spalte3", "Spalte4", "Spalte5", "Spalte6", "Spalte7", "Spalte8", "Spalte9", "Spalte10", "Spalte11", "Spalte12", "Spalte13", "Spalte14", "Spalte15", "Spalte16", "Spalte17", "Spalte18", "Spalte19", "Spalte20", "Spalte21", "Spalte22", "Spalte23"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Text1] <> null)),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Text1", Order.Ascending}, {"Text2", Order.Ascending}, {"Date", Order.Descending}}),
[B]    BufferSort = Table.Buffer( #"Sorted Rows"),[/B]
    #"Changed Type1" = Table.TransformColumnTypes(BufferSort,{{"Date", type date}}),
    #"Grouped Rows" = Table.Group(#"Changed Type1", {"Text1", "Text2"}, {{"Anzahl", each _, type table [Text1=text, Text2=text, Datum=date, pA=text, Text3=text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Index 1", each Table.AddIndexColumn([Anzahl], "Index1", 1, 1)),
    #"Expanded {0}" = Table.ExpandTableColumn(#"Added Custom", "Index 1", {"Text1", "Text2", "Date", "pA", "Text3", "Index1"}, {"Text1.1", "Text2.1", "Date", "pA", "Text3", "Index1"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded {0}", each ([Index1] = 1)),
    #"Filtered Rows2" = Table.SelectRows(#"Filtered Rows1", each Text.StartsWith([Text1], "DE00") or Text.StartsWith([Text1], "GEH2")),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows2",{"Anzahl", "Text1", "Text2", "Index1"})
in
    #"Removed Columns1"

Regards,
Mike

pinarello · Sep 25, 2019

Hello, Mike,

Many thanks for the quick and successful help.
So far I was of the opinion that buffering should mainly reduce processing times.
Now I notice that Microsoft has implemented some pitfalls in PQ that can be bypassed with buffering.

Let's see what else awaits me.

By the way, this week I noticed that PQ in Power BI can import PDF documents very well. But PQ in Excel does not yet offer the possibility to import PDF documents.

cyborgski · Sep 25, 2019

Hi pinarello,

You’re welcome.

I would not call the behavior a pitfall. Sorting is costly and USUALLY not necessary to correctly execute an operation. Therefore, the designers decided to optimize out all but the last sort operation. SQL compilers have been doing this same optimization for years. At least PQ offers the Buffer operation as a workaround. My beef with the development team is that the documentation should warn about the optimization in Table.Sort so that we don’t have to struggle with why results are not coming in as expected.

Regarding your PDF observation, Microsoft seems to develop first for PowerBI then Excel PQ (some features may never get into PQ). I presume this setup is because Microsoft wants people to pay to use the PowerBI tool. I don’t begrudge them for wanting to sell products. But I think they are missing on the fact that PowerQuery can do so much more than Business Intelligence. For example, I work in accounting and have eliminated hundreds of hours by automating journal entry and reconciliation processes. In addition to the time savings, the processes have much better controls because these steps have been automated and not performed by a user. And the tool is self-documenting (just read the steps). The auditors love the tool because they can replicate the process by hitting refresh. I am happy for the features that Microsoft brings over, but wish it would change its strategy to help those of us using the tool to transform our organizations. Because I do have a few pdf reports that would make great automated journals.

Regards,
Mike

pinarello · Sep 25, 2019

Hello, Mike,

Thank you very much for your detailed answer.

I wouldn't know how I could aggregate the most recent or oldest entry from a reference list with several entries to a key combination without sorting the table, insert an index into the aggregation and then filter the records with index value 0 or 1 at the end.

On the whole, I also think Power Query is great and if you understand the way it works, it's very easy to learn.

But to the many possible special functions, which can be captured directly as M-code in the extended editor, there is an online help for example, but the examples all show only fixed value assignments. Maybe it's because of my stupidity, but at the moment I'm not able to adapt the mentioned examples for concrete questions to the respective needs.

Even worse, I don't realize how useful this function could be from the description of a function and the examples that don't help me.

The examples in the normal Excel help are not always perfect, but they are usually helpful.

Regards,
Pinarello

cyborgski · Sep 25, 2019

Hi Pinarello,

Oh I use the Sorting technique that you mention all the time. That's why I knew the issue that you were having. And the Microsoft help is terrible, so don't feel stupid.

I have had to read a lot of blogs to pick things up. Ken Puhls, Bill Szysz, Chris Webb and Imke Feldman are some of my favorites. Ben Gribaudo has a great primer if you really want to get into the M code. I read that one at least five times. And of course, this board is amazing. The people here have great challenges that help hone my skill-set.

BTW, instead of adding an index and selecting 0, you can also use Table.First function. You could then change to ExpandRecordColumn (which seems a little faster than ExpandTableColumn in my experience).

Code:

 #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Text1", "Text2"}, {{"Tab", each _, type table}}),
    AddFirstRecord = Table.AddColumn(#"Grouped Rows", "FirstRecord", each [B]Table[/B].[B]First[/B](Table.Buffer(Table.Sort([Tab],{"Date", Order.Ascending}))), type [B]record[/B]),
    #"Expanded Tab" = Table.[B]ExpandRecordColumn[/B](AddFirstRecord, "FirstRecord", {"Date", "pA", "Text3"}, {"Date", "pA", "Text3"})

Regards,
Mike

pinarello · Sep 25, 2019

Hello, Mike,

Thank you so much for your pep talk. Yes, the Table.First feature is just as good at solving the problem.

I already know almost all of the names you gave me and on Chris Webb's blog I found some helpful information.

By the way, I only knew the previously used method because somewhere here, some time ago, the question for a Countif for Power Query was asked. Then I developed a working but very slow solution and introduced it. Combined with the question why line accesses using "Index" are so slow.

I was then informed that access via "Index" is not yet one of the strengths of Power Query and that several very good alternatives have been mentioned.

But overall this corresponds to my experience that I have gained much more than I have given by dealing with questions and working out solutions, at first almost always very primitive. Because other, better solutions are usually easier to understand if you have also dealt with the question yourself. My benefit is therefore much greater than the investment I have made.

Regards,
Pinarello

Solved A tablejoin shows a wrong record after opening it

pinarello

Member

cyborgski

New member

pinarello

Member

cyborgski

New member

pinarello

Member

cyborgski

New member

pinarello

Member