Solved The Merge window shows correct results, but the Expand shows a wrong result.

pinarello

Member
Joined
Jun 21, 2019
Messages
214
Reaction score
4
Points
18
Location
Germany
Excel Version(s)
Office 365
Question posed in the german ms-office-forum.de:

In a list with many timestamps (Input worksheet), there can be one or more IDs for each timestamp. Now the questioner wants to know which ID was booked first in a list of manually entered timestamps. But if the ID "FM0" exists for a timestamp, then this ID should be reported as the result.

With a formula combination of COUNTIFS and VLOOKUP, it is no problem to get the correct result. But since the questioner asked for a VBA solution (many users don't know that very often Power Query can be used as well), I also developed a solution with Power Query.

For all timestamps only the first ID or, if available, the FM0 ID is determined and only this one is loaded as connection. In the evaluation query the ID of the timestamps is determined by merging them. The Merge window also shows the correct IDs. But as soon as I expand the result of the merge in the next step, the FM0 IDs are no longer shown.

I have tested my PQ query several times and cannot find any errors in it. Therefore I think this must be an error in Power Query.

Some weeks ago I already reported a bug: https://www.excelguru.ca/forums/sho...rt-Converter-deletes-line-feeds-automatically
 

Attachments

  • mof - SVERWEIS - Zuordnung eines bestimmten Wertes nicht möglich (PQ).xlsx
    30.8 KB · Views: 10
Last edited:
I'll admit that I'm not thrilled with this answer, but it's not classified as a bug, but more how the engine works. To get around this problem - in the step before the merge - you either need to:
  • Wrap the step in Table.Buffer()
  • Add an Index Column, then Remove the Index column (but both steps must be there)

That essentially locks in the state before the merge takes place, and the correct results should pass through.
 
Thank you very much for the very quick reply !!!

I have incorporated your suggestions into the query, but it has no effect on me.

let
Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content],
Change_Types = Table.Buffer(Table.TransformColumnTypes(Source,{{"Datum", type datetime}})),
#"Added Index" = Table.AddIndexColumn(Change_Types, "Index", 0, 1, Int64.Type),
#"Removed Columns" = Table.RemoveColumns(#"Added Index",{"Index"}),
Merge_Queries = Table.NestedJoin(#"Removed Columns", {"Datum"}, Load_Input_as_connection, {"Datum start"}, "Load_Input_as_connection", JoinKind.LeftOuter),
Expand_Merge = Table.ExpandTableColumn(Merge_Queries, "Load_Input_as_connection", {"ID"}, {"ID"}),
Replace_null = Table.ReplaceValue(Expand_Merge,null,"not found",Replacer.ReplaceValue,{"ID"})
in
Replace_null
 
Now with following code it works:

let

tb_input = Table.Buffer(Load_Input_as_connection),

Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content],
Change_Types = Table.Buffer(Table.TransformColumnTypes(Source,{{"Datum", type datetime}})),


#"Added Index" = Table.AddIndexColumn(Change_Types, "Index", 0, 1, Int64.Type),
#"Removed Columns" = Table.RemoveColumns(#"Added Index",{"Index"}),
// Merge_Queries = Table.NestedJoin(#"Removed Columns", {"Datum"}, Load_Input_as_connection, {"Datum start"}, "Load_Input_as_connection", JoinKind.LeftOuter),
Merge_Queries = Table.NestedJoin(#"Removed Columns", {"Datum"}, tb_input, {"Datum start"}, "tb_input", JoinKind.LeftOuter),
#"Expanded {0}" = Table.ExpandTableColumn(Merge_Queries, "tb_input", {"ID"}, {"ID"}),

Replace_null = Table.ReplaceValue(#"Expanded {0}",null,"not found",Replacer.ReplaceValue,{"ID"})
in
Replace_null
 
Yeah, you beat me to it. It looks like the other table is the one that needed the buffering. I ended up adjusting the original code sample to use this:

Code:
	Merge_Queries =
		Table.NestedJoin(
			Table.Buffer(Change_Types),
			{"Datum"},
			Table.Buffer(Load_Input_as_connection),
			{"Datum start"},
			"Load_Input_as_connection",
			JoinKind.LeftOuter
		)
 
In any case many, many thanks for your very fast help.

Maybe you can find some time to have a look at the other case.
 
Hello, Ken,

after it was running, I tested the query again and cleaned it up for unnecessary steps. The definition of the Table.Buffer is actually only necessary for the first table.

But I have merged tables so often by now and normally I never need the Table.Buffering. And in those queries where I had inserted it for testing purposes to see if it would make the queries faster, they usually became slower.

So I don't understand why this query returns wrong results when Table.Buffering is not used.

The crazy thing is that I had the same error when I loaded the first table into the folder and re-imported it for the merge instead of just as a connection.

On the other hand I also had the following experience:
I read in raw data with a query, prepared it and loaded it as a sheet into the Workbook. With another query I then imported this sheet to create an evaluation. Loading it as a table into the Workbook was not really necessary. But for testing the 2nd query it was helpful to have this data present.

So and now the interesting effect: If I made changes in the raw data table and did not call the first query, with which the raw data are read and rooted up, but the update of the 2nd query has nevertheless considered the changes of the raw data. And this although the changes were not yet visible in the loaded raw data table.

I could only prevent this effect by outsourcing the reading, processing and loading of the raw data into a separate workbook.

So today I can only say that Power Query is really a fantastic tool, but many things are still mystical for me.

But now I attach the cleaned up folder once again.
 

Attachments

  • mof - SVERWEIS - Zuordnung eines bestimmten Wertes nicht möglich (PQ).xlsx
    30.8 KB · Views: 6
The fundamental issue is that there are a number of operations which don’t preserve sort order, and these include distinct, group and join. We've brought this up with Microsoft, but it is something deep within the engine, so I don't imagine it will ever get changed. Basically, when you hit one of these, the action you need to take is either Buffer() the correct table/step, or trigger this by adding/removing an Index column (which essentially assigns values and forces the same behaviour.)

I agree that it is less than ideal, but it underscores the importance of always checking the output. :)
 
Back
Top