Solved Count if in Power BI

pinarello

Member
Joined
Jun 21, 2019
Messages
214
Reaction score
4
Points
18
Location
Germany
Excel Version(s)
Office 365
Hello,

on mrexcel.com/forum the question "Countif in power BI" was asked recently. Since I found the question attractive, I tried to find a solution and found one.

When I applied it to 11,000 lines, the query ran for more than 7 minutes. By using "Table.Buffer" I could reduce the runtime to about 50 seconds, but this is still much too slow.

To solve the problem, I had to use indexes. I suspect that the long runtime is related to this. Because even in another query in which I work with INDEX, the runtime is also very bad.

However, I myself have only been working with Power Query since last December and don't know, for example, when and how "Table.Buffer" is best used.

But the great solutions posted for Challenge 5 make me confident that there is a good and fast solution for this problem as well.
 

Attachments

  • Excel PQ - COUNTIF in Power BI 5000 lines.xlsx
    118.4 KB · Views: 24
Last edited:
Hi pinarello,

I don't know that indexing is the issue but rather where you reference the prior record in step Userdef_Col.

[ColA] <> Index_3[ColA]{[#"Index-3"]-1}

In my experience, this technique is always slow. You might get better performance by merging and expanding to get the prior record.

Alternatively, you can use a group on ColA, add an index, expand and resort to avoid referencing the prior record. The query below runs in a couple of seconds.

Regards,
Mike

let
Source = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ColA", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "OriginalOrder", 1, 1),
#"Grouped Rows" = Table.Group(#"Added Index", {"ColA"}, {{"Temp", each _, type table [ColA=text, OriginalOrder=number]}}),
Transform = Table.TransformColumns(#"Grouped Rows",{{"Temp", each Table.AddIndexColumn(_, "ColB", 1, 1), type table}}),
#"Expanded Temp" = Table.ExpandTableColumn(Transform, "Temp", {"OriginalOrder", "ColB"}, {"OriginalOrder", "ColB"}),
#"Sorted Rows" = Table.Sort(#"Expanded Temp",{{"OriginalOrder", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"OriginalOrder"})
in
#"Removed Columns"
 
I have subsequently modified your code to show the self merge technique to reference the prior row. This query runs in seconds too.

let
Source = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
Index_1 = Table.AddIndexColumn(Source, "Index", 0, 1),
Sorted_Rows = Table.Sort(Index_1,{{"ColA", Order.Ascending}, {"Index", Order.Ascending}}),
Index_2 = Table.AddIndexColumn(Sorted_Rows, "Index-2", 1, 1),
Index_3 = Table.AddIndexColumn(Index_2, "Index-3", 0, 1),
#"Merged Queries" = Table.NestedJoin(Index_3, {"ColA", "Index-3"}, Index_3, {"ColA", "Index-2"}, "Index_3", JoinKind.LeftOuter),
#"Expanded Index_3" = Table.ExpandTableColumn(#"Merged Queries", "Index_3", {"ColA"}, {"PriorRow.ColA"}),
AddCountFrom = Table.AddColumn(#"Expanded Index_3", "CountFrom", each if [PriorRow.ColA] = null then [#"Index-2"]
else null),

#"Filled Down" = Table.FillDown(AddCountFrom,{"CountFrom"}),
#"Added Custom" = Table.AddColumn(#"Filled Down", "ColB", each [#"Index-2"] -[CountFrom]+1),
#"Sorted Rows" = Table.Sort(#"Added Custom",{{"Index", Order.Ascending}}),
#"Removed Other Columns" = Table.SelectColumns(#"Sorted Rows",{"ColA", "ColB"})
in
#"Removed Other Columns"
 
Last edited:
Hello, Mike,

first of all many thanks for your two fantastic solutions. I took a close look at them and then let them run step by step to understand them.

With my current state of knowledge, I would never have come up with the idea of solving the question in this way.

With a data volume of 120,000 table lines, both are almost equally fast with a runtime of 1-2 seconds. And I can't tell which one is faster. If I test with 960,000 table rows, then your first solution takes about 12 seconds and the second about 40 seconds.

By using "Table.Buffer" I can bring the first one down to 10.5 seconds, while the second one has a negative effect. Whereby I must confess that I am not yet clear, when and how "Table.Buffer" is used most meaningfully.

The reference book, published in 2018, in German language, that I bought 3 months ago, is helpful at the beginning, but as soon as the questions become more complex, it doesn't help. And the Microsoft M-Reference (English) I can hardly use at the moment, because in my opinion, the examples can hardly be adapted to my own questions.

In this respect, I think it was a good decision to become active on Excelguru.

By the way, Pinarello was the brand of my 2nd racing bike, which I rode from 1983-1988. Currently I have been riding a full carbon racing bike with a weight of about 7 kilograms since the end of 2007.


Best regards,
Pinarello
 
Many thanks to all of you.


I already gave a very positive feedback yesterday. But my answer still has to be approved by the administrator.

If my answer yesterday is not released, then I will probably have to reformulate it again.



Addendum: Funny, today's answer went straight through. Maybe I happened to use a keyword on the index yesterday
 
Last edited:
Hi pinarello,


Glad the post helped you.


I have found Table.Buffer (and List.Buffer) to be useful when calling the same table over and over again in a column or in List.Generate/List.Accumulate. And I always found .Buffer hurts my NestedJoins.


You may find better performance by putting a key on your Index3 step to be used in the NestedJoin step. But this is not guarentee of improvement either.


Index_3 = Table.AddKey(Table.AddIndexColumn(Index_2, "Index-3", 0, 1),"Index-3", true),


The rule that I follow most in PQ is "if it helps then keep it, otherwise lose it".


Regards,
Mike
 
Hi mike,

when I replace the Index_3 code I get error message: It's not possible convert value "Index-3" into type "List".

Regards,
Pinarello
 
Sorry, forgot the braces around "Index-3"
Index_3 = Table.AddKey(Table.AddIndexColumn(Index_2, "Index-3", 0, 1),{"Index-3"}, true),
 
Using "AddKey" yesterday your solution with the self merge technique for 960,000 lines only took about 11 seconds. For me incomprehensible, this variant needs today between 35 and 40 seconds (several attempts).

But I noticed something else.

In my folder I have your 2 queries, the query from citizenbh (mrexcel.com/forum - link to the beginning of the thread), which he posted after my positive experiences here, and my own (slow) query.

And only the presence of my slow query, even if I don't open and update it, causes the task manager for Excel to permanently display a CPU load of 30 - 50 percent (960,000 input lines).

Only after I have commented on my query and refreshed it once, Excel now shows a CPU load of 0.1 percent.

Do you have any idea why this might be?
 
Hi Pinarello,

I have had similar issues to what you are describing when I have multiple queries in Excel that take large amounts of resources. I find better performance by turning off the background refresh. You may have to save, close, quit Excel, and get back into the workbook prior to seeing the benefit.

QueryProperties1.PNG

QueryProperties2.PNG

I think I responded to the wrong post. I am not sure how to fix that issue.

Regards,
Mike
 
Last edited:
Hi, Mike,

Thank you very much for your quick answer, which I also tested immediately.

If I deactivate the refresh in the background, it will only refresh directly and at the bottom there will be no more possibility to cancel the refresh. All I can do is close Excel.

But then I deactivated the checkbox Refresh when all are refreshed.

And actually I don't need my ineffective query any more. This only shows me how I have found a solution to the question with my level of knowledge at that time. But if I hadn't tested with a larger amount of data back then, I wouldn't have found out how bad my solution was and wouldn't have been able to find out so much new information.

Thank you so much for letting me share your knowledge!

Many greetings,
Pinarello
 
Old thread I know, just going through these old threads for practice.
I try to answer the question before looking at other responses, then later look at the others.
So this is a quite different approach, but actually seems quite fast, especially with the Buffer line included:
Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
    ChangedType = Table.TransformColumnTypes(Source,{{"ColA", type text}}),
    Custom1 = List.Buffer(ChangedType[ColA]),
    AddedIndex = Table.AddIndexColumn(ChangedType, "Index", 1, 1),
    AddedCustom = Table.AddColumn(AddedIndex, "Custom", each List.FirstN(Custom1,[Index])),
    AddedCustom1 = Table.AddColumn(AddedCustom, "Custom.1", each List.Count(List.PositionOf([Custom],[ColA],2))),
    TheEnd = Table.RemoveColumns(AddedCustom1,{"Index", "Custom"})
in
    TheEnd
 
Hello p45cal,

this is a very nice solution, but unfortunately much too slow.

In the test with 500 table rows, the response time is not noticeable. But when I assigned the table with 960,000 rows, I stopped after 25 minutes. At this point, only 63,000 rows were calculated.
 
but unfortunately much too slow
Quite so. The 'group on ColA, add an index, expand and resort' method seems to be the most straightforward and fast.
 
Back
Top