Merge Data Based on Two Columns

This past weekend I attended SQL Saturday in Portland, OR.  While I was there, I attended Reza Rad’s session on Advanced Data Transformations with Power Query.  During that session, Reza showed a cool trick to merge data based on two columns through the user interface… without concatenating the columns first.

The Issue

Assume for a second that we have data that looks like this:

SNAGHTML64de019

There’s two tables, and we want to join the account name to the transaction.  The problem is that the unique key to join these two tables (which isn’t super obvious here) is a combination of the Acct and Dept fields.  (Elsewhere in the data the same account exists in multiple departments.

To get started, I created two connection only queries, one to each table.

  • Select a cell in the left table (Transactions) –> create a new query –> From Table –> Close & Load To… Connection only
  • Select a cell in the right table (COA) –> create a new query –> From Table –> Close & Load To… Connection only

My Original Approach

Now, with both of those created, I want to merge the data so I get the account name on each row of the Transactions table.  So how…?

Originally I would have edited each query, selected the Acct and Dept columns, and merged the two columns together, probably separating them with a custom delimiter.  (This can be done via the Merge command on the Transform or the Add Column tab.)

Essentially, by concatenating the columns, I end up with a single column that I can use to dictate the matches.

Reza’s presentation showed that this isn’t actually necessary, and I don’t need to merge those columns at all…

Merge Data Based on Two Columns

So here’s how we can get those records from the COA Table into the Transactions table:

  • Right click the Transactions query in the Workbook Queries pane
  • Choose Merge
  • Select the COA query

The data now looks like this, asking for us to select the column(s) we wish to use for the merge:

SNAGHTML70855ae

So here’s the secret:

  • Under Transactions, click the Acct column
  • Hold down the CTRL key
  • Click the Dept column

And Power Query indicates the order of the columns you selected.  It will essentially use this as a temporary concatenated value!

image

So now do the same to the COA table:

image

And then complete the merge.  As you can see, you get a new column of data in your query:

image

of course, we can expand NewColumn to get just the Name field, and everything is working perfectly!

image

End Thoughts

This is pretty cool, although not super discoverable.  The really nice piece here is that it can save you the work of creating extra columns if you only need them to merge your data.

I should also mention that Reza showed this trick in Power BI Desktop, not Excel.  But because it’s Power Query dealing with the data in both, it works in both.  How cool is that?

21 thoughts on “Merge Data Based on Two Columns

  1. Ken,
    I found this by neccessity a couple of weeks ago. I thnk it will work for more than two columns.

  2. *Very* cool. Deserves to be much more discoverable. (slight edit: in '"To Get Started..." paragraph, isn't the COA the *right* table?) Thanks!

  3. Hey Alex, it absolutely will. I supposed I should have called the article Merge Data Based on Two (or more) Columns. I've tested it with up to 3, but can't see why you wouldn't be able to use many more than that should the task require it.

  4. Hi Ken,
    It is a known feature. I've shown this trick Mike ExcelIsFun Girvin a year (maybe more) ago. The order of columns in tables (for merging) doesn't matter, but order of clicking on them is very important.

    Regards

  5. Hey Bill,

    Known, but not to me (and others, of course). I'm still amazed by how much stuff there is to learn about this tool. 🙂

  6. Pingback: Excel Roundup 20151102 « Contextures Blog

  7. Hello,
    Thanks for this blog. I am a recent user in Power Query and the this trick is amazing ! (I used to create tempary concatenated columns to use as a match before merge).
    I noticed however than when I merge a column, I lose some rows when I expand, even though the Estimating matches based on data preview have the same values.
    I am not the only user to have this issue, it was highlighted here, but without an adequate solution (and I tried everything that was mentioned in the post).
    Did you encounter a similar issue by any chance ?

    https://social.technet.microsoft.com/Forums/fr-FR/7ce7cfb7-1379-486e-91d0-2dfabb53d7f7/merging-queries-some-rows-disappear-when-expanding-merged-query

  8. I might try doing the concatenation manually to make sure it comes up the same, then doing the merge based on that instead of trying to do it in the merge. Maybe something is getting lost in there? Failing that, I would use the Send a Frown feature to send it to the PQ team. I'm sure they'll get back to you on it.

  9. Hi there,
    Thanks for your advice. I notice that actually rows were not disappearing, it is the opposite, I have extra rows after expanding. It might be due to my merge table having duplicates, but I am not sure. The issue is that since the table is larger than the 1000 lines in the preview, I cannot extract (at least not in one go) the resulting table to compare it with my merge table.

  10. I can almost guarantee that you've got duplicates in the tale that you are trying to merge in. You can deal with that pretty quickly though...

    Go to the merge table that you are merging in. I assume you've got a query that pulls that data in, so once you've done that, remove duplicates from the column you'll use to create your merge. (Remove duplicates is on the right click menu.)

    If you're still getting duplicates afterwards, then there is something else going on...

  11. Bah!!!!

    I've been concatenating up until now. I knew this had to be possible..... cant believe it was so simple through gui

    Thx for confirming

  12. Hi There,
    Thank you so much for this useful trick! This is day 3 for me using power query. I was so excited when I found this website. I tried it on my data in power query for Excel 2013. After merging the two columns in both tables, I got a warning message saying" We were unable to determine how many matches the selection will return". I clicked ok anyways and I got this error message below
    DataFormat.Error: We couldn't convert to Number.
    Details:
    G0103
    I dont know what happened. Could it be because Table A has 6651 rows and 3 columns while Table B has 6476 rows and 3 columns. Does the number of rows affect the merge? I did a left outer join from Table A, matching from Table B.

    Looking forward to your feedback, thank you!

  13. Hi Christol, what's happening is that the data type in one of your original queries is set to be numeric, and the other query is set to be text. If you go back to the two individual queries, make the last step to convert the column you used for the join to text (showing the ABC logo in the top left corner of the column). That should fix it for you.

  14. Hi Ken,
    I tried to change the data type to numbers but it keeps changing back to the ABC123 logo in table A. I am able to merge both tables but i am still getting the same error message. Also, the merged table wont let me select the only column I am interested in from the merged results. I have to select all the columns to be able to view the merged table. I also think that the merged query left out some data because it has 1000 rows and 8 columns. But Table A has 6651 rows and 5 columns while Table B has 6476 rows and 3 columns. Could it be that the null values are affecting the output of the merge in both tables?
    I tried using power pivot Join by concatenating but it wont let me because the relational column in both tables doesn't have unique values. Any suggestions on how I can approach this?

    Looking forward to your feedback, thanks!

  15. The ABC123 means that it is an "any" data type, presumably because there is both text and numeric data in the preview. Remember also that you are only working with previews in Power Query, so it may not load all the data at once.

    I just want to confirm, you have two separate queries, let's call them Table A and Table B. The VERY LAST step in each of those queries should be to set the data type to text (not number) as at least one of those columns has textural values. And then you've started a NEW query to merge Table A and Table B, yes? Because that should work.

    If it doesn't, can you maybe post in our forum at http://www.excelguru.ca/forums with your data so that we can debug it?

  16. This saved me. I'm new to this and thought I had an unsolvable problem until I came across this feature. THANK YOU.

  17. This is really amazing.. I searched some complicated solution but this is simpler and elegant. Thanks a lot

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.