Data types vs formats

One of the common questions I get in live courses, blog comments and forum posts is a variant of, “How do I format my data in Power Query or Power BI?”  The short answer is that you don’t, but the longer answer is a discussion on data types vs formats.

What am I even talking about here?

To illustrate the issue, let’s take a quick look at some sample data in an Excel table:

image

What you see here is just randomly generated data. Nothing special or exciting, but I set it up to have lots of decimal places for a reason.  I also want to point out that the yellow values are rounded to 0 decimal places, and the green values are rounded to 2 decimal places.  All the other values continue on with many decimals places.

What you see here is data that has been formatted in Excel.  I’ve applied the comma style (explaining the commas), and forced the decimals to show 10 decimal places.

Looking at the data in Power Query (in Excel or Power BI)

While I’m using Excel to demo this, it’s exactly the same in Power BI.  What I did here is pull the data into the Query Editor, and here’s a view of what I see when clicking on the Source step:

image

There are three things I want to point out here:

  1. The “data type” for each column in this image is set to “any”, as denoted by the ABC123 icon in the column’s header.
  2. The value shown on row 1 of the Value1 column displays five decimals.
  3. The true value shown for this data point is shown in the preview window at the bottom, and carries many more decimals.

And this is where the number formatting question comes in.  Why can’t I see all the decimals?  How do I apply a comma style here?  How can I line up the decimals consistently?

Data Types are not formatting

The thing to understand here is that this is not about formatting in any way.  It’s all about setting the type of data.  Is it a date, a decimal number, a whole number, currency, etc.  Let’s take a look at what happens when I set some data types:

image

Let’s start with the Value1 column, which has been formatted as a date.  If you were to select the first data point, you’ll see that it has been converted to 2108-07-09, with no decimals.  Why this value?  It’s 76,162 days since Jan 1, 1900.  The more important thing here, however, is that all the decimals have been truncated.  So if I try to convert this back to a date/time later, it will return midnight as no decimals have been preserved.

Next, we’ll take a look at the last column: Value3.  Notice here that the maximum number of decimals displayed in the column is 4.  This is because the “fixed decimal number” can only hold a maximum of four decimal places.  The number is rounded off and shown here.  (The original value for row 1 in this column was 72,248.9877387719, rounded off to 72248.9877, as shown in the preview window at the bottom.)  Once again, if I come back and change the data type in a subsequent step, those decimals have been rounded off by this step.  They are gone and aren’t ever coming back (unless I replace the data type in THIS step to change the behaviour).

Finally, the Value2 column shows number formatting that is all over the place, ranging from 2 to 6 decimal places.  The only thing I truly care about here is that there are more than 4 decimals.  The reason for this is that it indicates that this is not a fixed decimal number.  This is the one data type that actually holds more than what you see on screen.  If you were to select the first row of the Value2 column, you’d see in the preview window that the value remains as 95,125.1258361885, even though is only shows to 5 decimal places.

Data types vs formats

The big thing to be aware of here is that data types and formats are not even close to the same thing:

Formats:  Control how a number is displayed, without affecting the underlying precision in any way.

 

Data Types: Control the type of data, and will change the precision of the value to become consistent with the type of data you have declared.

This is obviously a very important distinction that you should be aware of.  Setting a data type can (and often does) change the underlying value in some way, while formatting never does.

So how do we set formatting in the Query Editor?

In short, you don’t.

In the data types vs formats battle, the Query Editor is all about setting the type of data, not the formatting.  Why?  Because you’re not going to read your data in the Query Editor anyway.  This tool is about getting the data right, not presenting it.  Ultimately, we’re going to load the data into one of two places:

Excel: A worksheet table or the Power Pivot data model

 

Power BI: The data model

The formatting then gets done in the presentation layer of the solution.  That means one (or more) of the following places:

  • The “Measure” signature (if the data is landed to the data model).  In Excel this can be controlled by setting the default number format when creating your Measure, and in Power BI it’s configured by selecting the measure then setting the format on the Modeling tab.
  • Charts or Visuals.  In Excel you can force the number format to appear as you want on your chart, and you have similar options in the Power BI visuals formatting tools.
  • Worksheet cells.  Whether landed to a table, PivotTable or CUBE function, if it lives in the Excel grid, you can apply a number style to the data.

Do I have to choose data types vs formats?

This one came up the other day on a blog comment: “Since I have to format my measures in Power BI anyway, can I just avoid setting data types?”

To me, the answer is absolutely not.  The Query Editor uses strongly typed data, meaning that you can’t combine text and numbers, dates and numbers, etc…

One of the things we demonstrate in the Power Query workshop is how avoiding data types can blow up an entire solution.  The easiest way to show this is when I take something that looks like a date in the Query Editor, and load it while the data type is still undefined.  Load it to an Excel table and it shows up as values (without the date formats applied).  But change that to load to Power Pivot and it shows up as text.  That’s bad news.

Data types and formatting are two different things.  One is about the data type and precision, the other is about how it looks.  And – in my opinion – both need to be expliclty defined.

6 thoughts on “Data types vs formats

  1. It is a shame there is no FORMAT() or TEXT() function in Power Query to format data when returning it in a text format. For example, I have data about shipping containers I needed to return in a text informational field in an Excel table. I wanted it to essentially be "Shipment 01/12 - 5,000 pieces, arriving 03/15/17"

    I had to write a ridiculous formula that did all of that manually, down to the comma in the thousands of pieces.

    I get why there is nor formatting for any type of value, but not text.

  2. Hey Ed, out of curiosity, did you try something like this?

    =Number.ToText([Column1],"#,###")

    In my tests that works just like the Excel Text function... So yes, you'd still need to write a formula that does something like:

    ="Shipment "&Date.ToText([Column1],"MM/dd")&" - "&Number.ToText([Column2],"#,##0")& ...

    But it should work and really isn't that different than Excel. Or am I missing something?

  3. Pingback: M Intellisense, Bookmarks and more | Guy in a Cube

  4. I think PQ is all about transforming data arriving at a specified type, data in shape ready for analysis and then formatting? - which can be presented in BI or Excel tool?
    That's what many are confused about formatting in Excel and BI - what you see is not always what you get...so quite happy if PQ remains as it is..it's there to transform - formatting can be done elsewhere, the icing on the cake?

  5. sorry should have said data ready to be loaded into tool for analytical purpose and presentation.
    Also thanks for this great web site and all information.. The Data Monkey book is excellent. - thanks you

Leave a Reply

Your email address will not be published. Required fields are marked *