Column profile – Profiling the Data

Column profile

This option will give you a more detailed view of how the data is spread in your column. It will provide you with summary statistics for the first 1,000 rows, by default, of your dataset. This includes a count of how many rows Power BI is bringing in. This can give you not only a sense of how large the data in the column is but also whether you are missing data. If your original data source had 500 rows, but you are only seeing 250, then either you filtered out half your data in Power BI or it’s not being retrieved from the source for some reason. This row count will also show how many rows are outliers, how many are null, and how many are empty text. It will also show the minimum value and the maximum value. If the column is marked as a text column, Power BI determines the minimum and maximum alphabetically.

If the column is marked as a numeric column, then along with all those statistics, you will see how many rows are zero, the average for the column, and the standard deviation. There is also an entry for NaN, which stands for not a number. This is the count of rows that have some non-numeric character in them.

The bar chart will show you how the values are distributed. This is a larger and easier-to-read visualization of the same data from the Column distribution option. The main differences, aside from the size, are that you can easily compare distributions between columns in Column distribution, and you can easily create a filter of your data in Column profile. In Column profile, you can hover over or click one of the bars and choose to either exclude all the rows it represents from the report or filter the report to just those rows.

Here, you can see a column profile for a numeric column. Notice that the column statistics include Average, Standard deviation, Zero, and NaN. Also, note that by hovering over one of the columns, you are presented with the Equals or Does not Equal options. Choosing either one will add a Filtered Rows Step to Applied Steps and filter the report according to which you choose.

Figure 3.8 – Summary statistics and distribution for a single numeric column

Looking at the summary statistics and distribution for a single column will let you get a feel for your data, giving you a preview of what you can report on when you start adding visualization to the report page. For example, Figure 3.8 shows us the minimum, maximum, and average values that can be used as summary values on a table visual in your report.

The following screenshot shows the column statistics for a text-type column. Notice that there are fewer statistics available:

Figure 3.9 – A screenshot showing column statistics for a text type column

Notice this text column does not have an average or other number-based statistics.

Date columns have one more statistical field than text. A date column will have all the statistics as a text column, but will also have an average.

Figure 3.10 – Date fields have their own subset of statistics Looking at individual column statistics can help you ensure that the data you are reporting on is correct. It can also provide you with some information on what to expect when you start adding visualizations to the reporting page.

Author: Noah Walker

Leave a Reply

Your email address will not be published. Required fields are marked *