Insights

MASKING DATA IN POWER BI

This post addresses common scenarios Kane, our principal Power BI expert, have encountered where clients require data to be masked in certain ways in reporting. Most often this is to prevent exposing a competitors’ data when showing reports to clients or customers.

MASKING COLUMN VALUES

Imagine you are building a report that shows metrics for different clients of a business. The business wants to see all the detail but they also want to show the report to their clients in meetings and when they do so, they don’t want to expose a competitor’s data to the client viewing the report. There are many different clients and new ones are added to the data so building different versions of reports is not an option. The desired result is a toggle on the report where the business can choose to see all client names or select a specific client and toggle all other client names to be masked. The client they are showing the report too can still see how they compare to competitors but can’t see who the competitors are.

The solution is a modelling one. We basically need to take the dimension table with the client column to mask and do a cross join of it with itself. We then add a new column that masks clients other than the original column value. We end up with a column for the business to use to view all clients, a column for the business to select a client to show in the report and a masked column for clients to view that only shows the client selected in the original column. Its better explained with a diagram.

The same technique can be applied to any type of attribute. A common scenario is to mask the attribute based on a different grouping column. For example, if clients were members of a parent group and are allowed to see results for other clients withing their group but not outside it. I will step through some examples of how to implement the cross join and masking column.

I’m using a simple data model built from the financial data sample file included in Power BI. I have renamed the “Segment” column to “Customer” and will be using Customer as the column to be masked.

To implement in Power Query is relatively straightforward. We can duplicate our customer dimension to create a new version for applying the masking to. The easiest way to do a cross join in Power Query is to add a column to both tables with the number 1 as the value and then do a join on this. We can add this column then duplicate the query again to create another query to join back to the original

We then merge the two queries based on the custom column:

We can then remove unnecessary columns and clean up names to get our cross joined dimension. Of the original columns, all we need to keep is the original Customer column which will eventually serve as a name to select in a slicer. Selecting this name will not filter any specific customer instead it returns all customer keys. You can retain more of the original tables columns if you plan to use the new dimension for normal customer filtering as well as masking but personally, I think it’s much better to keep your original dimension as a standard customer dimension and create the new one specifically for masking.

For our masked column, if we give the same generic name to each our masked values (like “Customer”), they won’t be distinct in reports. We need to keep the masked values as separate attributes, so they display separately in visuals, but we don’t want any identifying features – They should all be just “Customer”. To achieve this, we can suffix a qty of zero width space unicode characters to the text “Customer”. I added a numeric sort column for my new Customer column – Customer (Business) so can use this to determine the amount of zero width spaces to add. Note that you should add characters to force uniqueness according to unique customers in the Customer (Business) column.

Note: I have used screenshots of code for the post but of you want to get your hands on it, there is a PBIX with all code and working examples linked at the end.

The relationship using the cross joined customer key will be many to many so we can also create a list of unique customer keys to use as a bridging table to implement the many to many relationship more efficiently. This is easily done by duplicating the original dimension and removing all columns but the key.

The tables can now be connected to the model:

The last steps are to add a field parameter that allows the report user to toggle between the Customer (Business) and Customer (Masked) columns and a single select slicer where the customer who will view the report can be selected.

You can also generate the new masked dimension in DAX if it suits your needs better, here is an example. I only have 5 customers in my sample data so manually specified a numeric value for the amount of zero width spaces to add for each customer. In real projects you would need to generate a numeric unique identifier for the Customer (Business) column to base your zero width spaces on.

Note: You may wonder why GENERATE instead of CROSSJOIN in the code above. It’s because heard Phil Seamark say it’s faster and it’s become habit. I haven’t tested it 🙂

What if you need to mask customer names based on a different column? To illustrate, I’ve added a new grouping column to my customer dimension. The goal is that only customers belonging to the same group should be visible in the report when a single customer is selected to view it.

Rather than step through all the steps again, you can see the difference in logic required in the following code:

Here is an example of a report page with masking by group. It’s important to set slicers for customer and the masking field parameter to single select. You can see in the screenshot only customers belonging to my “Group B” Customer group are shown and others are masked.

Here is a page with masking by individual customers.

MASKING MEASURE VALUES

Of course, it may be preferable in certain scenarios to mask measure values rather than column values. Businesses may wish to present reports with all the attribute detail but randomize or mask the actual numbers.

A calculation group can make quick work of randomizing the numbers in a report page. Added to a slicer this will randomize all non-text, measure-based values on a page.

There are some options though, what if we want to hide the values but keep the underlying number so that charts appear correctly? This would allow accurate comparison of say bar length in a bar chart but not show the specific value – you get an idea of the amount but not the specific amount.

To mask data labels, we can use format string expressions in a calculation group. Let’s assume we want a similar behaviour to our previous example where report users can select a customer to show the report too and we wish to randomize or mask values other than that customer’s. Let’s also assume we don’t have our previously created masked customer dimension.

Firstly, we need to create a disconnected table with our customer names in it so that we can use them in a slicer for our users to select. This can be easily derived in Power Query or DAX, we just need the customer column from our customer dimension.

We can now setup to calculation groups like so:

Each calculation item simply checks the current customer value in the filter context is included in the selections in the slicer values from the disconnected table and either manipulates the format string or randomizes the value. Masking labels is probably more useful unless you are simply demoing a report and the underlying numbers are meaningless.

Of course, custom format strings will mess up chart axis even if you just apply them at a measure level with the new dynamic format strings for measures. Even those this feature is still in preview at the time of writing; it seems to have addressed none of the issues present with using them in calculation groups. I really hope the PBI team gets this fixed and GA soon! For now, its easiest just to leave the axis off. You can add a dummy measure to the well when multiple measures are possible (Charts take the format string of the first data point of the first measure in the well) but otherwise your out of luck or keep hacking.

Here is a sample report with value masking implemented:

Having some data masking techniques in your toolbelt can be a game changer for certain clients. You’re broadening the possible audience of you reports and facilitating interaction between a business and clients. Being able to show a customer where the opportunity is with real live data is a powerful tool for businesses. You can also avoid many duplicate versions of a report customized for different audiences which is a difficult to maintain solution.

Thanks for sticking with me to the end through a long post, I hope there’s something useful for you! Would love to hear if you have some other use cases.

Author: Kane Snyder

The PBIX with all examples is available here

Contact Us

Receive a FREE 30-minute consultation with one of our BI Consultants

Find Out