Using merge queries – Cleansing, Transforming, and Shaping Data

Using merge queries

Merge queries are used when we want to merge or combine columns from multiple queries where some columns are different. This often happens in data warehouse environments as data tends to be normalized or split across multiple tables (imported into a Power BI data model as a query). For more information on why data is normalized in a data warehouse, we recommend reading The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling by Ralph Kimball. Tables contain key columns that allow individual records to be merged or joined together when needed. Normalizing data is standard practice for properly designed data warehouses. For more information on data warehouse design, we recommend looking at resources on dimensional modeling, such as https://wikipedia.org/wiki/Dimensional_modeling.

In order to use merge queries in Power Query, you need to have at least two queries that contain data that you’d like to merge into one query by bringing together columns from both queries. Ideally, these queries contain columns that allow the data to relate to each other, typically called key or join key columns. If the queries don’t contain a key, then it may be possible to use other data transformation capabilities (as discussed earlier in the Using column transformations section) to create keys that would allow the data to be merged properly.

Using merge queries requires one of the two queries to be selected from the Queries pane on the leftmost side of the Power Query Editor window. Then, you click the Merge Queries button under Combine on the Home tab of the ribbon. There is the option to just merge the queries or merge the queries as a new query.

The selected table will show at the top of the merge user interface that is shown and the second query can be selected from the drop-down menu in the center. Now both queries are shown in the preview, and you only need to select the key column or join columns from both queries and then select Join Kind. There are six types of joins supported by merge:

  • Left Outer – Brings all rows from the first query and only matching records from the second query.
  • Right Outer – Brings all rows from the second query and only matching records from the first query.
  • Full Outer – Brings all rows from both queries, no matter whether they are matching or not. If there are any rows that do not match, this will create null values for some columns.
  • Inner – Brings only rows that match from both queries.
  • Left Anti – Brings only rows from the first query that don’t have matching rows in the second query.
  • Right Anti – Brings only rows from the second query that don’t have matching rows in the first query.

Additionally, you can use fuzzy matching for the join, which means the values in the key columns do not need to match exactly. Fuzzy matching can be configured using these settings:

  • Similarity threshold – This setting needs to be a decimal value between 0.00 and 1.00. A value of 1.00 means every value needs to match exactly, while a value of 0.00 means any values will match.
  • Ignore case – Without this checkbox enabled, values will have to match exactly, while this checkbox will enable values to match even if the case in the text does not match.
  • Match by combining text parts – When this option is enabled, Power Query will try to match parts of the text to find matches. For example, if one column contains “wind” and another contains “mill,” then Power Query will try to use these two parts to match against “windmill.”
  • Maximum number of matches – This option can be used to enforce a limited number of matches to take place during the merge. For example, if you only want to return the first match (not all matches), then using “1” in this option should be used.
  • Transformation table – This option will keep track of fuzzy matches and record instances where “wind” and “mill” match to “windmill.” This table needs to contain To and From columns.

Author: Noah Walker

Leave a Reply

Your email address will not be published. Required fields are marked *