Excel First

Excel Solutions

Pattern Matching in Power Query

White image Background

In Power Query there is no tool yet for matching regular expressions (patterns).
Only in Power BI we can run scripts in R and Python, hopefully these languages will be added to Excel Power Query.

Until then, to solve this problem, with a little imagination, we can design our own pattern matching process.

Here is a file you can download and test:

For this pattern, “##-##-##”, (meaning 3 groups of 2 digits, separated by “-“), the query I designed is:

let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
List1 = List.Transform({0..9}, each Text.From(_)),
ReplacementsList = List.Zip({List1,List.Repeat({1},List.Count(List1))}),
#"Added Custom" = Table.AddColumn(Source, "Matches", (x)=> Text.Combine(List.Transform( Text.PositionOf( Text.Combine( List.Transform( List.ReplaceMatchingItems( Text.ToList( x[Legal]),ReplacementsList),Text.From)),"11-11-11",Occurrence.All), each Text.Range(x[Legal],_,8)),";")    )
in
#"Added Custom"

Logical Process Design:

Step 1: The solution is based on replacing any digit with 1 (or any other text). To do that, we have to create a list of digits:

 List1 = List.Transform({0..9}, each Text.From(_))

The above line will return this list:

={"0","1","2","3","4","5","6","7","8","9"}

To create the replacements list, the following line will do the trick:

ReplacementsList = List.Zip(
          {List1,List.Repeat({1},List.Count(List1))} 
                           ), 

The result is a list of lists, to set the initial chars and the corresponding replacement chars:

 ={{"0","1"},{"1","1"},{"2","1"},{"3","1"},{"4","1"},{"5","1"},{"6","1"},{"7","1"},{"8","1"},{"9","1"}}

The replacements are executed by the following line:

List.ReplaceMatchingItems(Text.ToList(x[Legal]),ReplacementsList)

The key here is converting the initial text into a list, each character in text becomes an entry in the list. This way, a text like “sample” becomes a list: {“s”,”a”,”m”,”p”,”l”,”e”}. Once the text is a list, List.ReplaceMatchingItems will apply each replacement pairs: 0 with 1, 1 with 1, 2 with 2, and so on.

After replacement, the initial text for example will contain only “1” instead of digits as below:

"16-24-61 E/2NENE TRACT 23-44-56 COMMERCIAL" becomes: "11-11-11 E/1NENE TRACT 11-11-11 COMMERCIAL"

Step 2: Because we replaced digits with “1”, in the next processing stage, we have to identify the new pattern in the initial text:

Text.PositionOf("11-11-11 E/1NENE TRACT 11-11-11 COMMERCIAL", "11-11-11", Occurrence.All)

The result is a list of all pattern matches: {0,23} (zero based)

Step 3: In last step, we have to extract 8 chars from the initial text (8 is the length of our pattern):

=List.Transform({0,23},each Text.Range("16-24-61 E/2NENE TRACT 23-44-56   COMMERCIAL",_,8))

The result is a list of 0 or more elements, depending on the number of pattern matches:

={"16-24-61","23-44-56"}

Step 4: In the final processing stage, we just combine the matches into a single piece of text, using the code below:

=Text.Combine({"16-24-61","23-44-56"}, ";"), 

The result of this step is:

"16-24-61;23-44-56"

Enjoy!

4 Comments to Pattern Matching in Power Query

  1. YasserKhalil says:

    Awesome tutorial. Can you make a video of the tutorial to be more easier and we can learn from you those magic tricks?

  2. Nosmada says:

    2 years later…

    Thank you for this very well demonstrated, explained and useful tutorial.

Leave a Reply

Your email address will not be published. Required fields are marked *

The Excel Learning Journey that I propose for you, is structured in such a way as to offer you the most practical way to learn Excel, free.

Start Your Free Excel Learning Journey!

Enter your email address to subscribe to this blog: