Pattern Matching in Power Query
In Power Query there is no tool yet for matching regular expressions (patterns).
Only in Power BI we can run scripts in R and Python, hopefully these languages will be added to Excel Power Query.
Until then, to solve this problem, with a little imagination, we can design our own pattern matching process.
Here is a file you can download and test:
For this pattern, “##-##-##”, (meaning 3 groups of 2 digits, separated by “-“), the query I designed is:
let Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content], List1 = List.Transform({0..9}, each Text.From(_)), ReplacementsList = List.Zip({List1,List.Repeat({1},List.Count(List1))}), #"Added Custom" = Table.AddColumn(Source, "Matches", (x)=> Text.Combine(List.Transform( Text.PositionOf( Text.Combine( List.Transform( List.ReplaceMatchingItems( Text.ToList( x[Legal]),ReplacementsList),Text.From)),"11-11-11",Occurrence.All), each Text.Range(x[Legal],_,8)),";") ) in #"Added Custom"
Logical Process Design:
Step 1: The solution is based on replacing any digit with 1 (or any other text). To do that, we have to create a list of digits:
List1 = List.Transform({0..9}, each Text.From(_))
The above line will return this list:
={"0","1","2","3","4","5","6","7","8","9"}
To create the replacements list, the following line will do the trick:
ReplacementsList = List.Zip(
{List1,List.Repeat({1},List.Count(List1))}
),
The result is a list of lists, to set the initial chars and the corresponding replacement chars:
={{"0","1"},{"1","1"},{"2","1"},{"3","1"},{"4","1"},{"5","1"},{"6","1"},{"7","1"},{"8","1"},{"9","1"}}
The replacements are executed by the following line:
List.ReplaceMatchingItems(Text.ToList(x[Legal]),ReplacementsList)
The key here is converting the initial text into a list, each character in text becomes an entry in the list. This way, a text like “sample” becomes a list: {“s”,”a”,”m”,”p”,”l”,”e”}. Once the text is a list, List.ReplaceMatchingItems will apply each replacement pairs: 0 with 1, 1 with 1, 2 with 2, and so on.
After replacement, the initial text for example will contain only “1” instead of digits as below:
"16-24-61 E/2NENE TRACT 23-44-56 COMMERCIAL" becomes: "11-11-11 E/1NENE TRACT 11-11-11 COMMERCIAL"
Step 2: Because we replaced digits with “1”, in the next processing stage, we have to identify the new pattern in the initial text:
Text.PositionOf("11-11-11 E/1NENE TRACT 11-11-11 COMMERCIAL", "11-11-11", Occurrence.All)
The result is a list of all pattern matches: {0,23} (zero based)
Step 3: In last step, we have to extract 8 chars from the initial text (8 is the length of our pattern):
=List.Transform({0,23},each Text.Range("16-24-61 E/2NENE TRACT 23-44-56 COMMERCIAL",_,8))
The result is a list of 0 or more elements, depending on the number of pattern matches:
={"16-24-61","23-44-56"}
Step 4: In the final processing stage, we just combine the matches into a single piece of text, using the code below:
=Text.Combine({"16-24-61","23-44-56"}, ";"),
The result of this step is:
"16-24-61;23-44-56"
Enjoy!
Awesome tutorial. Can you make a video of the tutorial to be more easier and we can learn from you those magic tricks?
Hi,
I might, but until then you can download the sample file, you can learn a lot by analyzing the query in the Query Editor.
Cheers,
Catalin
Thanks a lot for your reply and your interest.
2 years later…
Thank you for this very well demonstrated, explained and useful tutorial.