Read Data into a Dictionary of Arrays
Posted by:Catalin Bombea | April 27, 2022
There are many scenarios where we need to categorize and split raw data.
In this scenario, we will split raw data into new workbooks, by category, using a Dictionary of Arrays.
Normally, it’s best to combine data into a single location, as I did in this article.
The main challenge is to find an efficient way to read data into categories:
- the table might not be sorted by category:
- reading and writing data row by row is very slow.
Download the demo file to try the solution:
Which is the most efficient way to collect and write data?
Based on my experience, even reading data row by row from a list object is a slow operation, the fastest way to read data is to load it into an array.
A bi-dimensional array is the best option to paste data into a worksheet, fits perfectly into a range of cells that has the same dimensions: rows and columns.
Because we want to separate data by category, we have to create an array of data for each category.
We will store the arrays into a dictionary object in this exercise, using category as a dictionary key.
Loading the raw data into an array should be an easy step:
'load source data into arrays, it's faster than looping through table rows Dim ArrMaster As Variant: ArrMaster = Master.DataBodyRange.Value
Building the Dictionary of Arrays
is the challenging part, at each row of the source data we have to check if that category is already in the dictionary.
In this case, we should append the data into that category array, otherwise we create a new array using ReDim:
'read raw data into a dictionary of arrays Dim MasterDict As Object: Set MasterDict = CreateObject("scripting.dictionary") For i = 1 To UBound(ArrMaster) ChangeCode = ArrMaster(i, Master.ListColumns("Category").Index) If MasterDict.Exists(ChangeCode) = True Then ArrResults = MasterDict(ChangeCode) Counter = UBound(ArrResults, 2) + 1 Else Counter = 1 ReDim ArrResults(1 To Master.ListColumns.Count, 1 To Counter) 'add headers ArrResults(Master.ListColumns("Category").Index, Counter) = "Category" ArrResults(Master.ListColumns("Product").Index, Counter) = "Product" ArrResults(Master.ListColumns("Sales").Index, Counter) = "Sales" ArrResults(Master.ListColumns("Month").Index, Counter) = "Month" Counter = Counter + 1 End If ReDim Preserve ArrResults(1 To Master.ListColumns.Count, 1 To Counter) ArrResults(Master.ListColumns("Category").Index, Counter) = ChangeCode ArrResults(Master.ListColumns("Product").Index, Counter) = ArrMaster(i, Master.ListColumns("Product").Index) ArrResults(Master.ListColumns("Sales").Index, Counter) = ArrMaster(i, Master.ListColumns("Sales").Index) ArrResults(Master.ListColumns("Month").Index, Counter) = ArrMaster(i, Master.ListColumns("Month").Index) MasterDict(ChangeCode) = ArrResults Next
Now that we have the data properly organized in arrays that include headers as well, we can start a loop through the dictionary keys (categories):
For Each aKey In MasterDict.Keys Counter = UBound(MasterDict(aKey), 2) 'add new book, sheet Set NewWb = xlApp.Workbooks.Add Set NewSh = NewWb.Worksheets(1) NewSh.Name = CStr(Replace(aKey, "/", "")) 'rename sheet 'set the file path, create exports folder if needed FilePath = DestFolder & Application.PathSeparator & "" & Replace(aKey, "/", "") & ".xlsx" NewWb.SaveAs FilePath 'data includes headers, paste data into sheet and organize it into a defined table (list object) If Counter > 0 Then Set Rng = NewSh.Range(NewSh.Cells(1), NewSh.Cells(Counter, Master.ListColumns.Count)) Rng.Value = TransposeArray(MasterDict(aKey)) Rng.Columns.AutoFit NewSh.ListObjects.Add xlSrcRange, Rng, , xlYes End If Application.DisplayAlerts = False 'a new workbook might have more than 1 sheet, remove all except the one we added For Each Wks In NewWb.Worksheets If Not Wks.Name Like NewSh.Name And NewWb.Worksheets.Count > 1 Then Wks.Delete Next Wks Application.DisplayAlerts = True NewWb.Close True Next aKey
With data properly organized in array, defining the range where to paste data is extremely simple, reading the number of rows of the current array with:
Counter = UBound(MasterDict(aKey), 2), then use this counter to set the range:
Set Rng = NewSh.Range(NewSh.Cells(1), NewSh.Cells(Counter, Master.ListColumns.Count))
When we add a new workbook, depending on each user settings, that new workbook might have more than 1 worksheet in it.
So, we might want to remove all other sheets than the one where we pasted data, the above code includes this section that does just that:
'a new workbook might have more than 1 sheet, remove all except the one we added For Each Wks In NewWb.Worksheets If Not Wks.Name Like NewSh.Name And NewWb.Worksheets.Count > 1 Then Wks.Delete Next Wks
The arrays must be transposed in order to bring the data in the structure we need.
To achieve this, here is the function I used:
Function TransposeArray(myarray As Variant) As Variant Dim X As Long Dim Y As Long Dim Xupper As Long Dim Yupper As Long Dim tempArray As Variant Xupper = UBound(myarray, 2) Yupper = UBound(myarray, 1) ReDim tempArray(LBound(myarray, 2) To Xupper, LBound(myarray, 1) To Yupper) For X = LBound(myarray, 2) To Xupper For Y = LBound(myarray, 1) To Yupper tempArray(X, Y) = myarray(Y, X) Next Y Next X TransposeArray = tempArray End Function
Of course, you can add different other pieces of useful codes.
For example, there are a few limitations:
- some chars are not valid for file names or sheet names,
- there is also a limit of 30 chars for the worksheet name.
Therefore, it’s a good idea to validate the file names and sheet names before applying them.
Have fun,
Catalin
Posted in Tips | No Comments »
Tags: arrays, dictionary, split data