Be sure to check out my Scripting4Crypto initiative. It’s a fun way to get into using cryptocurrencies all while getting your PowerShell needs met.
I received an interesting request today. A user was looking for a way to filter out data from many Excel spreadsheets to be used for a ProjectWise document import process. The user had 24 spreadsheets (in 3 different Windows folders) containing nearly 200,000 rows of data and needed to filter on documents with the file extension ‘XLSX’. Yeah, you can open each Excel spreadsheet, do a filter on the ‘document file name’ column, select and copy the desired data and finally paste it into a new spreadsheet. But where’s the fun in that. The following is what I had come up with to quickly resolve the issue and provide a spreadsheet with only the data required. The resulting spreadsheet contains only about 2,000 rows.
We will be using the following cmdlets to accomplish this task. All of the ProjectWise related cmdlets are available using the PWPS_DAB module. Take a look at the help for each of the cmdlets to become familiar with their functionality, available parameters, etc.
- Get-ChildItem (Windows)
- Get-TablesFromXLSXWorkbook (PWPS_DAB)
- New-XLSXWorkbook (PWPS_DAB)
First thing we will do is create a variable with the 3 folders to be searched for the source spreadsheets.
# Path to folders containing the source spreadsheet. $Path = "D:\!Projects\" $WindowsFolders = "$Path\SourceFolder1", "$Path\SourceFolder2", "$Path\SourceFolder3"
Next, we will create a new DataTable. We will be copying data from the source Excel spreadsheets into this datatable.
# Create a new datatable to copy data into. $dtExcel = New-Object System.Data.DataTable
We will create two counters to demonstrate the amount of data being processed.
# The following two are counters for demonstration purposes. $WorksheetCount = 0 $DataRowCount = 0 # Counter used to only create data columns on the first index in the loop. $count = 1
Now we will loop through each of the three source folders.
# Loop through each of the Windows source folders. foreach ($WindowsFolder in $WindowsFolders){
For each folder we will do the following:
- Get a list of the spreadsheets contained in the folder.
# Get all spreadsheet objects from the current folder. $SpreadSheets = Get-ChildItem $WindowsFolder -File -Verbose
- Loop through each of the spreadsheets returned.
# Loop through each of the spreadsheet objects returned. foreach ($SpreadSheet in $SpreadSheets) { # Index the counter. $WorksheetCount++
- Import the data from each of the spreadsheets using the Get-TablesFromXLSXWorkbook cmdlet.
try{ # Import the data from the source spreadsheet into datatables. $dts = Get-TablesFromXLSXWorkbook -InputFileName $SpreadSheet.FullName -Verbose -ErrorAction Stop # We only need to work with the first datatable imported from each spreadsheet. $Rows = $dts[0].Rows
- Use the column names in the spreadsheet to create the data columns within the new datatable.
- The data is consistent through each spreadsheet, so we only need to add the columns on the first index in the loop.
# Create the data columns within the target datatable # using the column headers from the current spreadsheet. if($count -eq 1) { foreach ($item in $dts.Columns) { $dtExcel.Columns.Add($item.ColumnName) | Out-Null } $count++ }
- Loop through each of the rows within each spreadsheet.
- Determine if the value in column ‘Document File Name’ ends with ‘.XLSX’.
- If it does, copy the data row in to the new datatable.
# Loop through each row of data returned from the current spreadsheet. foreach ($row in $Rows) { # Index the counter. $DataRowCount++ # Determine if the 'Document File Name' column in the current row ends with the '.xlsx' extension. # If yes, copy the data row to the target datatable. if($row.'Document File Name'.ToLower().EndsWith('.xlsx')) { $dr = $dtExcel.NewRow() $dr.ItemArray = $row.ItemArray.Clone() $dtExcel.Rows.Add($dr) } } } catch { Write-Warning -Message "Something happened. Write a good error message." } } }
Finally, we will export the new datatable a spreadsheet using the New-XLSXWorkbook cmdlet.
# Export the target datatable to a new Excel spreadsheet. New-XLSXWorkbook -InputTables $dtExcel -OutputFileName 'D:\!Projects\ExcelFiles_21081217.xlsx' -Open
Experiment with it and have fun.
Hopefully, you find this useful. Please let me know if you have any questions or comments.
New-XLSXWorkbook gives me an error: The term ‘New-XLSXWorkbook’ is not recognized as the name of a cmdlet, function, script file, or operable program, and could you tell me how to fix that? Thank you!
LikeLike
You need to ensure the PWPS_DAB module.
Install-Module pwps_dab -Force
LikeLike
I am not getting the code to work, could you tell me how to install PWPS_DAB module in my windows machine? Thank you!
LikeLike
Install-Module pwps_dab -Force
LikeLike