Convert XLSX file into CSV
using Powershell only
By Boris Knizhnik (borisk@bikinfo.com)
Any links to this site are appreciated.
If you decide to include a link to this article on your site, please drop us a note.

Abstract

This web page attempts to offer a solution to conversin of XLSX files (Excel workbooks) to CSV files.
This is easily done using Excel itself using its "Save As" option, but if one needs to do it in a batch mode and on a server machine, where it is customary not to install MS Office nor any s/w drivers including OLEDB, it becomes a real problem.
So, we offer a pure Powershell solution to conversion of xlsx files into csv.

Proposed Solution

A few points before presenting the solution:

  1. It turned out that XLSX file is a zip file which contains several xml files with data.
  2. The individual sheets are kept as separate XML files.
  3. To save on space even more Microsoft stores all the character literal values in one common xml dictionary file. The individual cell value found for this string in the actual .xml file is just an occurence number in this dictionary file.
  4. Dates are stored as day number since 1900/01/01 (at least they are supposed to. I discovered that one has to subtract 2 from this number of days to get the correct conversion).
  5. Time portion of the date is stored as a fraction of a day, so it has to be multiplied by 60*60*24 (86400) to get the actual number os seconds.
  6. Microsoft does not store empty cells or rows, so any gaps between values have to be taken care by the code.
  7. To figure out the number of skipped columns one need to be able to figure out the distance between, say, cell "AB67" and "C67". The way columns are named: A through Z, then AA through AZ, then AAA through AAZ, etc., suggests that we may assume they are using a base-26 system and therefore use a simple conversion method from a base-26 to the decimal system and then use subtraction to find out the number of commas between columns.
  8. If a cell value contains a comma, the enire value has to be enclosed in doublequotes.
  9. If a cell value contains a doublequote each of them has to be doubled and then the value shoudl be enclosed in doublequotes.

With this in mind here is the code:

To call this code you may use a command like this:

The resulting will be printed, so you will have to redirect it to a file.


We hope you found this article useful. If you have any comments about this article please send them to Boris Knizhnik borisk@bikinfo.com. We appreciate any links to this site as they show your appreciation and work as free ads. If you decide to include a link to our site on your site, please drop us a note.


Back to BIK Information Services home page  Go to top


©1997-2015 BIK Information Services, Inc. All Rights Reserved.
Last modified on Thursday, 22-Oct-2015 20:17:39 EDT