c# - Fastest method to remove Empty rows and Columns From Excel Files using Interop -
i have lot of excel files contains data , contains empty rows , empty columns. shown bellow
i trying remove empty rows , columns excel using interop. create simple winform application , used following code , works fine.
dim lstfiles new list(of string) lstfiles.addrange(io.directory.getfiles(m_strfolderpath, "*.xls", io.searchoption.alldirectories)) dim m_xlapp = new excel.application dim m_xlwrkbs excel.workbooks = m_xlapp.workbooks dim m_xlwrkb excel.workbook each strfile string in lstfiles m_xlwrkb = m_xlwrkbs.open(strfile) dim m_xlwrksheet excel.worksheet = m_xlwrkb.worksheets(1) dim introw integer = 1 while introw <= m_xlwrksheet.usedrange.rows.count if m_xlapp.worksheetfunction.counta(m_xlwrksheet.cells(introw, 1).entirerow) = 0 m_xlwrksheet.cells(introw, 1).entirerow.delete(excel.xldeleteshiftdirection.xlshiftup) else introw += 1 end if end while dim intcol integer = 1 while intcol <= m_xlwrksheet.usedrange.columns.count if m_xlapp.worksheetfunction.counta(m_xlwrksheet.cells(1, intcol).entirecolumn) = 0 m_xlwrksheet.cells(1, intcol).entirecolumn.delete(excel.xldeleteshiftdirection.xlshifttoleft) else intcol += 1 end if end while next m_xlwrkb.save() m_xlwrkb.close(savechanges:=true) marshal.releasecomobject(m_xlwrkb) marshal.releasecomobject(m_xlwrkbs) m_xlapp.quit() marshal.releasecomobject(m_xlapp)
but when cleaning big excel files takes lot of time. suggestions optimizing code? or way clean excel files faster? there function can delete empty rows in 1 click?
i don't have problem if answers using c#
edit:
i uploaded sample file sample file. not files have same structure.
i found looping through excel worksheet can take time if worksheet large. solution tried avoid looping in worksheet. avoid looping through worksheet, made 2 dimensional object array cells returned usedrange
with:
excel.range targetcells = worksheet.usedrange; object[,] allvalues = (object[,])targetcells.cells.value;
this array loop through indexes of empty rows , columns. make 2 int lists, 1 keeps row indexes delete other keeps column indexes delete.
list<int> emptyrows = getemptyrows(allvalues, totalrows, totalcols); list<int> emptycols = getemptycols(allvalues, totalrows, totalcols);
these lists sorted high low simplify deleting rows bottom , deleting columns right left. loop through each list , delete appropriate row/col.
deleterows(emptyrows, worksheet); deletecols(emptycols, worksheet);
finally after empty rows , columns have been deleted, saveas file new file name.
hope helps.
edit:
addressed usedrange issue such if there empty rows @ top of worksheet, rows removed. remove empty columns left of starting data. allows indexing work if there empty rows or columns before data starts. accomplished taking address of first cell in usedrange address of form “$a$1:$d$4”. allow use of offset if empty rows @ top , empty columns left remain , not deleted. in case deleting them. number of rows delete top can calculated first “$a$4” address “4” row first data appears. need delete top 3 rows. column address of form “a”, “ab” or “aad” required translation , how convert column number (eg. 127) excel column (eg. aa) able determine how many columns on left need deleted.
class program { static void main(string[] args) { excel.application excel = new excel.application(); string originalpath = @"h:\exceltestfolder\book1_test.xls"; excel.workbook workbook = excel.workbooks.open(originalpath); excel.worksheet worksheet = workbook.worksheets["sheet1"]; excel.range usedrange = worksheet.usedrange; removeemptytoprowsandleftcols(worksheet, usedrange); deleteemptyrowscols(worksheet); string newpath = @"h:\exceltestfolder\book1_test_removed.xls"; workbook.saveas(newpath, excel.xlsaveasaccessmode.xlnochange); workbook.close(); excel.quit(); system.runtime.interopservices.marshal.releasecomobject(workbook); system.runtime.interopservices.marshal.releasecomobject(excel); console.writeline("finished removing empty rows , columns - press key exit"); console.readkey(); } private static void deleteemptyrowscols(excel.worksheet worksheet) { excel.range targetcells = worksheet.usedrange; object[,] allvalues = (object[,])targetcells.cells.value; int totalrows = targetcells.rows.count; int totalcols = targetcells.columns.count; list<int> emptyrows = getemptyrows(allvalues, totalrows, totalcols); list<int> emptycols = getemptycols(allvalues, totalrows, totalcols); // have list of empty rows , columns need delete deleterows(emptyrows, worksheet); deletecols(emptycols, worksheet); } private static void deleterows(list<int> rowstodelete, excel.worksheet worksheet) { // rows sorted high low - index's wont shift foreach (int rowindex in rowstodelete) { worksheet.rows[rowindex].delete(); } } private static void deletecols(list<int> colstodelete, excel.worksheet worksheet) { // cols sorted high low - index's wont shift foreach (int colindex in colstodelete) { worksheet.columns[colindex].delete(); } } private static list<int> getemptyrows(object[,] allvalues, int totalrows, int totalcols) { list<int> emptyrows = new list<int>(); (int = 1; < totalrows; i++) { if (isrowempty(allvalues, i, totalcols)) { emptyrows.add(i); } } // sort list high low return emptyrows.orderbydescending(x => x).tolist(); } private static list<int> getemptycols(object[,] allvalues, int totalrows, int totalcols) { list<int> emptycols = new list<int>(); (int = 1; < totalcols; i++) { if (iscolumnempty(allvalues, i, totalrows)) { emptycols.add(i); } } // sort list high low return emptycols.orderbydescending(x => x).tolist(); } private static bool iscolumnempty(object[,] allvalues, int colindex, int totalrows) { (int = 1; < totalrows; i++) { if (allvalues[i, colindex] != null) { return false; } } return true; } private static bool isrowempty(object[,] allvalues, int rowindex, int totalcols) { (int = 1; < totalcols; i++) { if (allvalues[rowindex, i] != null) { return false; } } return true; } private static void removeemptytoprowsandleftcols(excel.worksheet worksheet, excel.range usedrange) { string addressstring = usedrange.address.tostring(); int rowstodelete = getnumberoftoprowstodelete(addressstring); deletetopemptyrows(worksheet, rowstodelete); int colstodelete = getnumberofleftcolstodelte(addressstring); deleteleftemptycolumns(worksheet, colstodelete); } private static void deletetopemptyrows(excel.worksheet worksheet, int startrow) { (int = 0; < startrow - 1; i++) { worksheet.rows[1].delete(); } } private static void deleteleftemptycolumns(excel.worksheet worksheet, int colcount) { (int = 0; < colcount - 1; i++) { worksheet.columns[1].delete(); } } private static int getnumberoftoprowstodelete(string address) { string[] splitarray = address.split(':'); string firstindex = splitarray[0]; splitarray = firstindex.split('$'); string value = splitarray[2]; int returnvalue = -1; if ((int.tryparse(value, out returnvalue)) && (returnvalue >= 0)) return returnvalue; return returnvalue; } private static int getnumberofleftcolstodelte(string address) { string[] splitarray = address.split(':'); string firstindex = splitarray[0]; splitarray = firstindex.split('$'); string value = splitarray[1]; return parsecolheadertoindex(value); } private static int parsecolheadertoindex(string coladress) { int[] digits = new int[coladress.length]; (int = 0; < coladress.length; ++i) { digits[i] = convert.toint32(coladress[i]) - 64; } int mul = 1; int res = 0; (int pos = digits.length - 1; pos >= 0; --pos) { res += digits[pos] * mul; mul *= 26; } return res; } }
edit 2: testing made method loops thru the worksheet , compared code loops thru object array. shows significant difference.
method loop thru worksheet , delete empty rows , columns.
enum roworcol { row, column }; private static void conventionalremoveemptyrowscols(excel.worksheet worksheet) { excel.range usedrange = worksheet.usedrange; int totalrows = usedrange.rows.count; int totalcols = usedrange.columns.count; removeempty(usedrange, roworcol.row); removeempty(usedrange, roworcol.column); } private static void removeempty(excel.range usedrange, roworcol roworcol) { int count; excel.range currange; if (roworcol == roworcol.column) count = usedrange.columns.count; else count = usedrange.rows.count; (int = count; > 0; i--) { bool isempty = true; if (roworcol == roworcol.column) currange = usedrange.columns[i]; else currange = usedrange.rows[i]; foreach (excel.range cell in currange.cells) { if (cell.value != null) { isempty = false; break; // can exit loop since range not empty } else { // cell value null contiue checking } } // end loop thru each cell in range (row or column) if (isempty) { currange.delete(); } } }
then main testing/timing 2 methods.
enum roworcol { row, column }; static void main(string[] args) { excel.application excel = new excel.application(); string originalpath = @"h:\exceltestfolder\book1_test.xls"; excel.workbook workbook = excel.workbooks.open(originalpath); excel.worksheet worksheet = workbook.worksheets["sheet1"]; excel.range usedrange = worksheet.usedrange; // start test looping thru each excel worksheet stopwatch sw = new stopwatch(); console.writeline("start stopwatch loop thru worksheet..."); sw.start(); conventionalremoveemptyrowscols(worksheet); sw.stop(); console.writeline("it took total of: " + sw.elapsed.milliseconds + " miliseconds remove empty rows , columns..."); string newpath = @"h:\exceltestfolder\book1_test_removedloopthruworksheet.xls"; workbook.saveas(newpath, excel.xlsaveasaccessmode.xlnochange); workbook.close(); console.writeline(""); // start test looping thru object array workbook = excel.workbooks.open(originalpath); worksheet = workbook.worksheets["sheet1"]; usedrange = worksheet.usedrange; console.writeline("start stopwatch loop thru object array..."); sw = new stopwatch(); sw.start(); deleteemptyrowscols(worksheet); sw.stop(); // display results second test console.writeline("it took total of: " + sw.elapsed.milliseconds + " miliseconds remove empty rows , columns..."); string newpath2 = @"h:\exceltestfolder\book1_test_removedloopthruarray.xls"; workbook.saveas(newpath2, excel.xlsaveasaccessmode.xlnochange); workbook.close(); excel.quit(); system.runtime.interopservices.marshal.releasecomobject(workbook); system.runtime.interopservices.marshal.releasecomobject(excel); console.writeline(""); console.writeline("finished testing methods - press key exit"); console.readkey(); }
edit 3 per op request... updated , changed code match op code. found interesting results. see below.
i changed code match functions using ie… entirerow , counta. code below found preforms terribly. running tests found code below in 800+ milliseconds execution time. 1 subtle change made huge difference.
on line:
while (rowindex <= worksheet.usedrange.rows.count)
this slowing things down lot. if create range variable usedrang , not keep regrabbibg each iteration of while loop make huge difference. so… when change while loop to…
excel.range usedrange = worksheet.usedrange; int rowindex = 1; while (rowindex <= usedrange.rows.count) , while (colindex <= usedrange.columns.count)
this performed close object array solution. did not post results, can use code below , change while loop grab usedrange each iteration or use variable usedrange test this.
private static void removeemptyrowscols3(excel.worksheet worksheet) { //excel.range usedrange = worksheet.usedrange; // <- using variable makes while loop faster int rowindex = 1; // delete empty rows //while (rowindex <= usedrange.rows.count) // <- changing 1 line makes huge difference - not grabbibg usedrange each iteration... while (rowindex <= worksheet.usedrange.rows.count) { if (excel.worksheetfunction.counta(worksheet.cells[rowindex, 1].entirerow) == 0) { worksheet.cells[rowindex, 1].entirerow.delete(excel.xldeleteshiftdirection.xlshiftup); } else { rowindex++; } } // delete empty columns int colindex = 1; // while (colindex <= usedrange.columns.count) // <- change here while (colindex <= worksheet.usedrange.columns.count) { if (excel.worksheetfunction.counta(worksheet.cells[1, colindex].entirecolumn) == 0) { worksheet.cells[1, colindex].entirecolumn.delete(excel.xldeleteshiftdirection.xlshifttoleft); } else { colindex++; } } }
update @hadi
you can alter deletecols
, deleterows
function better performance if excel contains blank rows , columns after last used ones:
private static void deleterows(list<int> rowstodelete, microsoft.office.interop.excel.worksheet worksheet) { // rows sorted high low - index's wont shift list<int> nonemptyrows = enumerable.range(1, rowstodelete.max()).tolist().except(rowstodelete).tolist(); if (nonemptyrows.max() < rowstodelete.max()) { // there empty rows after last non empty row microsoft.office.interop.excel.range cell1 = worksheet.cells[nonemptyrows.max() + 1,1]; microsoft.office.interop.excel.range cell2 = worksheet.cells[rowstodelete.max(), 1]; //delete empty rows after last used row worksheet.range[cell1, cell2].entirerow.delete(microsoft.office.interop.excel.xldeleteshiftdirection.xlshiftup); } //else last non empty row = worksheet.rows.count foreach (int rowindex in rowstodelete.where(x => x < nonemptyrows.max())) { worksheet.rows[rowindex].delete(); } } private static void deletecols(list<int> colstodelete, microsoft.office.interop.excel.worksheet worksheet) { // cols sorted high low - index's wont shift //get non empty cols list<int> nonemptycols = enumerable.range(1, colstodelete.max()).tolist().except(colstodelete).tolist(); if (nonemptycols.max() < colstodelete.max()) { // there empty rows after last non empty row microsoft.office.interop.excel.range cell1 = worksheet.cells[1,nonemptycols.max() + 1]; microsoft.office.interop.excel.range cell2 = worksheet.cells[1,nonemptycols.max()]; //delete empty rows after last used row worksheet.range[cell1, cell2].entirecolumn.delete(microsoft.office.interop.excel.xldeleteshiftdirection.xlshifttoleft); } //else last non empty column = worksheet.columns.count foreach (int colindex in colstodelete.where(x => x < nonemptycols.max())) { worksheet.columns[colindex].delete(); } }
check answer @ get last non empty column , row index excel using interop
Comments
Post a Comment