c# - Fastest method to remove Empty rows and Columns From Excel Files using Interop -


i have lot of excel files contains data , contains empty rows , empty columns. shown bellow

excel preview

i trying remove empty rows , columns excel using interop. create simple winform application , used following code , works fine.

dim lstfiles new list(of string) lstfiles.addrange(io.directory.getfiles(m_strfolderpath, "*.xls", io.searchoption.alldirectories))  dim m_xlapp = new excel.application dim m_xlwrkbs excel.workbooks = m_xlapp.workbooks dim m_xlwrkb excel.workbook  each strfile string in lstfiles     m_xlwrkb = m_xlwrkbs.open(strfile)     dim m_xlwrksheet excel.worksheet = m_xlwrkb.worksheets(1)     dim introw integer = 1      while introw <= m_xlwrksheet.usedrange.rows.count         if m_xlapp.worksheetfunction.counta(m_xlwrksheet.cells(introw, 1).entirerow) = 0             m_xlwrksheet.cells(introw, 1).entirerow.delete(excel.xldeleteshiftdirection.xlshiftup)         else             introw += 1         end if     end while      dim intcol integer = 1     while intcol <= m_xlwrksheet.usedrange.columns.count         if m_xlapp.worksheetfunction.counta(m_xlwrksheet.cells(1, intcol).entirecolumn) = 0             m_xlwrksheet.cells(1, intcol).entirecolumn.delete(excel.xldeleteshiftdirection.xlshifttoleft)         else             intcol += 1         end if     end while next  m_xlwrkb.save() m_xlwrkb.close(savechanges:=true)  marshal.releasecomobject(m_xlwrkb) marshal.releasecomobject(m_xlwrkbs) m_xlapp.quit() marshal.releasecomobject(m_xlapp) 

but when cleaning big excel files takes lot of time. suggestions optimizing code? or way clean excel files faster? there function can delete empty rows in 1 click?

i don't have problem if answers using c#

edit:

i uploaded sample file sample file. not files have same structure.

i found looping through excel worksheet can take time if worksheet large. solution tried avoid looping in worksheet. avoid looping through worksheet, made 2 dimensional object array cells returned usedrange with:

excel.range targetcells = worksheet.usedrange; object[,] allvalues = (object[,])targetcells.cells.value; 

this array loop through indexes of empty rows , columns. make 2 int lists, 1 keeps row indexes delete other keeps column indexes delete.

list<int> emptyrows = getemptyrows(allvalues, totalrows, totalcols); list<int> emptycols = getemptycols(allvalues, totalrows, totalcols); 

these lists sorted high low simplify deleting rows bottom , deleting columns right left. loop through each list , delete appropriate row/col.

deleterows(emptyrows, worksheet); deletecols(emptycols, worksheet); 

finally after empty rows , columns have been deleted, saveas file new file name.

hope helps.

edit:

addressed usedrange issue such if there empty rows @ top of worksheet, rows removed. remove empty columns left of starting data. allows indexing work if there empty rows or columns before data starts. accomplished taking address of first cell in usedrange address of form “$a$1:$d$4”. allow use of offset if empty rows @ top , empty columns left remain , not deleted. in case deleting them. number of rows delete top can calculated first “$a$4” address “4” row first data appears. need delete top 3 rows. column address of form “a”, “ab” or “aad” required translation , how convert column number (eg. 127) excel column (eg. aa) able determine how many columns on left need deleted.

class program {   static void main(string[] args) {     excel.application excel = new excel.application();     string originalpath = @"h:\exceltestfolder\book1_test.xls";     excel.workbook workbook = excel.workbooks.open(originalpath);     excel.worksheet worksheet = workbook.worksheets["sheet1"];     excel.range usedrange = worksheet.usedrange;      removeemptytoprowsandleftcols(worksheet, usedrange);      deleteemptyrowscols(worksheet);      string newpath = @"h:\exceltestfolder\book1_test_removed.xls";     workbook.saveas(newpath, excel.xlsaveasaccessmode.xlnochange);      workbook.close();     excel.quit();     system.runtime.interopservices.marshal.releasecomobject(workbook);     system.runtime.interopservices.marshal.releasecomobject(excel);     console.writeline("finished removing empty rows , columns - press key exit");     console.readkey();   }    private static void deleteemptyrowscols(excel.worksheet worksheet) {     excel.range targetcells = worksheet.usedrange;     object[,] allvalues = (object[,])targetcells.cells.value;     int totalrows = targetcells.rows.count;     int totalcols = targetcells.columns.count;      list<int> emptyrows = getemptyrows(allvalues, totalrows, totalcols);     list<int> emptycols = getemptycols(allvalues, totalrows, totalcols);      // have list of empty rows , columns need delete     deleterows(emptyrows, worksheet);     deletecols(emptycols, worksheet);   }    private static void deleterows(list<int> rowstodelete, excel.worksheet worksheet) {     // rows sorted high low - index's wont shift     foreach (int rowindex in rowstodelete) {       worksheet.rows[rowindex].delete();     }   }    private static void deletecols(list<int> colstodelete, excel.worksheet worksheet) {     // cols sorted high low - index's wont shift     foreach (int colindex in colstodelete) {       worksheet.columns[colindex].delete();     }   }    private static list<int> getemptyrows(object[,] allvalues, int totalrows, int totalcols) {     list<int> emptyrows = new list<int>();      (int = 1; < totalrows; i++) {       if (isrowempty(allvalues, i, totalcols)) {         emptyrows.add(i);       }     }     // sort list high low     return emptyrows.orderbydescending(x => x).tolist();   }    private static list<int> getemptycols(object[,] allvalues, int totalrows, int totalcols) {     list<int> emptycols = new list<int>();      (int = 1; < totalcols; i++) {       if (iscolumnempty(allvalues, i, totalrows)) {         emptycols.add(i);       }     }     // sort list high low     return emptycols.orderbydescending(x => x).tolist();   }    private static bool iscolumnempty(object[,] allvalues, int colindex, int totalrows) {     (int = 1; < totalrows; i++) {       if (allvalues[i, colindex] != null) {         return false;       }     }     return true;   }    private static bool isrowempty(object[,] allvalues, int rowindex, int totalcols) {     (int = 1; < totalcols; i++) {       if (allvalues[rowindex, i] != null) {         return false;       }     }     return true;   }    private static void removeemptytoprowsandleftcols(excel.worksheet worksheet, excel.range usedrange) {     string addressstring = usedrange.address.tostring();     int rowstodelete = getnumberoftoprowstodelete(addressstring);     deletetopemptyrows(worksheet, rowstodelete);     int colstodelete = getnumberofleftcolstodelte(addressstring);     deleteleftemptycolumns(worksheet, colstodelete);   }    private static void deletetopemptyrows(excel.worksheet worksheet, int startrow) {     (int = 0; < startrow - 1; i++) {       worksheet.rows[1].delete();     }   }    private static void deleteleftemptycolumns(excel.worksheet worksheet, int colcount) {     (int = 0; < colcount - 1; i++) {       worksheet.columns[1].delete();     }   }    private static int getnumberoftoprowstodelete(string address) {     string[] splitarray = address.split(':');     string firstindex = splitarray[0];     splitarray = firstindex.split('$');     string value = splitarray[2];     int returnvalue = -1;     if ((int.tryparse(value, out returnvalue)) && (returnvalue >= 0))       return returnvalue;     return returnvalue;   }    private static int getnumberofleftcolstodelte(string address) {     string[] splitarray = address.split(':');     string firstindex = splitarray[0];     splitarray = firstindex.split('$');     string value = splitarray[1];     return parsecolheadertoindex(value);   }    private static int parsecolheadertoindex(string coladress) {     int[] digits = new int[coladress.length];     (int = 0; < coladress.length; ++i) {       digits[i] = convert.toint32(coladress[i]) - 64;     }     int mul = 1; int res = 0;     (int pos = digits.length - 1; pos >= 0; --pos) {       res += digits[pos] * mul;       mul *= 26;     }     return res;   } } 

edit 2: testing made method loops thru the worksheet , compared code loops thru object array. shows significant difference.

enter image description here

method loop thru worksheet , delete empty rows , columns.

enum roworcol { row, column }; private static void conventionalremoveemptyrowscols(excel.worksheet worksheet) {   excel.range usedrange = worksheet.usedrange;   int totalrows = usedrange.rows.count;   int totalcols = usedrange.columns.count;    removeempty(usedrange, roworcol.row);   removeempty(usedrange, roworcol.column); }  private static void removeempty(excel.range usedrange, roworcol roworcol) {   int count;   excel.range currange;   if (roworcol == roworcol.column)     count = usedrange.columns.count;   else     count = usedrange.rows.count;    (int = count; > 0; i--) {     bool isempty = true;     if (roworcol == roworcol.column)       currange = usedrange.columns[i];     else       currange = usedrange.rows[i];      foreach (excel.range cell in currange.cells) {       if (cell.value != null) {         isempty = false;         break; // can exit loop since range not empty       }       else {         // cell value null contiue checking       }     } // end loop thru each cell in range (row or column)      if (isempty) {       currange.delete();     }   } } 

then main testing/timing 2 methods.

enum roworcol { row, column };  static void main(string[] args) {   excel.application excel = new excel.application();   string originalpath = @"h:\exceltestfolder\book1_test.xls";   excel.workbook workbook = excel.workbooks.open(originalpath);   excel.worksheet worksheet = workbook.worksheets["sheet1"];   excel.range usedrange = worksheet.usedrange;    // start test looping thru each excel worksheet   stopwatch sw = new stopwatch();   console.writeline("start stopwatch loop thru worksheet...");   sw.start();   conventionalremoveemptyrowscols(worksheet);   sw.stop();   console.writeline("it took total of: " + sw.elapsed.milliseconds + " miliseconds remove empty rows , columns...");    string newpath = @"h:\exceltestfolder\book1_test_removedloopthruworksheet.xls";   workbook.saveas(newpath, excel.xlsaveasaccessmode.xlnochange);   workbook.close();   console.writeline("");    // start test looping thru object array   workbook = excel.workbooks.open(originalpath);   worksheet = workbook.worksheets["sheet1"];   usedrange = worksheet.usedrange;   console.writeline("start stopwatch loop thru object array...");   sw = new stopwatch();   sw.start();   deleteemptyrowscols(worksheet);   sw.stop();    // display results second test   console.writeline("it took total of: " + sw.elapsed.milliseconds + " miliseconds remove empty rows , columns...");   string newpath2 = @"h:\exceltestfolder\book1_test_removedloopthruarray.xls";   workbook.saveas(newpath2, excel.xlsaveasaccessmode.xlnochange);   workbook.close();   excel.quit();   system.runtime.interopservices.marshal.releasecomobject(workbook);   system.runtime.interopservices.marshal.releasecomobject(excel);   console.writeline("");   console.writeline("finished testing methods - press key exit");   console.readkey(); } 

edit 3 per op request... updated , changed code match op code. found interesting results. see below.

i changed code match functions using ie… entirerow , counta. code below found preforms terribly. running tests found code below in 800+ milliseconds execution time. 1 subtle change made huge difference.

on line:

while (rowindex <= worksheet.usedrange.rows.count) 

this slowing things down lot. if create range variable usedrang , not keep regrabbibg each iteration of while loop make huge difference. so… when change while loop to…

excel.range usedrange = worksheet.usedrange; int rowindex = 1;  while (rowindex <= usedrange.rows.count) , while (colindex <= usedrange.columns.count) 

this performed close object array solution. did not post results, can use code below , change while loop grab usedrange each iteration or use variable usedrange test this.

private static void removeemptyrowscols3(excel.worksheet worksheet) {   //excel.range usedrange = worksheet.usedrange;     // <- using variable makes while loop faster    int rowindex = 1;    // delete empty rows   //while (rowindex <= usedrange.rows.count)     // <- changing 1 line makes huge difference - not grabbibg usedrange each iteration...   while (rowindex <= worksheet.usedrange.rows.count) {     if (excel.worksheetfunction.counta(worksheet.cells[rowindex, 1].entirerow) == 0) {       worksheet.cells[rowindex, 1].entirerow.delete(excel.xldeleteshiftdirection.xlshiftup);     }     else {       rowindex++;     }   }    // delete empty columns   int colindex = 1;   // while (colindex <= usedrange.columns.count) // <- change here    while (colindex <= worksheet.usedrange.columns.count) {     if (excel.worksheetfunction.counta(worksheet.cells[1, colindex].entirecolumn) == 0) {       worksheet.cells[1, colindex].entirecolumn.delete(excel.xldeleteshiftdirection.xlshifttoleft);     }     else {       colindex++;     }   } } 

update @hadi

you can alter deletecols , deleterows function better performance if excel contains blank rows , columns after last used ones:

private static void deleterows(list<int> rowstodelete, microsoft.office.interop.excel.worksheet worksheet) {     // rows sorted high low - index's wont shift      list<int> nonemptyrows = enumerable.range(1, rowstodelete.max()).tolist().except(rowstodelete).tolist();      if (nonemptyrows.max() < rowstodelete.max())     {          // there empty rows after last non empty row          microsoft.office.interop.excel.range cell1 = worksheet.cells[nonemptyrows.max() + 1,1];         microsoft.office.interop.excel.range cell2 = worksheet.cells[rowstodelete.max(), 1];          //delete empty rows after last used row         worksheet.range[cell1, cell2].entirerow.delete(microsoft.office.interop.excel.xldeleteshiftdirection.xlshiftup);       }    //else last non empty row = worksheet.rows.count        foreach (int rowindex in rowstodelete.where(x => x < nonemptyrows.max()))     {         worksheet.rows[rowindex].delete();     } }  private static void deletecols(list<int> colstodelete, microsoft.office.interop.excel.worksheet worksheet) {     // cols sorted high low - index's wont shift      //get non empty cols     list<int> nonemptycols = enumerable.range(1, colstodelete.max()).tolist().except(colstodelete).tolist();      if (nonemptycols.max() < colstodelete.max())     {          // there empty rows after last non empty row          microsoft.office.interop.excel.range cell1 = worksheet.cells[1,nonemptycols.max() + 1];         microsoft.office.interop.excel.range cell2 = worksheet.cells[1,nonemptycols.max()];          //delete empty rows after last used row         worksheet.range[cell1, cell2].entirecolumn.delete(microsoft.office.interop.excel.xldeleteshiftdirection.xlshifttoleft);       }            //else last non empty column = worksheet.columns.count      foreach (int colindex in colstodelete.where(x => x < nonemptycols.max()))     {         worksheet.columns[colindex].delete();     } } 

check answer @ get last non empty column , row index excel using interop


Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -