-
Notifications
You must be signed in to change notification settings - Fork 3.6k
insertNewRowBefore is more and more slow for large files #2509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
if I double the number of columns written, it double the time to insertNewRowBefore line 500 0.013 |
real world example : it takes 18hours to write 38000 lines to a xlsx file |
This is never going to happen. Inserting/Removing Rows/Columns is not an Consider a worksheet with just 5 rows:
And you insert a new row before row And it needs to do that for every column as well... that can quickly become a lot of cell checking in larger worksheets. If you then do another insert/remove; it has to redo all that work again. That's why it's better to avoid doing multiple inserts if possible, and do a single insert for multiple rows when you can. If you need to insert 1000 new rows, then doing: for ($i = 0; i < 1000; ++$i) {
$worksheet->insertNewRowBefore($newRowNumber);
} will repeat all that work 1000 times Doing: $worksheet->insertNewRowBefore($newRowNumber, 1000); Only does that work once. And the more cells/rows/columns that you have in a worksheet, the more cells/rows/columns need checking, and the longer it will take whenever you do an insert or a removal. |
i don't know how many rows i need to add. I'm reading info and writing them in the excel file as needed. Like arrays of indexes that contains formulas, or boolean that says if we have any formula, and same for data validation, filler etc ? Or do you have any solution to speed up things ? I don't know how many rows i need to insert in advance. I could add rows by 1000 and increase the number of rows only when i've wrote 1000 lines, but that's a hack and i may need to delete empty rows when i finished writing the file. I've tested the delays, and here is what takes the most time (1.4s) while ($coordinate = array_pop($allCoordinates)) {
$cell = $worksheet->getCell($coordinate);
$cellIndex = Coordinate::columnIndexFromString($cell->getColumn());
if ($cellIndex - 1 + $numberOfColumns < 0) {
continue;
}
// New coordinate
$newCoordinate = Coordinate::stringFromColumnIndex($cellIndex + $numberOfColumns) . ($cell->getRow() + $numberOfRows);
// Should the cell be updated? Move value and cellXf index from one cell to another.
if (($cellIndex >= $beforeColumn) && ($cell->getRow() >= $beforeRow)) {
// Update cell styles
$worksheet->getCell($newCoordinate)->setXfIndex($cell->getXfIndex());
// Insert this cell at its new location
if ($cell->getDataType() == DataType::TYPE_FORMULA) {
// Formula should be adjusted
$worksheet->getCell($newCoordinate)
->setValue($this->updateFormulaReferences($cell->getValue(), $beforeCellAddress, $numberOfColumns, $numberOfRows, $worksheet->getTitle()));
} else {
// Formula should not be adjusted
$worksheet->getCell($newCoordinate)->setValueExplicit($cell->getValue(), $cell->getDataType());
}
// Clear the original cell
$worksheet->getCellCollection()->delete($coordinate);
} else {
/* We don't need to update styles for rows/columns before our insertion position,
but we do still need to adjust any formulae in those cells */
if ($cell->getDataType() == DataType::TYPE_FORMULA) {
// Formula should be adjusted
$cell->setValue($this->updateFormulaReferences($cell->getValue(), $beforeCellAddress, $numberOfColumns, $numberOfRows, $worksheet->getTitle()));
}
}
} just looping to all coordinate and doing those 2 lines takes 0.8s $cell = $worksheet->getCell($coordinate);
$cellIndex = Coordinate::columnIndexFromString($cell->getColumn()); and this takes 0.2s $this->adjustRowDimensions($worksheet, $beforeCellAddress, $numberOfColumns, $beforeRow, $numberOfRows); all the rest on the function insertNewBefore doesn't take any time . So i think that if you keep a reference of all cells coordinates that contains a formula, you could loop only to them and also loop to rows that are after the rows we are currently inserting (in my case, there are none however), you can save 63% of the time if you maths are exacts. |
If you're only adding new rows at the end of the worksheet, then why are you calling Yes, it would be possible to maintain a reference to all cells that contain formulae; this is exactly how MS Excel does this, and is able to do so quickly; but MS Excel has the advantage of not needing to worry about memory limits. Not all worksheets are simply data, some are predominantly formulae, and that reference list of cells containing formulae could become very large; and would itself need to be updated by any rows inserted/deleted |
Isn't it needed to add rows when we want to add data? I've read it somewhere. I don't think the list of rows having formula would be huge if you only save it as the coordinates and use an indexed array, accessing it would be fast. |
No Access to an indexed array would be faster; I experimented with that technique when I first identified that MS Excel uses that approach. But most developers don't use |
Thank you I will then remove insertnewrow |
are you sure that a weak map will use less memory than an array like this $coordinatesUsingFormula = ['A2' => true, 'C12' => true]; ? |
This is a real problem. |
I am also facing the same issue and disappointed that it becomes too slow and I get Gateway Timeout error while using this insertNewRowBefore function. I have to insert rows in betwteen and facing the issue. Is there any proper solution to fix the issue or should we switch to .Net if excel is used in any project? |
@MarkBaker Can you please re-open this issue? We are facing the same issue as well. The problem became more significant with each version, starting from 1.29.10, where things did work to 4.1.*, where we are now facing timeouts. The reason why we use Thank you in advance. |
This is:
What is the expected behavior?
Adding new rows is always the same speed
What is the current behavior?
insertNewRowBefore to add new rows is more and more slow depending on the number of existing rows.
With 0 rows, it takes 0 seconds. With 1000 it takes 0.082s, with 2000 it takes 0.166s
What are the steps to reproduce?
with this minimal code the time is different but the same behaviour occurs.
output
line 500 0.007
line 1000 0.013
line 1500 0.021
line 2000 0.027
line 2500 0.034
line 3000 0.042
Which versions of PhpSpreadsheet and PHP are affected?
1.21.0
php 8.1
The text was updated successfully, but these errors were encountered: