Description
The method CTokenListCategory::updateOrderedCommonTokenIds
is known to be potentially inefficient. It contains this comment:
// The algorithm used here is technically O(N^3 * log(N)), but has these
// redeeming features:
// 1. It does not allocate any memory.
// 2. It is order O(N * log(N)) in the (likely) case of no change required.
// 3. In the case where the nested loops run many times it will be reducing
// the value of N for subsequent calls, thus making those subsequent
// calls much faster. For example, suppose we start off with 100 ordered
// tokens, and one call to this method runs the outer loop 50 times to
// reduce the ordered set to 50 tokens. Then the next call will start
// with only 50 ordered tokens.
// There's a trade-off here, because reducing the complexity would mean
// changing the data structures, which would add complexity to state
// persistence, or allocating temporary storage, which would increase the
// runtime a lot in the common case where the previously ordered common
// tokens are present in the same order in the new tokens.
Usually we get away with this but recently we have seen some data sets where the inefficient algorithm causes problems, namely when there are many tokens in each message:
In the case of m_CommonUniqueTokenIds
having more than a certain number of values in it, we should take the time at the beginning of CTokenListCategory::updateOrderedCommonTokenIds
to build a more efficient data structure to work with. For small numbers of tokens the time taken to build a new data structure (in particular memory allocations) is likely to outweigh the cost of the looping we currently do, so some experimentation will be needed to find the point at which it's worthwhile. But for large numbers of tokens we should see a performance improvement from it.