Skip to content

Conversation

thomasleese
Copy link
Contributor

This makes a number of small changes to the queries to avoid preloading objects where it's no longer necessary, particularly now that we have cached status models.

Some sessions can have lots of patient sessions (community clinics can sometimes have upwards of 50,000 patients in) and preloading all of this can be slow and require a large amount of memory.

Often at this scale, it seems to be more performant to not avoid N+1 queries, particularly depending on the page that is trying to be loaded.

@thomasleese thomasleese added the performance Improving performance label Mar 26, 2025
@thomasleese thomasleese added this to the v2.1.2 milestone Mar 26, 2025
@thomasleese thomasleese force-pushed the refactor-after-performance branch from bab6438 to c3f8d44 Compare March 26, 2025 17:12
@thomasleese thomasleese force-pushed the cache-register-status branch from 7726685 to 08ed628 Compare March 26, 2025 17:31
@thomasleese thomasleese force-pushed the refactor-after-performance branch from c3f8d44 to c220f9e Compare March 26, 2025 17:31
@thomasleese thomasleese force-pushed the refactor-after-performance branch from caf9708 to 3f922ab Compare March 26, 2025 21:47
@thomasleese thomasleese force-pushed the refactor-after-performance branch from 3f922ab to cb45c07 Compare March 27, 2025 10:24
@thomasleese thomasleese changed the base branch from cache-register-status to reporting-performance March 27, 2025 10:24
@thomasleese thomasleese force-pushed the refactor-after-performance branch from cb45c07 to 00754a3 Compare March 27, 2025 10:57
@thomasleese thomasleese force-pushed the reporting-performance branch from ce9f8ce to 52ddf4e Compare March 28, 2025 09:47
@thomasleese thomasleese force-pushed the refactor-after-performance branch from 828f928 to 593ea76 Compare March 28, 2025 09:47
@thomasleese thomasleese force-pushed the reporting-performance branch from 52ddf4e to 4ae8419 Compare March 28, 2025 14:00
@thomasleese thomasleese force-pushed the refactor-after-performance branch from 593ea76 to 94dd078 Compare March 28, 2025 14:00
@thomasleese thomasleese force-pushed the reporting-performance branch from 4ae8419 to 49e5bdc Compare March 28, 2025 16:43
@thomasleese thomasleese force-pushed the refactor-after-performance branch from 94dd078 to ea64a79 Compare March 28, 2025 16:47
@thomasleese thomasleese force-pushed the reporting-performance branch from 49e5bdc to 919a844 Compare March 28, 2025 16:54
@thomasleese thomasleese force-pushed the reporting-performance branch from 2f62ab6 to 49819d6 Compare March 28, 2025 17:25
@thomasleese thomasleese force-pushed the refactor-after-performance branch from 8f901ee to e710a6b Compare March 28, 2025 17:26
@thomasleese thomasleese force-pushed the reporting-performance branch from 49819d6 to 77e1ef1 Compare March 31, 2025 13:47
@thomasleese thomasleese force-pushed the refactor-after-performance branch from e710a6b to 81de690 Compare March 31, 2025 13:47
session: :programmes
)
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future consideration ...

Is there something to having a scope for the various statuses, or is it really just enough to include :registration_status, :vaccination_statuses, according to which one will be used? I'm seeing some of the changes above add session: :programmes, or other bits, so not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want only the status of a patient/patient session then it is enough to just include the statuses, however, the reason we include session: :programmes a lot is to get the programmes eligible for the current patient (this has to be done in Ruby at the moment as the year groups are dynamic and not stored in the database).

The reason I wanted to remove preload_for_status is I found that we were using it in lots of places where only one of the statuses was being shown, and therefore we ended up preloading a lot more than we really needed, and I think having a single scope can lead to that, whereas I think we need to be careful and explicit about when we want to preload something, particularly in lists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I agree with the reasoning behind removing preload_for_status. I was just wondering if the need to get the programmes (eligible for the current patient) is a concern of calculating the status, or the current context of the calling code.

If it's the former it might be worth thinking to put that, and any other status-specific concerns, into a status-specific scope (and I agree the one you removed is too much so I don't mean bringing it back), but 🥾🥫 🚢 .

If it's the latter then 🙈 🚢 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's a concern of the caller rather than calculating the status. The programmes are used to render the pages (where the various statuses are shown for each programme) rather than being used to generate the statuses.

I'll merge this in for now, but I'll have a look at adding a scope for programmes and see if it works well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised #3301 to implement your suggestion. I'm not sure about it myself, but happy to merge that in if you think it's an improvement.

@thomasleese thomasleese force-pushed the reporting-performance branch from 77e1ef1 to 9343b25 Compare March 31, 2025 17:23
@thomasleese thomasleese force-pushed the refactor-after-performance branch from 81de690 to bdeb2c6 Compare March 31, 2025 17:25
@thomasleese thomasleese force-pushed the refactor-after-performance branch from bdeb2c6 to b75fed3 Compare March 31, 2025 17:41
@thomasleese thomasleese force-pushed the reporting-performance branch from 9343b25 to 421c782 Compare April 1, 2025 05:31
Base automatically changed from reporting-performance to v2.1.2-wip April 1, 2025 05:41
This avoids potential issues where preloading large amounts of data can
be slow and require a lot of memory. Instead, where N+1 issues isn't a
problem, it's fine to run an additional query to the database and fetch
only the data we need as is the case in most situations.

Now that we have cached the vaccination statuses, we can preload those
without needing to preload the entire vaccination history.
This removes this scope in favour of specifying explicitly which records
should be preloading at each location where this scope is used, to avoid
unnecessary preloading.
Some sessions can have lots of patient sessions (community clinics can
sometimes have upwards of 50,000 patients in) and preloading all of this
can be slow and require a large amount of memory.

Often at this scale, it seems to be more performant to not avoid N+1
queries, particularly depending on the page that is trying to be loaded.
This adds a new scope on the `PatientSession` model that destroys a
collection of patient sessions provided it is safe to do so. No new
logic is introduce in this commit, instead it's just refactoring
existing code to reduce duplication.
This avoids potential issues where preloading large amounts of data can
be slow and require a lot of memory. Instead, where N+1 issues isn't a
problem, it's fine to run an additional query to the database and fetch
only the data we need as is the case in most situations.
This avoids the need to order every time the pre-screenings are fetched
and instead require the caller to decide the order as approriate. This
should introduce a small performance increase in some situations since
`created_at` is not indexed.
This avoids potential issues where preloading large amounts of data can
be slow and require a lot of memory. Instead, where N+1 issues isn't a
problem, it's fine to run an additional query to the database and fetch
only the data we need as is the case in most situations.
This avoids potential issues where preloading large amounts of data can
be slow and require a lot of memory. Instead, where N+1 issues isn't a
problem, it's fine to run an additional query to the database and fetch
only the data we need as is the case in most situations.
Calling `patient` makes a query to the database whereas we just need the
ID of the patient.
@thomasleese thomasleese force-pushed the refactor-after-performance branch from b75fed3 to 8351f7e Compare April 1, 2025 05:42
@tvararu tvararu temporarily deployed to mavis-pr-3280 April 1, 2025 05:42 Inactive
Copy link

sonarqubecloud bot commented Apr 1, 2025

@thomasleese thomasleese merged commit c2902c6 into v2.1.2-wip Apr 1, 2025
10 of 11 checks passed
@thomasleese thomasleese deleted the refactor-after-performance branch April 1, 2025 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improving performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants