Adding Database side analysis functions #3

diveshjain-phy · 2025-08-27T13:23:34Z

diveshjain-phy · 2025-08-27T13:38:26Z

@JBorrow, this is an ongoing work to address the analysis function issue on lightserve. It took some time to understand and resolve the primary key and foreign key requirements of the tables when creating hypertables. Once I’ve added the analysis functions, I’ll request a review.

JBorrow · 2025-08-27T13:45:01Z

Cool! So this actually works with timescsaledb like this?

diveshjain-phy · 2025-08-27T13:50:57Z

Yes, seems like working. Had to add 'time' as a primary key in FluxMeasurement Table as it was a requirement for creating hypertables chunked in time. Then had to carry the changes forward.

JBorrow · 2025-08-27T13:53:13Z

Very interesting that it requires time as a primary key... As long as it's ok with a composite primary key with the id field we're ok, otherwise we might need to rethink things.

diveshjain-phy · 2025-08-27T13:54:25Z

Yes I checked that works.

JBorrow · 2025-08-27T13:58:11Z

Very interesting. Keep exploring this direction... Do we need a separate table for each time range? Or does timescaledb have some functionality for arbrtitrary time ranges too?

diveshjain-phy · 2025-08-27T14:09:59Z

As far as I have read, we don't need to worry about separate tables for different time ranges. timescaledb uses one table and automatically partitions data into time-based chunks. When we query any time range, it automatically finds the relevant chunks.

JBorrow · 2025-08-27T18:21:32Z

Ah I understand, so the chunk_time_interval => INTERVAL '6 months', is more of an optimization thing?

diveshjain-phy · 2025-08-28T09:28:00Z

Yes. Best practice is to set chunk_time_interval so that one chunk of data takes up 25% of RAM. Most examples show 7-14 days as starting points, with TimescaleDB's default being 7 days.

diveshjain-phy · 2025-09-17T13:00:36Z

Right now I have set the aggregate bucketing at 1 month. This way every row in the table corresponds to the month start, so any query range includes entire months whose buckets start inside that range. For example, asking for 1 Mar–2 Sep returns the September bucket that starts on 1 Sep, and because that bucket covers the whole month, the results run through 30 Sep. If you don’t anticipate any issues, we can combine information from the Aggregate Table and Raw Table for accurate results. Alternatively, we can set shorter buckets, but this requires optimisation unless we know what users are comfortable with.

… database while testing

diveshjain-phy · 2025-10-06T15:33:34Z

@JBorrow I've implemented performance tests for the aggregate statistics endpoints to compare continuous aggregates vs raw queries:

/analysis/aggregate/{source_id}/{band_name} (with continuous aggregates)
/analysis/wo_ca/aggregate/{source_id}/{band_name} (without continuous aggregates)

============================================================
Testing: GET http://localhost:8000/analysis/aggregate/1/f145
============================================================
Mean:12.94 ms
std:7.83 ms
min:8.40 ms
max:117.82 ms

============================================================
Testing: GET http://localhost:8000/analysis/wo_ca/aggregate/1/f145
============================================================
Mean:68.66 ms
std:56.27 ms
min:57.14 ms
max:2525.97 ms

Ratio of mean times with aggregate to without aggregate calls: 0.18845746342388867

Is it the right way to approach the tests?

perf_test.py

JBorrow · 2025-10-06T21:14:55Z

Seems like a reasonable way to do this. I guess in the future we'd probably not want to support two types of endpoint though.

diveshjain-phy · 2025-10-07T09:45:39Z

Yes, definitely. I had the 'without continuous aggregate' endpoint for testing purposes and will drop it once this is cleanly implemented.

diveshjain-phy · 2025-10-07T10:53:30Z

Hi @JBorrow! I have implemented both time series and aggregate endpoints. I have verified this against the raw table statistics. Would you like to test the implementation, or should I proceed with dropping the endpoint serving data from the raw flux measurement tables?

JBorrow

Some comments before the refactor to two templated setups. You're close to that already.

lightcurvedb/analysis/aggregates.py

JBorrow · 2025-10-10T18:37:18Z

lightcurvedb/analysis/aggregates.py

+        return f"""
+        SELECT add_continuous_aggregate_policy('{self.config.view_name}',
+            start_offset => INTERVAL '{self.config.refresh_start_offset}',
+            end_offset => INTERVAL '{self.config.refresh_end_offset}',
+            schedule_interval => INTERVAL '{self.config.refresh_schedule_interval}'
+        );
+        """


Can we not use sql parameters here? I don't like building queries with random strings...

when I do select(func.function_name(...)) I get

LINE 1: SELECT add_retention_policy('band_statistics_daily', INTERVA...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

It seems to be expecting a REGCLASS type variable.

JBorrow · 2025-10-10T18:37:32Z

lightcurvedb/analysis/statistics.py

+            if delta_start <= 30:
+                view_name = "band_statistics_daily"
+                time_resolution = "daily"
+            elif delta_start <= 180:
+                view_name = "band_statistics_weekly"
+                time_resolution = "weekly"
+            else:
+                view_name = "band_statistics_monthly"
+                time_resolution = "monthly"


This isn't very configurable.

refactored and set this up to use the AggregateConfigurations dictionary

JBorrow · 2025-10-10T18:38:10Z

lightcurvedb/cli/ephemeral.py

+            session.execute(text("""
+                    SELECT create_hypertable(
+                        'flux_measurements',
+                        'time',
+                        chunk_time_interval => INTERVAL '7 days',
+                        if_not_exists => TRUE
+                    );
+                     """))
+            session.commit()
+            create_continuous_aggregates(session)


Is hypertable the table that the CAs get loaded from?

this makes the flux_measurements a hypertable

…ructure as aggregate requests

Migration to timescaledb

578cc80

Continuous aggregate deployment and bad statistics

762bee9

diveshjain-phy mentioned this pull request Aug 29, 2025

Added preliminary api endpoint simonsobs/lightserve#20

Open

Updating mean statistic computes

65e07c3

diveshjain-phy requested a review from JBorrow September 1, 2025 15:12

diveshjain-phy added 2 commits September 17, 2025 13:45

Refactoring the code for structure and readability

75f6942

minor update

fcd23e5

diveshjain-phy added 12 commits September 17, 2025 15:02

minor update

e3a367c

update to aggregate setup. rentention policy+multiple aggregate table

d43f15a

implementation of daily, weekly and monthly table

167e495

selection of table in derived statistics

6c10186

minor update

16aee35

generate data with past timestamps

a9484ca

Manual refresh for historical data

eb961b2

Implementation of manual refresh when dealing with historical data in…

1fe986e

… database while testing

added time stamps in responses and fixed minor bugs

04a6fee

minor updates

ed02963

passing time resolution as part of response body

ad879f8

added time series field and variance parameter

64e0e31

JBorrow reviewed Oct 10, 2025

View reviewed changes

diveshjain-phy added 2 commits October 15, 2025 12:59

Minor mods and handling without_aggregate requests within the same st…

24f2207

…ructure as aggregate requests

modified handling time threhold strategy

a64f12a

Adding Database side analysis functions #3

Are you sure you want to change the base?

Adding Database side analysis functions #3

Conversation

diveshjain-phy commented Aug 27, 2025

Uh oh!

diveshjain-phy commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JBorrow commented Aug 27, 2025

Uh oh!

diveshjain-phy commented Aug 27, 2025

Uh oh!

JBorrow commented Aug 27, 2025

Uh oh!

diveshjain-phy commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JBorrow commented Aug 27, 2025

Uh oh!

diveshjain-phy commented Aug 27, 2025

Uh oh!

JBorrow commented Aug 27, 2025

Uh oh!

diveshjain-phy commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diveshjain-phy commented Sep 17, 2025

Uh oh!

diveshjain-phy commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JBorrow commented Oct 6, 2025

Uh oh!

diveshjain-phy commented Oct 7, 2025

Uh oh!

diveshjain-phy commented Oct 7, 2025

Uh oh!

JBorrow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

diveshjain-phy commented Aug 27, 2025 •

edited

Loading

diveshjain-phy commented Aug 27, 2025 •

edited

Loading

diveshjain-phy commented Aug 28, 2025 •

edited

Loading

diveshjain-phy commented Oct 6, 2025 •

edited

Loading