Skip to content

Conversation

diveshjain-phy
Copy link
Member

@diveshjain-phy
Copy link
Member Author

diveshjain-phy commented Aug 27, 2025

@JBorrow, this is an ongoing work to address the analysis function issue on lightserve. It took some time to understand and resolve the primary key and foreign key requirements of the tables when creating hypertables. Once I’ve added the analysis functions, I’ll request a review.

@JBorrow
Copy link
Member

JBorrow commented Aug 27, 2025

Cool! So this actually works with timescsaledb like this?

@diveshjain-phy
Copy link
Member Author

Yes, seems like working. Had to add 'time' as a primary key in FluxMeasurement Table as it was a requirement for creating hypertables chunked in time. Then had to carry the changes forward.

@JBorrow
Copy link
Member

JBorrow commented Aug 27, 2025

Very interesting that it requires time as a primary key... As long as it's ok with a composite primary key with the id field we're ok, otherwise we might need to rethink things.

@diveshjain-phy
Copy link
Member Author

diveshjain-phy commented Aug 27, 2025

Yes I checked that works.

@JBorrow
Copy link
Member

JBorrow commented Aug 27, 2025

Very interesting. Keep exploring this direction... Do we need a separate table for each time range? Or does timescaledb have some functionality for arbrtitrary time ranges too?

@diveshjain-phy
Copy link
Member Author

As far as I have read, we don't need to worry about separate tables for different time ranges. timescaledb uses one table and automatically partitions data into time-based chunks. When we query any time range, it automatically finds the relevant chunks.

@JBorrow
Copy link
Member

JBorrow commented Aug 27, 2025

Ah I understand, so the chunk_time_interval => INTERVAL '6 months', is more of an optimization thing?

@diveshjain-phy
Copy link
Member Author

diveshjain-phy commented Aug 28, 2025

Yes. Best practice is to set chunk_time_interval so that one chunk of data takes up 25% of RAM. Most examples show 7-14 days as starting points, with TimescaleDB's default being 7 days.

@diveshjain-phy
Copy link
Member Author

Right now I have set the aggregate bucketing at 1 month. This way every row in the table corresponds to the month start, so any query range includes entire months whose buckets start inside that range. For example, asking for 1 Mar–2 Sep returns the September bucket that starts on 1 Sep, and because that bucket covers the whole month, the results run through 30 Sep. If you don’t anticipate any issues, we can combine information from the Aggregate Table and Raw Table for accurate results. Alternatively, we can set shorter buckets, but this requires optimisation unless we know what users are comfortable with.

@diveshjain-phy
Copy link
Member Author

diveshjain-phy commented Oct 6, 2025

@JBorrow I've implemented performance tests for the aggregate statistics endpoints to compare continuous aggregates vs raw queries:

  • /analysis/aggregate/{source_id}/{band_name} (with continuous aggregates)
  • /analysis/wo_ca/aggregate/{source_id}/{band_name} (without continuous aggregates)
============================================================
Testing: GET http://localhost:8000/analysis/aggregate/1/f145
============================================================
Mean:12.94 ms
std:7.83 ms
min:8.40 ms
max:117.82 ms

============================================================
Testing: GET http://localhost:8000/analysis/wo_ca/aggregate/1/f145
============================================================
Mean:68.66 ms
std:56.27 ms
min:57.14 ms
max:2525.97 ms

Ratio of mean times with aggregate to without aggregate calls: 0.18845746342388867

Is it the right way to approach the tests?

perf_test.py

@JBorrow
Copy link
Member

JBorrow commented Oct 6, 2025

Seems like a reasonable way to do this. I guess in the future we'd probably not want to support two types of endpoint though.

@diveshjain-phy
Copy link
Member Author

Yes, definitely. I had the 'without continuous aggregate' endpoint for testing purposes and will drop it once this is cleanly implemented.

@diveshjain-phy
Copy link
Member Author

Hi @JBorrow! I have implemented both time series and aggregate endpoints. I have verified this against the raw table statistics. Would you like to test the implementation, or should I proceed with dropping the endpoint serving data from the raw flux measurement tables?

Copy link
Member

@JBorrow JBorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments before the refactor to two templated setups. You're close to that already.

Comment on lines +209 to +215
return f"""
SELECT add_continuous_aggregate_policy('{self.config.view_name}',
start_offset => INTERVAL '{self.config.refresh_start_offset}',
end_offset => INTERVAL '{self.config.refresh_end_offset}',
schedule_interval => INTERVAL '{self.config.refresh_schedule_interval}'
);
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use sql parameters here? I don't like building queries with random strings...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I do select(func.function_name(...)) I get

LINE 1: SELECT add_retention_policy('band_statistics_daily', INTERVA...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be expecting a REGCLASS type variable.

Comment on lines 79 to 87
if delta_start <= 30:
view_name = "band_statistics_daily"
time_resolution = "daily"
elif delta_start <= 180:
view_name = "band_statistics_weekly"
time_resolution = "weekly"
else:
view_name = "band_statistics_monthly"
time_resolution = "monthly"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't very configurable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored and set this up to use the AggregateConfigurations dictionary

Comment on lines +52 to +61
session.execute(text("""
SELECT create_hypertable(
'flux_measurements',
'time',
chunk_time_interval => INTERVAL '7 days',
if_not_exists => TRUE
);
"""))
session.commit()
create_continuous_aggregates(session)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is hypertable the table that the CAs get loaded from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes the flux_measurements a hypertable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants