Skip to content

Add CAG validation to synthesizer.validate #2480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: feature/single-table-CAG
Choose a base branch
from

Conversation

R-Palazzo
Copy link
Contributor

CU-86b4pmjph
Resolve #2470

@R-Palazzo R-Palazzo self-assigned this Apr 24, 2025
@sdv-team
Copy link
Contributor

Copy link

codecov bot commented Apr 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.59%. Comparing base (7347cc6) to head (61cb10e).
Report is 1 commits behind head on feature/single-table-CAG.

Additional details and impacted files
@@                     Coverage Diff                      @@
##           feature/single-table-CAG    #2480      +/-   ##
============================================================
+ Coverage                     98.54%   98.59%   +0.04%     
============================================================
  Files                            68       68              
  Lines                          7030     7048      +18     
============================================================
+ Hits                           6928     6949      +21     
+ Misses                          102       99       -3     
Flag Coverage Δ
integration 83.88% <100.00%> (+0.33%) ⬆️
unit 97.23% <100.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from issue-24XX-move-multi-table-logic to feature/single-table-CAG April 28, 2025 18:41
@frances-h frances-h force-pushed the issue-2470-validation branch from 202aa7b to d955a9e Compare April 30, 2025 13:40
@pvk-developer pvk-developer force-pushed the feature/single-table-CAG branch from 12ec5bc to c2c3060 Compare April 30, 2025 17:42
@R-Palazzo R-Palazzo force-pushed the feature/single-table-CAG branch from c2c3060 to 258e10a Compare May 4, 2025 08:48
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch 2 times, most recently from b62ec68 to 14d9936 Compare May 5, 2025 16:41
@R-Palazzo R-Palazzo changed the base branch from feature/single-table-CAG to issue-2484-add-version-parameter-to-single-table-synthesizer-get-metadata May 5, 2025 16:42
@R-Palazzo R-Palazzo changed the base branch from issue-2484-add-version-parameter-to-single-table-synthesizer-get-metadata to feature/single-table-CAG May 5, 2025 16:43
@R-Palazzo R-Palazzo changed the base branch from feature/single-table-CAG to issue-2484-add-version-parameter-to-single-table-synthesizer-get-metadata May 5, 2025 16:44
@R-Palazzo R-Palazzo changed the base branch from issue-2484-add-version-parameter-to-single-table-synthesizer-get-metadata to issue-2484-add-version-parameter-to-single-table May 5, 2025 16:44
Base automatically changed from issue-2484-add-version-parameter-to-single-table to feature/single-table-CAG May 6, 2025 19:17
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch from 14d9936 to 6e7813e Compare May 7, 2025 11:02
@R-Palazzo R-Palazzo marked this pull request as ready for review May 7, 2025 11:57
@R-Palazzo R-Palazzo requested a review from a team as a code owner May 7, 2025 11:57
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch from 74f045e to 85b506c Compare May 8, 2025 16:31
@@ -775,31 +768,51 @@ def _transform_helper(self, data):

return data

def preprocess(self, data):
"""Transform the raw data to numerical space.
def _validate_cags(self, data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function just call the user-facing validate_cag function (made for synthetic data)?

def validate_cag(self, synthetic_data):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, these two functions seem slightly different.

@@ -348,6 +345,19 @@ def _validate_all_tables(self, data):

return errors

def _validate_cags(self, data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we either call this _validate_cag or _validate_constraints. CAG stands fore constraint augmented generation and isn't plural.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it _validate_cag for now since there is an existing _validate_constraints that handles old-style constraints. I will name it _validate_constraints when solving #2492

@@ -501,7 +483,6 @@ def get_info(self):
return info

def _preprocess(self, data):
self.validate(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we move validate out of here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved it out because _preprocess() is defined only in the BaseSynthesizer while all the cag logic is defined in the BaseSingleTableSynthesizer. That's why I added a _preprocess_helper method (similar to transform_helper()) that handles the cag.

@R-Palazzo R-Palazzo requested review from amontanez24 and gsheni May 12, 2025 12:13
@frances-h frances-h force-pushed the feature/single-table-CAG branch from 9eba42f to 7347cc6 Compare May 12, 2025 15:16
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch from 157d027 to 61cb10e Compare May 12, 2025 16:31
Comment on lines +826 to +834
if hasattr(self, '_reject_sampling_patterns'):
for pattern in self._reject_sampling_patterns:
pattern.validate(data=data, metadata=self._original_metadata)

Returns:
pandas.DataFrame:
The preprocessed data.
if hasattr(self, '_chained_patterns'):
for pattern in self._chained_patterns:
pattern.fit(data=data, metadata=metadata)
metadata = pattern.get_updated_metadata(metadata)
data = pattern.transform(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants