-
Notifications
You must be signed in to change notification settings - Fork 111
feat: add the .cols_merge() method
#780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #780 +/- ##
==========================================
+ Coverage 91.58% 91.65% +0.06%
==========================================
Files 47 47
Lines 5752 5857 +105
==========================================
+ Hits 5268 5368 +100
- Misses 484 489 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
machow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read any of the code in utils yet, but the main piece I would vote to refactor a bit is the big method on the GT class, and avoiding full GT renders for the unit tests (see comments for suggestions).
| new_body.render_formats(self._tbl_data, self._substitutions, context) | ||
| return self._replace(_body=new_body) | ||
|
|
||
| def _perform_col_merge(self) -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind moving this out of the GT class? Please do not put large amounts of logic onto this class (see other methods on it for smaller bits of logic that seem okay to add).
|
|
||
| # For Pandas and Polars, _set_cell() modifies in place and returns None but | ||
| # for PyArrow, _set_cell() returns a new table | ||
| if result is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this custom handling of backends happen in other places outside of _tbl_data.py?
|
|
||
| # With each column reference in the pattern, check that it is valid | ||
| for col_ref in pattern_cols: | ||
| # The pattern syntax uses 1-based indexing so adjust here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1-based indexing seems like a bad idea (but maybe it happens in other places in Python?)
| ) | ||
|
|
||
| # Process each row (according to the `rows=` parameter in `cols_merge()`) | ||
| for row_idx in rows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way for you to extract out big chunks of the logic here, so that rather than...
- loop over all merges
- process columns
- process pattern
- loop over all rows
- call _process_col_merge_pattern
The merge logic can be run on just the values that might be in a given row. Something like...
- ColMergeInfo processes pattern on init
- ColMergeInfo has a method to that mostly runs _process_col_merge_pattern
- Some bigger method that handles the merge stage of GT rendering calls this method
- (It will likely contain much of the code in this big
_perform_col_mergemethod)
- (It will likely contain much of the code in this big
- This method gets used in tests, rather than rendering full tables
machow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read any of the code in utils yet, but the main piece I would vote to refactor a bit is the big method on the GT class, and avoiding full GT renders for the unit tests (see comments for suggestions).
This PR adds the
.cols_merge()method. It allows for selective concatenation of cell values to a single cell, automatically hiding other columns involved in the merging process (could be set to not hide, however). I'll provide two examples here of how this new method works in practice.In the first example, we merge the
open&closecolumns together, and, thelow&highcolumns (putting an em dash between both). Thecloseandhighcolumns are hidden, and we take the opportunity to renameopenandlow:This next one is a bit more involved in that we account for missing values in merged cells. We use the
.cols_merge()twice to merge together the: (1)trqandtrq_rpmcolumns, and (2)mpg_c&mpg_hcolumns. Given the presence of missing values we can use patterns that include<</>>to create conditional text spans, excluding results where any of the merged columns have missing values.Note that we formatted the floating point values to be displayed as integers and that the formatted values are present in the merged outputs. The ability to handle formatting separately from the merging (along with the NA handling) makes producing highly customized cell outputs really easy to iterate on.
This PR partially addresses #171. The remaining
.cols_merge*()methods will be added in a later PR.