Skip to content

Conversation

@rich-iannone
Copy link
Member

This PR adds the .cols_merge() method. It allows for selective concatenation of cell values to a single cell, automatically hiding other columns involved in the merging process (could be set to not hide, however). I'll provide two examples here of how this new method works in practice.

In the first example, we merge the open & close columns together, and, the low & high columns (putting an em dash between both). The close and high columns are hidden, and we take the opportunity to rename open and low:

from great_tables import GT
from great_tables.data import sp500
import polars as pl

sp500_mini = (
    pl.from_pandas(sp500)
    .slice(49, 6)
    .select("open", "close", "low", "high")
)

(
    GT(sp500_mini)
    .fmt_number(
        columns=["open", "close", "low", "high"],
        decimals=2,
        use_seps=False
    )
    .cols_merge(columns=["open", "close"], pattern="{1}—{2}")
    .cols_merge(columns=["low", "high"], pattern="{1}—{2}")
    .cols_label(open="open/close", low="low/high")
)
image

This next one is a bit more involved in that we account for missing values in merged cells. We use the .cols_merge() twice to merge together the: (1) trq and trq_rpm columns, and (2) mpg_c & mpg_h columns. Given the presence of missing values we can use patterns that include <</>> to create conditional text spans, excluding results where any of the merged columns have missing values.

from great_tables import GT
from great_tables.data import gtcars
import polars as pl
import polars.selectors as cs

gtcars_pl = (
    pl.from_pandas(gtcars)
    .filter(pl.col("year") == 2017)
    .select(["mfr", "model", "trq", "trq_rpm", "mpg_c", "mpg_h"])
)

(
    GT(gtcars_pl)
    .fmt_integer(columns=[cs.starts_with("trq"), cs.starts_with("mpg")])
    .cols_merge(columns=["trq", "trq_rpm"], pattern="{1}<< ({2} rpm)>>")
    .cols_merge(columns=["mpg_c", "mpg_h"], pattern="<<{1} city<</{2} hwy>>>>")
    .cols_label(mfr="Manufacturer", model="Car Model", trq="Torque", mpg_c="MPG")
)
image

Note that we formatted the floating point values to be displayed as integers and that the formatted values are present in the merged outputs. The ability to handle formatting separately from the merging (along with the NA handling) makes producing highly customized cell outputs really easy to iterate on.

This PR partially addresses #171. The remaining .cols_merge*() methods will be added in a later PR.

@codecov
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

❌ Patch coverage is 94.49541% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.65%. Comparing base (85d25f6) to head (6c512e2).

Files with missing lines Patch % Lines
great_tables/_spanners.py 91.30% 2 Missing ⚠️
great_tables/_utils.py 95.00% 2 Missing ⚠️
great_tables/gt.py 94.73% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #780      +/-   ##
==========================================
+ Coverage   91.58%   91.65%   +0.06%     
==========================================
  Files          47       47              
  Lines        5752     5857     +105     
==========================================
+ Hits         5268     5368     +100     
- Misses        484      489       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot temporarily deployed to pr-780 October 8, 2025 20:41 Destroyed
Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read any of the code in utils yet, but the main piece I would vote to refactor a bit is the big method on the GT class, and avoiding full GT renders for the unit tests (see comments for suggestions).

new_body.render_formats(self._tbl_data, self._substitutions, context)
return self._replace(_body=new_body)

def _perform_col_merge(self) -> Self:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind moving this out of the GT class? Please do not put large amounts of logic onto this class (see other methods on it for smaller bits of logic that seem okay to add).


# For Pandas and Polars, _set_cell() modifies in place and returns None but
# for PyArrow, _set_cell() returns a new table
if result is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this custom handling of backends happen in other places outside of _tbl_data.py?


# With each column reference in the pattern, check that it is valid
for col_ref in pattern_cols:
# The pattern syntax uses 1-based indexing so adjust here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1-based indexing seems like a bad idea (but maybe it happens in other places in Python?)

)

# Process each row (according to the `rows=` parameter in `cols_merge()`)
for row_idx in rows:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way for you to extract out big chunks of the logic here, so that rather than...

  • loop over all merges
    • process columns
    • process pattern
    • loop over all rows
      • call _process_col_merge_pattern

The merge logic can be run on just the values that might be in a given row. Something like...

  • ColMergeInfo processes pattern on init
  • ColMergeInfo has a method to that mostly runs _process_col_merge_pattern
  • Some bigger method that handles the merge stage of GT rendering calls this method
    • (It will likely contain much of the code in this big _perform_col_merge method)
  • This method gets used in tests, rather than rendering full tables

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read any of the code in utils yet, but the main piece I would vote to refactor a bit is the big method on the GT class, and avoiding full GT renders for the unit tests (see comments for suggestions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants