Skip to content

fix: strip all punctuation characters #1482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

im-adithya
Copy link

Fixes TryGhost/Ghost#22248

Description

Adds other punctuation characters as mentioned in the issue above to the regex.

Copy link

coderabbitai bot commented Mar 16, 2025

Walkthrough

The pull request introduces a new helper function, createSlug, which encapsulates the slugification logic in the slugify function. This function processes an inputString and utilizes a symbolRegex to remove specified symbols before encoding the string. The main exported function has been updated to use this new helper function, and the default ghostVersion parameter has been changed from '4.0' to '6.0'. The logic for generating slugs has been modified to handle different versions: for versions less than 6.x, a specific regex is used, while for versions 6.x and above, an expanded regex includes additional punctuation and typographic symbols. The inline slugification logic has been replaced with calls to the createSlug function, enhancing the structure of the code. Test cases have also been updated to reflect these changes.

Assessment against linked issues

Objective Addressed Explanation
Strip additional punctuation characters from auto-generated heading ID attributes (#22248)

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3aa5209 and 5653621.

📒 Files selected for processing (2)
  • packages/kg-utils/lib/slugify.js (2 hunks)
  • packages/kg-utils/test/slugify.test.js (1 hunks)
🔇 Additional comments (6)
packages/kg-utils/lib/slugify.js (3)

3-17: Great refactoring with the new createSlug helper function

Extracting the common slugification logic into a dedicated helper function with proper JSDoc documentation improves code maintainability and readability. This eliminates duplication and makes the code more modular.


40-48: Good implementation of version-based behavior

The conditional logic based on version is a solid approach to maintain backward compatibility while enhancing functionality for newer versions. This addresses the concerns raised in the PR comments about potential breaking changes.

The expanded regex for versions ≥6.x now properly includes additional punctuation and typographic symbols, fulfilling the PR objective to strip all punctuation characters.


19-19:

❓ Verification inconclusive

Consider the impact of changing the default ghostVersion

Changing the default ghostVersion from '4.0' to '6.0' means that all calls without an explicit version will now use the expanded regex. While this aligns with the PR's goal, it could potentially be a breaking change for existing code that relies on the previous default behavior.


🏁 Script executed:

#!/bin/bash
# Check for usages of slugify without an explicit ghostVersion parameter
rg -A 1 -B 1 "slugify\(" --glob "!**/test/**" --glob "!**/slugify.js"

Length of output: 2899


Review Note: Verify Impact of Updated Default ghostVersion Behavior

The change in default from '4.0' to '6.0' in packages/kg-utils/lib/slugify.js means that any call to slugify without explicitly specifying a ghostVersion will now use the expanded regex for version 6.0. Our repository-wide search indicates several usages (e.g., in packages/kg-mobiledoc-html-renderer, packages/kg-lexical-html-renderer, packages/kg-markdown-html-renderer, and the various header renderers) where the function is invoked without an explicit version. This could potentially lead to unexpected behavior if those parts of the code are relying on the old 4.0 implementation.

  • Affected Files/Usages:
    • packages/kg-mobiledoc-html-renderer/lib/MobiledocHtmlRenderer.js – invoking slugify(text, this.options)
    • packages/kg-lexical-html-renderer/lib/utils/generate-id.ts – calling slugify(text, options)
    • packages/kg-markdown-html-renderer/lib/markdown-html-renderer.js – using slugify(title, usedHeaders)
    • packages/kg-default-nodes/lib/nodes/header/renderers/v1 and v2 – invoking slugify with header data
    • Other similar cases where no explicit ghostVersion is provided

Please verify that this change is intentional for all these contexts. If certain usages require the legacy regex logic tied to version '4.0', consider either updating those calls to explicitly pass the needed ghostVersion or adding appropriate tests/deprecation warnings to manage potential breaking changes.

packages/kg-utils/test/slugify.test.js (3)

53-53: Improved test suite description

Clarifying the test suite description from ">=4.x" to ">=4.x <6.x" makes the version range explicit and aligns with the implementation changes. This helps developers understand which functionality is being tested.


55-55: Good practice to use explicit version parameters

Adding explicit ghostVersion: '4.0' parameters to test calls ensures that tests target specific version behavior rather than relying on default values. This makes the tests more resilient to future changes in the default version.

Also applies to: 60-60, 65-65, 72-72


77-82: Comprehensive test for new functionality

The new test suite for versions ≥6.x properly verifies that the expanded regex correctly handles additional punctuation and typographic symbols. The test case is comprehensive, including various types of quotes, dashes, and special characters.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/kg-utils/lib/slugify.js (1)

28-28: Consider using a more maintainable approach for character definitions

For improved readability and maintainability, consider extracting the regex pattern to a named constant. This would make future additions or modifications clearer.

 module.exports = function (inputString = '', {ghostVersion = '4.0', type = 'mobiledoc'} = {}) {
     const version = semver.coerce(ghostVersion);
+    const PUNCTUATION_REGEX = /[\][!"#$%&'()*+,./:;<=>?@\\^_{|}~''""`¡¿–—•]/g;
 
     if (typeof inputString !== 'string' || (inputString || '').trim() === '') {
         return '';
     }
     // ...
     return encodeURIComponent(inputString.trim()
         .toLowerCase()
-        .replace(/[\][!"#$%&'()*+,./:;<=>?@\\^_{|}~''""`¡¿–—•]/g, '')
+        .replace(PUNCTUATION_REGEX, '')
         .replace(/\s+/g, '-')
         .replace(/^-|-{2,}|-$/g, '')
     );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d4ecdd and 3aa5209.

📒 Files selected for processing (1)
  • packages/kg-utils/lib/slugify.js (1 hunks)
🔇 Additional comments (2)
packages/kg-utils/lib/slugify.js (2)

28-28: Approve the expanded punctuation stripping

The change correctly extends the regex pattern to include additional punctuation and special characters (curly quotes, smart double quotes, backticks, inverted punctuation marks, dashes, and bullets) as specified in the PR objectives. This enhancement will create cleaner, more consistent slugs.


28-28:

❓ Verification inconclusive

Consider adding unit tests for new punctuation characters

While the implementation is correct, it would be beneficial to add unit tests that verify the new punctuation characters are properly stripped from generated slugs.


🏁 Script executed:

#!/bin/bash
# Check if there are existing tests for the slugify function that would cover this change
echo "Searching for existing test files for slugify..."
fd -t f --glob "*test*" | xargs grep -l "slugify" || echo "No tests found with 'slugify'"

Length of output: 195


Action: Ensure Comprehensive Test Coverage for New Punctuation in Slugify

The implementation correctly strips the new punctuation characters as per the updated regex. I identified that there is an existing test file at packages/kg-utils/test/slugify.test.js. Please review this file to confirm it includes specific test cases that validate the removal of each of the newly added punctuation characters. If coverage for these punctuation cases is missing or incomplete, kindly add the necessary tests.

@im-adithya
Copy link
Author

@kevinansfield can you please have a look :D

@kevinansfield
Copy link
Member

Hey @im-adithya 👋 Thanks for taking a look at this!

Unfortunately we can't merge as-is because the change in slugify behaviour is a breaking change as mentioned here. We can work around that by putting the updated regex in a new version conditional like we currently have for <4.0 and >=4.0. The changes will also need tests 🙂

@im-adithya
Copy link
Author

Hey Kevin, sorry I missed that comment, I added the version conditional + tests, should be good now!

(Also, can you please have a look at my other PRs as well? Thanks!)

);
}

module.exports = function (inputString = '', {ghostVersion = '6.0', type = 'mobiledoc'} = {}) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up that I changed the default to 6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Punctuation characters aren't stripped from auto-generated heading ID attributes
2 participants