Skip to content

Implement TiDB database monitoring #20826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

takaidohigasi
Copy link

@takaidohigasi takaidohigasi commented Jul 23, 2025

What does this PR do?

extend part of MySQL database monitoring capability to TiDB.
#20811

note: wait event etc. does not work for TiDB

Motivation

database monitoring is very useful tool and we will use it with other metrics with MCP server.
If we have query baseline information in datadog, we are very happy.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Discussion and Some Review Point

Overview

Feature MySQL Table TiDB Table Key Differences
Statement Stats performance_schema.events_statements_summary_by_digest information_schema.cluster_statements_summary TiDB aggregates across all nodes, different column names
Current Activity performance_schema.events_statements_current information_schema.cluster_processlist There isn't information for wait event for TiDB, so always N/A for TiDB wait_event
Execution Plans Real-time EXPLAIN Pre-collected in PLAN column(refer to information_schema.cluster_statements_summary PLAN) TiDB stores plans with statements
Variables performance_schema.global_variables SHOW VARIABLES only TiDB has no performance_schema

metric collection adjustments

  • Skips incompatible metrics for TiDB:
    * InnoDB metrics - TiDB uses TiKV storage engine, no information_schema.innodb_* tables
    * MySQL replication metrics - TiDB uses Raft consensus, no SHOW SLAVE STATUS
    * MyISAM key cache metrics - TiDB doesn't support MyISAM, no Key_* status variables
    * Binary log metrics - TiDB has different binlog implementation
  • Performance schema check returns False immediately for TiDB (TiDB has no performance_schema database)

compatibility with existing MySQL

  • explain (EventType: dbm-samples)
    • to show explain result from GUI without error, we output both MySQL compatible one and TiDB specific one
    • MySQL output explain with the following output, and we create normalized_plan from TiDB explan plan (_convert_tidb_plan_to_mysql_format) in plan.definition. access_type should be MySQL compatible one (e.g. const, ref, range, index... not Point_Get, Batch_Point_Get) for current datadog server implementation, so we are converting access_type for now.
    • we also TiDB execution plan in query_block.tidb_execution_tree and query_block. tidb_original_text_plan. which we can confirm on RawJson in GUI
: MySQL output explain
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "0.00"
    },
    "table": {
      "table_name": "items",
      "access_type": "const",
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "key": "PRIMARY(id)",
      "used_key_parts": [
        "PRIMARY(id)"
      ],
      "operator_info": "table:items, clustered index:PRIMARY(id)",
      "execution_info": "time:551.7µs, loops:2, Get:{num_rpc:1, total_time:508.9µs}, time_detail: {total_process_time: 33.7µs, total_wait_time: 38.2µs, total_kv_read_wall_time: 74.5µs, tikv_wall_time: 99µs}, scan_detail: {total_process_keys: 1, total_process_keys_size: 227, total_keys: 1, get_snapshot_time: 5.9µs, rocksdb: {block: {cache_hit_count: 8}}}",
      "operator_id": "Point_Get_1"
    },
    "tidb_execution_tree": {
      "id": "Point_Get_1",
      "taskType": "root",
      "estRows": "1",
      "operatorInfo": "table:items, clustered index:PRIMARY(id)",
      "actRows": "1",
      "executionInfo": "time:551.7µs, loops:2, Get:{num_rpc:1, total_time:508.9µs}, time_detail: {total_process_time: 33.7µs, total_wait_time: 38.2µs, total_kv_read_wall_time: 74.5µs, tikv_wall_time: 99µs}, scan_detail: {total_process_keys: 1, total_process_keys_size: 227, total_keys: 1, get_snapshot_time: 5.9µs, rocksdb: {block: {cache_hit_count: 8}}}"
    },
    "tidb_original_text_plan": "\tid         \ttask\testRows\toperator info                           \tactRows\texecution info                                                                                                                                                                                                                                                                                                                                   \tmemory\tdisk\n\tPoint_Get_1\troot\t1      \ttable:items, clustered index:PRIMARY(id)\t1      \ttime:551.7µs, loops:2, Get:{num_rpc:1, total_time:508.9µs}, time_detail: {total_process_time: 33.7µs, total_wait_time: 38.2µs, total_kv_read_wall_time: 74.5µs, tikv_wall_time: 99µs}, scan_detail: {total_process_keys: 1, total_process_keys_size: 227, total_keys: 1, get_snapshot_time: 5.9µs, rocksdb: {block: {cache_hit_count: 8}}}\tN/A   \tN/A"
  }
}

TiDB also have EXPLAIN format=tidb_json, but we need to original query for it and cost higher, so I use statement_summary plan information. note that which is not realtime, and detected plan for the period. (default 30min)

https://docs.pingcap.com/tidb/stable/sql-statement-explain/

@takaidohigasi
Copy link
Author

Run python ddev/src/ddev/utils/scripts/check_pr.py changelog --diff-file /tmp/diff --pr-file "$GITHUB_EVENT_PATH"  --repo "core"
Package "mysql" has changes that require a changelog. Please run `ddev release changelog new` to add it.
Error: Package "mysql" has changes that require a changelog. Please run `ddev release changelog new` to add it.
Error: Process completed with exit code 1.

@takaidohigasi takaidohigasi changed the title Implement tidb database monitoring Implement TiDB database monitoring Jul 23, 2025
Copy link

⚠️ Major version bump
The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

@takaidohigasi takaidohigasi force-pushed the implement-tidb-database-monitoring branch 3 times, most recently from 04ae784 to 27e20c0 Compare July 23, 2025 08:11
Copy link

codecov bot commented Jul 23, 2025

Codecov Report

❌ Patch coverage is 81.68215% with 355 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.04%. Comparing base (4840571) to head (dadcf97).
⚠️ Report is 43 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
confluent_platform ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
mysql 87.12% <81.68%> (-2.25%) ⬇️
presto ?
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@takaidohigasi takaidohigasi force-pushed the implement-tidb-database-monitoring branch 2 times, most recently from ec97801 to 1a9eab2 Compare July 23, 2025 10:01
@takaidohigasi takaidohigasi marked this pull request as ready for review July 23, 2025 10:01
@takaidohigasi takaidohigasi requested review from a team as code owners July 23, 2025 10:01
@takaidohigasi
Copy link
Author

please tell me if it's better not to devide files for tidb.

@takaidohigasi takaidohigasi marked this pull request as draft July 24, 2025 04:51
@takaidohigasi
Copy link
Author

sorry, I want to add some function so I changed state to draft once.

@takaidohigasi

This comment was marked as resolved.

@takaidohigasi takaidohigasi force-pushed the implement-tidb-database-monitoring branch from dcfe488 to a8c9bbd Compare July 25, 2025 05:19
@takaidohigasi takaidohigasi marked this pull request as ready for review July 25, 2025 05:51
Copy link
Contributor

@iadjivon iadjivon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, thanks for this PR. I have added some suggestions here. Let us know if you have any questions.

mysql/README.md Outdated
@@ -68,6 +68,24 @@ mysql> GRANT PROCESS ON *.* TO 'datadog'@'%';
Query OK, 0 rows affected (0.00 sec)
```

##### TiDB-specific setup

For TiDB databases, the user setup is similar but with some differences:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For TiDB databases, the user setup is similar but with some differences:
For TiDB databases, the user setup is similar but with some differences:

Can you clarify what the set up is similar to? Is it to other Databases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add similart to other databases like MySQL, MariaDB, and so on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in
48bec2a

mysql/README.md Outdated
@@ -141,6 +159,27 @@ For a full list of available configuration options, see the [sample `mysql.d/con

To collect `extra_performance_metrics`, your MySQL server must have `performance_schema` enabled - otherwise set `extra_performance_metrics` to `false`. For more information on `performance_schema`, see [MySQL Performance Schema Quick Start][9].

##### TiDB Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
##### TiDB Configuration
##### TiDB configuration

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in
48bec2a

mysql/README.md Outdated
@@ -551,6 +598,24 @@ The check does not collect all metrics by default. Set the following boolean con
| ---------------------- | ----------- |
| mysql.info.schema.size | GAUGE |

#### TiDB limitations

When using this integration with TiDB, be aware of the following limitations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using this integration with TiDB, be aware of the following limitations:
When using this integration with TiDB, be aware of the following limitations:

Which integration is this referencing? If it is the InnoDB integration, would this change work:

When using he InnoDB integration with TiDB, be aware of the following limitations:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not be satisfy your request. but fixed for some.

48bec2a

In fact I don't mention for the specific integration...
thanks for the review

mysql/README.md Outdated
For Database Monitoring features:
- Query samples and explain plans are collected from `cluster_statements_summary` with some approximations
- Wait events are not available as TiDB doesn't track them in the same way as MySQL
- Some query metrics are approximated (e.g., rows examined is estimated from keys processed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Some query metrics are approximated (e.g., rows examined is estimated from keys processed)
- Some query metrics are approximated (for example, rows examined is estimated from keys processed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, applied in
48bec2a

mysql/README.md Outdated
@@ -571,6 +636,15 @@ See [service_checks.json][22] for a list of service checks provided by this inte
- [Database user lacks privileges][29]
- [How to collect metrics with a SQL Stored Procedure?][30]

### TiDB-specific troubleshooting

**Missing metrics**: If you see warnings about missing InnoDB or performance_schema metrics when monitoring TiDB:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Missing metrics**: If you see warnings about missing InnoDB or performance_schema metrics when monitoring TiDB:
**Missing metrics**: If you see warnings about missing InnoDB or `performance_schema` metrics when monitoring TiDB:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takaidohigasi
Copy link
Author

please tell me if it's better not to devide files for tidb.
→ I merged the file

self._log.debug("Failed to parse TiDB plan to JSON: %s", e)
return json.dumps({"raw_plan": plan_text, "parse_error": str(e)})

def _convert_tidb_plan_to_mysql_format(self, tidb_plan_json):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix some to this function. It turned out this does not cover some cases.

@takaidohigasi takaidohigasi requested a review from iadjivon July 28, 2025 07:30
@takaidohigasi takaidohigasi force-pushed the implement-tidb-database-monitoring branch 3 times, most recently from bd3ecca to 5ea69a1 Compare July 30, 2025 02:28
- Map all TiDB operator types to MySQL-compatible access types
- Add support for additional TiDB operators:
  - Batch_Point_Get → const (multiple point lookups)
  - TableRowIDScan → ref (rowid lookup)
  - TableScan/IndexScan → ALL/index (generic scans)
  - IndexLookUp → ref (non-unique index lookup)
- Improve mapping accuracy for better MySQL compatibility

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@takaidohigasi takaidohigasi force-pushed the implement-tidb-database-monitoring branch from e72fbe8 to dadcf97 Compare July 30, 2025 08:27
@takaidohigasi
Copy link
Author

@iadjivon I'm ready again. Would you please review the doc again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants