Skip to content

Implement TiDB database monitoring #20826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 75 additions & 1 deletion mysql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The MySQL integration tracks the performance of your MySQL instances. It collect

Enable [Database Monitoring][32] (DBM) for enhanced insights into query performance and database health. In addition to the standard integration, Datadog DBM provides query-level metrics, live and historical query snapshots, wait event analysis, database load, and query explain plans.

MySQL version 5.6, 5.7, 8.0, and MariaDB versions 10.5, 10.6, 10.11 and 11.1 are supported.
MySQL version 5.6, 5.7, 8.0, MariaDB versions 10.5, 10.6, 10.11 and 11.1, and TiDB version 8.1+ are supported.

## Setup

Expand Down Expand Up @@ -68,6 +68,24 @@ mysql> GRANT PROCESS ON *.* TO 'datadog'@'%';
Query OK, 0 rows affected (0.00 sec)
```

##### TiDB-specific setup

For TiDB databases, the user setup is similar but with some differences:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For TiDB databases, the user setup is similar but with some differences:
For TiDB databases, the user setup is similar but with some differences:

Can you clarify what the set up is similar to? Is it to other Databases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add similart to other databases like MySQL, MariaDB, and so on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in
48bec2a


- TiDB does not have `performance_schema`, so skip the performance_schema grant
- TiDB does not support the `REPLICATION CLIENT` privilege, but this is not needed as TiDB uses different replication mechanisms
- The `innodb_index_stats` table is not available in TiDB

For TiDB, create the user with these commands:

```shell
mysql> CREATE USER 'datadog'@'%' IDENTIFIED BY '<UNIQUEPASSWORD>';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT PROCESS ON *.* TO 'datadog'@'%';
Query OK, 0 rows affected (0.00 sec)
```

Verify the replication client. Replace `<UNIQUEPASSWORD>` with the password you created above:

```shell
Expand Down Expand Up @@ -141,6 +159,27 @@ For a full list of available configuration options, see the [sample `mysql.d/con

To collect `extra_performance_metrics`, your MySQL server must have `performance_schema` enabled - otherwise set `extra_performance_metrics` to `false`. For more information on `performance_schema`, see [MySQL Performance Schema Quick Start][9].

##### TiDB Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
##### TiDB Configuration
##### TiDB configuration

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in
48bec2a


For TiDB instances, some configuration options should be adjusted:

```yaml
init_config:

instances:
- host: 127.0.0.1
username: datadog
password: "<YOUR_CHOSEN_PASSWORD>"
port: 4000 # Default TiDB port
options:
replication: false # TiDB uses different replication mechanisms
galera_cluster: false
extra_status_metrics: true
extra_innodb_metrics: false # TiDB doesn't have InnoDB
disable_innodb_metrics: true # Disable InnoDB metrics for TiDB
extra_performance_metrics: false # TiDB doesn't have performance_schema
```

**Note**: The `datadog` user should be set up in the MySQL integration configuration as `host: 127.0.0.1` instead of `localhost`. Alternatively, you may also use `sock`.

[Restart the Agent][10] to start sending MySQL metrics to Datadog.
Expand Down Expand Up @@ -251,6 +290,14 @@ LABEL "com.datadoghq.ad.init_configs"='[{}]'
LABEL "com.datadoghq.ad.instances"='[{"server": "%%host%%", "username": "datadog","password": "<UNIQUEPASSWORD>"}]'
```

For TiDB instances, add the appropriate configuration options:

```yaml
LABEL "com.datadoghq.ad.check_names"='["mysql"]'
LABEL "com.datadoghq.ad.init_configs"='[{}]'
LABEL "com.datadoghq.ad.instances"='[{"server": "%%host%%", "username": "datadog", "password": "<UNIQUEPASSWORD>", "port": 4000, "options": {"disable_innodb_metrics": true, "extra_performance_metrics": false}}]'
```

See [Autodiscovery template variables][12] for details on using `<UNIQUEPASSWORD>` as an environment variable instead of a label.

#### Log collection
Expand Down Expand Up @@ -551,6 +598,24 @@ The check does not collect all metrics by default. Set the following boolean con
| ---------------------- | ----------- |
| mysql.info.schema.size | GAUGE |

#### TiDB limitations

When using this integration with TiDB, be aware of the following limitations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using this integration with TiDB, be aware of the following limitations:
When using this integration with TiDB, be aware of the following limitations:

Which integration is this referencing? If it is the InnoDB integration, would this change work:

When using he InnoDB integration with TiDB, be aware of the following limitations:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not be satisfy your request. but fixed for some.

48bec2a

In fact I don't mention for the specific integration...
thanks for the review


- **InnoDB metrics**: TiDB doesn't use the InnoDB storage engine, so all InnoDB-related metrics are unavailable
- **Performance Schema**: TiDB doesn't have MySQL's `performance_schema`, so performance metrics requiring it are unavailable
- **Replication metrics**: TiDB uses a different replication mechanism (Raft consensus), so traditional MySQL replication metrics don't apply
- **MyISAM metrics**: TiDB doesn't support MyISAM, so key cache metrics are unavailable
- **Binary log metrics**: TiDB has a different binlog implementation, so traditional MySQL binlog metrics may not be accurate
- **Statement metrics**: TiDB uses `information_schema.cluster_statements_summary` instead of `performance_schema.events_statements_summary_by_digest`
- **Activity monitoring**: TiDB uses `information_schema.cluster_processlist` instead of `performance_schema.events_statements_current`

For Database Monitoring features:
- Query samples and explain plans are collected from `cluster_statements_summary` with some approximations
- Wait events are not available as TiDB doesn't track them in the same way as MySQL
- Some query metrics are approximated (e.g., rows examined is estimated from keys processed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Some query metrics are approximated (e.g., rows examined is estimated from keys processed)
- Some query metrics are approximated (for example, rows examined is estimated from keys processed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, applied in
48bec2a

- TiDB explain plans are retrieved from the `PLAN` column in `information_schema.cluster_statements_summary` table, which contains pre-collected execution plans in text format with embedded execution statistics

### Events

The MySQL check does not include any events.
Expand All @@ -571,6 +636,15 @@ See [service_checks.json][22] for a list of service checks provided by this inte
- [Database user lacks privileges][29]
- [How to collect metrics with a SQL Stored Procedure?][30]

### TiDB-specific troubleshooting

**Missing metrics**: If you see warnings about missing InnoDB or performance_schema metrics when monitoring TiDB:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Missing metrics**: If you see warnings about missing InnoDB or performance_schema metrics when monitoring TiDB:
**Missing metrics**: If you see warnings about missing InnoDB or `performance_schema` metrics when monitoring TiDB:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- This is expected behavior. Set `disable_innodb_metrics: true` and `extra_performance_metrics: false` in your configuration.

**Connection issues**: TiDB typically runs on port 4000 instead of MySQL's default 3306. Make sure to specify the correct port in your configuration.

**High metric collection time**: The `CLUSTER_*` tables in TiDB aggregate data from all TiDB nodes, which can be slow in large clusters. Consider increasing the collection interval if needed.

## Further Reading

Additional helpful documentation, links, and articles:
Expand Down
1 change: 1 addition & 0 deletions mysql/changelog.d/20826.changed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Implement TiDB database monitoring
203 changes: 202 additions & 1 deletion mysql/datadog_checks/mysql/activity.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import time
from contextlib import closing
from enum import Enum
from typing import Dict, List # noqa: F401
from typing import Dict, List, Tuple # noqa: F401

import pymysql

Expand Down Expand Up @@ -130,6 +130,37 @@
)
"""

# TiDB specific constants
TIDB_ACTIVITY_QUERY_LIMIT = 100

# TiDB specific activity query
TIDB_ACTIVITY_QUERY = """\
SELECT
ID as processlist_id,
USER as processlist_user,
HOST as processlist_host,
DB as processlist_db,
COMMAND as processlist_command,
STATE as processlist_state,
INFO as sql_text,
TIME as query_time,
MEM as memory_usage,
TxnStart as txn_start_time
FROM INFORMATION_SCHEMA.CLUSTER_PROCESSLIST
WHERE
COMMAND != 'Sleep'
AND INFO IS NOT NULL
AND INFO != ''
-- Exclude our own monitoring queries
AND INFO NOT LIKE '%CLUSTER_PROCESSLIST%'
AND INFO NOT LIKE '%datadog-agent%'
-- Exclude other system queries
AND INFO NOT LIKE '%INFORMATION_SCHEMA%'
AND INFO NOT LIKE '%performance_schema%'
ORDER BY TIME DESC
LIMIT {}
""".format(TIDB_ACTIVITY_QUERY_LIMIT)


class MySQLVersion(Enum):
# 8.0
Expand Down Expand Up @@ -183,6 +214,12 @@ def run_job(self):
'Waiting for events_waits_current availability to be determined by the check, skipping run.'
)
if self._check.events_wait_current_enabled is False:
# Use TiDB-specific activity collection
if self._check._get_is_tidb(self._db):
self._log.debug("TiDB detected, using TiDB-specific activity collection")
self._collect_tidb_activity()
return

azure_deployment_type = self._config.cloud_metadata.get("azure", {}).get("deployment_type")
if azure_deployment_type != "flexible_server":
self._check.record_warning(
Expand All @@ -201,6 +238,170 @@ def run_job(self):
self._check_version()
self._collect_activity()

@tracked_method(agent_check_getter=agent_check_getter)
def _collect_tidb_activity(self):
# type: () -> None
"""Collect activity data from TiDB CLUSTER_PROCESSLIST"""
tags = [t for t in self._tags if not t.startswith('dd.internal')]

with closing(self._get_db_connection().cursor(CommenterDictCursor)) as cursor:
rows = self._get_tidb_activity(cursor)
rows = self._normalize_tidb_rows(rows)

# Group rows by TiDB node instance
rows_by_node = {}
for row in rows:
node_instance = row.get('processlist_host', 'unknown')
if node_instance not in rows_by_node:
rows_by_node[node_instance] = []
rows_by_node[node_instance].append(row)

# Create and send separate events for each TiDB node
for node_instance, node_rows in rows_by_node.items():
event = self._create_tidb_activity_event(node_rows, tags, node_instance)
payload = json.dumps(event, default=self._json_event_encoding)
self._check.database_monitoring_query_activity(payload)
self._check.histogram(
"dd.mysql.activity.collect_activity.payload_size",
len(payload),
tags=tags + ["tidb_node_instance:{}".format(node_instance)] + self._check._get_debug_tags(),
)

@tracked_method(agent_check_getter=agent_check_getter, track_result_length=True)
def _get_tidb_activity(self, cursor):
# type: (pymysql.cursor) -> List[Dict[str]]
"""Execute TiDB activity query"""
self._log.debug("Running TiDB activity query [%s]", TIDB_ACTIVITY_QUERY)
cursor.execute(TIDB_ACTIVITY_QUERY)
return cursor.fetchall()

def _derive_tidb_wait_event(self, state):
# type: (str) -> Tuple[str, str]
"""
Derive wait event and wait event group from TiDB processlist state.
Returns (wait_event, wait_event_group)
"""
return 'N/A', 'N/A'

def _normalize_tidb_rows(self, rows):
# type: (List[Dict[str]]) -> List[Dict[str]]
"""Normalize TiDB activity rows to match expected format"""
normalized_rows = []
estimated_size = 0

for row in rows:
# Generate unique identifiers for TiDB
thread_id = row.get('processlist_id', 0)

# Derive wait event from state
state = row.get('processlist_state', '')
wait_event, wait_event_group = self._derive_tidb_wait_event(state)

# Convert TiDB fields to match MySQL activity format
normalized_row = {
'thread_id': thread_id,
'processlist_id': row.get('processlist_id'),
'processlist_user': row.get('processlist_user'),
'processlist_host': row.get('processlist_host'),
'processlist_db': row.get('processlist_db'),
'processlist_command': row.get('processlist_command'),
'processlist_state': row.get('processlist_state'),
'sql_text': row.get('sql_text'),
'query_time': row.get('query_time', 0),
'memory_usage': row.get('memory_usage', 0),
'txn_start_time': row.get('txn_start_time'),
# Derived wait events
'wait_event': wait_event,
'wait_event_type': wait_event_group,
}

# Add query truncation state
if normalized_row['sql_text'] is not None:
normalized_row['query_truncated'] = get_truncation_state(normalized_row['sql_text']).value

# Obfuscate the query
normalized_row = self._obfuscate_and_sanitize_row(normalized_row)

estimated_size += self._get_estimated_row_size_bytes(normalized_row)
if estimated_size > MySQLActivity.MAX_PAYLOAD_BYTES:
return normalized_rows

normalized_rows.append(normalized_row)

return normalized_rows

def _create_tidb_activity_event(self, active_sessions, tags, node_instance):
# type: (List[Dict[str]], List[str], str) -> Dict[str]
"""Create activity event payload for TiDB"""
# Convert rows to MySQL-compatible activity format
mysql_activity = []

for row in active_sessions:
# Calculate timing information
# Use milliseconds to avoid overflow issues
current_time_ms = int(time.time() * 1000)
query_time_s = row.get('query_time', 0)
query_time_ms = int(query_time_s * 1000) if query_time_s else 0
event_start_ms = max(0, current_time_ms - query_time_ms)

# Generate event IDs based on thread_id and timestamp
event_id = hash(str(row['thread_id']) + str(current_time_ms)) % (2**31) # Keep it positive and reasonable

activity = {
# Essential identifiers
'thread_id': row['thread_id'],
'processlist_id': row['processlist_id'],
'processlist_user': row['processlist_user'],
'processlist_host': row['processlist_host'],
'processlist_db': row['processlist_db'],
'processlist_command': row['processlist_command'],
'processlist_state': row['processlist_state'],
'sql_text': row.get('sql_text'),
'current_schema': row.get('processlist_db'),
'query_signature': row.get('query_signature'),
'dd_commands': row.get('dd_commands', []),
'dd_tables': row.get('dd_tables', []),
'dd_comments': row.get('dd_comments', []),
'query_truncated': row.get('query_truncated'),
# Event identifiers
'event_id': event_id,
'end_event_id': event_id, # Same as event_id for TiDB
# Timing information
'event_timer_start': event_start_ms * 1000000, # Convert to nanoseconds
'event_timer_end': current_time_ms * 1000000, # Convert to nanoseconds
'lock_time': 0, # TiDB doesn't provide lock time in CLUSTER_PROCESSLIST
# Wait event info
'wait_event': row.get('wait_event', 'CPU'),
'wait_timer_start': event_start_ms * 1000000, # Same as event timer
'wait_timer_end': current_time_ms * 1000000,
# Additional MySQL compatibility fields
'object_name': None, # TiDB doesn't track file operations
'object_type': None,
'operation': None,
'source': '',
}

mysql_activity.append(activity)

event = {
"host": self._check.reported_hostname,
"ddagentversion": datadog_agent.get_version(),
"ddsource": "mysql",
"dbm_type": "activity",
"collection_interval": self.collection_interval,
"ddtags": tags,
"timestamp": time.time() * 1000,
"cloud_metadata": self._config.cloud_metadata,
'service': self._config.service,
"mysql_activity": mysql_activity,
}

# For TiDB, add the specific node instance for this activity event
if node_instance:
event['tidb'] = {'node_instance': node_instance}

return event

def _check_version(self):
# type: () -> None
if self._check.version.version_compatible((8,)):
Expand Down
Loading
Loading