Skip to content

Commit 8a29ca0

Browse files
committed
Add TiDB database monitoring support
1 parent 4840571 commit 8a29ca0

File tree

12 files changed

+3296
-113
lines changed

12 files changed

+3296
-113
lines changed

mysql/README.md

Lines changed: 75 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The MySQL integration tracks the performance of your MySQL instances. It collect
88

99
Enable [Database Monitoring][32] (DBM) for enhanced insights into query performance and database health. In addition to the standard integration, Datadog DBM provides query-level metrics, live and historical query snapshots, wait event analysis, database load, and query explain plans.
1010

11-
MySQL version 5.6, 5.7, 8.0, and MariaDB versions 10.5, 10.6, 10.11 and 11.1 are supported.
11+
MySQL version 5.6, 5.7, 8.0, MariaDB versions 10.5, 10.6, 10.11 and 11.1, and TiDB version 5.0+ are supported.
1212

1313
## Setup
1414

@@ -68,6 +68,24 @@ mysql> GRANT PROCESS ON *.* TO 'datadog'@'%';
6868
Query OK, 0 rows affected (0.00 sec)
6969
```
7070

71+
##### TiDB-specific setup
72+
73+
For TiDB databases, the user setup is similar but with some differences:
74+
75+
- TiDB does not have `performance_schema`, so skip the performance_schema grant
76+
- TiDB does not support the `REPLICATION CLIENT` privilege, but this is not needed as TiDB uses different replication mechanisms
77+
- The `innodb_index_stats` table is not available in TiDB
78+
79+
For TiDB, create the user with these commands:
80+
81+
```shell
82+
mysql> CREATE USER 'datadog'@'%' IDENTIFIED BY '<UNIQUEPASSWORD>';
83+
Query OK, 0 rows affected (0.00 sec)
84+
85+
mysql> GRANT PROCESS ON *.* TO 'datadog'@'%';
86+
Query OK, 0 rows affected (0.00 sec)
87+
```
88+
7189
Verify the replication client. Replace `<UNIQUEPASSWORD>` with the password you created above:
7290

7391
```shell
@@ -141,6 +159,27 @@ For a full list of available configuration options, see the [sample `mysql.d/con
141159
142160
To collect `extra_performance_metrics`, your MySQL server must have `performance_schema` enabled - otherwise set `extra_performance_metrics` to `false`. For more information on `performance_schema`, see [MySQL Performance Schema Quick Start][9].
143161

162+
##### TiDB Configuration
163+
164+
For TiDB instances, some configuration options should be adjusted:
165+
166+
```yaml
167+
init_config:
168+
169+
instances:
170+
- host: 127.0.0.1
171+
username: datadog
172+
password: "<YOUR_CHOSEN_PASSWORD>"
173+
port: 4000 # Default TiDB port
174+
options:
175+
replication: false # TiDB uses different replication mechanisms
176+
galera_cluster: false
177+
extra_status_metrics: true
178+
extra_innodb_metrics: false # TiDB doesn't have InnoDB
179+
disable_innodb_metrics: true # Disable InnoDB metrics for TiDB
180+
extra_performance_metrics: false # TiDB doesn't have performance_schema
181+
```
182+
144183
**Note**: The `datadog` user should be set up in the MySQL integration configuration as `host: 127.0.0.1` instead of `localhost`. Alternatively, you may also use `sock`.
145184

146185
[Restart the Agent][10] to start sending MySQL metrics to Datadog.
@@ -251,6 +290,14 @@ LABEL "com.datadoghq.ad.init_configs"='[{}]'
251290
LABEL "com.datadoghq.ad.instances"='[{"server": "%%host%%", "username": "datadog","password": "<UNIQUEPASSWORD>"}]'
252291
```
253292
293+
For TiDB instances, add the appropriate configuration options:
294+
295+
```yaml
296+
LABEL "com.datadoghq.ad.check_names"='["mysql"]'
297+
LABEL "com.datadoghq.ad.init_configs"='[{}]'
298+
LABEL "com.datadoghq.ad.instances"='[{"server": "%%host%%", "username": "datadog", "password": "<UNIQUEPASSWORD>", "port": 4000, "options": {"disable_innodb_metrics": true, "extra_performance_metrics": false}}]'
299+
```
300+
254301
See [Autodiscovery template variables][12] for details on using `<UNIQUEPASSWORD>` as an environment variable instead of a label.
255302

256303
#### Log collection
@@ -551,6 +598,24 @@ The check does not collect all metrics by default. Set the following boolean con
551598
| ---------------------- | ----------- |
552599
| mysql.info.schema.size | GAUGE |
553600

601+
#### TiDB limitations
602+
603+
When using this integration with TiDB, be aware of the following limitations:
604+
605+
- **InnoDB metrics**: TiDB doesn't use the InnoDB storage engine, so all InnoDB-related metrics are unavailable
606+
- **Performance Schema**: TiDB doesn't have MySQL's `performance_schema`, so performance metrics requiring it are unavailable
607+
- **Replication metrics**: TiDB uses a different replication mechanism (Raft consensus), so traditional MySQL replication metrics don't apply
608+
- **MyISAM metrics**: TiDB doesn't support MyISAM, so key cache metrics are unavailable
609+
- **Binary log metrics**: TiDB has a different binlog implementation, so traditional MySQL binlog metrics may not be accurate
610+
- **Statement metrics**: TiDB uses `information_schema.cluster_statements_summary` instead of `performance_schema.events_statements_summary_by_digest`
611+
- **Activity monitoring**: TiDB uses `information_schema.cluster_processlist` instead of `performance_schema.events_statements_current`
612+
613+
For Database Monitoring features:
614+
- Query samples and explain plans are collected from `cluster_statements_summary` with some approximations
615+
- Wait events are not available as TiDB doesn't track them in the same way as MySQL
616+
- Some query metrics are approximated (e.g., rows examined is estimated from keys processed)
617+
- TiDB explain plans are retrieved from the `PLAN` column in `information_schema.cluster_statements_summary` table, which contains pre-collected execution plans in text format with embedded execution statistics
618+
554619
### Events
555620

556621
The MySQL check does not include any events.
@@ -571,6 +636,15 @@ See [service_checks.json][22] for a list of service checks provided by this inte
571636
- [Database user lacks privileges][29]
572637
- [How to collect metrics with a SQL Stored Procedure?][30]
573638

639+
### TiDB-specific troubleshooting
640+
641+
**Missing metrics**: If you see warnings about missing InnoDB or performance_schema metrics when monitoring TiDB:
642+
- This is expected behavior. Set `disable_innodb_metrics: true` and `extra_performance_metrics: false` in your configuration.
643+
644+
**Connection issues**: TiDB typically runs on port 4000 instead of MySQL's default 3306. Make sure to specify the correct port in your configuration.
645+
646+
**High metric collection time**: The `CLUSTER_*` tables in TiDB aggregate data from all TiDB nodes, which can be slow in large clusters. Consider increasing the collection interval if needed.
647+
574648
## Further Reading
575649

576650
Additional helpful documentation, links, and articles:

mysql/changelog.d/20826.changed

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Implement TiDB database monitoring

mysql/datadog_checks/mysql/activity.py

Lines changed: 202 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
import time
88
from contextlib import closing
99
from enum import Enum
10-
from typing import Dict, List # noqa: F401
10+
from typing import Dict, List, Tuple # noqa: F401
1111

1212
import pymysql
1313

@@ -130,6 +130,37 @@
130130
)
131131
"""
132132

133+
# TiDB specific constants
134+
TIDB_ACTIVITY_QUERY_LIMIT = 100
135+
136+
# TiDB specific activity query
137+
TIDB_ACTIVITY_QUERY = """\
138+
SELECT
139+
ID as processlist_id,
140+
USER as processlist_user,
141+
HOST as processlist_host,
142+
DB as processlist_db,
143+
COMMAND as processlist_command,
144+
STATE as processlist_state,
145+
INFO as sql_text,
146+
TIME as query_time,
147+
MEM as memory_usage,
148+
TxnStart as txn_start_time
149+
FROM INFORMATION_SCHEMA.CLUSTER_PROCESSLIST
150+
WHERE
151+
COMMAND != 'Sleep'
152+
AND INFO IS NOT NULL
153+
AND INFO != ''
154+
-- Exclude our own monitoring queries
155+
AND INFO NOT LIKE '%CLUSTER_PROCESSLIST%'
156+
AND INFO NOT LIKE '%datadog-agent%'
157+
-- Exclude other system queries
158+
AND INFO NOT LIKE '%INFORMATION_SCHEMA%'
159+
AND INFO NOT LIKE '%performance_schema%'
160+
ORDER BY TIME DESC
161+
LIMIT {}
162+
""".format(TIDB_ACTIVITY_QUERY_LIMIT)
163+
133164

134165
class MySQLVersion(Enum):
135166
# 8.0
@@ -183,6 +214,12 @@ def run_job(self):
183214
'Waiting for events_waits_current availability to be determined by the check, skipping run.'
184215
)
185216
if self._check.events_wait_current_enabled is False:
217+
# Use TiDB-specific activity collection
218+
if self._check._get_is_tidb(self._db):
219+
self._log.debug("TiDB detected, using TiDB-specific activity collection")
220+
self._collect_tidb_activity()
221+
return
222+
186223
azure_deployment_type = self._config.cloud_metadata.get("azure", {}).get("deployment_type")
187224
if azure_deployment_type != "flexible_server":
188225
self._check.record_warning(
@@ -201,6 +238,170 @@ def run_job(self):
201238
self._check_version()
202239
self._collect_activity()
203240

241+
@tracked_method(agent_check_getter=agent_check_getter)
242+
def _collect_tidb_activity(self):
243+
# type: () -> None
244+
"""Collect activity data from TiDB CLUSTER_PROCESSLIST"""
245+
tags = [t for t in self._tags if not t.startswith('dd.internal')]
246+
247+
with closing(self._get_db_connection().cursor(CommenterDictCursor)) as cursor:
248+
rows = self._get_tidb_activity(cursor)
249+
rows = self._normalize_tidb_rows(rows)
250+
251+
# Group rows by TiDB node instance
252+
rows_by_node = {}
253+
for row in rows:
254+
node_instance = row.get('processlist_host', 'unknown')
255+
if node_instance not in rows_by_node:
256+
rows_by_node[node_instance] = []
257+
rows_by_node[node_instance].append(row)
258+
259+
# Create and send separate events for each TiDB node
260+
for node_instance, node_rows in rows_by_node.items():
261+
event = self._create_tidb_activity_event(node_rows, tags, node_instance)
262+
payload = json.dumps(event, default=self._json_event_encoding)
263+
self._check.database_monitoring_query_activity(payload)
264+
self._check.histogram(
265+
"dd.mysql.activity.collect_activity.payload_size",
266+
len(payload),
267+
tags=tags + ["tidb_node_instance:{}".format(node_instance)] + self._check._get_debug_tags(),
268+
)
269+
270+
@tracked_method(agent_check_getter=agent_check_getter, track_result_length=True)
271+
def _get_tidb_activity(self, cursor):
272+
# type: (pymysql.cursor) -> List[Dict[str]]
273+
"""Execute TiDB activity query"""
274+
self._log.debug("Running TiDB activity query [%s]", TIDB_ACTIVITY_QUERY)
275+
cursor.execute(TIDB_ACTIVITY_QUERY)
276+
return cursor.fetchall()
277+
278+
def _derive_tidb_wait_event(self, state):
279+
# type: (str) -> Tuple[str, str]
280+
"""
281+
Derive wait event and wait event group from TiDB processlist state.
282+
Returns (wait_event, wait_event_group)
283+
"""
284+
return 'N/A', 'N/A'
285+
286+
def _normalize_tidb_rows(self, rows):
287+
# type: (List[Dict[str]]) -> List[Dict[str]]
288+
"""Normalize TiDB activity rows to match expected format"""
289+
normalized_rows = []
290+
estimated_size = 0
291+
292+
for row in rows:
293+
# Generate unique identifiers for TiDB
294+
thread_id = row.get('processlist_id', 0)
295+
296+
# Derive wait event from state
297+
state = row.get('processlist_state', '')
298+
wait_event, wait_event_group = self._derive_tidb_wait_event(state)
299+
300+
# Convert TiDB fields to match MySQL activity format
301+
normalized_row = {
302+
'thread_id': thread_id,
303+
'processlist_id': row.get('processlist_id'),
304+
'processlist_user': row.get('processlist_user'),
305+
'processlist_host': row.get('processlist_host'),
306+
'processlist_db': row.get('processlist_db'),
307+
'processlist_command': row.get('processlist_command'),
308+
'processlist_state': row.get('processlist_state'),
309+
'sql_text': row.get('sql_text'),
310+
'query_time': row.get('query_time', 0),
311+
'memory_usage': row.get('memory_usage', 0),
312+
'txn_start_time': row.get('txn_start_time'),
313+
# Derived wait events
314+
'wait_event': wait_event,
315+
'wait_event_type': wait_event_group,
316+
}
317+
318+
# Add query truncation state
319+
if normalized_row['sql_text'] is not None:
320+
normalized_row['query_truncated'] = get_truncation_state(normalized_row['sql_text']).value
321+
322+
# Obfuscate the query
323+
normalized_row = self._obfuscate_and_sanitize_row(normalized_row)
324+
325+
estimated_size += self._get_estimated_row_size_bytes(normalized_row)
326+
if estimated_size > MySQLActivity.MAX_PAYLOAD_BYTES:
327+
return normalized_rows
328+
329+
normalized_rows.append(normalized_row)
330+
331+
return normalized_rows
332+
333+
def _create_tidb_activity_event(self, active_sessions, tags, node_instance):
334+
# type: (List[Dict[str]], List[str], str) -> Dict[str]
335+
"""Create activity event payload for TiDB"""
336+
# Convert rows to MySQL-compatible activity format
337+
mysql_activity = []
338+
339+
for row in active_sessions:
340+
# Calculate timing information
341+
# Use milliseconds to avoid overflow issues
342+
current_time_ms = int(time.time() * 1000)
343+
query_time_s = row.get('query_time', 0)
344+
query_time_ms = int(query_time_s * 1000) if query_time_s else 0
345+
event_start_ms = max(0, current_time_ms - query_time_ms)
346+
347+
# Generate event IDs based on thread_id and timestamp
348+
event_id = hash(str(row['thread_id']) + str(current_time_ms)) % (2**31) # Keep it positive and reasonable
349+
350+
activity = {
351+
# Essential identifiers
352+
'thread_id': row['thread_id'],
353+
'processlist_id': row['processlist_id'],
354+
'processlist_user': row['processlist_user'],
355+
'processlist_host': row['processlist_host'],
356+
'processlist_db': row['processlist_db'],
357+
'processlist_command': row['processlist_command'],
358+
'processlist_state': row['processlist_state'],
359+
'sql_text': row.get('sql_text'),
360+
'current_schema': row.get('processlist_db'),
361+
'query_signature': row.get('query_signature'),
362+
'dd_commands': row.get('dd_commands', []),
363+
'dd_tables': row.get('dd_tables', []),
364+
'dd_comments': row.get('dd_comments', []),
365+
'query_truncated': row.get('query_truncated'),
366+
# Event identifiers
367+
'event_id': event_id,
368+
'end_event_id': event_id, # Same as event_id for TiDB
369+
# Timing information
370+
'event_timer_start': event_start_ms * 1000000, # Convert to nanoseconds
371+
'event_timer_end': current_time_ms * 1000000, # Convert to nanoseconds
372+
'lock_time': 0, # TiDB doesn't provide lock time in CLUSTER_PROCESSLIST
373+
# Wait event info
374+
'wait_event': row.get('wait_event', 'CPU'),
375+
'wait_timer_start': event_start_ms * 1000000, # Same as event timer
376+
'wait_timer_end': current_time_ms * 1000000,
377+
# Additional MySQL compatibility fields
378+
'object_name': None, # TiDB doesn't track file operations
379+
'object_type': None,
380+
'operation': None,
381+
'source': '',
382+
}
383+
384+
mysql_activity.append(activity)
385+
386+
event = {
387+
"host": self._check.reported_hostname,
388+
"ddagentversion": datadog_agent.get_version(),
389+
"ddsource": "mysql",
390+
"dbm_type": "activity",
391+
"collection_interval": self.collection_interval,
392+
"ddtags": tags,
393+
"timestamp": time.time() * 1000,
394+
"cloud_metadata": self._config.cloud_metadata,
395+
'service': self._config.service,
396+
"mysql_activity": mysql_activity,
397+
}
398+
399+
# For TiDB, add the specific node instance for this activity event
400+
if node_instance:
401+
event['tidb'] = {'node_instance': node_instance}
402+
403+
return event
404+
204405
def _check_version(self):
205406
# type: () -> None
206407
if self._check.version.version_compatible((8,)):

0 commit comments

Comments
 (0)