|
| 1 | +--- |
| 2 | +title: "Data Observability: Datasets" |
| 3 | +further_reading: |
| 4 | + - link: '/data_observability' |
| 5 | + tag: 'Documentation' |
| 6 | + text: 'Data Observability' |
| 7 | + - link: '/data_jobs' |
| 8 | + tag: 'Documentation' |
| 9 | + text: 'Data Jobs Monitoring' |
| 10 | + - link: '/data_streams' |
| 11 | + tag: 'Documentation' |
| 12 | + text: 'Data Streams Monitoring' |
| 13 | + - link: '/database_monitoring' |
| 14 | + tag: 'Documentation' |
| 15 | + text: 'Database Monitoring' |
| 16 | +--- |
| 17 | + |
| 18 | +<div class="alert alert-info">Data Observability is in Preview.</div> |
| 19 | + |
| 20 | +{{< img src="data_observability/data_quality_tables.png" alt="Datasets page showing a list of tables with columns for query count, storage size, row count, and last data update; two tables are flagged with triggered alerts" style="width:100%;" >}} |
| 21 | + |
| 22 | +Data Observability for Datasets detects issues such as data freshness delays, unusual data patterns, and changes in column-level metrics before they affect dashboards, machine learning models, or other downstream systems. It alerts you to potential problems and provides the context to trace them back to upstream jobs or sources. |
| 23 | + |
| 24 | +## Key capabilities |
| 25 | + |
| 26 | +With Data Observability, you can: |
| 27 | + |
| 28 | +- Detect delayed updates and unexpected row count behavior in your tables |
| 29 | +- Surface changes in column-level metrics such as null counts or uniqueness |
| 30 | +- Set up monitors using static thresholds or historical baselines |
| 31 | +- Trace quality issues using lineage views that show upstream jobs and downstream impact |
| 32 | + |
| 33 | +## Supported data sources |
| 34 | + |
| 35 | +Data Observability supports the following data sources: |
| 36 | + |
| 37 | +- Snowflake |
| 38 | +- BigQuery |
| 39 | + |
| 40 | +## Setup |
| 41 | + |
| 42 | +{{< tabs >}} |
| 43 | +{{% tab "Snowflake" %}} |
| 44 | + |
| 45 | +To monitor Snowflake data in Datadog, you must configure both your Snowflake account and the Snowflake integration in Datadog. Before you begin, make sure that: |
| 46 | + |
| 47 | +- You have access to the `ACCOUNTADMIN` role in Snowflake. |
| 48 | +- You have generated an RSA key pair. For more information, see the [Snowflake key-pair authentication docs][1]. |
| 49 | + |
| 50 | +After you confirm the prerequisites above, complete the following setup steps in Snowflake: |
| 51 | + |
| 52 | +1. Define the following variables: |
| 53 | + ```sql |
| 54 | + SET role_name = 'DATADOG_ROLE'; |
| 55 | + SET user_name = 'DATADOG_USER'; |
| 56 | + SET warehouse_name = 'DATADOG_WH'; |
| 57 | + SET database_name = '<YOUR_DATABASE>'; |
| 58 | + |
| 59 | + ``` |
| 60 | +1. Create a role, warehouse, and key-pair-authenticated user. |
| 61 | + |
| 62 | + ```sql |
| 63 | + USE ROLE ACCOUNTADMIN; |
| 64 | +
|
| 65 | + -- Create monitoring role |
| 66 | + CREATE ROLE IF NOT EXISTS IDENTIFIER($role_name); |
| 67 | + GRANT ROLE IDENTIFIER($role_name) TO ROLE SYSADMIN; |
| 68 | +
|
| 69 | + -- Create an X-SMALL warehouse (auto-suspend after 30s) |
| 70 | + CREATE WAREHOUSE IF NOT EXISTS IDENTIFIER($warehouse_name) |
| 71 | + WAREHOUSE_SIZE = XSMALL |
| 72 | + WAREHOUSE_TYPE = STANDARD |
| 73 | + AUTO_SUSPEND = 30 |
| 74 | + AUTO_RESUME = TRUE |
| 75 | + INITIALLY_SUSPENDED = TRUE; |
| 76 | +
|
| 77 | + -- Create Datadog user—key-pair only (no password) |
| 78 | + -- Replace <PUBLIC_KEY> with your RSA public key (PEM, no headers/newlines) |
| 79 | + CREATE USER IF NOT EXISTS IDENTIFIER($user_name) |
| 80 | + LOGIN_NAME = $user_name |
| 81 | + DEFAULT_ROLE = $role_name |
| 82 | + DEFAULT_WAREHOUSE = $warehouse_name |
| 83 | + RSA_PUBLIC_KEY = '<PUBLIC_KEY>'; |
| 84 | +
|
| 85 | + GRANT ROLE IDENTIFIER($role_name) TO USER IDENTIFIER($user_name); |
| 86 | + ``` |
| 87 | +1. Grant monitoring privileges to the role. |
| 88 | + |
| 89 | + ```sql |
| 90 | + -- Warehouse usage |
| 91 | + GRANT USAGE ON WAREHOUSE IDENTIFIER($warehouse_name) TO ROLE IDENTIFIER($role_name); |
| 92 | +
|
| 93 | + -- Account‐level monitoring (tasks, pipes, query history) |
| 94 | + GRANT MONITOR EXECUTION ON ACCOUNT TO ROLE IDENTIFIER($role_name); |
| 95 | +
|
| 96 | + -- Imported privileges on Snowflake's ACCOUNT_USAGE |
| 97 | + GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE IDENTIFIER($role_name); |
| 98 | +
|
| 99 | + -- Imported privileges on any external data shares |
| 100 | + -- GRANT IMPORTED PRIVILEGES ON DATABASE IDENTIFIER($database_name) TO ROLE IDENTIFIER($role_name); |
| 101 | +
|
| 102 | + -- Grant the following ACCOUNT_USAGE views to the new role. Do this if you wish to collect Snowflake account usage logs and metrics. |
| 103 | + GRANT DATABASE ROLE SNOWFLAKE.OBJECT_VIEWER TO ROLE IDENTIFIER($role_name); |
| 104 | + GRANT DATABASE ROLE SNOWFLAKE.USAGE_VIEWER TO ROLE IDENTIFIER($role_name); |
| 105 | + GRANT DATABASE ROLE SNOWFLAKE.GOVERNANCE_VIEWER TO ROLE IDENTIFIER($role_name); |
| 106 | + GRANT DATABASE ROLE SNOWFLAKE.SECURITY_VIEWER TO ROLE IDENTIFIER($role_name); |
| 107 | +
|
| 108 | + -- Grant ORGANIZATION_USAGE_VIEWER to the new role. Do this if you wish to collect Snowflake organization usage metrics. |
| 109 | + GRANT DATABASE ROLE SNOWFLAKE.ORGANIZATION_USAGE_VIEWER TO ROLE IDENTIFIER($role_name); |
| 110 | +
|
| 111 | + -- Grant ORGANIZATION_BILLING_VIEWER to the new role. Do this if you wish to collect Snowflake cost data. |
| 112 | + GRANT DATABASE ROLE SNOWFLAKE.ORGANIZATION_BILLING_VIEWER TO ROLE IDENTIFIER($role_name); |
| 113 | + ``` |
| 114 | + |
| 115 | +1. Grant read-only access to your data. |
| 116 | + |
| 117 | + ```sql |
| 118 | + USE DATABASE IDENTIFIER($database_name); |
| 119 | +
|
| 120 | + CREATE OR REPLACE PROCEDURE grantFutureAccess(databaseName string, roleName string) |
| 121 | + returns string not null |
| 122 | + language javascript |
| 123 | + as |
| 124 | + $$ |
| 125 | + var schemaResultSet = snowflake.execute({ sqlText: 'SELECT SCHEMA_NAME FROM ' + '"' + DATABASENAME + '"' + ".INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME != 'INFORMATION_SCHEMA';"}); |
| 126 | + |
| 127 | + var numberOfSchemasGranted = 0; |
| 128 | + while (schemaResultSet.next()) { |
| 129 | + numberOfSchemasGranted += 1; |
| 130 | + var schemaAndRoleSuffix = ' in schema "' + DATABASENAME + '"."' + |
| 131 | + schemaResultSet.getColumnValue('SCHEMA_NAME') + '" to role ' + ROLENAME + ';' |
| 132 | +
|
| 133 | + snowflake.execute({ sqlText: 'grant USAGE on schema "' + DATABASENAME + '"."' + |
| 134 | + schemaResultSet.getColumnValue('SCHEMA_NAME') + '" to role ' + ROLENAME + ';'}); |
| 135 | + snowflake.execute({ sqlText: 'grant SELECT on all tables' + schemaAndRoleSuffix}); |
| 136 | + snowflake.execute({ sqlText: 'grant SELECT on all views' + schemaAndRoleSuffix}); |
| 137 | + snowflake.execute({ sqlText: 'grant SELECT on all event tables' + schemaAndRoleSuffix}); |
| 138 | + snowflake.execute({ sqlText: 'grant SELECT on all external tables' + schemaAndRoleSuffix}); |
| 139 | + snowflake.execute({ sqlText: 'grant SELECT on all dynamic tables' + schemaAndRoleSuffix}); |
| 140 | + snowflake.execute({ sqlText: 'grant SELECT on future tables' + schemaAndRoleSuffix}); |
| 141 | + snowflake.execute({ sqlText: 'grant SELECT on future views' + schemaAndRoleSuffix}); |
| 142 | + snowflake.execute({ sqlText: 'grant SELECT on future event tables' + schemaAndRoleSuffix}); |
| 143 | + snowflake.execute({ sqlText: 'grant SELECT on future external tables' + schemaAndRoleSuffix}); |
| 144 | + snowflake.execute({ sqlText: 'grant SELECT on future dynamic tables' + schemaAndRoleSuffix}); |
| 145 | + } |
| 146 | + |
| 147 | + return 'Granted access to ' + numberOfSchemasGranted + ' schemas'; |
| 148 | + $$ |
| 149 | + ; |
| 150 | +
|
| 151 | + GRANT USAGE ON DATABASE IDENTIFIER($database_name) TO ROLE IDENTIFIER($role_name); |
| 152 | + CALL grantFutureAccess('<DATABASE_NAME>', '<ROLE_NAME>'); |
| 153 | + ``` |
| 154 | + |
| 155 | +1. (Optional) If your organization uses [Snowflake event tables][2], you can grant the Datadog role access to them. |
| 156 | + |
| 157 | + ```sql |
| 158 | + -- Grant usage on the database, schema, and table of the event table |
| 159 | + GRANT USAGE ON DATABASE <EVENT_TABLE_DATABASE> TO ROLE IDENTIFIER($role_name); |
| 160 | + GRANT USAGE ON SCHEMA <EVENT_TABLE_DATABASE>.<EVENT_TABLE_SCHEMA> TO ROLE IDENTIFIER($role_name); |
| 161 | + GRANT SELECT ON TABLE <EVENT_TABLE_DATABASE>.<EVENT_TABLE_SCHEMA>.<EVENT_TABLE_NAME> TO ROLE IDENTIFIER($role_name); |
| 162 | +
|
| 163 | + -- Snowflake-provided application roles for event logs |
| 164 | + GRANT APPLICATION ROLE SNOWFLAKE.EVENTS_VIEWER TO ROLE IDENTIFIER($role_name); |
| 165 | + GRANT APPLICATION ROLE SNOWFLAKE.EVENTS_ADMIN TO ROLE IDENTIFIER($role_name); |
| 166 | +
|
| 167 | + ``` |
| 168 | + |
| 169 | +After completing the Snowflake setup, configure the Snowflake integration in Datadog. |
| 170 | + |
| 171 | +1. On the [Snowflake integration tile][3], click **Add Snowflake account**. |
| 172 | +1. Enter your Snowflake account URL. |
| 173 | +1. Under **Logs**, turn on: |
| 174 | + - **Query History Logs** |
| 175 | + - **Enable Query Logs with Access History** |
| 176 | +1. Under **Data Observability**, turn on: |
| 177 | + - **Enable Data Observability for Snowflake tables** |
| 178 | +1. Set the **User Name** to `DATADOG_USER`. |
| 179 | +1. Under **Configure a key pair authentication**, upload your unencrypted RSA private key. |
| 180 | +1. Click **Save**. |
| 181 | + |
| 182 | +[1]: https://docs.snowflake.com/en/user-guide/key-pair-auth#generate-the-private-key |
| 183 | +[2]: https://docs.snowflake.com/en/developer-guide/logging-tracing/event-table-setting-up |
| 184 | +[3]: https://app.datadoghq.com/integrations?search=snowflake&integrationId=snowflake-web |
| 185 | + |
| 186 | +{{% /tab %}} |
| 187 | +{{% tab "BigQuery" %}} |
| 188 | + |
| 189 | +To monitor BigQuery data in Datadog, you must configure permissions in your Google Cloud project and enable the relevant features in the Datadog integration. For detailed instructions, see the [Expanded BigQuery monitoring][1] section of the Datadog Google Cloud Platform documentation. |
| 190 | + |
| 191 | +[1]: /integrations/google_cloud_platform/?tab=dataflowmethodrecommended#expanded-bigquery-monitoring |
| 192 | + |
| 193 | +{{% /tab %}} |
| 194 | +{{< /tabs >}} |
| 195 | + |
| 196 | +## Further reading |
| 197 | + |
| 198 | +{{< partial name="whats-next/whats-next.html" >}} |
0 commit comments