-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[DOCS-11631] BigQuery Cost Allocation #30722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
2add607
d49ef0c
c70e515
e262f81
2bc33b5
bf9be64
9b33597
e2630b9
8277f73
758e02a
0f8556c
c99f1b9
0237fca
a54a9ab
e48fcd2
8ad33fe
7702178
28665e4
9f14e0b
a179e66
3818a0f
7614992
187f9e8
1dbb799
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
--- | ||
title: Cost Allocation | ||
description: Learn how to allocate cloud costs across your organization with Datadog Cloud Cost Management | ||
further_reading: | ||
- link: "/cloud_cost_management/" | ||
tag: "Documentation" | ||
text: "Learn about Cloud Cost Management" | ||
- link: "/cloud_cost_management/cost_allocation/container_cost_allocation" | ||
tag: "Documentation" | ||
text: "Container Cost Allocation" | ||
- link: "/cloud_cost_management/cost_allocation/bigquery" | ||
tag: "Documentation" | ||
text: "BigQuery Cost Allocation" | ||
--- | ||
|
||
## Overview | ||
|
||
Datadog Cloud Cost Management (CCM) provides comprehensive cost allocation capabilities that help you understand and optimize your cloud spending by breaking down costs across different resources and organizational dimensions. Cost allocation enables you to: | ||
|
||
- **Track resource-level spending**: Allocate costs down to individual containers, pods, tasks, and data warehouse queries | ||
- **Optimize resource utilization**: Identify idle resources and underutilized capacity | ||
- **Chargeback and showback**: Attribute costs to specific teams, projects, or business units | ||
- **Make informed decisions**: Understand the true cost of your applications and services | ||
|
||
## Cost allocation methods | ||
|
||
CCM offers multiple cost allocation methods to help you understand your cloud spending at different levels of granularity: | ||
|
||
### Container cost allocation | ||
|
||
Automatically allocate the costs of your cloud clusters to individual services and workloads running in those clusters. Use cost metrics enriched with tags from pods, nodes, containers, and tasks to visualize container workload cost in the context of your entire cloud bill. | ||
|
||
Learn more about [Container Cost Allocation][1]. | ||
|
||
### BigQuery cost allocation | ||
|
||
Allocate BigQuery costs to individual queries, users, and projects to understand your data warehouse spending at a granular level. Track query performance costs, storage costs, and slot utilization across your organization. | ||
|
||
Learn more about [BigQuery Cost Allocation][2]. | ||
|
||
## Getting started | ||
|
||
To get started with cost allocation: | ||
|
||
1. **Set up Cloud Cost Management** by configuring your cloud provider integration on the [Cloud Cost Setup page][3]. | ||
2. **Enable container monitoring** by installing the Datadog Agent in your containerized environments. | ||
3. **Configure tag extraction** for detailed cost breakdown. | ||
4. **Set up BigQuery integration** for data warehouse insights. | ||
|
||
## Further reading | ||
|
||
{{< partial name="whats-next/whats-next.html" >}} | ||
|
||
[1]: /cloud_cost_management/cost_allocation/container_cost_allocation | ||
[2]: /cloud_cost_management/cost_allocation/bigquery | ||
[3]: https://app.datadoghq.com/cost/setup |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,183 @@ | ||
--- | ||
title: BigQuery Cost Allocation | ||
description: Learn how to allocate Cloud Cost Management spending across your organization with BigQuery Cost Allocation. | ||
further_reading: | ||
- link: "/cloud_cost_management/" | ||
tag: "Documentation" | ||
text: "Learn about Cloud Cost Management" | ||
--- | ||
|
||
## Overview | ||
|
||
Datadog Cloud Cost Management (CCM) automatically allocates the costs of your Google BigQuery resources to individual queries and workloads. Use cost metrics enriched with tags from queries, projects, and reservations to visualize BigQuery workload costs in the context of your entire cloud bill. | ||
|
||
CCM displays costs for resources including query-level analysis, storage, and data transfer on the [**BigQuery dashboard**][1]. | ||
|
||
## BigQuery pricing models | ||
|
||
BigQuery offers multiple pricing components, with CCM focusing on query-related processing costs. | ||
|
||
### Query Processing | ||
|
||
**On-demand queries**: You pay per query based on the amount of data processed. | ||
- Costs are directly attributed to individual queries based on bytes processed | ||
- Includes query-level tags for detailed cost attribution | ||
|
||
**Reservation-based queries**: You purchase dedicated processing capacity (slots) in advance at a fixed cost. Multiple queries can share this reserved capacity, making cost attribution more complex but potentially more cost-effective for consistent workloads. | ||
- Costs of reserved slots are allocated proportionally to queries using those slots | ||
- Allocation based on slot consumption (`total_slot_ms`) per query | ||
- Includes idle cost calculation for unused reservation capacity | ||
|
||
**Other BigQuery Costs:** | ||
- **Storage**: Charges for data stored in BigQuery tables (active and long-term storage) | ||
- **Streaming**: Costs for real-time data ingestion via streaming inserts | ||
- **Data Transfer**: Charges for moving data between regions or exporting data | ||
- **BI Engine**: Costs for in-memory analytics acceleration | ||
- **Other services**: ML training, routine executions, and additional BigQuery features | ||
|
||
CCM allocates and enriches costs for both query-processing pricing models, providing detailed cost attribution and tagging for your BigQuery analysis workloads. Learn more about BigQuery services and pricing models [**here**][3]. | ||
|
||
[**Learn more about optimizing BigQuery performance and costs.**][8] | ||
|
||
## Prerequisites | ||
|
||
The following table presents the list of collected features and the minimal requirements: | ||
|
||
| Feature | Requirements | | ||
|---|---| | ||
| Retrieve tags from labels of a query | GCP CCM costs must be setup. Supported without monitoring or reservations. | | ||
| Query-Level Cost Attribution | BigQuery monitoring enabled | | ||
| Reservation Cost Allocation | BigQuery reservations configured | | ||
|
||
1. Configure the Google Cloud Cost Management integration on the [Cloud Cost Setup page][2]. | ||
2. Enable BigQuery monitoring in your Google Cloud project. | ||
[**Enable BigQuery monitoring here**][4] | ||
3. For reservation cost allocation, configure BigQuery reservations in your project. [**Learn about BigQuery reservations.**][7] | ||
|
||
## Allocating costs | ||
|
||
### Compute | ||
|
||
Costs are allocated into the following spend types: | ||
|
||
| Spend type | Description | | ||
|---|---| | ||
| `allocated_spend_type`: Usage | Cost of query execution based on bytes processed (on-demand) or slot consumption (reservation) | | ||
| `allocated_spend_type`: Cluster_idle | Cost of reserved slots allocated within a project but not utilized by queries| | ||
|
||
### Query-level tag extraction | ||
|
||
When the [Datadog Google BigQuery integration][4] is enabled, CCM extracts the following tags to add to your query costs: | ||
|
||
| Tag | Description | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any other tags we can add? What about the tag that has the raw query id, also maybe mention the tag for the region and project id as well because those are super important for bigquery Those 2 they might know already but customers in general can never find the tags they need, so I dont think it is a bad thing to be a little extra verbose here with the top tags they would care about |
||
|---|---| | ||
| `reservation_id` | The reservation pool that provided compute resources | | ||
| `user_email` | The user or service account that executed the query | | ||
| `dts_config_id` | Identifier for scheduled queries and data transfers | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any way we can help customers understand how to take this value and use it to find their schedule in the bigquery ui? Otherwise this is just a random number that does not mean much to them There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep theres a sectino for this on the dashboard, i'll add the same section here. Thanks for pointing that out |
||
|
||
To identify which BigQuery schedule a `DTS_CONFIG_ID` refers to: | ||
|
||
1. Go to **BigQuery** in the [**GCP Console**][6]. | ||
2. Navigate to **Transfers > Schedules**. | ||
3. Use the **search bar** or **Ctrl+F** to locate the `DTS_CONFIG_ID`. | ||
4. Click the matched entry to view details about the query schedule, including source, frequency, and target dataset. | ||
|
||
Additionally, CCM adds the following tags for cost analysis: | ||
|
||
| Tag | Description | | ||
|---|---| | ||
| `allocated_spend_type` | Categorizes costs as either `usage` (active query execution) or `cluster_idle` (unused reservation capacity) | | ||
| `allocated_resource` | Indicates resource measurement type - `slots` for reservation-based queries or `bytes_processed` for on-demand queries | | ||
| `orchestrator` | Set to `BigQuery` for all BigQuery query-related records | | ||
|
||
The tags below are automatically tagged from the billing data CCM processes and can be especially useful in BigQuery cost analysis: | ||
|
||
| Tag | Description | | ||
|---|---| | ||
| `project_id` | GCP project ID where the BigQuery resource or job is located | | ||
| `google_location` | The specific Google Cloud region or zone where BigQuery resources are deployed (e.g., us-central1, europe-west1, asia-southeast1) | | ||
| `resource_name` | Full Google Cloud resource identifier | | ||
Comment on lines
+97
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are project_id and resource_name directly from bills? But I agree these tags are useful for customers to understand the cost allocation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, they are directly from the bills -> CCM doesn't add them. I will split these into a subsection to clarify 👍 |
||
|
||
### Using BigQuery labels for cost attribution | ||
|
||
BigQuery labels provide a powerful way to add custom metadata to your queries, jobs, datasets, and tables that automatically appear as tags in CCM. This enables highly granular cost attribution across teams, projects, applications, or any custom dimension you define. | ||
|
||
**What are BigQuery labels?** | ||
Labels are key-value pairs that you can attach to BigQuery resources. When you add labels to queries or jobs, they automatically become available as tags in CCM, allowing you to filter and group costs by these custom dimensions. | ||
|
||
**Adding labels to queries:** | ||
You can add labels to BigQuery queries using the `--label` flag with the `bq` command-line tool: | ||
|
||
```bash | ||
bq query --label department:engineering --label environment:production 'SELECT * FROM dataset.table' | ||
``` | ||
|
||
**Adding labels in SQL sessions:** | ||
For queries within a session, you can set labels that apply to all subsequent queries: | ||
|
||
```sql | ||
SET @@query_label = "team:data_science,cost_center:analytics"; | ||
``` | ||
|
||
**Benefits for cost management:** | ||
- **Team attribution**: Tag queries with team names to track departmental BigQuery spending | ||
- **Environment tracking**: Separate development, staging, and production costs | ||
- **Application mapping**: Associate costs with specific applications or services | ||
- **Project categorization**: Group costs by business initiatives or customer projects | ||
|
||
Labels added to BigQuery resources automatically appear as tags in CCM, enabling powerful cost analysis and chargeback capabilities. [**Learn more about adding BigQuery labels**][10]. | ||
|
||
### Query-level allocation | ||
|
||
Cost allocation divides BigQuery costs from GCP into individual queries and workloads associated with them. These divided costs are enriched with tags from queries, projects, and reservations so you can break down costs by any associated dimensions. | ||
|
||
For reservation-based BigQuery costs, CCM allocates costs proportionally based on slot usage. Each query's cost is determined by its share of the total slot usage within the project's reservations. For example, if a query uses 25% of the total consumed slots in a project's reservation during a given period, it will be allocated 25% of that project's total reservation cost for that period. The cost per-query is calculated using the following formula: | ||
|
||
``` | ||
cost_per_query = (query_slot_usage / total_slot_usage) * total_project_reservation_cost | ||
``` | ||
|
||
Where: | ||
- `query_slot_usage`: The number of slot-seconds consumed by an individual query | ||
- `total_slot_usage`: The total slot-seconds used across all queries in the project's reservations | ||
- `total_project_reservation_cost`: The total cost of the reservations in a given project for the time period | ||
|
||
Any difference between the total billed reservation cost and the sum of allocated query costs is categorized as a project's idle cost, representing unused reservation capacity. These costs are tagged with `allocated_spend_type:cluster_idle`, while actual query execution costs (both reservation and on-demand) are tagged with `allocated_spend_type:usage`. | ||
|
||
### Understanding idle costs | ||
|
||
Idle costs represent the portion of reservation capacity that was paid for but not utilized by queries. These costs arise when the reserved slot capacity exceeds actual usage during a billing period. | ||
|
||
**Idle slot sharing considerations**: If your organization has enabled idle slot sharing between reservations, the idle cost calculation may appear different than expected. When queries from one project use idle slots from another project's reservation, those slot costs are attributed as "free" rather than to the consuming project. This means: | ||
|
||
- A project's reservation may show higher idle costs if other projects are using its unused capacity | ||
- The original project pays full reservation costs regardless of cross-project usage | ||
- No automatic cost-transfer: Sharing projects don't pay the reservation owner for consumed idle slots | ||
|
||
[**Learn how to enable idle slot sharing for your reservations.**][5] | ||
|
||
### Storage | ||
|
||
Storage costs are categorized as: | ||
|
||
| Spend type | Description | | ||
|---|---| | ||
| `google_usage_type`: Active Logical Storage | Includes any table or table partition that has been modified in the last 90 days | | ||
| `google_usage_type`: Long Term Logical Storage | Includes any table or table partition that has not been modified for 90 consecutive days. The price of storage for that table automatically drops by approximately 50%. There is no difference in performance, durability, or availability between active and long-term storage | | ||
|
||
[**Learn more about BigQuery storage and best practices.**][9] | ||
|
||
## Further reading | ||
|
||
{{< partial name="whats-next/whats-next.html" >}} | ||
|
||
[1]: /dashboard/ecm-es8-agw/bigquery-allocation | ||
[2]: /cost/setup | ||
[3]: https://cloud.google.com/bigquery/pricing?hl=en | ||
[4]: https://docs.datadoghq.com/integrations/google-cloud-bigquery/ | ||
[5]: https://cloud.google.com/bigquery/docs/reservations-tasks | ||
[6]: https://console.cloud.google.com | ||
[7]: https://cloud.google.com/bigquery/docs/reservations-intro | ||
[8]: https://cloud.google.com/bigquery/docs/best-practices-performance-overview | ||
[9]: https://cloud.google.com/bigquery/docs/best-practices-storage | ||
[10]: https://cloud.google.com/bigquery/docs/adding-labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea where this should go but can you add a section on bigquery labels? This is a feature bigquery provides where you can add a label to your bigquery queries and then the labels show up in CCM
It is super powerful for our users and not many know about it
It is documented here https://cloud.google.com/bigquery/docs/adding-labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a section for labels! This is definitely a useful feature 👍