Skip to content

Fix regional datasets not showing up in Data Explorer#370

Open
szmikler wants to merge 3 commits intoGoogleCloudDataproc:mainfrom
szmikler:main
Open

Fix regional datasets not showing up in Data Explorer#370
szmikler wants to merge 3 commits intoGoogleCloudDataproc:mainfrom
szmikler:main

Conversation

@szmikler
Copy link
Copy Markdown

Problem
The Dataset Explorer hides zonal datasets (e.g., us-central1, us-east4) when the search field is empty. Only multi-zone datasets are shown. This is caused by using the regional Dataplex Entry Groups API and a strict frontend filter that excludes any dataset not matching the exact bqRegion setting, which usually correspond to a multi-region.

Solution
This PR switches the "empty search" listing to use the Dataplex Search API, matching the behavior already used for active searches.

  • Backend: Updated list_datasets to use the global searchEntries endpoint for private projects, ensuring all datasets are discovered regardless of zone.
  • Frontend: Removed the redundant location filter in bigQueryService.tsx to allow all identified datasets to be displayed.
  • Compatibility: Public dataset listing via the BigQuery API remains unchanged.

Impact
All project datasets are now visible in the explorer, regardless of their specific GCP region or zone.

Testing
Updated and verified backend tests (test_bigquery.py); all 11 cases pass.
Manual testing is currently in progress.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the BigQuery dataset listing functionality to use the Dataplex Search API, enabling cross-region dataset discovery for user-specific projects. The changes include updating the backend to perform POST requests with search payloads and removing obsolete client-side location filtering in the frontend. Review feedback highlights that the location parameter in the list_datasets function is now redundant and should be removed, and provides a suggestion to improve the robustness of the API response parsing to handle potential null values.

Comment thread dataproc_jupyter_plugin/services/bigquery.py
Comment thread dataproc_jupyter_plugin/services/bigquery.py Outdated
@szmikler
Copy link
Copy Markdown
Author

There's one more thing that I discovered. In Jupyter Lab, there's a setting page where you can specify BQ region.
image

When set to US (default in my case), the behavior is as described in the PR. When set to us-central1, no BQ datasets are visible, even the datasets that reside in us-central1 are not visible. The reason for that, I think, is because the filtering happens in two stages: in the backend and in the fronted. I think the setting page only modifies frontend behavior, but this needs further confirmation. If that's the case, backend is still returning only US-multi-region datasets, and frontend is trying to keep only the "us-central1" datasets.

@szmikler szmikler marked this pull request as ready for review April 14, 2026 08:11
@szmikler
Copy link
Copy Markdown
Author

I confirmed the regional datasets show up correctly now after this update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant