-
Notifications
You must be signed in to change notification settings - Fork 4
INVALIDATE SESSION METADATA CACHE
The setting spark.sparklinedata.cache.metadata
can be used to turn on Session Level Metadata Caching.
- this feature is available because under certain deployments the time of get table(s) metadata during planning is more
than the time for query execution.
- specifically when the ThriftServer connects to a /remote/ Hive Metastore Server/ in a kerberized environment each call for table metadata incurs a call to the kerberos server in addition to a call the Metastore server.
- even without kerberos the cumulative calls to just the Metastore for a query can be 100s of milliseconds if not seconds.
-
But, the lifecycle management of metadata management is very rudimentary.
- we clear the cache on any DDL command(Olap Index, Star Schema, Model) issued through the SNAP thriftserver
-
anything else needs to be managed by the user
- any table DDLs
- any DDLs issued on other thriftservers/spark contexts that change table metadata
This command is the tool available to the user to the clear any cached metadata and have Sessions reload the current metadata from the metastore.
To issue, run
invalidate session metadata cache
Rather than clearing the whole cache, the user can also use the following command to clear the cache of individual tables. For clearing a cache of given table, use
invalidate session metadata cache <table name>
Every session keeps track of the last time the session metadata is refreshed. Before every metadata call of a session, the invalidation logic works in the following way
-
First the global invalidation flag is checked to see if there are any invalidations triggered after the last session refresh time, and all the session caches are cleared if any invalidations occured.
-
Second, we find out if there are any specific tables invalidated after the session last refresh timestamp and if found, the caches of these specific tables are cleared.
Please note that, the invalidation tracking is global level and hence all these commands invalidate caches in all sessions, not just the issuing session.
The cache invalidation is triggered lazily, hence only when there is a need for metadata of the table, the underlying calls are made to fetch the relevant metadata