Skip to content

Reconsider network auto-update by default #214

@brycedrennan

Description

@brycedrennan

While it's understandable and useful in many situations to want the latest dataset, it can cause issues in some situations:

  • ephemeral environments that will not be able to cache the network calls to disk. I'm thinking things like k8s tasks or other distributed systems. They'll be refetching the list at every invocation.
  • firewalled or no-connection environments. I believe the library works in this case but only after the delay of making a failed http connection

Not sure what a solution would look like but here are some ideas:

  • automate the publishing of the python package on a schedule with an updated tld_set
  • make the default non-autoupdating but allow the self-updating version to be easily used via function argument. Something like use_latest or use_autoupdating
  • add a TTL to the cached version. For example we could set it at 7 days and it would automatically refetch the list if the cached version was older than that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions