Skip to content

Latest commit

 

History

History
241 lines (169 loc) · 14.4 KB

File metadata and controls

241 lines (169 loc) · 14.4 KB

Contribute to ISF

Welcome to the ISF contributor guide!

Contents:

Code of Conduct

We are committed to providing a friendly, safe, and welcoming environment for all. Please read and adhere to our Code of Conduct.

Who can contribute?

Anyone who uses ISF and has ideas on how to improve or extend the functionality is welcome to contribute!

How to contribute

What to consider

Before you start implementing a feature or fixing a bug, please open an issue first. It is not unlikely that some functionality may already be available, or at least possible with ISF. At the same time, not every possibility needs to be part of ISF's source code. These considerations can be discussed in the issue trackers.

In the issues tracker, you can also count on the expertise and advice of the developers who have been using and develping ISF for some time now. That being said, we are also welcome new ideas, and are very excited to hear how you are using ISF!

Setting up the development environment

ISF uses pixi for managing environments. The default environment includes therun dependencies: everything you need to run ISF. These are often sufficient to implement new ideas. However, if you require additional dependenices, you can simply pixi add xyz. Before adding new dependencies to ISF, please consider the following:

  • Dependencies are not always maintained forever. In the past, we have had to undeprecate pandas-msgpack ourselves, and drop sumatra. Ideally, we should not take on the maintenance burden of a dependency that is not actively maintained.
  • Maintenance costs tend to scale exponentially with additional dependencies
  • Can your dependency be reasonably omitted in favor of the standard library, or other core packages such as numpy or scipy?

ISF is only useful if it is stable. ISF is only stable if the dependencies, API and reproducibility do not significantly change. Be mindful when adding dependencies

Coding Standards and Guidelines

Please follow these coding standards and guidelines:

  • Code Style: Preferably follow PEP 8 for Python code style. Autoformatting tools like black can help.
  • Naming Conventions: Use descriptive names for variables, functions, and classes.
  • Documentation: Write docstrings for all public functions, classes, and methods.
  • Comments: Use comments to explain complex logic and important decisions.
  • Formatting: We prefer the black coding style.

Testing

To test if a new feature works as advertised, we strongly recommend you write tests for it. This may not be the flashiest aspect of coding, but untested code can lead to infinitely more hassle and lost time compared to whatever time it takes to write the tests.

While this is not explicit in ISF, you can categorize a test in one of three areas:

  • Unit tests: Isolated tests that checks if a single object operates as expected under various conditions. Example: what happens if I pass wrong types, negative values ... to a single function?
  • Integration tets: Broader-scoped tests that check if a piece of software integrates well with existing code. Example: does my new pipeline work well with ISF's existing data_base functionality?
  • End-to-end (E2E) tests: Does my pipeline still operate as expected from start to finish? Example: if I adapt the file format of the cell parameters, can I still run simrun.run_new_simulations()?

Running test requires additional dependencies, such as pytest. These are defined in our test environment. To run the test suite in the test environment, we preconfigured the following command:

pixi run test

To run any other command within the test environment, you can simply prefix the command with:

pixi run -e test my_command

Commits

Please keep commits single-purpose with descriptive commit messages. Avoid adding all changes in a single monolithic commit. Write clear and descriptive commit messages. Follow these guidelines:

  • Title: A short summary of the changes (50 characters or less).
  • Body: A detailed description of the changes, if necessary. Explain the "why" and "how" of the changes.

Pull Request Process

To submit a pull request:

  1. Fork the repository
  2. Create a branch for your contribution from your version of develop.
  3. Commit your changes with clear commit messages. Do not commit to master or develop directly.
    • If your changes are related to a specific issue, include the issue number in the commit message (e.g., Fixes #123).
    • Use git commit --amend to update the last commit message if needed.
  4. Push your branch to your forked repository.
  5. Open a pull request against the develop branch of the main repository.
    • Ensure your pull request is based on the latest develop branch.
    • Provide a clear title and description for your pull request.
    • If your changes are related to an issue, link to the issue in the pull request description.

Issue Tracking

We use GitHub Issues to track bugs and feature requests. When reporting an issue, please include:

  • A clear and descriptive title.
  • A detailed description of the problem or request.
  • Steps to reproduce the issue, if applicable.
  • Any relevant logs, screenshots, or code snippets.

Label your issue appropriately (e.g., bug, enhancement, question).

Documentation

Our documentation is generated using Sphinx, together with the autoapi extension. Invoking pixi run build_docs triggers the Sphinx build process. This is simply a convenience alias for make html. If you are adapting the documentation configuration, you may need to delete the docs/_build and docs/tutorials directories before rebuilding.

Documentation format

We use the Google-style documentation format. The Google documentation guidelines can be broadly summarized by just a handful of rules:

  • The first line is a short description. This will appear in summary tables in the documentation.
  • Attribute blocks, argument blocks and other blocks are indented. Note that indentation also impacts rst (see example below)
  • Arguments and attributes are listed with their type in brackets: attr_arg (type): descr

In addition to these guidelines, ISF adds one additional rule for docstrings:

  • Class docstrings that end with a list of their attributes are equivalent to the PEP-257 convention. In PEP-257, each class attribute definition is followed by a one-line docstring. Howver, ISF uses Jinja templating to parse out attribute documentation from an Attribute block in the class definition, rather than the PEP-257 convention. You are of course allowed to follow PEP-257 convention. Just don't mix the two within a single class.

Classes tend to be the most documentation work, so we've opted to give you an example class to showcase what you can do with documentation. rst directives in the docstring are allowed. most HTML themes also support example blocks, attention blocks, "see also" blocks etc.

class DocumentMePlease():
  """A short sentence.
  
  This class serves as an example. It shows how a docstring should look like.

  You can use example blocks like below. Mind the fact that code blocks in rst are defined by their indentation and a preceding double colon.

  Example::

      >>> code(
      ... arg=1
      ... )
      output
      >>> more_code()

  Example:

      Another example, just in text.

  See also:
      A link to :class:`~package.subpackage.module.AnotherClass`

  Attributes:
      test_attribute (bool): a test attribute
      esc (bool): Another test attribute
      attribute_not_arg (int): An attribute that is not in the init docstring.
  """
  # The above "Attributes" block will be parsed by Jinja, and the attributes will be documented in the class documentation.
  # This should not be mixed with PEP-257 (see below). Choose one convention.
  def __init__(self, test_arg, escape_me_):
    """
    Args:
        test_arg (bool): A test argument.
        escape_me\_ (bool): An argument ending in an underscore, which should b escaped.
    """
    self.test_attribute = test
    self.esc = escape_me_
    """esc (bool): This is a PEP-257 docstring example. Because I'm now mixing conventions, this will appear twice"""
    # The above should not be mixed with the "Attributes" block in the class docstring.
    self.attribute_not_arg = self.test_attribute + self.esc

Common mistakes

Here are some common things to look out for when writing documentation:

  • Indentation: rst is strict on indentation. It is also conventionally indented with $3$ spaces rather than $4$. In most cases, rst works fine with $4$ as well (as long as you're consistent), but if you're e.g. adapting the Jinja templates, you will need to stick with the convention.
  • Newlines: Sphinx relies heavily on neewlines to recognize blocks. For example, al lists (numbered or bullet) must start and end with a newline. This can lead to some weird-looking docstrings if you want e.g. a bullet list inside of an attribute block. But it works.
  • Forgetting the module-level docstring: please don't forget it! It's the most easily overlooked, but stands out the most in the final HTML page, as it yields a near-blank page.

Sphinx

A comprehensive overview of how Sphinx reads in source files (i.e. our Python code) and builds documentation is given at https://www.sphinx-doc.org/en/master/extdev/event_callbacks.html However, we summarize some key concepts below.

  • All configuration (with the exception of templates) is defined in docs/conf.py.
  • Our documentation uses custom templates, provided in docs/_templates/autoapi/python
  • When Sphinx writes out documentation, it first writes our "stub pages" that contain the overall structure of the documentation page. These stub pages are reStructuredText (.rst) files with structural information of the page, but (generally) no explicit content yet. Only afterwards does it generate HTML (or PDF if you want) from these stub pages.
  • Sphinx and reStructuredText heavily rely on directives to make documentation. While there are many extensions with custom directives, this project relies on just a handful of core built-in directives: .. py:obj:: and .. toctree::.
  • The look and ufnctionality of the documentation website can still heavily depend on which HTML theme you are using. Often, HTML themes offer very functional extensions beyond what one would consider "just a theme". For example, our current HTML theme (last checked 27/02/2025) is immaterial, which ships with a reflinking extension for Graphviz.
  • For debugging purposes, you can inspect what the stub files look like in docs/autoapi (only if autoapi_keep_files = True in conf.py)

Database

The database system in ISF is modular, meaning that new file formats or entirely new database systems can be implemented if needed.

Implementing new file formats is easy:

  1. Identify the current database system (should be data_base.isf_data_base, last updated of 24/04/2025)
  2. Add a new module to data_base.<db_backend>.IO.LoaderDumper containing: a. A writer function under the name dump() b. A reader class under the name Loader

That's it! You can now use this data format to save data:

db.set("key", my_data, dumper=dumper_module_name)
db["key"]  # returns my_data

The database will automatically use your dump() and Loader to save and read your data.

Database modularity

This codebase has been a little over 15 years in the making (depending on who you ask). Inevitably, data formats have come and passed. To balance long term support with cutting-edge file formats, we must be modular in our database system.

What do we meean when we say a "modular data base system"? In the past, we have used model_data_base instead of isf_data_base. model_data_base differed from the current database system in the following aspects:

  • It used the .pickle format for saving metadata, and LoaderDumper information. This introduced the issue that nothing could be refactored, moved or renamed in the source code of the database, or the pickled Loader object would not work anymore.
  • It used SQLite to save and fetch metadata. This then required filelocking to prevent concurrent writes to the metadata file.

Both issues introduced significant overhead in usage and maintenance. However, simply overhauling the way it worked would invalidate all old data. As we didn't want to convert exabytes of data when we could very much simply still read it in as is, but also wanted to avoid these issues in the future, we opted for the current "modular" approach, where we can use both isf_data_base, and model_data_base (if necessary), and even extend it to some mysterious third future option (I will shed a tear if we need to, but we could if we must).

This modular approach is possible because we have one wrapper data_base package, a DataBase wrapper class, and an IO wrapper subpackage. Give the wrapper class a path to a database, it will infer which database system was used, and give you the correct source code to read, inspect, and write data. Import IO and it will infer which backend to use (i.e. does it need to use JSON or .pickle for metadata).

Throughout ISF, all other packages simply rely on the wrapper data_base, and do not know which database system will actually take care of saving their precious data. This agnosticism is achieved by dynamically importing the correct IO subpackages at runtime, i.e. as soon as data_base.IO is imported.

You may have noticed that we do not recommend changing the database system. It is tedious, and introduces avoidable technical debt. Generally, 99% of all flexibility you could ever want can be achieved by implementing new LoaderDumper modules in the current database system.