Skip to content

psycopg2 vs psycopg #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
espg opened this issue Oct 9, 2024 · 0 comments
Open

psycopg2 vs psycopg #112

espg opened this issue Oct 9, 2024 · 0 comments

Comments

@espg
Copy link
Collaborator

espg commented Oct 9, 2024

Do we want version 2 or 3 moving forward?

Last week we removed pygresql and replaced it with psycopg2 inside of pgamit. Should we be using version 3 instead of 2?

What's the difference between 2 and 3?

psycopg2 is what we have now, and it works; however, it is no longer being actively developed, and the developers of psycopg 2/3 recommend that new projects use version 3. Note that version 2 is called psycopg2, while version 3 is just called psycopg with no version number.

The main advantage of psycopg is that it supports, from version 3.2 onwards, numpy datatypes (see also this PR). This means that this sort of decimal to float conversion isn't needed using current version. It is also significantly faster, which may impact production run times.

The main advantage of psycopg2 is that it is what we are using now, so there's no code adaptation needed... however, since we just switched from pygresql, now might be an appropriate time to switch, since the debugging from that transition is likely fresh in @demiangomez 's mind.

What would need to happen to switch from psycopg2 to psycopg?

I looked at both versions while setting up our testing framework and debugging the linux runner installation-- these are the main changes I saw with respect to the the API between the versions:

# psycopg2 version

import psycopg2
import psycopg2.extras
import psycopg2.extensions
...
        # register an adapter to convert decimal to float
        # see: https://www.psycopg.org/docs/faq.html#faq-float
        DEC2FLOAT = psycopg2.extensions.new_type(
            psycopg2.extensions.DECIMAL.values,
            'DEC2FLOAT',
            lambda value, curs: float(value) if value is not None else None)

        # Define the custom type for an array of decimals
        DECIMAL_ARRAY_TYPE = psycopg2.extensions.new_type(
            (psycopg2.extensions.DECIMAL.values,),  # This matches the type codes for DECIMAL
            'DECIMAL_ARRAY',  # Name of the type
            lambda value, curs: [float(d) for d in value] if value is not None else None
        )

        psycopg2.extensions.register_type(DEC2FLOAT)
        psycopg2.extensions.register_type(DECIMAL_ARRAY_TYPE)

        # open connection to server
        err = None
        for i in range(3):
            try:
                self.cnn = psycopg2.connect(host=options['hostname'], user=options['username'],
                                            password=options['password'], dbname=options['database'])

                self.cnn.autocommit = True
                self.cursor = self.cnn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)

(Note the decimal and float conversion code above)

import psycopg
from psycopg.rows import dict_row
...
        # open connection to server
        err = None
        for i in range(3):
            try:
                self.cnn = psycopg.connect(host=options['hostname'],
                                            user=options['username'],
                                            password=options['password'],
                                            dbname=options['database'],
                                            row_factory=dict_row)

                self.cnn.autocommit = True
                self.cursor = self.cnn.cursor()

The above adaptation follows from here , but would need some debugging and attention on other items mentioned in the reference link.

Presumably, since numpy types are supported natively in psycopg, the extension code can simply be dropped, but that might also require rerunning the data tables for previous processing runs to ensure uniformity of numpy types.

Is there any impact to installation switching from psycopg2 to psycopg?

Either version will work fine for the build, and I've tested both in terms of being installable and importable to python. Both psycopg2 and psycopg are conda/mamba builds in our setup because they compile C code against the postgres header and C files to produce binary code.

There are 'pure python' modules of both versions that are pip installable, however, these builds are not recommended for production or development because of performance degradation; they also have a libpg5 dependency as well, to further complicate installation (see additional notes here).

There are binary builds specifically for use with pip using pip install psycopg2-binary and pip install "psycopg[binary]" respectively. These are preferred to the pure python installs, but are produced on a best effort basis and may not be available for all platforms or version numbers. Depending on which version we use moving forward, we can specify the binary version in our pyproject.toml file for pip installs when we do package releases for pgamit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant