Skip to content

I suspect a breaking change related to "include_relationships=True" and schema.load() in version 1.4.1 #664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Curiouspaul1 opened this issue Mar 14, 2025 · 5 comments

Comments

@Curiouspaul1
Copy link

Curiouspaul1 commented Mar 14, 2025

Hi ! I recently had to do some testing on my local for some portion of our code that's been running happily on staging. I didn't specify a version number for my local container instance, as such it pulled 1.4.1 which is the latest. I noticed that it seems to enforce implicitly defined fields (from the include_relationships=True option), during a schema.load() which I THINK isn't the desired or USUAL behavior.

Typically (in previous versions specifically 1.1.0 ) we've been able to deserialize into objects with schemas defined with the include_relationships option set to True - without having to provide values for relationship fields (which intuitively makes sense), however for some reason it's raising a Validation (Missing field) error on 1.4.1 . This behavior wasn't reproducable on 1.1.0 (not that I would know if this is on later versions in: 1.1.0 < version < 1.4.1 because we've not used any other up until my recent local testing which used 1.4.1).

class UserSchema(Base):
    class Meta:
        model = User
        include_relationships=True  # includes for example an 'address' relationship attr (could be something like a 'backref' or explicitly declared field on the sqlalchemy model)
        include_fk = True
        load_instance = True

# usage and expected behavior
schema = UserSchema(unknown='exclude', session=session)
user = schema.load(**data)
# returns <User> object

# Observed behavior
# .. same steps
# throws 'Missing data for required 'address' field - error'
@sloria
Copy link
Member

sloria commented Mar 21, 2025

I wasn't able to reproduce this. added a test to cover this case in #666 and it passes. @Curiouspaul1 can you post a more complete minimal reproducible example?

@Curiouspaul1
Copy link
Author

Curiouspaul1 commented Mar 22, 2025

Hi @sloria That's strange, are you sure you tested with a load() and not a dump() also make sure your model has a field that defines a sqlalchemy relationship (see example below and or above). I can guarantee that this is still happening because I tried again to reproduce the behavior by upgrading the version to 1.4.1 and it does the same thing. I guess additional information could be the version numbers for libraries of interest, namely:

sqlalchemy==2.0.35
flask-sqlalchemy==3.1.1
flask-marshmallow==1.2.1
marshmallow==3.22.0

I had to create a simple test to reproduce behavior outside of the running application, so it's pretty basic steps to reproduce. All i'm doing is using a sqlalchemyautoschema instance to dump a sqlalchemy instance, as described above:

# user model
class User(db.Model):
    # ..  define columns ..
    address = db.relationship('Address', backref='user')

class UserSchema(Base):
    class Meta:
        model = User
        include_relationships=True  # includes for example an 'address' relationship attr (could be something like a 'backref' or explicitly declared field on the sqlalchemy model)
        include_fk = True
        load_instance = True

# usage and expected behavior
schema = UserSchema(unknown='exclude', session=session)
user = schema.load(**data)
# returns <User> object

# Observed behavior
# .. same steps
# throws 'Missing data for required 'address' field - error'

PS: The User nd Address example aren't exactly the same models in our usecase, but its representative of what exactly is happening. Here's a snippet of the error i'm getting:

raise DataValidationError(message=message, errors=exc.messages, data=data)
          │                           │               │   │              └ {'quantity': 1.0, 'product_id': 2631}
          │                           │               │   └ {'location': ['Missing data for required field.']}
          │                           │               └ ValidationError({'location': ['Missing data for required field.']})
          │                           └ 'location: Missing data for required field'
          └ <class 'app.utils.exceptions.DataValidationError'>

So as you can see, the location field (which is defined as a relationship between the model i'm trying to load and a Locations table) exactly how i defined it in my example above for User and Address - is throwing a missing field error, which isn't the expected behavior. I shouldn't have to specify the value for a relationship field in version 1.4.1 but it works in 1.1.0.

Hopefully this helps.

@Curiouspaul1
Copy link
Author

@sloria any update on this?

@sloria
Copy link
Member

sloria commented Mar 30, 2025

sorry--haven't had much time to work on marshmallow these days. would definitely appreciate help on this, even if it's an incomplete PR.

@n8harmon
Copy link

n8harmon commented Apr 4, 2025

I am currently working on updating app dependencies and came across this issue. After some digging into my problem, which may not be exactly the same, I think this PR could be the breaking change.

In my use case, we had defined a Nested relationship on the schema, which seemed to trigger the error.

Prior to that commit, with models and schemas setup like:

class User(db.Model):
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    friends = relationship(UserFriend, back_populates='user')


class UserFriend(db.Model):
    user_id = ForeignKey(User.id, nullable=False)
    name = Column(String, nullable=False)
    user = relationship(User, back_populates='friends')


class UserFriendSchema(SQLAlchemyAutoSchema):
    class Meta:
        model = UserFriend
        include_relationships = True 
        load_instance = True


class UserSchema(SQLAlchemyAutoSchema):
    class Meta:
        model = User
        include_relationships = True 
        load_instance = True

    friends = Nested(UserFriendSchema, many=True)

This was possible:

user_data = {'name': 'Bob', 'friends': ['name': 'Joe', 'name': 'Tom']}
new_user, errors = UserSchema.load(user_data, instance=None)
assert not errors
db.session.add(new_user)
db.session.commit()
assert new_user.name == 'Bob'
assert sorted([f.name for f in new_user.friends]) == ['Joe', 'Tom']

After that commit, this is the result:

user_data = {'name': 'Bob', 'friends': ['name': 'Joe', 'name': 'Tom']}
new_user, errors = UserSchema.load(user_data, instance=None)

# errors == {
#   'friends': {
#       0: {'user': ['Missing data for required field.']}, 
#       1: {'user': ['Missing data for required field.']},
# }

Since this is an add operation for a User with nested relationship data related to the object being added via load, we don't have access to the User.id to include in the user_data.

I did notice that by removing the defined Nested relationship, the friends relationship gets handled automatically as a Related field, which does seem to handle the relationship data correctly (i.e. the user object gets added to the db along with the related objects). However, the following warning is raised:

/marshmallow_sqlalchemy/fields.py:136: SAWarning: fully NULL primary key identity cannot load any object

which I have determined is caused by the fact that on line 133 of fields.py lookup_values is: [None]. The below change seems to be a valid solution because _deserialize returns self.related_model(**value) (line 113 of fields.py) as a result of raising NoResultFound:

    def _get_existing_instance(self, related_model, value):
        ...
        else:
            # Use a faster path if the related key is the primary key.
            lookup_values = [value.get(prop.key) for prop in self.related_keys]
            # When lookup_values == [None], there is no primary_key data, so no need to call session.get
            # Otherwise, SAWarning: fully NULL primary key identity cannot load any object will be raised
            if len(lookup_values) == 1 and lookup_values[0] is None:
                raise NoResultFound
            try:
                result = self.session.get(related_model, lookup_values)
            except TypeError as error:
                keys = [prop.key for prop in self.related_keys]
                raise self.make_error("invalid", value=value, keys=keys) from error
            if result is None:
                raise NoResultFound
        return result

Unfortunately, in my case, right now I think I need a Nested relationship instead of a Related field, so the above solution doesn't appear fix my use case. I'm by no means a marshmallow expert, so I'm not sure the best overall solution to these problems is. But I thought I would provide this info in case it helps with further troubleshooting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants