Skip to content

Backward compatibility support for Avro schema when deserialize data #275

Open
@chioai1309

Description

@chioai1309

My project currently using jackson-dataformat-avro (version 2.12.2) to convert the Java POJO and store it. Just facing problem is that when the schema is evolve then the old data stored cannot be deserialize back with the following exception:

com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input in FIELD_NAME
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:659)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:636)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._nextByteGuaranteed2(JacksonAvroParserImpl.java:1038)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._nextByteGuaranteed(JacksonAvroParserImpl.java:1033)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._decodeIntSlow(JacksonAvroParserImpl.java:265)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl.decodeInt(JacksonAvroParserImpl.java:234)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl.decodeIndex(JacksonAvroParserImpl.java:988)
	at com.fasterxml.jackson.dataformat.avro.deser.ScalarDecoder$ScalarUnionDecoder$FR.readValue(ScalarDecoder.java:412)
	at com.fasterxml.jackson.dataformat.avro.deser.RecordReader$Std.nextToken(RecordReader.java:142)
	at com.fasterxml.jackson.dataformat.avro.deser.AvroParserImpl.nextToken(AvroParserImpl.java:97)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:156)
	at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2079)
	at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1453)

Reason is that RecordReader$Std try to resolve the token for the new added field while the data reader reach till the end of the stored message.

Given at the first version I have this POJO and corresponding Avro schema generated for it:

@Getter
@Setter
@Document(StoreEntity.COLLECTION_NAME)
public class StoreEntity extends AuditableEntity {

  public static final String COLLECTION_NAME = "StoreEntity";

  @Id
  private String id;
  @Field
  @Length(max = 150)
  @NotBlank
  private String name;
  @Field
  @Indexed(unique = true)
  @Length(max = 50)
  @NotBlank
  private String code;
  @Field
  @CountryCode
  private String countryCode;
}
===================================
{
   "type":"record",
   "name":"StoredEntity",
   "namespace":"com.mydomain.entity",
   "fields":[
      { "name":"code", "type":["null","string"] },
      { "name":"countryCode", "type":["null","string"] },
      { "name":"createdBy", "type":["null","string"] },
      { "name":"createdDate", "type":["null","string"] },
      { "name":"id", "type":["null","string"] },
      { "name":"lastModifiedBy", "type":["null","string"] },
      { "name":"lastModifiedDate", "type":["null","string"] },
      { "name":"name", "type":["null","string"] }
   ]
}

Later on the schema is evolved with the new field append to the end of the schema

@Getter
@Setter
@Document(StoreEntity.COLLECTION_NAME)
public class StoreEntity extends AuditableEntity {

  public static final String COLLECTION_NAME = "StoreEntity";

  @Id
  private String id;
  @Field
  @Length(max = 150)
  @NotBlank
  private String name;
  @Field
  @Indexed(unique = true)
  @Length(max = 50)
  @NotBlank
  private String code;
  @Field
  @CountryCode
  private String countryCode;
  @Field
  @JsonProperty(defaultValue = "null")
  private String phone;
}
====================================================
{
   "type":"record",
   "name":"StoredEntity",
   "namespace":"com.mydomain.entity",
   "fields":[
      { "name":"code", "type":["null","string"] },
      { "name":"countryCode", "type":["null","string"] },
      { "name":"createdBy", "type":["null","string"] },
      { "name":"createdDate", "type":["null","string"] },
      { "name":"id", "type":["null","string"] },
      { "name":"lastModifiedBy", "type":["null","string"] },
      { "name":"lastModifiedDate", "type":["null","string"] },
      { "name":"name", "type":["null","string"] }
      { "name":"phone", "type":["null","string"], "default":null }
   ]
}

By following some convention of Avro schema Resolution mentioned here http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution

if the reader's record schema has a field that contains a default value, and writer's schema does not have a field with the same name, then the reader should use the default value from its field.

Also another source of suggestion here https://docs.confluent.io/2.0.0/avro.html#backward-compatibility

But seem this is not the case with the library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions