Skip to content

Invalid Avro file produced using SequenceWriter #339

Open
@willsoto

Description

@willsoto

While documentation on writing Avro to a file is sparse, I have managed to piece some stuff together but I am still getting an error.

Here is some sample code:

final var avroFactory = AvroFactory.builderWithApacheDecoder().enable(AvroGenerator.Feature.AVRO_FILE_OUTPUT).build();

final var generator = new AvroSchemaGenerator().enableLogicalTypes();

final var mapper = AvroMapper.builder(avroFactory).addModule(new AvroJavaTimeModule()).build();
mapper.acceptJsonFormatVisitor(Thing.class, generator);

final var avroSchema = generator.getGeneratedSchema();

final var file = Files.createTempFile("something", ".avro").toFile();

final var out = new ByteArrayOutputStream();
final var writer = mapper.writer(avroSchema).writeValues(out);

// in a loop
writer.write(thing);

// after loop
writer.close();

try (FileOutputStream outputStream = new FileOutputStream(file)) {
  out.writeTo(outputStream);
}

When checking the resultant file using avro-tools, I get the following error:

avro-tools tojson something.avro

22/09/08 18:36:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:224)
	at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:97)
	at org.apache.avro.tool.Main.run(Main.java:67)
	at org.apache.avro.tool.Main.main(Main.java:56)
Caused by: java.io.IOException: Invalid sync!
	at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:319)
	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:213)
	... 3 mor

According to some searching, the Invalid sync! error occurs when the file hasn't been stitched together properly, but it's unclear to me what I need to do in code to get that to happen. I've looked through most of the Avro tests in this repo and I cannot find one that actually writes to a file and then de-serializes from that file.

I am not sure if I have stumbled into an actual bug here or not, but I am happy to try and write a test case if this code does seem correct since that would imply it's a bug?

Thanks in advance.

Edit:

I've also tried the following:

final var file = Files.createTempFile("something", ".avro").toFile();
final SequenceWriter writer = mapper.writer(avroSchema).writeValues(file);

In which case I get the following error at that line:

java.lang.UnsupportedOperationException: Generator of type com.fasterxml.jackson.core.json.UTF8JsonGenerator does not support schema of type 'avro'

	at com.fasterxml.jackson.core.JsonGenerator.setSchema(JsonGenerator.java:592)
	at com.fasterxml.jackson.databind.ObjectWriter$GeneratorSettings.initialize(ObjectWriter.java:1393)
	at com.fasterxml.jackson.databind.ObjectWriter._configureGenerator(ObjectWriter.java:1258)
	at com.fasterxml.jackson.databind.ObjectWriter.createGenerator(ObjectWriter.java:717)
	at com.fasterxml.jackson.databind.ObjectWriter.writeValues(ObjectWriter.java:753)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions