Skip to content

Concurrency Issue with Joda LocalDate #235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TheHound opened this issue Jan 26, 2018 · 14 comments
Open

Concurrency Issue with Joda LocalDate #235

TheHound opened this issue Jan 26, 2018 · 14 comments
Labels

Comments

@TheHound
Copy link

TheHound commented Jan 26, 2018

I am running fst 2.48 and joda-time 2.9.9.
The following test case errors most of the time:

public class FSTSerializationTest {

    private final FSTConfiguration fstConf = FSTConfiguration.createDefaultConfiguration();

    public void tryAndReProduceBug() throws InterruptedException {
        final Serializable testObject = new TestObject();
        ExecutorService pool = Executors.newFixedThreadPool(4);
        List<Future<?>> futures = new ArrayList<>();
        for (int i = 0; i < 1_000_000; i++) {

            futures.add(pool.submit(() -> {
                ByteArrayOutputStream stream = new ByteArrayOutputStream();
                try {
                    writeObject(testObject, stream);
                    readObject(new ByteArrayInputStream(stream.toByteArray()));
                } catch (Exception e) {
                    System.err.println("ERROR" + e);
                    throw new RuntimeException(e);
                }

            }));
        }
        pool.shutdown();
        pool.awaitTermination(10, TimeUnit.HOURS);
        for (Future<?> future : futures) {
            try {
                Object object = future.get();
            } catch (Exception e) {
                Assert.fail("error", e);
            }
        }
    }

    public Object readObject(final InputStream inputStream) throws IOException {
        try (FSTObjectInput in = new FSTObjectInput(inputStream, fstConf)) {
            return in.readObject();
        } catch (Exception e) {
            throw new IOException(e);
        }
    }

    public void writeObject(final Object obj, final OutputStream outputStream) throws IOException {
        try (FSTObjectOutput out = new FSTObjectOutput(outputStream, fstConf)) {
            out.writeObject(obj);
        } catch (Exception e) {
            throw new IOException(e);
        }
    }

    public static final class TestObject implements Serializable {

        private String string = "asdasdassd";

        private DateTime dateTime = new DateTime();

        private LocalDate locaTime = new LocalDate();

        private int inty = 129;
    }
}

I have been able to work around the issue by using a ThreadLocal for FSTConfiguration. I don't think this is an issue with Joda as replacing the FST serialization with Java Serialization doesn't appear to cause an issue.

@RuedigerMoeller
Copy link
Owner

Thanks for the report, the workaround you decribe points to a concurrency issue in FSTConfiguration. Actually in the past the workaround was "works as designed", however due to memory consumption of metadata, FSTConfiguration was made multithreaded afterwards (with gaps obviously ;) ).

@andrewl102
Copy link

The issue is caused by the clnames fields in FSTStreamEncoder and FSTStreamDecoder.
Replacing this section of code:

        if ( clnames == null ) {
            clnames = new FSTClazzNameRegistry(conf.getClassRegistry());
        } else {
            clnames.clear();
        }

with

clnames = new FSTClazzNameRegistry(conf.getClassRegistry());

resolves the issue. It appears the class name registry has some concurrency issues that prevent it from working properly when the FSTConfiguration class is shared amongst threads.

There is a workaround for this issue : by preregistering all the classes you want to serialize, the registry will be effectively immutable and not subject to this problem.
The code below runs successfully, at least for me : note the explicit registration.

import org.joda.time.DateTime;
import org.joda.time.LocalDate;
import org.nustaq.serialization.FSTConfiguration;
import org.nustaq.serialization.FSTObjectInput;
import org.nustaq.serialization.FSTObjectOutput;

import java.io.*;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;

public class FSTSerializationTest {

    public static void main(String[] args) throws InterruptedException, ClassNotFoundException {
        new FSTSerializationTest().tryAndReProduceBug();
    }

    private final FSTConfiguration fstConf = FSTConfiguration.createDefaultConfiguration();

    public void tryAndReProduceBug() throws InterruptedException, ClassNotFoundException {
        final Serializable testObject = new TestObject();
        ExecutorService pool = Executors.newFixedThreadPool(4);
        fstConf.registerClass(DateTime.class);
        fstConf.registerClass(LocalDate.class);
        fstConf.registerClass(Class.forName("org.joda.time.chrono.ISOChronology$Stub"));
        fstConf.registerClass(Class.forName("org.joda.time.DateTimeZone$Stub"));
        fstConf.registerClass(Class.forName("FSTSerializationTest$TestObject"));
        List<Future<?>> futures = new ArrayList<>();
        for (int i = 0; i < 1_000_000; i++) {

            futures.add(pool.submit(() -> {
                ByteArrayOutputStream stream = new ByteArrayOutputStream();
                try {
                    writeObject(testObject, stream);
                    readObject(new ByteArrayInputStream(stream.toByteArray()));
                } catch (Exception e) {
                    System.err.println("ERROR" + e);
                    throw new RuntimeException(e);
                }

            }));
        }
        pool.shutdown();
        pool.awaitTermination(10, TimeUnit.HOURS);
        for (Future<?> future : futures) {
            try {
                Object object = future.get();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }

    public Object readObject(final InputStream inputStream) throws IOException {
        try (FSTObjectInput in = new FSTObjectInput(inputStream, fstConf)) {
            return in.readObject();
        } catch (Exception e) {
            throw new IOException(e);
        }
    }

    public void writeObject(final Object obj, final OutputStream outputStream) throws IOException {
        try (FSTObjectOutput out = new FSTObjectOutput(outputStream, fstConf)) {
            out.writeObject(obj);
        } catch (Exception e) {
            throw new IOException(e);
        }
    }

    public static final class TestObject implements Serializable {

        private String string = "asdasdassd";

        private DateTime dateTime = new DateTime();

        private LocalDate locaTime = new LocalDate();

        private int inty = 129;
    }
}

@perlun
Copy link

perlun commented Jan 24, 2019

For reference, this might come handy for others (supplied by @slovdahl, thank you!). The way that @TheHound 's test case crashes is a "hard JVM crash", caused by a SIGSEGV in the JVM. Here is the HotSpot error log output:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8d3c3d0743, pid=10097, tid=0x00007f8d30d86700
#
# JRE version: OpenJDK Runtime Environment (8.0_191-b12) (build 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# j  org.joda.time.LocalDate.readResolve()Ljava/lang/Object;+50
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
---------------  T H R E A D  ---------------
Current thread (0x00007f8d4c8bd000):  JavaThread "pool-1-thread-1" [_thread_in_Java, id=10121, stack(0x00007f8d30c86000,0x00007f8d30d87000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000100000068
Registers:
RAX=0x00000007c0119cc8, RBX=0x0000000100000018, RCX=0x00000007196a5d60, RDX=0x00007f8d307845a0
RSP=0x00007f8d30d855b0, RBP=0x00007f8d30d85620, RSI=0x00000000f8023399, RDI=0x0000000000003768
R8 =0x00007f8d38ef1d90, R9 =0x0000000000000006, R10=0x00007f8d51f33ae0, R11=0x00007f8d3c7a8780
R12=0x0000000000000000, R13=0x00007f8d30d855b8, R14=0x00007f8d307845a0, R15=0x00007f8d4c8bd000
RIP=0x00007f8d3c3d0743, EFLAGS=0x0000000000010297, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e
Top of Stack: (sp=0x00007f8d30d855b0)
0x00007f8d30d855b0:   00007f8d3c3b3d80 00000007196a5d60
0x00007f8d30d855c0:   000001687d280000 00000007196a5d00
0x00007f8d30d855d0:   00000007196a5db0 00000007196a5db0
0x00007f8d30d855e0:   00007f8d30d855e0 00007f8d30de9ba2
0x00007f8d30d855f0:   00007f8d30d85630 00007f8d30def0c0
0x00007f8d30d85600:   00007f8d307845a0 00007f8d30de9bb8
0x00007f8d30d85610:   00007f8d30d855b8 00007f8d30d85630
0x00007f8d30d85620:   00007f8d30d85688 00007f8d3c3b3d80
0x00007f8d30d85630:   00000007196a5d00 00007f8d3c3b3d80
0x00007f8d30d85640:   00000007196a5d00 00007f8d30d85648
0x00007f8d30d85650:   00007f8d30f64f1c 00007f8d30d856a8
0x00007f8d30d85660:   00007f8d30f65028 00007f8d30f65338
0x00007f8d30d85670:   00007f8d30f64f58 00007f8d30d85640
0x00007f8d30d85680:   00007f8d30d856b0 00000000ba69b971
0x00007f8d30d85690:   00007f8d3c8b63b4 00000007196a5d80
0x00007f8d30d856a0:   00000007196a5d00 00000005d34dcb98
0x00007f8d30d856b0:   0000000400000000 0000000000000028
0x00007f8d30d856c0:   00000007196a56c8 00000007c000e320
0x00007f8d30d856d0:   00000007196a5750 00000005d3607d18
0x00007f8d30d856e0:   00000005d3607d70 00000007196a5880
0x00007f8d30d856f0:   0000000a00000001 00000007c0001610
0x00007f8d30d85700:   00000005d3607de8 00000007196a56c8
0x00007f8d30d85710:   000000a2ba6c0fbd 00000007196a5d00
0x00007f8d30d85720:   ba6be5de000000a2 00000005d35f2bf8
0x00007f8d30d85730:   00000017000000bc 00007f8d3c767e48
0x00007f8d30d85740:   00000000ba6c0f95 00007f8d3c7163a4
0x00007f8d30d85750:   00000005d3422fb0 00007f8d00000032
0x00007f8d30d85760:   00007f8d4c8bd000 00007f8d4c8b8320
0x00007f8d30d85770:   0000000700000002 0000000000000000
0x00007f8d30d85780:   00007f8d4c8bd5b8 00000000000000d8
0x00007f8d30d85790:   0000000000000000 00007f8d3c809570
0x00007f8d30d857a0:   0000000000000000 00000005d37d55b8 
Instructions: (pc=0x00007f8d3c3d0743)
0x00007f8d3c3d0723:   48 83 c2 10 4c 8b 6a d8 41 83 ed 04 41 c1 e5 03
0x00007f8d3c3d0733:   49 03 d5 48 89 55 e0 4c 8d 6c 24 08 4c 89 6d f0
0x00007f8d3c3d0743:   ff 63 50 66 90 00 00 00 00 00 00 00 00 00 00 00
0x00007f8d3c3d0753:   00 00 00 00 00 00 00 00 00 00 00 00 00 c0 02 00 
Register to memory mapping:
RAX=0x00000007c0119cc8 is pointing into metadata
RBX=0x0000000100000018 is an unknown value
RCX=0x00000007196a5d60 is an oop
org.joda.time.LocalDate 
 - klass: 'org/joda/time/LocalDate'
RDX=0x00007f8d307845a0 is pointing into metadata
RSP=0x00007f8d30d855b0 is pointing into the stack for thread: 0x00007f8d4c8bd000
RBP=0x00007f8d30d85620 is pointing into the stack for thread: 0x00007f8d4c8bd000
RSI=0x00000000f8023399 is an unknown value
RDI=0x0000000000003768 is an unknown value
R8 ={method} {0x00007f8d38ef1d90} 'monthOfYear' '()Lorg/joda/time/DateTimeField;' in 'org/joda/time/Chronology'
R9 =0x0000000000000006 is an unknown value
R10=0x00007f8d51f33ae0: <offset 0xf42ae0> in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f8d50ff1000
R11=0x00007f8d3c7a8780 is at entry_point+32 in (nmethod*)0x00007f8d3c7a8610
R12=0x0000000000000000 is an unknown value
R13=0x00007f8d30d855b8 is pointing into the stack for thread: 0x00007f8d4c8bd000
R14=0x00007f8d307845a0 is pointing into metadata
R15=0x00007f8d4c8bd000 is a thread
Stack: [0x00007f8d30c86000,0x00007f8d30d87000],  sp=0x00007f8d30d855b0,  free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
j  org.joda.time.LocalDate.readResolve()Ljava/lang/Object;+50
j  sun.reflect.GeneratedMethodAccessor17.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+36
J 1309 C2 org.nustaq.serialization.FSTObjectInput.readObjectFields(Lorg/nustaq/serialization/FSTClazzInfo$FSTFieldInfo;Lorg/nustaq/serialization/FSTClazzInfo;[Lorg/nustaq/serialization/FSTClazzInfo$FSTFieldInfo;Ljava/lang/Object;II)V (538 bytes) @ 0x00007f8d3c8b63b4 [0x00007f8d3c8b5bc0+0x7f4]
J 1313 C2 org.nustaq.serialization.FSTObjectInput.readObjectInternal([Ljava/lang/Class;)Ljava/lang/Object; (56 bytes) @ 0x00007f8d3c809570 [0x00007f8d3c8083a0+0x11d0]
J 1428 C2 fi.hibox.centre.protocol.server.FstTest.readObject(Ljava/io/InputStream;)Ljava/lang/Object; (107 bytes) @ 0x00007f8d3c8aa0fc [0x00007f8d3c8a9960+0x79c]
J 1436 C2 java.util.concurrent.FutureTask.run()V (126 bytes) @ 0x00007f8d3c8b2128 [0x00007f8d3c8b1d80+0x3a8]
J 1445% C2 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (225 bytes) @ 0x00007f8d3c90b6b8 [0x00007f8d3c90b420+0x298]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [libjvm.so+0x66701b]
V  [libjvm.so+0x6645f4]
V  [libjvm.so+0x664bd7]
V  [libjvm.so+0x6a7eb4]
V  [libjvm.so+0xa18917]
V  [libjvm.so+0xa18ddc]
V  [libjvm.so+0x8be122]
C  [libpthread.so.0+0x76ba]  start_thread+0xca
---------------  P R O C E S S  ---------------
Java Threads: ( => current thread )
  0x00007f8ce8001000 JavaThread "Attach Listener" daemon [_thread_blocked, id=10131, stack(0x00007f8d30682000,0x00007f8d30783000)]
  0x00007f8d4c8c1800 JavaThread "pool-1-thread-4" [_thread_in_Java, id=10124, stack(0x00007f8d30983000,0x00007f8d30a84000)]
  0x00007f8d4c8c0000 JavaThread "pool-1-thread-3" [_thread_in_Java, id=10123, stack(0x00007f8d30a84000,0x00007f8d30b85000)]
  0x00007f8d4c8be800 JavaThread "pool-1-thread-2" [_thread_in_Java, id=10122, stack(0x00007f8d30b85000,0x00007f8d30c86000)]
=>0x00007f8d4c8bd000 JavaThread "pool-1-thread-1" [_thread_in_Java, id=10121, stack(0x00007f8d30c86000,0x00007f8d30d87000)]
  0x00007f8d4c287000 JavaThread "Service Thread" daemon [_thread_blocked, id=10116, stack(0x00007f8d312e1000,0x00007f8d313e2000)]
  0x00007f8d4c282000 JavaThread "C1 CompilerThread3" daemon [_thread_in_native, id=10115, stack(0x00007f8d313e2000,0x00007f8d314e3000)]
  0x00007f8d4c280000 JavaThread "C2 CompilerThread2" daemon [_thread_in_native, id=10114, stack(0x00007f8d314e3000,0x00007f8d315e4000)]
  0x00007f8d4c277800 JavaThread "C2 CompilerThread1" daemon [_thread_in_native, id=10113, stack(0x00007f8d31dd9000,0x00007f8d31eda000)]
  0x00007f8d4c276000 JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=10112, stack(0x00007f8d31eda000,0x00007f8d31fdb000)]
  0x00007f8d4c279800 JavaThread "Monitor Ctrl-Break" daemon [_thread_in_native, id=10111, stack(0x00007f8d31fdb000,0x00007f8d320dc000)]
  0x00007f8d4c20e000 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=10110, stack(0x00007f8d320dc000,0x00007f8d321dd000)]
  0x00007f8d4c1db800 JavaThread "Finalizer" daemon [_thread_blocked, id=10109, stack(0x00007f8d325e6000,0x00007f8d326e7000)]
  0x00007f8d4c1d9000 JavaThread "Reference Handler" daemon [_thread_blocked, id=10108, stack(0x00007f8d326e7000,0x00007f8d327e8000)]
  0x00007f8d4c012800 JavaThread "main" [_thread_blocked, id=10098, stack(0x00007f8d52c51000,0x00007f8d52d52000)]
Other Threads:
  0x00007f8d4c1cf800 VMThread [stack: 0x00007f8d327e8000,0x00007f8d328e9000] [id=10107]
  0x00007f8d4c289800 WatcherThread [stack: 0x00007f8d311e0000,0x00007f8d312e1000] [id=10117]
VM state:not at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: None
Heap:
 PSYoungGen      total 1524224K, used 131917K [0x0000000718b00000, 0x000000079e600000, 0x00000007c0000000)
  eden space 1373696K, 3% used [0x0000000718b00000,0x000000071bad3788,0x000000076c880000)
  from space 150528K, 55% used [0x000000076c880000,0x0000000771980010,0x0000000775b80000)
  to   space 144896K, 0% used [0x0000000795880000,0x0000000795880000,0x000000079e600000)
 ParOldGen       total 475136K, used 172590K [0x00000005ca000000, 0x00000005e7000000, 0x0000000718b00000)
  object space 475136K, 36% used [0x00000005ca000000,0x00000005d488bb90,0x00000005e7000000)
 Metaspace       used 11344K, capacity 11732K, committed 11904K, reserved 1060864K
  class space    used 1240K, capacity 1344K, committed 1408K, reserved 1048576K
Card table byte_map: [0x00007f8d3b03b000,0x00007f8d3bfec000] byte_map_base: 0x00007f8d381eb000
Marking Bits: (ParMarkBitMap*) 0x00007f8d51f22520
 Begin Bits: [0x00007f8cfc500000, 0x00007f8d04280000)
 End Bits:   [0x00007f8d04280000, 0x00007f8d0c000000)
Polling page: 0x00007f8d52d76000
CodeCache: size=245760Kb used=5616Kb max_used=5617Kb free=240143Kb
 bounds [0x00007f8d3c3ac000, 0x00007f8d3c93c000, 0x00007f8d4b3ac000]
 total_blobs=1823 nmethods=1453 adapters=284
 compilation: enabled
Compilation events (10 events):
Event: 4.522 Thread 0x00007f8d4c282000 nmethod 1462 0x00007f8d3c8e91d0 code [0x00007f8d3c8e9360, 0x00007f8d3c8e96a8]
Event: 5.507 Thread 0x00007f8d4c282000 1463   !   3       org.nustaq.serialization.FSTObjectInput::readObjectCompatibleRecursive (254 bytes)
Event: 5.507 Thread 0x00007f8d4c280000 1464       4       java.util.ArrayDeque::pop (5 bytes)
Event: 5.507 Thread 0x00007f8d4c277800 1466       4       org.nustaq.serialization.FSTClazzInfo::getCompInfo (24 bytes)
Event: 5.508 Thread 0x00007f8d4c277800 nmethod 1466 0x00007f8d3c8e7d10 code [0x00007f8d3c8e7e60, 0x00007f8d3c8e7ef8]
Event: 5.508 Thread 0x00007f8d4c277800 1468       4       java.util.ArrayDeque::removeFirst (19 bytes)
Event: 5.508 Thread 0x00007f8d4c280000 nmethod 1464 0x00007f8d3c8e3910 code [0x00007f8d3c8e3a60, 0x00007f8d3c8e3b38]
Event: 5.508 Thread 0x00007f8d4c277800 nmethod 1468 0x00007f8d3c8e3550 code [0x00007f8d3c8e36a0, 0x00007f8d3c8e3778]
Event: 5.508 Thread 0x00007f8d4c277800 1472       4       org.nustaq.serialization.FSTObjectInput::readByte (10 bytes)
Event: 5.508 Thread 0x00007f8d4c280000 1471       4       org.nustaq.serialization.FSTObjectInput::getObjectInputStream (47 bytes)
GC Heap History (10 events):
Event: 3.440 GC heap before
{Heap before GC invocations=6 (full 1):
 PSYoungGen      total 537088K, used 20962K [0x0000000718b00000, 0x000000076a000000, 0x00000007c0000000)
  eden space 516096K, 0% used [0x0000000718b00000,0x0000000718b00000,0x0000000738300000)
  from space 20992K, 99% used [0x0000000738300000,0x00000007397788f8,0x0000000739780000)
  to   space 150016K, 0% used [0x0000000760d80000,0x0000000760d80000,0x000000076a000000)
 ParOldGen       total 371712K, used 369231K [0x00000005ca000000, 0x00000005e0b00000, 0x0000000718b00000)
  object space 371712K, 99% used [0x00000005ca000000,0x00000005e0893d50,0x00000005e0b00000)
 Metaspace       used 11364K, capacity 11780K, committed 11904K, reserved 1060864K
  class space    used 1244K, capacity 1352K, committed 1408K, reserved 1048576K
Event: 3.736 GC heap after
Heap after GC invocations=6 (full 1):
 PSYoungGen      total 537088K, used 0K [0x0000000718b00000, 0x000000076a000000, 0x00000007c0000000)
  eden space 516096K, 0% used [0x0000000718b00000,0x0000000718b00000,0x0000000738300000)
  from space 20992K, 0% used [0x0000000738300000,0x0000000738300000,0x0000000739780000)
  to   space 150016K, 0% used [0x0000000760d80000,0x0000000760d80000,0x000000076a000000)
 ParOldGen       total 475136K, used 172574K [0x00000005ca000000, 0x00000005e7000000, 0x0000000718b00000)
  object space 475136K, 36% used [0x00000005ca000000,0x00000005d4887b90,0x00000005e7000000)
 Metaspace       used 11343K, capacity 11732K, committed 11904K, reserved ...

@perlun
Copy link

perlun commented Jan 24, 2019

The error is not consistent though, sometimes it fails with more "normal" JVM errors like this:

    java.lang.AssertionError: error
        at org.testng.Assert.fail(Assert.java:82)
        at FSTSerializationTest.tryAndReProduceBug(FSTSerializationTest.java:53)

        Caused by:
        java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: java.io.IOException: java.lang.reflect.InvocationTargetException
            at java.util.concurrent.FutureTask.report(FutureTask.java:122)
            at java.util.concurrent.FutureTask.get(FutureTask.java:192)
            at FSTSerializationTest.tryAndReProduceBug(FSTSerializationTest.java:51)

            Caused by:
            java.lang.RuntimeException: java.io.IOException: java.io.IOException: java.lang.reflect.InvocationTargetException
                at FSTSerializationTest.lambda$tryAndReProduceBug$0(FSTSerializationTest.java:42)

                Caused by:
                java.io.IOException: java.io.IOException: java.lang.reflect.InvocationTargetException
                    at FSTSerializationTest.readObject(FSTSerializationTest.java:62)
                    at FSTSerializationTest.lambda$tryAndReProduceBug$0(FSTSerializationTest.java:39)

                    Caused by:
                    java.io.IOException: java.lang.reflect.InvocationTargetException
                        at org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:247)
                        at FSTSerializationTest.readObject(FSTSerializationTest.java:60)
                        ... 1 more

                        Caused by:
                        java.lang.reflect.InvocationTargetException
                            at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
                            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                            at java.lang.reflect.Method.invoke(Method.java:498)
                            at org.nustaq.serialization.FSTObjectInput.handleReadRessolve(FSTObjectInput.java:585)
                            at org.nustaq.serialization.FSTObjectInput.readObjectCompatible(FSTObjectInput.java:577)
                            at org.nustaq.serialization.FSTObjectInput.instantiateAndReadNoSer(FSTObjectInput.java:559)
                            at org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:374)
                            at org.nustaq.serialization.FSTObjectInput.readObjectFields(FSTObjectInput.java:713)
                            at org.nustaq.serialization.FSTObjectInput.readObjectCompatibleRecursive(FSTObjectInput.java:635)
                            at org.nustaq.serialization.FSTObjectInput.readObjectCompatible(FSTObjectInput.java:574)
                            at org.nustaq.serialization.FSTObjectInput.instantiateAndReadNoSer(FSTObjectInput.java:559)
                            at org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:374)
                            at org.nustaq.serialization.FSTObjectInput.readObjectFields(FSTObjectInput.java:713)
                            at org.nustaq.serialization.FSTObjectInput.instantiateAndReadNoSer(FSTObjectInput.java:566)
                            at org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:374)
                            at org.nustaq.serialization.FSTObjectInput.readObjectInternal(FSTObjectInput.java:331)
                            at org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:311)
                            at org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:245)
                            ... 2 more

                            Caused by:
                            java.lang.IllegalArgumentException: The datetime zone id '' is not recognised
                                at org.joda.time.DateTimeZone.forID(DateTimeZone.java:229)
                                at org.joda.time.DateTimeZone$Stub.readResolve(DateTimeZone.java:1314)
                                ... 20 more

@andrewl102
Copy link

Well perhaps we should try to convince him to accept #254 or better yet provide a more performant fix :P

@perlun
Copy link

perlun commented Jan 24, 2019

@andrewl102 There was a clear plan suggested there so I tried bumping that PR a bit now.

@theigl
Copy link

theigl commented Jan 24, 2019

We've been hit by this issue with recent FST versions as well. As a consequence, we reverted back to version 2.10. It has been running in our production environment for years and is the last version that has never crashed for our system.

If you do not need any features introduced after that version I suggest downgrading until this issue is resolved.

@perlun
Copy link

perlun commented Jan 24, 2019

@theigl Thanks. The problem for us is that 2.10 is such an old version (2014) that I'm not sure it would work for our use case.

@theigl
Copy link

theigl commented Jan 24, 2019

@perlun: It is old, but there have not been too many changes to the serialization code since then. Just look at the commit diagram or the commit history. I'm usually not advocating 4 year old libraries, but as I said 2.10 has been running in our production environment for years and it works.

Having said that, I'd really love to see this issue resolved ;)

@perlun
Copy link

perlun commented Feb 1, 2019

@andrewl102 FWIW: I just tested your suggested workaround and it still runs into "unknown object tag -126" and similar errors sometimes, so it's unfortunately not a 100% safe fix. This is with Java 11.0.2.

Java 8 surprisingly does not exhibit this error at all. I tried running the test 10 times on Java 8 without failure. On Java 11, it failed 3/10 times so 30% failures. Even increasing the number of threads on Java 8 couldn't make it fail, but the crashes described below are indeed with Java 8 so... I think that there is still some issue, it's just extremely intermittent. We are talking about "once every 1-2 hours if stress tested to the max constantly", i.e. flooding the server with requests using Gatling. So, it's a very annoying bug/race condition we are dealing with here.

I will try a more long-running job with Java 8 now to see if I can provoke it to fail after, say 100 million iterations. Will keep you posted.


This could explain some of the errors we've been seeing in one of our customer's test environments lately. Random crashes, most often with a SIGSEGV in the Java process.

We've debugged this at length in gdb, attaching to the Java process and it's various assertions like these that are failing (note: you need a debug build of the JVM to see these, like the one provided by @shipilev at https://builds.shipilev.net/openjdk-jdk8-redhat/)

#  Internal Error (/home/buildbot/worker/jdk8u-redhat-linux/build/hotspot/src/share/vm/interpreter/linkResolver.cpp:100), pid=13231, tid=0x00007f94f1929700
#  assert(resolved_method->signature() == selected_method->signature()) failed: signatures must correspond 

#  Internal Error
(/home/buildbot/worker/jdk8u-dev-linux/build/hotspot/src/share/vm/oops/objArrayKlass.hpp:95),
pid=12388, tid=0x00007f35eabf3700
#  assert(k->oop_is_objArray()) failed: cast to ObjArrayKlass

#  Internal Error (/home/buildbot/worker/jdk8u-redhat-linux/build/hotspot/src/share/vm/runtime/handles.cpp:50), pid=29358, tid=0x00007f62e313b700
#  assert(obj->is_oop()) failed: not an oop: 0x0000000000000048

@theigl Thanks for that. I tested version 2.10 and here are my conclusions:

  • On Java 8: no crashes during 10 test runs.
  • On Java 11: crashed with similar (but not identical) errors 2/10 times, i.e. 20% of the invocations.

I will stress test Java 8 with 2.10 and 2.57 in more long-running CI runs now, to see if 2.10 is stable even after 100 or 1000 million iterations. (Yes, we need this level of stability for our use case.)

@perlun
Copy link

perlun commented Feb 4, 2019

I opened a separate issue and added a repro repository, moving the discussion to #274 since the workaround suggested in this issue does not work => separate root cause.

@chrisco484
Copy link

* On Java 8: no crashes during 10 test runs.

* On Java 11: crashed with similar (but not identical) errors 2/10 times, i.e. 20% of the invocations.

Hi @perlun

Similar concurrency issues are discussed in other issues and it seems to be related to turning on G1GC garbage collection strategy that was introduced with Java 8 - but not turned on by default.

I'm guessing the difference between your Java 8 and Java 11 experience is that in Java 11 G1GC is now on by default.

I suggest that if you explicitly turn on G1GC in Java 8 you might be able to see the problem there also - which is the experience on some of the other issue reports.

@RuedigerMoeller
Copy link
Owner

RuedigerMoeller commented Dec 16, 2019

Hi guys,
unfortunately i am super busy and also cash constrained so i can't find time for free work. Once we move away from java 8 with our internal infrastructure, this will get fixed for sure.

Actually its necessary to turn of use of Unsafe in order to get better diagnostics. This should be possible by setting the Unsafe singleton to null afair.

@theRealAph
Copy link

Should be fixed by #311 . Please check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants