Skip to content

Somewhat confusing bugs during live indexing #302

@ashprice

Description

@ashprice
v-2025-03-24.0-dirty

This reports dirty because of only three changes:

  1. I modified the default cron instead of the one under local.
  2. I tried to set SOLR_ULIMIT_CHECKS to 65000 in docker-compose.yml during troubleshooting this issue.
  3. MUSICBRAINZ_SERVER_PROCESSES=${MUSICBRAINZ_SERVER_PROCESSES:-24} I don't know if this actually improves performance in any way.

Docker compose version: 2.34.0
Docker Client/Server: 28.0.4/28.0.4

Issue

During live indexing/replication I receive the following errors:

Died at ./admin/replication/LoadReplicationChanges line 228.
Sat Mar 29 15:30:01 2025 : Downloading https://metabrainz.org/api/musicbrainz/replication-175725-v2.tar.bz2 to /tmp/replication-175725-v2.tar.bz2
500 Can't connect to metabrainz.org:443 (Temporary failure in name resolution)
Content-Type: text/plain
Client-Date: Sat, 29 Mar 2025 15:30:01 GMT
Client-Warning: Internal response

Can't connect to metabrainz.org:443 (Temporary failure in name resolution)

Temporary failure in name resolution at /musicbrainz-server/local/lib/perl5/LWP/Protocol/http.pm line 49.

These issues started on Wednesday. Here is what I have done to troubleshoot:

  1. Checked I can poll most recent packages and download them from the server with my API key using curl.
  2. Starting and stopping services, recreating them.
  3. Full database and search index rebuild. This worked! Replication proceeded as expected. Until I restarted my machine - then the issue has started to recur. The replication packet mentioned (175725) is completely different!

Indexer refuses to run:

indexer-1      | 2025-03-29 15:40:26,977: HTTP Error 500: Server Error
indexer-1      | Traceback (most recent call last):
indexer-1      |   File "/usr/local/lib/python2.7/site-packages/retrying.py", line 200, in call
indexer-1      |     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
indexer-1      |   File "sir/amqp/handler.py", line 444, in _watch_impl
indexer-1      |     handler = Handler(entities)
indexer-1      |   File "sir/amqp/handler.py", line 148, in __init__
indexer-1      |     self.cores[core_name] = solr_connection(core_name)
indexer-1      |   File "sir/util.py", line 88, in solr_connection
indexer-1      |     urllib2.urlopen(ping_uri)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 154, in urlopen
indexer-1      |     return opener.open(url, data, timeout)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 435, in open
indexer-1      |     response = meth(req, response)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 548, in http_response
indexer-1      |     'http', request, response, code, msg, hdrs)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 473, in error
indexer-1      |     return self._call_chain(*args)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 407, in _call_chain
indexer-1      |     result = func(*args)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 556, in http_error_default
indexer-1      |     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
indexer-1      | HTTPError: HTTP Error 500: Server Error
indexer-1      | 2025-03-29 15:40:26,978: Connecting to Solr failed: HTTP Error 500: Server Error
indexer-1      | 2025-03-29 16:09:11,449: Retrying...
indexer-1      | 2025-03-29 16:09:11,450: HTTP Error 500: Server Error
indexer-1      | Traceback (most recent call last):
indexer-1      |   File "/usr/local/lib/python2.7/site-packages/retrying.py", line 200, in call
indexer-1      |     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
indexer-1      |   File "sir/amqp/handler.py", line 444, in _watch_impl
indexer-1      |     handler = Handler(entities)
indexer-1      |   File "sir/amqp/handler.py", line 148, in __init__
indexer-1      |     self.cores[core_name] = solr_connection(core_name)
indexer-1      |   File "sir/util.py", line 88, in solr_connection
indexer-1      |     urllib2.urlopen(ping_uri)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 154, in urlopen
indexer-1      |     return opener.open(url, data, timeout)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 435, in open
indexer-1      |     response = meth(req, response)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 548, in http_response
indexer-1      |     'http', request, response, code, msg, hdrs)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 473, in error
indexer-1      |     return self._call_chain(*args)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 407, in _call_chain
indexer-1      |     result = func(*args)
indexer-1      |   File "/usr/local/lib/python2.7/urllib2.py", line 556, in http_error_default
indexer-1      |     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
indexer-1      | HTTPError: HTTP Error 500: Server Error
indexer-1      | 2025-03-29 16:09:11,451: Connecting to Solr failed: HTTP Error 500: Server Error

Search is running into what looks to me like read/write issues:

search-1  | 2025-03-29 16:09:11.449 ERROR (qtp802600647-56) [   ] o.a.s.s.HttpSolrCall null:org.apache.solr.core.SolrCoreInitializationException: SolrCore 'recording' is not available due to init failure: Error opening new searcher
search-1  | 	at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1653)
search-1  | 	at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:249)
search-1  | 	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
search-1  | 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
search-1  | 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
search-1  | 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
search-1  | 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
search-1  | 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
search-1  | 	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
search-1  | 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
search-1  | 	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
search-1  | 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
search-1  | 	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
search-1  | 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
search-1  | 	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
search-1  | 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
search-1  | 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
search-1  | 	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
search-1  | 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
search-1  | 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
search-1  | 	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
search-1  | 	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
search-1  | 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
search-1  | 	at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
search-1  | 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
search-1  | 	at org.eclipse.jetty.server.Server.handle(Server.java:502)
search-1  | 	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
search-1  | 	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
search-1  | 	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
search-1  | 	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
search-1  | 	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
search-1  | 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
search-1  | 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
search-1  | 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
search-1  | 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
search-1  | 	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
search-1  | 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
search-1  | 	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
search-1  | 	at java.lang.Thread.run(Thread.java:748)
search-1  | Caused by: org.apache.solr.common.SolrException: Error opening new searcher
search-1  | 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1048)
search-1  | 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:874)
search-1  | 	at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1187)
search-1  | 	at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1673)
search-1  | 	... 38 more
search-1  | Caused by: org.apache.solr.common.SolrException: Error opening new searcher
search-1  | 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2137)
search-1  | 	at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2257)
search-1  | 	at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1106)
search-1  | 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:993)
search-1  | 	... 41 more
search-1  | Caused by: org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/opt/solr/server/solr/data/recording/index/segments_lh4x")))
search-1  | 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:293)
search-1  | 	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:845)
search-1  | 	at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:124)
search-1  | 	at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:97)
search-1  | 	at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:257)
search-1  | 	at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:131)
search-1  | 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2098)
search-1  | 	... 44 more
search-1  | Caused by: java.nio.file.NoSuchFileException: /opt/solr/server/solr/data/recording/index/_22auo.si
search-1  | 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
search-1  | 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
search-1  | 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
search-1  | 	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
search-1  | 	at java.nio.channels.FileChannel.open(FileChannel.java:287)
search-1  | 	at java.nio.channels.FileChannel.open(FileChannel.java:335)
search-1  | 	at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
search-1  | 	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
search-1  | 	at org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)
search-1  | 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:361)
search-1  | 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)

and

search-1  | 2025-03-29 15:40:25.723 INFO  (coreLoadExecutor-9-thread-1) [   ] o.a.s.u.UpdateHandler Using UpdateLog implementation: org.apache.solr.update.UpdateLog
search-1  | 2025-03-29 15:40:25.724 INFO  (coreLoadExecutor-9-thread-1) [   ] o.a.s.u.UpdateLog Initializing UpdateLog: dataDir=/opt/solr/server/solr/data/recording defaultSyncLevel=FLUSH numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBuckets=65536
search-1  | 2025-03-29 15:40:25.724 INFO  (coreLoadExecutor-9-thread-1) [   ] o.a.s.u.CommitTracker Hard AutoCommit: if uncommitted for 60000ms; 
search-1  | 2025-03-29 15:40:25.724 INFO  (coreLoadExecutor-9-thread-1) [   ] o.a.s.u.CommitTracker Soft AutoCommit: if uncommitted for 10000ms; 
search-1  | 2025-03-29 15:40:25.725 INFO  (coreLoadExecutor-9-thread-1) [   ] o.a.s.c.SolrCore [recording]  CLOSING SolrCore org.apache.solr.core.SolrCore@25309e2d
search-1  | 2025-03-29 15:40:25.725 INFO  (coreLoadExecutor-9-thread-1) [   ] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.core.recording, tag=25309e2d
search-1  | 2025-03-29 15:40:25.726 INFO  (coreLoadExecutor-9-thread-1) [   ] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@68279487: rootName = null, domain = solr.core.recording, service url = null, agent id = null] for registry solr.core.recording / com.codahale.metrics.MetricRegistry@67f02055
search-1  | 2025-03-29 15:40:25.736 ERROR (coreContainerWorkExecutor-2-thread-1) [   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup
search-1  | org.apache.solr.common.SolrException: Unable to create core [recording]

This is the first output of an error I see after a session is started:

search-1  | 2025-03-29 16:20:49.889 INFO  (coreLoadExecutor-9-thread-2) [   ] o.a.s.u.UpdateLog Initializing UpdateLog: dataDir=/opt/solr/server/solr/data/recording defaultSyncLevel=FLUSH numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBuckets=65536
search-1  | 2025-03-29 16:20:49.890 INFO  (coreLoadExecutor-9-thread-2) [   ] o.a.s.u.CommitTracker Hard AutoCommit: if uncommitted for 60000ms; 
search-1  | 2025-03-29 16:20:49.890 INFO  (coreLoadExecutor-9-thread-2) [   ] o.a.s.u.CommitTracker Soft AutoCommit: if uncommitted for 10000ms; 
search-1  | 2025-03-29 16:20:49.891 INFO  (coreLoadExecutor-9-thread-2) [   ] o.a.s.c.SolrCore [recording]  CLOSING SolrCore org.apache.solr.core.SolrCore@3c5c8d1f
search-1  | 2025-03-29 16:20:49.892 INFO  (coreLoadExecutor-9-thread-2) [   ] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.core.recording, tag=3c5c8d1f
search-1  | 2025-03-29 16:20:49.892 INFO  (coreLoadExecutor-9-thread-2) [   ] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@3a7ec5d0: rootName = null, domain = solr.core.recording, service url = null, agent id = null] for registry solr.core.recording / com.codahale.metrics.MetricRegistry@12345f93
search-1  | 2025-03-29 16:20:49.900 ERROR (coreContainerWorkExecutor-2-thread-1) [   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup
search-1  | org.apache.solr.common.SolrException: Unable to create core [recording]
search-1  | 	at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1208) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:699) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197) ~[metrics-core-3.2.6.jar:3.2.6]
search-1  | 	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_212]
search-1  | 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:52]
search-1  | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
search-1  | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
search-1  | 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
search-1  | Caused by: org.apache.solr.common.SolrException: Error opening new searcher
search-1  | 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1048) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:874) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1187) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	... 7 more
search-1  | Caused by: org.apache.solr.common.SolrException: Error opening new searcher
search-1  | 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2137) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2257) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1106) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:993) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:874) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1187) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]
search-1  | 	... 7 more
search-1  | Caused by: org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/opt/solr/server/solr/data/recording/index/segments_lh4x")))

It does complain about SOLR_ULIMIT_CHECKS though, presumably I have declared it in the wrong place.

Any guidance on further troubleshooting would be much appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions