Releases: lockss/lockss-daemon
LOCKSS Daemon 1.74.7
Features
- The PDF filtering code has been hardened to withstand processing uncharacteristic PDF files with excessively large in-memory representations, without filling up the heap and without requiring changes to existing plugins.
Bugs
-
The proxy failed to normalize URLs in requests that include an AUID.
-
Cancelling hashes started from DebugPanel or HasherService frequently did not work, and sometimes crashed the daemon.
-
Aborting crawls using
crawlPriorityAuMap
did not work.
LOCKSS Daemon 1.74.3
Bug Fixes
- Upgraded third-party libraries to address security vulnerabilities reported against them. Updated versions include Apache PDFBox 1.8.16 (CVE-2018-11797), Apache Commons Compress 1.18 (CVE-2018-11771) and FasterXML Jackson 2.9.7 (CVE-2018-7489).
- Some of the ways ServeContent can be invoked failed in some cases on AUs having multiple crawl-start URLs, when some of the start URLs do not exist.
LOCKSS Daemon 1.74.2
Features
-
The new metadata type "
File
" supports indexing of arbitrary publication types. Support is in place for both publication level items (MetadataField.PUBLICATION_TYPE_FILE
) and article level items (MetadataField.ARTICLE_TYPE_FILE
). Article level file items will be assumed to have a publication level file parent even if not explicitly defined. Item metadata beyond the standard access URL, publisher, and provider may be stored as arbitrary key-value pairs in aMetadataField.FIELD_MD_MAP
. -
Content Configuration web service now adds AUs from their TDB definition rather than by AUID, matching the way other subsystems add AUs: Including non-definitional parameters, and choosing the least full repository.
-
Deep crawl status information (
lastDeepCrawl
,lastDeepCrawlResult
,lastCompletedDeepCrawl
,lastCompletedDeepCrawlDepth
) is tracked and reported in the UI, and through thegetAuStatus()
andqueryAus()
Web services. -
Debug Panel and AU Status now include a "Validate Files" action which runs the plugin's
ContentValidator
on all files in the AU, reporting anyValidationFailures
thrown. -
In lieu of a MIME-type content validator factory, plugins may specify an
au_url_mime_validation_map
.ValidationFailures
will occur for URLs that match one of the patterns but whoseContent-Type
does not match the corresponding MIME-type. E.g.,
<entry>
<string>au_url_mime_validation_map</string>
<list>
<string>/doi/pdf(plus)?/, application/pdf</string>
<string>/doi/(abs|full)/, text/html</string>
</list>
</entry>
-
ContentValidators
may throwContentValidationException.LogOnly
to record a warning message without causing validation failure. -
The "Files" list from AU Status now includes a
PollWeight
column.
Bug Fixes
-
SubscriptionManager
omitted non-definitional parameters when adding subscribed AUs. -
The Link Rewriter rewrote in-page links ("
#ref
"), breaking them. -
Metadata item type inference reversed
BOOKCHAPTER
andBOOKVOLUME
in some circumstances. -
In
queryAus()
web service, selectingnewContentCrawlUrls
field caused a fatal error. -
The
LastMetadataIndex
field ingetAuStatus()
andqueryAus()
web services was not accessible usingdaemonstatusservice.py
. -
Fixed unsafe database resource closings and incorrect comparisons in metadata-handling code.
-
Fixed active task removal when metadata indexing for an AU is disabled.
LOCKSS Daemon 1.73.4
Features
- Allow the content configuration Web Service to use the same storage volume selection logic as the UI when adding AUs.
Bug Fixes
-
Bug fixes in ServeContent link rewriting and OpenURL resolver.
-
Properly trigger configuration of AUs after synchronizing whole title subscriptions.
LOCKSS Daemon 1.73.3
Features
- The ViewContent screen now offers an option in the upper pane to run a link extractor on the content displayed in the lower pane.
Bug Fixes
- Fixed a bug in the title subscription management screen's tabbed interface, which
under some circumstances could cause the loss of title subscription data previously
entered in other tabs.
LOCKSS Daemon 1.73.2
Features
- Plugins may compute the starting URL(s) that should be used to browse
an AU's content. Ifplugin_access_url_factory
is set to the name of a
FeatureUrlHelperFactory
, then theFeatureUrlHelper
'sgetAccessUrls()
method will be invoked and the resulting list will be used in place of
the AU's start URLs in contexts where the user is presented with
starting points to browse an AU. (E.g., manifest index pages in
ServeContent and the proxy.) SeeFeatureUrlHelper
.
- Plugins may also compute feature URLs. If a value in the
au_feature_urls
map is the name of aFeatureUrlHelperFactory
, then the
FeatureUrlHelper
'sgetFeatureUrls()
method will be invoked instead of
expanding a printf template. SeeFeatureUrlHelper
.
Plugins that synthesize manifest pages (e.g., for bulk ingest content)
should generally set bothplugin_access_url_factory
and theau_volume
feature to aFeatureUrlHelperFactory
.
-
The AU status page now has two ServeContent links: "Serve Content" and
"Serve AU"."Serve Content" does what "Serve AU" has historically done: it feeds the
bibliographic information for that AU (usually issn&year or isbn&year)
to the OpenURL resolver to find AUs that contain that content. In
the case of multiple publishers or providers there may be more than
one AU. But the results can be misleading or unintuitive in cases
where the bibliographic information in the title db is incomplete.
"Serve AU" now serves that specific AU and no longer depends on Open URL
resolution. -
The AU XPath expressions in
org.lockss.crawler.crawlPriorityAuMap
and
org.lockss.poll.pollWeightAuMap
can now refer to the variable$myhost
,
which is set to the value oforg.lockss.platform.fqdn
. This allows
crawl and poll priorities to be set differently on different boxes.
Bug Fixes
-
OpenUrlResolver
's results were influenced by the availability of pages
at the publisher site, even whenorg.lockss.serveContent.neverProxy
was true. -
Link rewriting in ServeContent used the wrong base URL when serving
redirected pages. -
Deactivating AUs or reloading plugin could cause rapid, excessive
logging. -
Race condition when deactivating AUs could cause
CrawlManager
to exit
and daemon restart. -
Files with erroneous
Content-Encoding
compression headers caused
errors in login page checkers. -
URLs with path components longer than 255 chars and containing an
encoded slash weren't decoded decoded properly, resulting in their
childern not being seen by URL iterators -
Changes to the name of an AU in the title DB, with no other changes,
weren't reflected in status displays, etc, until daemon restart.
LOCKSS Daemon 1.72.3
This release adds several features to deal with more "non-standard" behavior by publisher sites (missing Content-Type, incorrect Content-Length, etc.), better reporting of poll results, and better control over crawl and poll priority.
For additional details, please see the full release notes.
LOCKSS Daemon 1.71.2
LOCKSS daemon 1.71.2 release candidate, packaged as a signed RPM.