- bump
djangoto 5.2.8 - fix: cap length for elasticsearch document id
- bump dependencies
celeryto 5.5.3kombuto 5.5.4
- improve error handling in celery task-result backend
- use logging config in celery worker
- improve code docs (README.md et al.)
- add cardsearch feeds (rss and atom)
- /trove/index-card-search/rss.xml
- /trove/index-card-search/atom.xml
- fix: render >1 result in streamed index-value-search (csv, tsv, json)
- when browsing trove api in browser, wrap non-browser-friendly mediatypes in html (unless
withFileName, which requests download) - better trove.render test coverage
- code cleanliness
- de-collide "simple" names
- SimpleRendering => EntireRendering
- SimpleTrovesearchRenderer => TrovesearchCardOnlyRenderer
- consolidate more shared logic into trove.util
- more accurate type annotations
- de-collide "simple" names
- use python 3.13
- use
poetryto manage dependencies - upgrade various dependencies
- start using
mypyfor type-checking (loosely)
- delete
RawDatummodeltrove.digestive_tract.extractnow must succeed before/trove/ingestresponds
- rename
IndexcardRdf(and kids) toResourceDescription - move most django models to their own files
- stop storing
CeleryTaskResults forever- new environment variables:
CELERY_RESULT_EXPIRES,FAILED_CELERY_RESULT_EXPIRES
- new environment variables:
- fix:
/api/v2/error generating rss/atom feed links - fix: pagination at
/api/v2/sourceconfigs - fix: correct osfmap IRIs (
dcat:accessURL,osf:verifiedLink)
- smaller
osfmap_jsonderived representation (thx bodintsov) - prepare for next release dropping
RawDatummodel/table:- mirror
expiration_datedatabase column fromRawDatumtoIndexcardRdf - add management command
migrate_rawdatum_expirationto copy old values
- mirror
- fix: avoid
sharev2_elasticqueue backups byacking more correctly
- fix: more consistent
suggestedFilterOperatorvalues in json api - configurable rabbitmq connection heartbeat timeout via
RABBITMQ_HEARTBEAT_TIMEOUTenv var
- remove search-text parsing from base trovesearch params (syntax may now vary by index strategy)
- add search-text syntax to
trovesearch_denormindex strategy (using elasticsearchsimple_query_string) - add
osf:verifiedLinksentry to osfmap thesaurus - remove
trove_indexcard_flatsindex strategy, a cautionary tale of elasticsearchnested(which is already cautioned against by its own docs, yes) - add
SimpleChainMaputil, alternative tocollections.ChainMapthat doesn't do updates (uses more permissiveMappingtype overMutableMapping - add
BasicTroveParams(shared params for all trove endpoints) - add/use base
trove.views:BaseTroveView: parsesBaseTroveParams, renders rdf data (response content) accordinglyStaticTroveView: responds with same static rdf data every timeGatheredTroveView: gathers rdf data via givenprimitive_metadata.gather.GatheringOrganizer
- fix
/trove/browse?iri=...and/trove/index-card/... - updo html rendering of
/trove/responses - add landing page of static data (links to docs, etc) rendered same way as
/trove/responses - easier editing feature flags via
/admin/(list-view checkboxes) - remove no-longer-used feature flag
TROVESEARCH_DENORMILY - add better "end to end" search-api tests
- further move on from SHAREv2...
- delete sharev2 ingestion pipeline
- share/harvest/*
- share/harvesters/*
- share/metadata_formats/*
- share/regulate/*
- share/schema/*
- share/sources/*
- share/tasks/*
- except
schedule_index_backfill-- moved toshare.models.index_backfillfor now
- except
- share/transform/*
- share/transformers/*
- anything deactivated by the
ignore_sharev2_ingestfeature flag
- delete (some) sharev2 db models/tables from share/models/...
- core.py: NormalizedData, FormattedMetadataRecord
- jobs.py: HarvestJob
- registration.py: ProviderRegistration
- sources.py: SourceStat
- delete (some) sharev2 api
- /api/v2/formattedmetadatarecords/...
- /api/v2/normalizeddata/...
- note: sharev2 “push” is a POST to this endpoint -- replaced by /trove/ingest
- /api/v2/sourceregistrations/...
- /api/v2/schemas/...
- /api/v1/share/data
- delete
sharectl(share/bin/*)- prefer django management commands, for now
- add management commands
- shtrove_indexer_run (replaces
sharectl search daemon) - shtrove_search_setup (replaces
sharectl search setup) - shtrove_search_teardown (replaces
sharectl search purge) - delete_pretrove_data (for letting go of some past)
- shtrove_indexer_run (replaces
- remove special ember-share handling (for local dev)
- remove all dead code and requirements easily removed
- update github actions flow (with more accurate code coverage)
- reduce wasteful text-field indexing (better this time)
- on the share-admin search-indexes page:
- require typed confirmation when deleting indexes
- allow deleting way more indexes
- (for osf search ui) update link description in osfmap
- update
IndexStrategyto allow multiple indexes within a strategy trovesearch_denormindex strategy updates:- multiple indexes: one for card-search, one for value-search on iri values
- skip indexing some text fields (e.g.
*.identifier, glob-paths of depth > 1)
- reduce wasteful computing (fewer queries, less hashing)
- add to metadata for
osfmap:affiliation - improve local setup, perhaps
- update calendar version to
25, reset semantic versions to0 - trove-search api:
- support jsonapi
fields[TYPE]query params; see https://jsonapi.org/format/#fetching-sparse-fieldsets - when
TYPEinfields[TYPE]matches the value of acardSearchFilter[resourceType]query param, interpret the given fields as shorthand property-paths and use for custom csv/tsv columns - streaming "simple json" rendering (
acceptMediatype=application/json) - when sorting by integer values, treat missing values as zero (tho there may be future times this is wrong...)
- support jsonapi
- allow rendering search responses as downloadable CSVs/TSVs
- add, reshape renderer output types
- more stable indexer daemon
trovesearch_denormindexing tweaks:- move iri-value delete_by_query into followup task
- suggest
affiliationinstead ofcreator.affiliationfor osf:Preprint searches - local docker-compose: give worker access to elasticsearch
- fix(trovesearch_denorm): keep iris whole in path-based fieldnames
- ignore trailing slashes on iri values
- have more than one shard and replicas
- fix for M chips with docker
- add "subject" related property for
cardSearchFilter[resourceType]=Project - allow "supplementary" metadata records
- allow expiration date on metadata records
- osfmap: add properties with shorthands
- add
trovesearch_denormindex strategy (more denormalized for better scaling) - dependency updates
- many tests
- specific exception classes within
trove - better search api error responses
- better search-api html experience
- more static vocabs
- fix various errors
- fix: jsonapi renderer now chooses
typeconsistently
- speed up oai-pmh queries
- improve trove simple-json and html experience
- add "simple json" renderer for search api responses
- update django to 3.2.25
- fix oai-pmh feed
- add
osfmap:hasCedarTemplateto trove.vocab
- fix: allow date literals for legacy sharev2_elastic deriver
- add docs:
- /trove/docs/openapi.json
- /trove/docs/openapi.html
- /vocab/2023/trove/...
- allow adding propertypaths to
cardSearchTextandvalueSearchText- e.g.
cardSearchText[creator.name]=...
- e.g.
- anywhere a set of propertypaths is encoded in query params, allow
simple glob-paths ("", ".", "..") that match any propertypath
of the given length
- note: partial globs (e.g. ".name" or "publisher.") are not supported (...yet?)
- when an iri value returned by an index-value-search has a full index-card, include that index-card instead of the stub built from indexed values
- friendlier FeatureFlag admin list
- BREAKING: allow multiple propertypaths in query params
- use
.to delimit steps in a path; e.g.creator.affiliationis a path of two steps (previously would becreator,affiliation) - use
,to delimit multiple paths; e.g.creator.name,contributor.namewould be two paths (previously impossible) - hidden behind feature flag:
periodic_propertypaths
- use
- add missing OSFMAP shorthands
- fix: in
index-card-search, do not show "next" link when no results
- more consistent pagination over randomly ordered results
- correct test setup for
trove_indexcard_flats
- skip "first" link from first page
- disable pagination on large, randomly-sorted result sets
- more efficient random sort (for sorting by relevance to nothingness)
- remove
trove_indexcard(fully replaced bytrove_indexcard_flats) trove_indexcard_flatsupdates:- log search queries when in DEBUG mode
- disable "unnamed filter values" aggregations (expensive and yet unused)
- fix:
trove_indexcard_flatswould clobber some iri values while flattering - skip indexing cards that don't have
osfmap_json - more gracefully handle erroneously circular
skos:Concepthierarchies
- lil optimization to skip unhelpful aggregations
- disable tests using elasticsearch5 on github actions
- (will soon reenable or remove elastic5 altogether)
- add
trove_indexcard_flatsindex strategy- copy of
trove_indexcardwith flatter queries (and more info on the root doc)
- copy of
- fix: allow more than 11 related properties on an
index-card-searchto have non-zero count
- small improvements to
trove_indexcardindex strategy- skip indexing metadata with
osfmap:containsin the path (don't index file metadata with its container) - better consolidate
nested_irito reduce number of nested docs
- skip indexing metadata with
- introducing "trove"
- store metadata records as small rdf documents called "index cards"
- ingest rdf
- add iri-centric search
- "shtrove": working to preserve back-compat (because trove may be trouble)
- make
SourceConfig.disabledpreventharvesttasks running
- downgrade to python 3.10 (for now)
- improve logging
- replace
raven(deprecated) withsentry-sdk - add logging formatter for json with
severity(for logging in deployments)
- replace
- remove squashed migrations, dead code
- fix a typo
- admin interface: allow re-ingesting all data for a source config
(see "ingest" buttons at
/admin/share/sourceconfig/) - address possible cause of some backfill gaps
- fix logging errors
- upgrade to python 3.11
- upgrade to elasticsearch 8
- add
share.search.index_strategyto act as a slippery abstraction layer between search-engine backend and planned friendly search api- configure two index strategies (and make it easy to add more in the future):
sharev2_elastic5: the existing/legacy SHAREv2 search index as exists on elasticsearch5 and exposed via/api/v2/search/creativeworks/_searchsharev2_elastic8: a mirror/replacement forsharev2_elastic5with all the same_sourcedocs (but possible incompatibilities for the existing pass-thru api)
- configure two index strategies (and make it easy to add more in the future):
- add a happy-path index-backfill workflow to the admin interface at
/admin/search-indexes- when changing index-strategy settings/mappings/whatever, the "happy path" is to create, backfill, verify a new copy of the index; then switch which is used for searching, verify again, and finally delete the old index.
- not intended to have the power of a full elasticsearch management interface -- just enough visibility to see whether things are going ok and where to start looking if something goes wrong
- for testing, support
indexStrategyquery param to/api/v2/search/creativeworks/_search,/api/feeds/rss,/api/feeds/atom- may request a configured strategy (e.g.
indexStrategy=sharev2_elastic8) or a specific version of an index within a strategy (e.g.indexStrategy=sharev2_elastic8__bcaa90e8fa8a772580040a8edbedb5f727202d1fca20866948bc0eb0e935e51f)
- may request a configured strategy (e.g.
- add
FeatureFlagmodel, use it to switch default search strategy (name="elastic_eight_default")
- add
suidvalue tosharev2_elasticindex
- easy additive elastic mapping changes
- add
osf_related_resource_typesfield - dockerfile updates
- update raven
- update and consolidate docs
- audit and upgrade all dependencies
- switch to github actions for tests/ci
- fix: feeds should not break on null date_published
- fix: oai_dc formatter breaks on deletions
- big rend! remove many things:
- concepts:
- merging data from multiple sources together (aiming instead for a simple, robust repository of metadata records -- let's talk later/soon about how we might do merging well)
- models:
ShareObjectand all its descendentsShareObjectVersionand all its descendentsChangeChangeSetSubjectTaxonomyUnusedCeleryProviderTaskUnusedCeleryTask
- api routes:
- all auto-generated
ShareObjectroutes (e.g./api/v2/creativeworks/) - all
schemaroutes (except the root/api/v2/schema/)- auto-generated schema routes (e.g.
/api/v2/schema/disputes/) - work type hierarchy (
/api/v2/schema/creativeworks/hierarchy/)
- auto-generated schema routes (e.g.
/api/v2/graph/
- all auto-generated
- concepts:
- admin features/improvements
- add FormattedMetadataRecord admin
- when investigating a problem, start by finding the suid and navigate relationships from there
- add action to delete all FormattedMetadataRecords for some chosen suid(s) (good for spam control)
- fix a 500 error at
/api/v2/ - fix sending useful debugging info to sentry
- make the oai-pmh feed respect switch-flipping
- give an accurate
date_createdin sharev2_elastic formatter - fix admin bug -- don't hide the search box
- add django-debug-toolbar to dev dependencies
- tidy up some admin inefficiencies
- expose a few models in read-only json:api, so the frontend can be useful given a suid
/api/v2/formattedmetadatarecords//api/v2/sourceconfigs//api/v2/suids/
- add new atom/rss feeds that get results from the new backcompat index
/api/v2/feeds/atom//api/v2/feeds/rss/- (old feeds now deprecated, will be gone with ShareObject)
- add
--pls-reingestarg to format_metadata_records command
- fix: facility != funder (in gov.clinicaltrials transformer)
- remove feature: oai_dc formatter no longer puts first author last
- add utility:
share.util.names.get_related_agent_namefor consistently getting an agent name from an "agent-work relation" node- if missing both
cited_asandname(true of some old, unregulated production data), reluctantly apply some cultural assumptions and build a name from parts (given_name,additional_name,family_name,suffix)
- if missing both
- bugfix: in share.util.graph, handle merging nodes with dictionary values
- bugfix: when formatting oai_dc, strip characters illegal in XML
- when regulating, discard gravatars as agent identifiers
- bugfix: deduping subjects in custom taxonomies
- fix up
populate_osf_suidswith more useful messaging - improve "central node" guessing to handle old osf data on prod
- speed up
populate_osf_suids-- excludeNormalizedDatawith nullraw, since they'll be ignored anyway
- fix
populate_osf_suidsscript to handle fun situations
- new model:
FormattedMetadataRecord - new sharectl commands:
sharectl search purgesharectl search setup <index_name>sharectl search setup --initialsharectl search set_primary <index_name>sharectl search reindex_all_suids <index_name>
- new management commands:
format_metadata_recordspopulate_osf_suids
- new doc:
README-docker-quickstart.md-- the easy way to get started - define the "share schema" statically (in
share.schema)- stop inferring everything from the
ShareObjectmodels
- stop inferring everything from the
- add a parallel ingestion path, preparing for a future without
ShareObject- use only the most recent
NormalizedDatafor each suid (no merging) - allow explicitly stating the suid when pushing a
NormalizedData- if not specified, try looking for an OSF guid
- build a
FormattedMetadataRecordfor each metadata format - currently two metadata formatters (and room for more):
sharev2_elastic: for a back-compatible elasticsearch index -- builds a document just likeshare.search.fetchers.CreativeWorkFetcher, but from aNormalizedDatainstead of all theShareObjecttablesoai_dc: dublin core XML, for the OAI-PMH feed
- use only the most recent
- indexer daemon overhaul
- assorted cleanup; dead/useless code removal
- add
ElasticManagerto encapsulate all requests sent to elasticsearch - add
IndexSetupconcept to describe how to get/build documents for an index and what messages to send to that index's daemon - currently two index setups:
share_classic: index byAbstractCreativeWorkid, using existingshare.search.fetcherslogicpostrend_backcompat: index bySourceUniqueIdentifierid, using thesharev2_elasticFormattedMetadataRecords
- add a parallel OAI-PMH that uses
FormattedMetadataRecordwithoai_dc- remains dormant for the moment -- enable with
pls_trovequery param - NOTE: when we switch over, OAI-PMH datestamps will all be new and recent
- remains dormant for the moment -- enable with
- admin updates:
- search
IngestJobby suid value
- search
- Add a decorator for marking views deprecated
- Mark some views deprecated
- Sources added via API default to canonical
- Automatically schedule
ingesttasks after harvesting - Schedule
ingesttasks in adminreenqueueaction - Pin
fakerto 4.0.3 - Update
.travis.yml - Fix bug in
io.osf.registrationstransformer
- Ensure order in oai-pmh
- Exclude frankenworks from oai-pmh
- Reduce oai-pmh page size
- Pin
graphql-relayto a compatible version
- Dockerfile fixes & improvements
- Optimize oai-pmh endpoint to avoid timeouts
- Add
reindex_worksshell util
- Pin python-dateutil to a version that doesn't break tests (2.8.0)
- Temporarily (i hope) skip tests broken by 19.0.5
- Temporary fix to avoid slow IngestJob queries
- Possibly fix a rare forceingest error
- Skip indexing works with too many agent relations
- Make the indexer more configurable by environment variables
- Fix indexer deadlock
- Allow turning off ingestion (but not harvest) for non-canonical sources
- Ingestion perf improvements (faster attr access in MutableGraph)
- Handle indexer errors better
- Ingestion perf improvements
- Update
requestsdependency
- Make it easier to reingest all OSF data
- Fix worker out of memory errors
- Update nameparser dependency
- Add datacite oai-1.1 schema namespace
- Fix common datacite transform errors
- Update django to 1.11.16
- Clean up disambiguation logic to make extending it less painful
- Extend disambiguation to match contributors with different name formats
- Rename
fixpreprintdisambiguationscommand toforceingest- Handle more complex merges
- Improve error message for transformer errors
- Fix OSF registration transformer
- Update NSF harvester to look farther into the past
- Fix a bug in the OSF project harvester
- Fix --osf-only flag in fix_datacite command
- When a job is marked "skipped", not even
superfluouswill re-run it
- All retried jobs should be marked "rescheduled"
- Harvest jobs that are retried when the same source is already being harvested should be marked "rescheduled" rather than "failed"
- Handle OSF harvest errors gracefully
- Pin kombu to 4.1.0
- Harvest all set specs from CSIC
- Allow sorting Atom feed by
date_createdanddate_published - Don't create unnecessary source configs for each new source
- Update pytest-django dependency to avoid version conflict
- Fix bug in indexer daemon, stop all threads when one dies
- Fix typo in
sharectl ingestthat prevented bulk reingestion
- Fix date range filtering in com.figshare.v2 harvester
- Bulk reingestion with
IngestScheduler.bulk_reingest()andsharectl ingest - Admin interface updates
- More stable and reliable indexer daemon
- "Urgent" queues for ingestion and indexing, allowing pushed data to jump ahead of harvested data
- Various source config updates
- Fix PeerJ transformer error
- Prevent infinite task loop for certain types of errors
- Update raw data janitor to skip over datums from disabled/deleted sources
- Fix bug in fixpreprintdisambiguations command
- Fix a broken test
- Fix some time-sensitive tests
- Add IngestJob, used to keep track of a RawDatum's ingestion status
- Exposed in API at
/api/v2/ingestjobs/ - In the response to pushed data, include a link to the IngestJob
- Exposed in API at
- Rename HarvestLog to HarvestJob
- Combine
transformanddisambiguatetasks intoingesttask - Catch all errors caused by bad input data, store them on the IngestJob
- Add Regulator, a place to put logic/transforms/validation that should run on all data, regardless of source
- Fix: Prevent indexer daemon threads from exiting when elasticsearch times out
- Map work relation types in MODS transformer
- Update edu.utah source config to include more approved sets
- Update edu.umassmed source config to use HTTPS
- Update pendulum dependency to avoid infinite janitor loop
- Fix elasticsearch_janitor task
- Expect (and give) str arguments, avoiding error
- Use the indexer daemon by default
- Speed up update_elasticsearch task:
- Don't count the works just for a log message
- Use the indexer daemon by default, instead of index_model tasks
- Only run one update_elasticsearch task at a time
- Add --delete-related and --superfluous flags to
enforce_set_lists - Improve script output by including ids in ShareObject.repr
- Devops updates for new environment
- Actually speed up OAI feed
- Speed up OAI feed when filtering by
set - Delete merged works with no identifiers in
fixpreprintdisambiguations
- Allow omitting arXiv from
fix_datacitescript
- Add parameters to
fix_datacitescript
- Use normalized agent name in Atom feed, instead of
cited_as - Update psycopg dependency
- Type map for Columbia Academic Commons (edu.columbia)
- Type map for University of Cambridge (uk.cambridge)
- Allow reading/writing
Source.canonicalat/api/v2/sources/ - Include
<author>in atom feed at/api/v2/atom/ - ScholarsArchive@OSU source config for their new API
- Prevent OSF harvester from being throttled
- Update NSFAwards harvester/transformer to include more fields
- Use request context to build URLs in the API instead of SHARE_API_URL setting
- Stop displaying
localhost:8000links
- Stop displaying
- Add
--fromparameter tofixpreprintdisambiguationsmanagement command
- Support for set blacklists for sources that follow OAI-PMH protocol
enforce_set_listscommand to enforce set blacklist and whitelist
- Set whitelist for UA Campus Repository
- Support for encrypted json field and start using it in SourceConfig model
- Enable Coveralls
- Include work lineage (based on IsPartOf relations) in the search index payload
- Add
selflinks to objects returned by the API
- Collect metadata in MODS format from UA Campus Repository
- Update columbia.edu harvester source config (disabled set to false)
- Improve creating Sources at
/api/v2/sources/- Use POST to create, PATCH to update
- Respond with sensical status codes (409 on name conflict, etc.)
- Backfill CHANGELOG.md to include
2.10.0and2.11.0 - Correctly encode &, <, > characters in the Atom feed
- Avoid DB connection leak by disabling persistent connections
editsubjectsmanagement command to modifyshare/subjects.yaml
- Replace
share/models/subjects.jsonwithshare/subjects.yaml - Update central subjects taxonomy to match Bepress' 2017-07 update
- Symbiota as a source
- AEA as a source
- Used django-include for a faster OAI-PMH endpoint
- Updated regex for compatibility with Python 3.6
- University of Arizona as a source
- NAU Open Knowledge as a source
- Started collecting analytics on source APIs (response time, etc.)
- Support for custom taxonomies
- sharectl command line tool
- Profiling middleware for local development
- Janitor tasks to find and process unprocessed data
- Timestamp field to RawData
- Mendeley Harvester!
- Started to use deprecation warning
- Timeouts for harvests
- The concept of "Bots"
- A lot of dead code
- A GPL licenced library
- Upgraded to Celery 4.0
- Deleted works now return 403s from the API
- Deleted works are now excluded from the API
- Corrected to date fields used to audit the Elasticsearch index
- Strongly defined the Harvester interface
- Harvests are now scheduled in a more friendly manner
- Updated the configurations for many OAI sources
- HarvestLogs no longer get stuck in progress
- Text parsing transformer utilties
- MODS transformer looks at the location field in addition to other fields for a work identifier
- Elasticsearch Janitor task to keep Postgres and ES in sync
- Concurrently added indexes
- Admin updates to allow quicker fixing of broken data
- More test coverage
- Elasticsearch's scroll API explicitly disabled
- Upgraded to Django 1.11
- Elasticsearch now pulls last_modified from itself rather than Postgres
- API pagination no longer times out on large collections
- Timestamps are now included in the ATOM feed
- OAI endpoint
- Sources
- OpenBU
- Updated documentation
- Sources
- A table for managing SHARE data sources
- Replaces the apps in the providers folder
- SourceConfigs
- A table for managing different methods of acquire data from given source
- Replaces nested apps/app labels
- HarvestLogs
- First class support for managing harvesting/back harvesting
- Source Unique Identifiers
- First class representation of what was RawData.provider_doc_id
- The Django admin now supports starting harvesters over long periods of time
- Support for the MODs OAI PHM prefix
- Provider Django applications have been removed
- Source specific fields have been removed from ShareUser
- Harvesters have been relocated to share/harvesters/
- Various renaming/vocabulary changes
- RawData -> RawDatum
- Favicon -> Icon
- Provider -> Source
- Provider App -> SourceConfig
- Normalizer -> Transformer
- Updates to the getting started guide
- Squashed migrations to speed up local development
- Harvesters are now expected to return utf-8 strings
- Sources are no longer tied to the ShareUser model
- Title now has an "exact" multi-field in elasticsearch
- A robot that archives old succeeded celery jobs
- New Harvesters
- Scholarly Commons @ JMU
- Compensate for potential race conditions with the push API
- New Harvesters
- Research Registry Harvester
- SSOAR
- Status API endpoint
- Updated set_specs for University of Kansas
- ClinicalTrials.gov now output registrations
- Source icons are now stored in the database
- Removed "Notify" from the page title in the browsable API
- Support for OSF Registries
- New Harvesters
- University of Utah
- Updated the API
- Improved Elasticsearch mappings
- Updated NIH and NSFAwards
- Affiliations are now gathered
- Non-Unique URLs are no longer collected
- Lots of under the hood changes to make dev's lives easier
- New Harvesters
- es.csic
- edu.purdue.epubs
- Site status banners
- Retraction harvesting
- A little bit of documentation
- OAuth login failure pages look nice now
- Cascade deletes are now implemented as database cascades
- New Harvesters
- edu.cornell
- edu.richmond
- edu.scholarworks_montana
- edu.ucf
- edu.umd
- edu.utahstate
- org.seafdec
- Relations between creative works
- Updated harvesters
- Figshare v2 API
- PeerJ XML API
- Pubmed PMC prefix
- Datacite 4.0
- BePress Taxonomy for subjects
- Travis now uses postgres 9.5
- Comprehensive test suite for normalization and disambiguation
- Updated data model
- More expressive relations between people/organizations and works
- Type hierarchies
- Creative works: Publication, Preprint, DataSet, Patent, Thesis, Software, etc.
- Agents: Person, Organization, Institution, Consortium
- More aggressive and intelligent data parsing
- Stricter validation of incoming data
- Prune duplicate objects from submitted changesets
- Various bug fixes
- Formalized disambiguation methods
- App bootstrap time improved by 4x
- Better elasticsearch mappings
- URI may now be searched/matched directly
- Prettier table names
- Backport of the V1 push API
- New and improved source registration form
- JSON schema endpoint
- New sources
- College of William and Mary
- University of Wisconsin