Skip to content

ReCiter 3.0

Latest
Compare
Choose a tag to compare
@mrj4001 mrj4001 released this 13 Mar 16:05
· 215 commits to master since this release
0cc74bc

ReCiter 3.0 Release Notes

Enhanced Scoring Methodology

In previous versions (ReCiter 2.0 and earlier), publication scoring relied heavily on identity-based methods and straightforward weighting, which occasionally failed to adequately reflect nuanced affiliations or feedback-driven importance. This method limited our ability to dynamically prioritize publications based on user-submitted feedback.

In version 3.0, we've introduced a significant enhancement by employing sigmoid functions to calculate attribute subscores dynamically based on user feedback. For example, if an author has even a small number of accepted publications with a particular affiliation not listed in institutional source systems, subsequent candidate publications with that affiliation will receive higher weighting. The more publications accepted with the same affiliation, the higher the weighting.

Attributes scored via sigmoid functions now include:

  • Target Author Name

  • Email

  • Institution

  • Organization

  • ORCID

  • ORCID Co-author

  • Co-author ORCID

  • Journal

  • Keyword

New Signals Incorporated:

  • Year of Publication: Candidate articles published before the earliest accepted article will now be increasingly penalized.

  • Count of Accepted Publications: Enhances relevance scoring based on previously accepted articles.

  • Count of Rejected Publications: Improves accuracy by considering articles previously rejected.

  • Author Count: Adjusts scoring by accounting for the increased uncertainty associated with publications having a higher number of authors.

  • Relationship Scoring: Enhances the accuracy by better utilizing the number of known relationships compared to the total number of co-authors. Additionally, first name matching is now required to be explicit and detailed.

  • Penalty for Inferred Target Authors: Added a penalty in cases where there have been 0 or 2+ target authors inferred, addressing a common source of false positives.

Neural Network Integration:

All attribute subscores, along with legacy identity-based scores, now feed into an advanced neural network model, significantly enhancing system accuracy. We have developed two distinct neural network models:

  • Feedback-Driven Model: Activated when feedback is available.

  • No-Feedback Model: Engaged when no prior feedback exists for the author.

These neural networks were fine-tuned through iterative experimentation, leading to an optimized model configuration delivering superior accuracy compared to previous methods.

Additional Improvements:

  • Improved Performance: Enhanced overall system performance by optimizing lookup processes and addressing inefficiencies.

  • No Results Fix: Previously, if a user's name did not exist in the eSearch API, results incorrectly defaulted to the first initial search (e.g., "M[au]"). This issue has been resolved for strict searches, lenient searches, and searches involving compound names.

  • Identity Checks: Added checks ensuring mandatory fields—firstName, lastName, and firstInitial—are required in the identity object.

  • Docker Hub Credentials: Included Docker Hub credentials in the Dockerfile to avoid the "image pull limit" error.

  • Degree Year Discrepancy Score: Improved the logic and effectiveness of the Degree Year Discrepancy scoring.

Related Repositories:

To fully utilize ReCiter 3.0, you must update the following related repositories:

This update marks a major step forward in refining publication matching accuracy and significantly boosts the effectiveness of user feedback within ReCiter.