Skip to content

Remove document similarity alternative ranking script #427

@marekhorst

Description

@marekhorst

Some time ago an alternative approach to ranking operation was introduced:

https://github.com/CeON/CoAnSys/blob/298863befc2f0e3a96b25a9ee53f6b53b41090a6/document-similarity/document-similarity-logic/src/main/pig/document-similarity-s1-ship-rank_filter.pig

involving custom rank operation written in rank.py script introduced in 318d88c commit.

An alternative oozie execution path could be selected by enabling load_filterTerms_calcTfidf_filter_ship_ranked flag.

This was a solution to memory related issues related to PIG embedded rank operation. In fact this may have been caused by the very same reason as the one causing #425.

The thing is as soon as #425 is fixed and PIG embedded rank operator works properly we can get rid of this alternative path.

It is useless anyway because it causes failure at later docsim stage. Probably both ranking related PIG scripts diverged at some point and an alternative one is not fully compliant with main one.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions