You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: Dynamic max_answers for SquadProcessor (fixes IndexError when max_answers is less than the number of answers in the dataset) (#4817)
* #4320 implemented dynamic max_answers for SquadProcessor, fixed IndexError when max_answers is less than the number of answers in the dataset
* #4320 added two unit tests for dataset_from_dicts testing default and manual max_answers
* apply suggestions from code review
Co-authored-by: bogdankostic <bogdankostic@web.de>
* simplify comment, fix mypy & pylint errors, fix old test
* adjust max_answers to each dataset individually
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
:param max_query_length: Maximum length of the question (in number of subword tokens)
404
403
:param proxies: proxy configuration to allow downloads of remote datasets.
405
404
Format as in "requests" library: https://2.python-requests.org//en/latest/user/advanced/#proxies
406
-
:param max_answers: number of answers to be converted. QA dev or train sets can contain multi-way annotations, which are converted to arrays of max_answer length
405
+
:param max_answers: Number of answers to be converted. QA sets can contain multi-way annotations, which are converted to arrays of max_answer length.
406
+
Adjusts to maximum number of answers in the first processed datasets if not set.
407
+
Truncates or pads to max_answer length if set.
407
408
:param kwargs: placeholder for passing generic parameters
408
409
"""
409
410
self.ph_output_type="per_token_squad"
@@ -469,12 +470,19 @@ def dataset_from_dicts(
469
470
# Split documents into smaller passages to fit max_seq_len
0 commit comments