Release 0.4.4 · deepset-ai/FARM

ELECTRA Model

We welcome a new language model to the FARM family that we found to be a really powerful alternative to the existing ones. ELECTRA is trained using a small generator network that replaces tokens with plausible alternatives and a discriminative model that predicts which learns to detect these replaced tokens (see the paper for details: https://arxiv.org/abs/2003.10555). This makes pretraining more efficient and improves down-stream performance for quite many tasks.

You can load it as usual via

LanguageModel.load("google/electra-base-discriminator")

See HF's model hub for more model variants

Natural Questions Style QA

With QA being our favorite and focussed down-stream task, we are happy to support an additional style of QA in FARM ( #334). In contrast to the popular SQuAD-based models, these NQ models support binary answers, i.e. questions like "Is Berlin the capital of Germany?" can be answered with "Yes" and an additional span that the model used as a "supporting fact" to give this answer.

The implementation leverages the option of prediction heads in FARM by having one QuestionAnsweringHead that predicts a span (like in SQuAD) and one TextClassificationHead that predicts what type of answer the model should give (current options: span, yes, no, is_impossible).

Example:

    QA_input = [
        {
            "qas": ["Is Berlin the capital of Germany?"],
            "context":  "Berlin (/bɜːrˈlɪn/) is the capital and largest city of Germany by both area and population."
        }
    ]
    model = Inferencer.load(model_name_or_path="../models/roberta-base-squad2-nq", batch_size=batch_size, gpu=True)
    result = model.inference_from_dicts(dicts=QA_input, return_json=False)
    print(f"Answer: {result[0].prediction[0].answer}")

   >> Answer: yes

See this new example script for more details on training and inference.

Note: This release includes the initial version for NQ, but we are already working on some further simplifications and improvements in #411 .

New speed benchmarking

With inference speed being crucial for many deployments - especially for QA, we introduce a new benchmarking tool in #321. This allows us to easily compare the performance of different frameworks (e.g. ONNX vs. pytorch), parameters (e.g. batch size) and code optimizations of different FARM versions.
See the readme for usage details and this spreadsheet for current results.

A few more changes ...

Modeling

Add support for Camembert-like models #396
Speed up in BERTLMHead by doing argmax on logits on GPU #377
Fix bug in BERT-style pretraining #369
Remove additional XLM-R tokens #360
ELECTRA: use gelu for pooled output of ELECTRA model #364

Data handling

Option to specify text col name in TextClassificationProcessor and RegressionProcessor #387
Document magic data loading in TextClassificationProcessor PR #383
multilabel support for data_silo.calculate_class_weights #389
Implement Prediction Objects for Question Answering #405
Removing lambda function from AdaptiveModel so the class can be pickable #345
Add target device optimisations for ONNX export #354

Examples / Docs

Add script to reproduce results from COVID-QA paper #412
Update tutorials #348
Docstring Format fix #382

Other

Adjust code to squad inferencing #367
Convert pydantic objects to regular classes #410
Rename top n recall to top n accuracy #409
Add test for embedding extraction #394
Quick fix CI problems with OOM and unclosed worker pool #406
Update transformers version to 2.11 #407
Managing pytorch pip find-links directive #393
Zero based Epoch Display in Training Progress Bar #398
Add stalebot #400
Update pytorch to 1.5.0 #392
Question answering accuracy test #357
Add init.py files for farm.conversion module #365
Make onnx imports optional #363
Make ONNXRuntime dependency optional #347

👨‍🌾 👩‍🌾 Thanks to all contributors for making FARMer's life better!
@PhilipMay , @stefan-it, @ftesser , @tstadel, @renaud, @skirdey, @brandenchan, @tanaysoni, @Timoeller, @tholor, @bogdankostic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.4.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

ELECTRA Model

Natural Questions Style QA

New speed benchmarking

A few more changes ...

Uh oh!