ServiceNow · bigximik · Jun 20, 2025 · Jun 20, 2025 · Jun 20, 2025 · Jun 20, 2025
diff --git a/docs/user_guide/evaluators.md b/docs/user_guide/evaluators.md
@@ -0,0 +1,92 @@
+# Evaluations
+
+Fast-LLM allows you to perform various evaluations during training or as a separate evaluation step. In both cases, you need to use your training config with `training.evaluators` specified.
+
+For evaluators used during training, both `interval` and `offset` must be specified. Then, start training as usual with:
+
+`fast-llm train gpt --config path/to/training/config.yaml`
+
+To perform evaluation as a separate step, use the same training config. Depending on the training progress, either the start model or the latest checkpoint will be loaded, and `interval` and `offset` will be ignored. To start evaluation:
+
+`fast-llm evaluate gpt --config path/to/training/config.yaml`
+
+## Currently Supported Evaluators
+
+- `loss`
+- `lm_eval`
+
+## Loss Evaluator
+
+To set up loss evaluation, specify a dataset to be used in the `data.datasets` section of the config. You must also define the loss evaluator in the `training.evaluators` config section. See example below.
+
+```yaml
+training:
+  evaluations:
+    stack_3b:
+      interval: 10
+      evaluator:
+        type: loss
+        iterations: 10
+        dataset_name: stack_3b
+    fineweb:
+      evaluator:
+        type: loss
+        iterations: 10
+        dataset_name: stack_3b
+      interval: 10
+data:
+  datasets:
+    stack_3b:
+      type: memmap
+      path: path/to/memmap/dataset
+    fineweb:
+      type: memmap
+      path: path/to/memmap/dataset1
+```
+
+## Evaluation Harness (`lm_eval`) Evaluator
+
+**Note:** Only data parallelism is currently supported for the `lm_eval` evaluator.
+
+To run `lm_eval` evaluations, version `0.4.9` of `lm_eval` must be installed along with all dependencies required for your evaluation tasks.
+
+The following environment variables may need to be set:
+
+- `HF_HOME`: Path for Hugging Face data caching
+- `WANDB_API_KEY_PATH`: Path to a file containing your Weights & Biases API key (if logging to W&B)
+- `HUGGINGFACE_API_KEY_PATH`: Path to a file containing your Hugging Face hub token
+- `NLTK_DATA`: Path to a directory that will contain downloaded NLTK packages (needed for some tasks)
+- `HF_ALLOW_CODE_EVAL=1`: Required for some evaluation tasks
+
+You may need to specify additional environment variables depending on the `lm_eval` tasks you want to run.
+
+To specify an `lm_eval` task, the evaluator config includes the following fields:
+
+### Model Config
+
+The model instantiated for training is reused for evaluation, so you don't need to specify it separately. However, there are some parameters specific to `lm_eval`. See `fast_llm/engine/evaluation/config.EvaluatorLmEvalConfig` for details.
+
+### CLI Parameters for `lm_eval`
+
+All other parameters are specified as if you were calling the `lm_eval` CLI, using a list of strings. Some CLI parameters are ignored or restricted—specifically those related to model loading, W&B, batch sizes, and device setup, as these are managed by the rest of the Fast-LLM configuration.
+
+Also, the tokenizer must be specified in `data.tokenizer`. If the tokenizer does not have a `bos_token`, it must be specified explicitly in `data.tokenizer.bos_token`. Although `lm_eval` does not use the `bos_token` directly, it is still required because the same tokenizer is used by other Fast-LLM components.
+
+Below is an example of the config:
+
+```yaml
+training:
+  evaluations:
+    lm_eval_tasks1:
+      interval: 10
+      evaluator:
+        type: lm_eval
+        cli_args:
+          - --tasks
+          - gsm8k,xnli_en,wikitext,ifeval
+          - --output_path
+          - /path/to/lm_eval/output
+data:
+  tokenizer:
+    path: path/to/the/tokenizer
+```
diff --git a/fast_llm/cli.py b/fast_llm/cli.py
@@ -1,4 +1,5 @@
 import logging
+import os
 import sys
 import traceback
 
@@ -7,6 +8,16 @@
 from fast_llm.engine.config_utils.run import log_main_rank
 from fast_llm.engine.config_utils.runnable import RunnableConfig
 
+# This must be set before importing numexpr,
+# because by default, the maximum number of threads is 64.
+# On systems with more cores, numexpr logs an error and
+# ignores the thread setting if it exceeds the limit.
+if "NUMEXPR_MAX_THREADS" not in os.environ:
+    import multiprocessing as mp
+
+    os.environ["NUMEXPR_MAX_THREADS"] = str(mp.cpu_count())
+
+
 # Import these submodules to ensure classes are added to the dynamic class registry.
 import fast_llm.data.auto  # isort: skip
 import fast_llm.engine.checkpoint.convert  # isort: skip