You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 22, 2025. It is now read-only.
FIX: Ignore captions and quotes when detecting locale and update prompts (#1483)
A more deterministic way of making sure the LLM detects the correct language (instead of relying on prompt to LLM to ignore it) is to take the cooked and remove unwanted elements.
In this commit
- we remove quotes, image captions, etc. and only take the remaining text, falling back to the unadulterated cooked
- and update prompts related to detection and translation
- /152465/12
Copy file name to clipboardExpand all lines: lib/personas/post_raw_translator.rb
+9-20Lines changed: 9 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -9,20 +9,20 @@ def self.default_enabled
9
9
10
10
defsystem_prompt
11
11
<<~PROMPT.strip
12
-
You are a highly skilled translator tasked with translating content from one language to another. Your goal is to provide accurate and contextually appropriate translations while preserving the original structure and formatting of the content. Follow these instructions carefully:
12
+
You are a highly skilled translator tasked with translating content from one language to another. Your goal is to provide accurate and contextually appropriate translations while preserving the original structure and formatting of the content. Follow these instructions strictly:
13
13
14
-
1. Translate the content accurately while preserving any Markdown, HTML elements, or newlines.
14
+
1. Preserve Markdown elements, HTML elements, or newlines. Text must be translated without altering the original formatting.
15
15
2. Maintain the original document structure including headings, lists, tables, code blocks, etc.
16
16
3. Preserve all links, images, and other media references without translation.
17
-
4. Handle code snippets appropriately:
18
-
- Do not translate variable names, functions, or syntax within code blocks (```).
19
-
- Translate comments within code blocks.
20
-
5. For technical terminology:
17
+
4. For technical terminology:
21
18
- Provide the accepted target language term if it exists.
22
19
- If no equivalent exists, transliterate the term and include the original term in parentheses.
23
-
6. For ambiguous terms or phrases, choose the most contextually appropriate translation.
24
-
7. Do not add any content besides the translation.
25
-
8. Ensure the translation only contains the original language and the target language.
20
+
5. For ambiguous terms or phrases, choose the most contextually appropriate translation.
21
+
6. Ensure the translation only contains the original language and the target language.
22
+
23
+
Follow these instructions on what NOT to do:
24
+
7. Do not translate code snippets or programming language names, but ensure that any comments within the code are translated.
25
+
8. Do not add any content besides the translation.
26
26
27
27
The text to translate will be provided in JSON format with the following structure:
28
28
{"content": "Text to translate", "target_locale": "Target language code"}
@@ -62,17 +62,6 @@ def examples
62
62
}.to_json,
63
63
{translation: "Nueva actualización para Minecraft añade templos submarinos"}.to_json,
64
64
],
65
-
[
66
-
{
67
-
content:
68
-
"# Machine Learning 101\n\nMachine Learning (ML) is a subset of Artificial Intelligence (AI) that focuses on the development of algorithms and statistical models that enable computer systems to improve their performance on a specific task through experience.\n\n## Key Concepts\n\n1. **Supervised Learning**: The algorithm learns from labeled training data.\n2. **Unsupervised Learning**: The algorithm finds patterns in unlabeled data.\n3. **Reinforcement Learning**: The algorithm learns through interaction with an environment.\n\n```python\n# Simple example of a machine learning model\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LogisticRegression\n\n# Assuming X and y are your features and target variables\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\nmodel = LogisticRegression()\nmodel.fit(X_train, y_train)\n\n# Evaluate the model\naccuracy = model.score(X_test, y_test)\nprint(f'Model accuracy: {accuracy}')\n```\n\nFor more information, visit [Machine Learning on Wikipedia](https://en.wikipedia.org/wiki/Machine_learning).",
69
-
target_locale: "fr",
70
-
}.to_json,
71
-
{
72
-
translation:
73
-
"# Machine Learning 101\n\nLe Machine Learning (ML) est un sous-ensemble de l'Intelligence Artificielle (IA) qui se concentre sur le développement d'algorithmes et de modèles statistiques permettant aux systèmes informatiques d'améliorer leurs performances sur une tâche spécifique grâce à l'expérience.\n\n## Concepts clés\n\n1. **Apprentissage supervisé** : L'algorithme apprend à partir de données d'entraînement étiquetées.\n2. **Apprentissage non supervisé** : L'algorithme trouve des motifs dans des données non étiquetées.\n3. **Apprentissage par renforcement** : L'algorithme apprend à travers l'interaction avec un environnement.\n\n```python\n# Exemple simple d'un modèle de machine learning\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LogisticRegression\n\n# En supposant que X et y sont vos variables de caractéristiques et cibles\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\nmodel = LogisticRegression()\nmodel.fit(X_train, y_train)\n\n# Évaluer le modèle\naccuracy = model.score(X_test, y_test)\nprint(f'Model accuracy: {accuracy}')\n```\n\nPour plus d'informations, visitez [Machine Learning sur Wikipedia](https://en.wikipedia.org/wiki/Machine_learning).",
0 commit comments