You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, training on DataSAIL splits generally leads to better generalizing models. The DataSAIL splits are designed to reduce information leakage between splits.
17
17
Therefore, when used for hyperparameter tuning, they help in selecting models (and their hyperparameter) that generalize better to unseen data.
18
18
19
-
What are the limitations of DataSAIL?
20
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19
+
2. What are the limitations of DataSAIL?
20
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21
21
The most time and memory consuming step in DataSAIL is the clustering of the data. For most datatypes, this is done by third-party programms such as FoldSeek,
22
22
DIAMOND, or MASH. In that case, DataSAIL has no influence on the runtime and memory consumption. The user may provide their own commandline arguments to these
23
23
programs.
24
24
25
25
Practical Questions
26
26
-------------------
27
27
28
-
How can I relax the split constraints if DataSAIL fails to find a solution?
Sometimes, DataSAIL is unable to solve the split problem and might output a message like:
31
31
32
32
.. code-block:: shell
@@ -43,3 +43,14 @@ DataSAIL compiles your input into multiple variables and constraints that for a
43
43
44
44
- If you are already on :code:`v1.2.0` or newer, you can set the :code:`epsilon` value to higher numbers. Default is :code:`0.05` but anything up to :code:`0.2`
45
45
or :code:`0.3` is totally reasonable. If you use stratification, you also need to set :code:`delta` to a higher value as both values are connected in that scenario.
46
+
47
+
2. DataSAIL shows a log message stating the found solution is :code:`optimal_inaccurate`. What does that mean?
This message just means, that the solver in DataSAIL found a solution, but the optimization did not finish and was terminated because of the timeout.
50
+
Therefore, the solution is not guaranteed to be optimal, but it is still a valid solution that satisfies all constraints and is in most cases close to optimal.
51
+
Therefore, you can use that :code:`optimal_inaccurate` solution without problems.
52
+
53
+
3. I set :code:`runs>1` but DataSAIL outputs the same splits each time. Why is that?
When you set the :code:`runs` variable to values greater than :code:`1`, DataSAIL will shuffle the dataset inbetween splitting rounds to run the optimization from different initializations.
56
+
But since many datasets have a unique optimal solution, DataSAIL might find the same solution multiple times and output it mutliple times.
0 commit comments