Hello!
I found an AI-Specific Code smell in your project.
The smell is called: TensorArray Not Used
You can find more information about it in this paper: https://dl.acm.org/doi/abs/10.1145/3522664.3528620.
According to the paper, the smell is described as follows:
| Problem |
If the developer initializes an array using tf.constant() and tries to assign a new value to it in the loop to keep it growing, the code will run into an error. The developer can fix this error by the low-level tf.while_loop() API. However, it is inefficient coding in this way. A lot of intermediate tensors are built in this process. |
| Solution |
Using tf.TensorArray() for growing array in the loop is a better solution for this kind of problem in TensorFlow 2. |
| Impact |
Efficiency, Error-proneness |
Example:
### TensorFlow
import tensorflow as tf
@tf.function
def fibonacci(n):
a = tf.constant(1)
b = tf.constant(1)
- c = tf.constant([1, 1])
+ c = tf.TensorArray(tf.int32, n)
+ c = c.write(0, a)
+ c = c.write(1, b)
for i in range(2, n):
a, b = b, a + b
- c = tf.concat([c, [b]], 0)
+ c = c.write(i, b)
- return c
+ return c.stack()
You can find the code related to this smell in this link:
|
if add_special_tokens: |
|
sequence = self.build_inputs_with_special_tokens(ids, pair_ids) |
|
token_type_ids = self.create_token_type_ids_from_sequences(ids, pair_ids) |
|
encoded_inputs["special_tokens_mask"] = self.get_special_tokens_mask(ids, pair_ids) |
|
else: |
|
sequence = ids + pair_ids if pair else ids |
|
token_type_ids = [0] * len(ids) + ([1] * len(pair_ids) if pair else []) |
|
|
|
if return_tensors == 'tf' and is_tf_available(): |
|
sequence = tf.constant([sequence]) |
|
token_type_ids = tf.constant([token_type_ids]) |
|
elif return_tensors == 'pt' and is_torch_available(): |
|
sequence = torch.tensor([sequence]) |
|
token_type_ids = torch.tensor([token_type_ids]) |
|
elif return_tensors is not None: |
|
logger.warning("Unable to convert output to tensors format {}, PyTorch or TensorFlow is not available.".format(return_tensors)) |
|
|
|
encoded_inputs["input_ids"] = sequence |
|
encoded_inputs["token_type_ids"] = token_type_ids |
|
|
|
if max_length and len(encoded_inputs["input_ids"]) > max_length: |
.
I also found instances of this smell in other files, such as:
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/bert/optimization_test.py#L26-L36 Line: 31
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/bert_wwm_ext/optimization_test.py#L26-L36 Line: 31
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/ernie/optimization_test.py#L26-L36 Line: 31
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/roberta_wwm_ext/optimization_test.py#L26-L36 Line: 31
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/roberta_wwm_large_ext/optimization_test.py#L26-L36 Line: 31
.
I hope this information is helpful!
Hello!
I found an AI-Specific Code smell in your project.
The smell is called: TensorArray Not Used
You can find more information about it in this paper: https://dl.acm.org/doi/abs/10.1145/3522664.3528620.
According to the paper, the smell is described as follows:
Example:
You can find the code related to this smell in this link:
CLUE/baselines/models_pytorch/classifier_pytorch/transformers/tokenization_utils.py
Lines 855 to 875 in 2ea9046
I also found instances of this smell in other files, such as:
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/bert/optimization_test.py#L26-L36 Line: 31
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/bert_wwm_ext/optimization_test.py#L26-L36 Line: 31
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/ernie/optimization_test.py#L26-L36 Line: 31
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/roberta_wwm_ext/optimization_test.py#L26-L36 Line: 31
File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/roberta_wwm_large_ext/optimization_test.py#L26-L36 Line: 31
.
I hope this information is helpful!