Skip to content

error: list not sorted #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matthpich opened this issue Mar 10, 2018 · 14 comments
Open

error: list not sorted #1

matthpich opened this issue Mar 10, 2018 · 14 comments

Comments

@matthpich
Copy link

Hi,
I ran MeShClust on 700k sequences and got the following error message:

Using 16 bit histograms
Counting 4-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Warning: Alignment may be too large for sampling
Before Pair: >158256496-stool1_revised_C820061_1_gene84242      strand:+, >158256496-stool1_revised_C820061_1_gene84242 strand:+
Before Pair: >158256496-stool1_revised_C820061_1_gene84242      strand:+, >158256496-stool1_revised_C844273_1_gene26404 strand:+
Before Pair: >158256496-stool1_revised_C820061_1_gene84242      strand:+, >158256496-stool1_revised_C850045_1_gene50883 strand:-
Before Pair: >158256496-stool1_revised_C820061_1_gene84242      strand:+, >158256496-stool1_revised_C928413_1_gene23126 strand:-
Alignment [============================================================] 100 %
positive=56 negative=1008
resizing positive
Vector size: 56 min size: 56
resizing negative
Vector size: 1008 min size: 56
index size: 952
positive=56 negative=56
Adding combo 18
new single feature 2
new single feature 16
Adding combo 6
new single feature 4
Adding combo 32
new single feature 32
bounds[0]: 0 to 16290
bounds[1]: 0.0944969 to 1
bounds[2]: 0 to 16290
bounds[3]: -0.188998 to 15.5225
Accuracy: 96.4286% Sensitivity: 100% Specificity: 92.8571% 
Accuracy: 94.6429% Sensitivity: 100% Specificity: 89.2857% 
Adding combo 1026
new single feature 1024
bounds[0]: 0 to 16290
bounds[1]: 0.0944969 to 1
bounds[2]: 0 to 16290
bounds[3]: -0.188998 to 15.5225
bounds[4]: 34393 to 65536
Accuracy: 98.2143% Sensitivity: 100% Specificity: 96.4286% 
Accuracy: 100% Sensitivity: 100% Specificity: 100% 
breaking from acc cutoff
Final: feat size is 4
Using 4 features Mar  9 2018
error: list not sorted===============>                                 ] 40 %
terminate called after throwing an instance of 'int'

Can this be overcome?
Many thanks,
Matthieu

@matthpich matthpich changed the title Warning: Alignment may be too large for sampling error: list not sorted Mar 10, 2018
@CBorreda
Copy link

I'm having the same problem with one file. Did you solve it?

@benjamin-james
Copy link
Member

New commit should fix this issue

@jgoodson
Copy link

jgoodson commented Sep 7, 2018

I have a different, but possibly related, error with the same error message when trying to cluster 500k short sequences. This is on the current Master.

❯❯❯ ~/software/MeShClust/bin/meshclust experiment.fasta --id 0.6 --threads 16 --output experiment.clstr
avg length: 74
Recommended K: 3
Reading in sequences [=================================================] 100 %
Using 8 bit histograms
Counting 3-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Before Pair: >JONATHAN:1:34:1:10053:35678:13647 1:N:0:1, >JONATHAN:1:34:1:50854:28360:84667 1:N:0:1
Before Pair: >JONATHAN:1:34:1:10069:82808:87017 2:N:0:1, >JONATHAN:1:34:1:26251:60815:76440 1:N:0:1
Before Pair: >JONATHAN:1:34:1:1007:70526:36221 2:N:0:1, >JONATHAN:1:34:1:77317:28403:22037 1:N:0:1
Before Pair: >JONATHAN:1:34:1:10179:75351:98929 2:N:0:1, >JONATHAN:1:34:1:74366:29020:95042 2:N:0:1
Alignment [============================================================] 100 %
positive=785 negative=735
resizing positive
Vector size: 785 min size: 735
index size: 50
resizing negative
Vector size: 735 min size: 735
positive=735 negative=735
Adding combo 18
new single feature 2
new single feature 16
Adding combo 6
new single feature 4
Adding combo 32
new single feature 32
bounds[0]: 0 to 3
bounds[1]: 0.632353 to 1
bounds[2]: 0 to 100
bounds[3]: -0.34485 to 1
Accuracy: 100% Sensitivity: 100% Specificity: 100%
Accuracy: 99.7283% Sensitivity: 100% Specificity: 99.4565%
breaking from acc cutoff
Final: feat size is 3
Using 3 features Sep  7 2018
error: list is not sorted
error: no bins to insert into, item not inserted
[1]    17949 segmentation fault  ~/software/MeShClust/bin/meshclust experiment.fasta --id 0.6  16

@liupfskygre
Copy link

Hi,
I got the same problem. I want to cluster 40 bp sequences (10K) with --id 0.6:
bin/meshclust 100000_seqs_40_40_bp.fasta --id 0.60 --output 100000_seqs_40_40_bp.clstr --threads 6

avg length: 40
Recommended K: 2
Reading in sequences [=================================================] 100 %
Using 8 bit histograms
Counting 2-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Warning: Alignment may be too large for sampling
Before Pair: >A10028|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp, >A45634|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp
Before Pair: >A10034|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp, >A28459|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp
Before Pair: >A1003|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp, >A94460|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp
Before Pair: >A10065|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp, >A94460|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp
Alignment [============================================================] 100 %
positive=45 negative=977
resizing positive
Vector size: 45 min size: 45
resizing negative
Vector size: 977 min size: 45
index size: 932
positive=45 negative=45
Adding combo 18
new single feature 2
new single feature 16
Adding combo 6
new single feature 4
Adding combo 32
new single feature 32
bounds[0]: 0 to 2.22507e-308
bounds[1]: 0.709091 to 1
bounds[2]: 0 to 32
bounds[3]: -0.388057 to 1
Accuracy: 86.3636% Sensitivity: 86.3636% Specificity: 86.3636%
Accuracy: 91.3043% Sensitivity: 91.3043% Specificity: 91.3043%
Adding combo 1026
new single feature 1024
bounds[0]: 0 to 2.22507e-308
bounds[1]: 0.709091 to 1
bounds[2]: 0 to 32
bounds[3]: -0.388057 to 1
bounds[4]: 181.527 to 256
Accuracy: 81.8182% Sensitivity: 86.3636% Specificity: 77.2727%
Accuracy: 89.1304% Sensitivity: 100% Specificity: 78.2609%
Final: feat size is 4
Using 4 features Sep 5 2018
error: list is not sorted
error: no bins to insert into, item not inserted
Segmentation fault (core dumped)

Do you have solution for this now?
Thanks!

@benjamin-james
Copy link
Member

No solution yet, but am working on it
Do you have data that caused the error? Thanks

@jgoodson
Copy link

Here is a sample of some data that causes this error.

https://gist.github.com/jgoodson/253f56ef4c49388304eb51fc42b9eeba

With this input, a call to MeShClust with default options does not crash and returns

Identity value does not match sampled data: Too many sequences below identity

If I specify an identity value, even the default of 0.90, I get the previous error:

error: list is not sorted
error: no bins to insert into, item not inserted
[1]    11218 segmentation fault  ~/software/MeShClust/bin/meshclust exp5ks.fasta --id 0.90 --output /dev/null

@benjamin-james
Copy link
Member

Thanks

@AnaSofia94
Copy link

AnaSofia94 commented Oct 4, 2018

I'm sorry to bother, but have you found a solution for this problem?
Thanks

@benjamin-james
Copy link
Member

Not yet

@matsen
Copy link

matsen commented Oct 25, 2018

Hi @benjamin-james -- I'm sure you're quite busy, but we're also hitting this problem. Can you help us understand if this is something likely to be fixed in the next few weeks, or is it something bigger that will require a significant amount of time?

@benjamin-james
Copy link
Member

Close, fixed in a few places but not in all cases yet

@benjamin-james
Copy link
Member

master should fix this bug

@ElHirad
Copy link

ElHirad commented Jun 30, 2019

Hello. I'm still having these issues with MeShClust. What should I do? Seqs are around 1k bp length and their number are around 300k.

avg length: 972
Recommended K: 4
Reading in sequences [=================================================] 100 %
Using 16 bit histograms
Counting 4-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Warning: Alignment may be too large for sampling
Before Pair: >align_id:1854781|asmbl_145 gene=PASA_cluster_114, >align_id:1942275|asmbl_87639 gene=PASA_cluster_75689
Before Pair: >align_id:1855658|asmbl_1022 gene=PASA_cluster_869, >align_id:1932409|asmbl_77773 gene=PASA_cluster_67074
Before Pair: >align_id:1855659|asmbl_1023 gene=PASA_cluster_870, >align_id:2204054|asmbl_349418 gene=PASA_cluster_288159
Before Pair: >align_id:1855697|asmbl_1061 gene=PASA_cluster_907, >align_id:1917658|asmbl_63022 gene=PASA_cluster_54328
Alignment [============================================================] 100 %
positive=45 negative=1019
resizing positive
Vector size: 45 min size: 45
resizing negative
Vector size: 1019 min size: 45
index size: 974
positive=45 negative=45
Adding combo 18
new single feature 2
new single feature 16
Adding combo 6
new single feature 4
Adding combo 32
new single feature 32
bounds[0]: 0 to 17418
bounds[1]: 0.0997519 to 1
bounds[2]: 0 to 17418
bounds[3]: 0.272044 to 1.52156
Inverse does not exist
Accuracy: 0% Sensitivity: 0% Specificity: 0% 
Accuracy: 0% Sensitivity: 0% Specificity: 0% 
Adding combo 1026
new single feature 1024
bounds[0]: 0 to 17418
bounds[1]: 0.0997519 to 1
bounds[2]: 0 to 17418
bounds[3]: 0.272044 to 1.52156
bounds[4]: 34488.1 to 65536
Inverse does not exist
Accuracy: 0% Sensitivity: 0% Specificity: 0% 
Accuracy: 0% Sensitivity: 0% Specificity: 0% 
Final: feat size is 4
Using 4 features Feb  2 2018
error: list not sorted                                                 ] 2 %
terminate called after throwing an instance of 'int'
Aborted (core dumped)```

@hani-girgis
Copy link
Member

Hi. I am happy to help. In order to reproduce this error on my machine, would you share the input sequences that caused this error? My email address is hzgirgis at buffalo dot edu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants