Skip to content

DeepBGC failed with ValueError: Grouper for 'sequence_id' not 1-dimensional #100

@marlaux

Description

@marlaux

Hello
Could you please help me with the folowing error: DeepBGC failed with ValueError: Grouper for 'sequence_id' not 1-dimensional
Thank you very much!!!

Commands:

deepbgc prepare --output-tsv cat_saxi_clusters_refs.prepared.tsv cat_saxi_clusters_refs_genomic.fasta
deepbgc train --model deepbgc.json --output SaxiDetector.pkl --config PFAM2VEC ./pfam2vec.csv Saxi_Positives.incluster.pfam.tsv Fake_negatives.pfam.tsv
ERROR 29/07 18:14:08 DeepBGC failed with ValueError: Grouper for 'sequence_id' not 1-dimensional

cat_saxi_clusters_refs_genomic.fasta contains six nucleotide sequences from a BGC of six species

Based on the GeneSwap_Negatives.pfam.tsv file, I edited the cat_saxi_clusters_refs.prepared.tsv to have the same columns as the GeneSwap_Negatives.pfam.tsv file, including the 'sequence_id', which consist of six BGC identifiers in the positive file and 'NEG_FAKE_CLUSTER' in the edited Fake_negatives.pfam.tsv
Both files have these columns:
sequence_id|contig_id|protein_id|gene_start|gene_end|gene_strand|pfam_id|domain_start|domain_end|bitscore|in_cluster

Saxi_Positives.incluster.pfam.tsv with in_cluster = 1 and six sequence_id to group by during training
Fake_negatives.pfam.tsv with in_cluster = 0 and one sequence_id

I got the deepbgc.json and pfam2vec.csv from github

Complete error message:
Traceback (most recent call last):
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
run(argv)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
args.func.run(**args_dict)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/command/train.py", line 60, in run
train_samples, train_y = util.read_samples(inputs, target)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/util.py", line 561, in read_samples
samples = [sample for sample_id, sample in domains.groupby('sequence_id')]
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/generic.py", line 7632, in groupby
observed=observed, **kwargs)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2110, in groupby
return klass(obj, by, **kwds)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 360, in init
mutated=self.mutated)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 602, in _get_grouper
if not isinstance(gpr, Grouping) else gpr)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 322, in init
"Grouper for '{}' not 1-dimensional".format(t))
ValueError: Grouper for 'sequence_id' not 1-dimensional
ERROR 29/07 18:14:08 ================================================================================
ERROR 29/07 18:14:08 DeepBGC failed with ValueError: Grouper for 'sequence_id' not 1-dimensional
ERROR 29/07 18:14:08 ================================================================================

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions