Skip to content

Duplicate nodes issues and request for better implementation #893

@jrkarki

Description

@jrkarki

Hi there team,

We are using neomodel for creating our graphs and are quite confused with couple of implementations. I will try to explain the situation below.

In the code below, the function takes 2 arguments, where data_dict is a dictionary of all datasets and node_dict is a dictionary of unique entities from data_dict with specific node attributes.

For instance,

node_dict = {'E3binder':{
{1: {'name': 'E3Binder 7364', 'inchi_key': 'LRGLFYOYKPSPJQ-UHFFFAOYSA-N'},
{2: {'name': 'E3Binder 7365', 'inchi_key': 'LRGLFYOYKPSPJQ-ABCDEFGHHI-B'}},
E3ligase:{3:{'name':'ABC','uprot':'Q23T0Y'}}}

While creating the graph, as expected, there will be many to many relationships which is dependent upon the data_dict. However, we are facing issues of duplicated nodes when trying to save nodes even when the attributes of nodes do not change. For instance,

e3b_node = node_dict['E3Binder'][1]
e3b_node.save()

creates duplicated nodes every time it appears in the data_dict. How can we avoid this?

For now, we have found a workaround to avoid duplicate nodes (code below). However, it seems we might have to do it for all functions which create relationships. We have several functions to do this and doing it for all of them this way doesn't seem very convenient. Can you help with a better implementation?

In py2neo, a one line code: e3rel = Relationship(e3b_node, 'has_e3_ligase ', node_dict['E3 Ligase'][e3ligase]) was very straightforward. Never had issues with duplicate nodes, nor had to save nodes before creating relationships.

Thanks,
R

Code looks like this:

class E3BinderNode(StructuredNode):
    __label__ = 'E3Binder'
    name = StringProperty(unique_index=True,required=True)
    inchi_key = StringProperty()
    smiles = StringProperty()
    proxidrugsdb = StringProperty()
    pubchem = StringProperty()
    nomenclature = StringProperty()
    molecular_formula = StringProperty()
    chembl_id = StringProperty()
    chembl_url = StringProperty()
    metadata = JSONProperty(default={})
    
    # Relationships
    has_e3_ligase = RelationshipTo('E3LigaseNode', 'hasE3Ligase') 

class E3LigaseNode(StructuredNode):
    __label__ = 'E3Ligase'
    name = StringProperty(unique_index=True,required=True)
    uniprot_id = StringProperty()
    approved_name = StringProperty()
    gene_name = StringProperty()
    function = StringProperty()
    subcellular_location = StringProperty()
    structural_family = StringProperty()
    metadata = JSONProperty(default={})


def createE3b2E3lig_Wh2Prot_Relns_neo4j(data_dict,node_dict):

    #E3 Binder to E3 Ligase edges
    for item in tqdm(data_dict, desc='Creating E3binder2E3ligase edges'):
        if item['UBM'] and item['PD_Ubiquitin_Ligase_involved']:
            e3b_regno = item['UBM']['CBDREGNO']
            e3b_node = node_dict['E3Binder'][e3b_regno]
    
            #e3b_node.save()

            e3ligase_name = item['PD_Ubiquitin_Ligase_involved'][0]['pd_ubiquitin_ligase_involved']
            e3lig_node = node_dict['E3Ligase'][e3ligase_name]
            #e3lig_node.save()
            
            e3b_already_saved = E3BinderNode.nodes.first_or_none(name=f"{e3b_node.name}")
            if e3b_already_saved is None:
                e3b_node.save()
            
            e3lig_already_saved = E3LigaseNode.nodes.first_or_none(name=f"{e3lig_node.name}")
            if e3lig_already_saved is None:
                e3lig_node.save()
            
            if e3lig_already_saved is not None:
                e3lig_node = e3lig_already_saved
            if e3b_already_saved is not None:
                e3b_node = e3b_already_saved

            if not e3b_node.has_e3_ligase.is_connected(e3lig_node):
                e3b_node.has_e3_ligase.connect(e3lig_node)
         

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions