-
Notifications
You must be signed in to change notification settings - Fork 241
Description
Hi there team,
We are using neomodel for creating our graphs and are quite confused with couple of implementations. I will try to explain the situation below.
In the code below, the function takes 2 arguments, where data_dict is a dictionary of all datasets and node_dict is a dictionary of unique entities from data_dict with specific node attributes.
For instance,
node_dict = {'E3binder':{
{1: {'name': 'E3Binder 7364', 'inchi_key': 'LRGLFYOYKPSPJQ-UHFFFAOYSA-N'},
{2: {'name': 'E3Binder 7365', 'inchi_key': 'LRGLFYOYKPSPJQ-ABCDEFGHHI-B'}},
E3ligase:{3:{'name':'ABC','uprot':'Q23T0Y'}}}
While creating the graph, as expected, there will be many to many relationships which is dependent upon the data_dict. However, we are facing issues of duplicated nodes when trying to save nodes even when the attributes of nodes do not change. For instance,
e3b_node = node_dict['E3Binder'][1]
e3b_node.save()
creates duplicated nodes every time it appears in the data_dict. How can we avoid this?
For now, we have found a workaround to avoid duplicate nodes (code below). However, it seems we might have to do it for all functions which create relationships. We have several functions to do this and doing it for all of them this way doesn't seem very convenient. Can you help with a better implementation?
In py2neo, a one line code: e3rel = Relationship(e3b_node, 'has_e3_ligase ', node_dict['E3 Ligase'][e3ligase]) was very straightforward. Never had issues with duplicate nodes, nor had to save nodes before creating relationships.
Thanks,
R
Code looks like this:
class E3BinderNode(StructuredNode):
__label__ = 'E3Binder'
name = StringProperty(unique_index=True,required=True)
inchi_key = StringProperty()
smiles = StringProperty()
proxidrugsdb = StringProperty()
pubchem = StringProperty()
nomenclature = StringProperty()
molecular_formula = StringProperty()
chembl_id = StringProperty()
chembl_url = StringProperty()
metadata = JSONProperty(default={})
# Relationships
has_e3_ligase = RelationshipTo('E3LigaseNode', 'hasE3Ligase')
class E3LigaseNode(StructuredNode):
__label__ = 'E3Ligase'
name = StringProperty(unique_index=True,required=True)
uniprot_id = StringProperty()
approved_name = StringProperty()
gene_name = StringProperty()
function = StringProperty()
subcellular_location = StringProperty()
structural_family = StringProperty()
metadata = JSONProperty(default={})
def createE3b2E3lig_Wh2Prot_Relns_neo4j(data_dict,node_dict):
#E3 Binder to E3 Ligase edges
for item in tqdm(data_dict, desc='Creating E3binder2E3ligase edges'):
if item['UBM'] and item['PD_Ubiquitin_Ligase_involved']:
e3b_regno = item['UBM']['CBDREGNO']
e3b_node = node_dict['E3Binder'][e3b_regno]
#e3b_node.save()
e3ligase_name = item['PD_Ubiquitin_Ligase_involved'][0]['pd_ubiquitin_ligase_involved']
e3lig_node = node_dict['E3Ligase'][e3ligase_name]
#e3lig_node.save()
e3b_already_saved = E3BinderNode.nodes.first_or_none(name=f"{e3b_node.name}")
if e3b_already_saved is None:
e3b_node.save()
e3lig_already_saved = E3LigaseNode.nodes.first_or_none(name=f"{e3lig_node.name}")
if e3lig_already_saved is None:
e3lig_node.save()
if e3lig_already_saved is not None:
e3lig_node = e3lig_already_saved
if e3b_already_saved is not None:
e3b_node = e3b_already_saved
if not e3b_node.has_e3_ligase.is_connected(e3lig_node):
e3b_node.has_e3_ligase.connect(e3lig_node)