@@ -25,8 +25,8 @@ For multiple small graphs a list of these dictionaries serves to
25
25
represent the common case of datasets for supervised learning tasks, for
26
26
example small molecules or crystal structures.
27
27
28
- Graph dictionary
29
- ----------------
28
+ Graph Dict
29
+ ----------
30
30
31
31
Graphs are represented by a dictionary ``GraphDict `` of (numpy) arrays
32
32
which behaves like a python dict. In principle the ``GraphDict `` can
@@ -43,14 +43,16 @@ arrays or run ``validate()``.
43
43
graph = GraphDict({"edge_indices": np.array([[1, 0], [0, 1]]), "node_label": np.array([[0], [1]])})
44
44
graph.set("graph_labels", np.array([0]))
45
45
graph.set("edge_attributes", np.array([[1.0], [2.0]]));
46
- print({x: v .shape for x,v in graph.items()})
46
+ print({key: value .shape for key, value in graph.items()})
47
47
print("Is dict: %s" % isinstance(graph, dict))
48
+ print("Graph label", graph["graph_labels"])
48
49
49
50
50
51
.. parsed-literal ::
51
52
52
53
{'edge_indices': (2, 2), 'node_label': (2, 1), 'graph_labels': (1,), 'edge_attributes': (2, 1)}
53
54
Is dict: True
55
+ Graph label [0]
54
56
55
57
56
58
The class ``GraphDict `` can be converted to for example a strict graph
@@ -77,7 +79,7 @@ Or compiling a dictionary of (tensorial) graph properties from a
77
79
.. code :: ipython3
78
80
79
81
graph = GraphDict().from_networkx(nx.cubical_graph())
80
- print({x: v .shape for x,v in graph.items()})
82
+ print({key: value .shape for key, value in graph.items()})
81
83
82
84
83
85
.. parsed-literal ::
@@ -94,41 +96,247 @@ accordingly.
94
96
**WARNING **: However, they should be used with caution since they
95
97
only apply to tensor properties regardless of any underlying graph.
96
98
97
- For example ``SortEdgeIndices `` can sort an ‘ edge_indices’ tensor and
98
- sort attributed properties such as ‘ edge_attributes’ or ‘ edge_labels’ or
99
+ For example ``SortEdgeIndices `` can sort an “ edge_indices” tensor and
100
+ sort attributed properties such as “ edge_attributes” or “ edge_labels” or
99
101
a list of multiple (named) properties accordingly. In the example below
100
102
a generic search string is also valid. To directly update a
101
- ``GraphDict `` make a preprocessor with ``in_place=True `` .
103
+ ``GraphDict `` make a preprocessor with ``in_place=True `` . Note that
104
+ preprocessors can be serialised and habe a ``get_config `` method.
102
105
103
106
.. code :: ipython3
104
107
105
108
from kgcnn.graph.preprocessor import SortEdgeIndices, AddEdgeSelfLoops, SetEdgeWeightsUniform
106
109
107
110
SortEdgeIndices(edge_indices="edge_indices", edge_attributes="^edge_(?!indices$).*", in_place=True)(graph)
111
+
108
112
SetEdgeWeightsUniform(edge_indices="edge_indices", value=1.0, in_place=True)(graph)
113
+
109
114
AddEdgeSelfLoops(
110
115
edge_indices="edge_indices", edge_attributes="^edge_(?!indices$).*",
111
116
remove_duplicates=True, sort_indices=True, fill_value=0, in_place=True)(graph);
112
117
113
- print({x: v .shape for x,v in graph.items()})
118
+ print({key: value .shape for key, value in graph.items()})
114
119
115
120
116
121
.. parsed-literal ::
117
122
118
123
{'node_number': (8,), 'edge_indices': (20, 2), 'edge_weights': (20, 1)}
119
124
120
125
121
- Graph list
126
+ Graph List
122
127
----------
123
128
124
129
A ``MemoryGraphList `` should behave identical to a python list but
125
- contain only ``GraphDict `` items.
130
+ contain only ``GraphDict `` items. Here a few examples with some utility
131
+ methods of the class.
132
+
133
+ .. code :: ipython3
134
+
135
+ from kgcnn.data.base import MemoryGraphList
136
+
137
+ # List of graph dicts.
138
+ graph_list = MemoryGraphList([
139
+ GraphDict({"edge_indices": [[0, 1], [1, 0]], "graph_label": [0]}),
140
+ GraphDict({"edge_indices": [[0, 0]], "graph_label": [1]}),
141
+ GraphDict({"graph_label": [0]})
142
+ ])
143
+
144
+ # Remove graphs without certain property
145
+ graph_list.clean(["edge_indices"])
146
+ print("New length of graph:", len(graph_list))
147
+
148
+ # Go to every graph dict and take out the requested property. Opposite is set().
149
+ print("Labels (list):", graph_list.get("graph_label"))
150
+
151
+ # Or directly modify list.
152
+ for i, x in enumerate(graph_list):
153
+ x.set("graph_number", [i])
154
+
155
+ print(graph_list) # Also supports indexing lists.
156
+
157
+
158
+ .. parsed-literal ::
159
+
160
+ INFO:kgcnn.data.base:Property 'edge_indices' is not defined for graph '2'.
161
+ WARNING:kgcnn.data.base:Found invalid graphs for properties. Removing graphs '[2]'.
162
+
163
+
164
+ .. parsed-literal ::
165
+
166
+ New length of graph: 2
167
+ Labels (list): [array([0]), array([1])]
168
+ <MemoryGraphList [{'edge_indices': array([[0, 1],
169
+ [1, 0]]), 'graph_label': array([0]), 'graph_number': array([0])} ...]>
170
+
171
+
172
+ It is also easy to map a a method over the graph dicts in the list. This
173
+ can be a class method of ``GraphDict `` or a callable function (or class)
174
+ or for legacy compatibility a default name of a preprocessor.
175
+
176
+ .. code :: ipython3
177
+
178
+ graph_list.map_list(method=AddEdgeSelfLoops(edge_indices="edge_indices", in_place=True))
179
+
180
+ # Note: Former deprecated option is to use a method name that is looked up in the preprocessor class.
181
+ # graph_list.map_list(method="add_edge_self_loops")
182
+
183
+
184
+
185
+
186
+ .. parsed-literal ::
187
+
188
+ <MemoryGraphList [{'edge_indices': array([[0, 0],
189
+ [0, 1],
190
+ [1, 0],
191
+ [1, 1]]), 'graph_label': array([0]), 'graph_number': array([0])} ...]>
192
+
193
+
194
+
195
+ Most importantly is to obtain a ragged tensor for direct model input.
196
+ You can simply pass a list or dict of the config of keras Input layers
197
+ as shown below:
198
+
199
+ .. code :: ipython3
200
+
201
+ graph_list.tensor([
202
+ {"name": "edge_indices", "shape": (None, 2), "ragged": True, "dtype": "int64"},
203
+ {"name": "graph_label", "shape": (1, ), "ragged": False}
204
+ ])
205
+
206
+
207
+
208
+
209
+ .. parsed-literal ::
210
+
211
+ [<tf.RaggedTensor [[[0, 0],
212
+ [0, 1],
213
+ [1, 0],
214
+ [1, 1]], [[0, 0]]]>,
215
+ <tf.Tensor: shape=(2, 1), dtype=int32, numpy=
216
+ array([[0],
217
+ [1]])>]
218
+
219
+
126
220
127
221
Datasets
128
222
--------
129
223
130
- Model input
131
- -----------
224
+ The ``MemoryGraphDataset `` inherits from ``MemoryGraphList `` but must be
225
+ initialized with file information on disk that points to a
226
+ ``data_directory `` for the dataset. The ``data_directory `` can have a
227
+ subdirectory for files and/or single file such as a CSV file. The usual
228
+ data structure looks like this:
229
+
230
+ .. code :: bash
231
+
232
+ ├── data_directory
233
+ ├── file_directory
234
+ │ ├── * .*
235
+ │ └── ...
236
+ ├── file_name
237
+ └── dataset_name.kgcnn.pickle
238
+
239
+ .. code :: ipython3
240
+
241
+ from kgcnn.data.base import MemoryGraphDataset
242
+ dataset = MemoryGraphDataset(
243
+ data_directory=".", # Path to file directory or current folder
244
+ dataset_name="Example",
245
+ file_name=None, file_directory=None)
246
+
247
+ # Modify like a MemoryGraphList
248
+ for x in graph_list:
249
+ dataset.append(x)
250
+ dataset[0]["node_attributes"] = np.array([[0.9, 3.2], [1.2, 2.4]])
251
+ print(dataset)
252
+
253
+
254
+ .. parsed-literal ::
255
+
256
+ <MemoryGraphDataset [{'edge_indices': array([[0, 0],
257
+ [0, 1],
258
+ [1, 0],
259
+ [1, 1]]), 'graph_label': array([0]), 'graph_number': array([0]), 'node_attributes': array([[0.9, 3.2],
260
+ [1.2, 2.4]])} ...]>
261
+
262
+
263
+ You can also change the location on file with ``relocate() `` . Note that
264
+ in this case only the file information is changed, but no files are
265
+ moved or copied. Save the dataset as pickled python list of python dicts
266
+ to file:
267
+
268
+ .. code :: ipython3
269
+
270
+ dataset.save()
271
+ dataset.load()
272
+
273
+
274
+ .. parsed-literal ::
275
+
276
+ INFO:kgcnn.data.Example:Pickle dataset...
277
+ INFO:kgcnn.data.Example:Load pickled dataset...
278
+
279
+
280
+
281
+
282
+ .. parsed-literal ::
283
+
284
+ <MemoryGraphDataset [{'edge_indices': array([[0, 0],
285
+ [0, 1],
286
+ [1, 0],
287
+ [1, 1]]), 'graph_label': array([0]), 'graph_number': array([0]), 'node_attributes': array([[0.9, 3.2],
288
+ [1.2, 2.4]])} ...]>
289
+
290
+
291
+
292
+ Special Datasets
293
+ ~~~~~~~~~~~~~~~~
294
+
295
+ From ``MemoryGraphDataset `` there are many subclasses ``QMDataset ``,
296
+ ``MoleculeNetDataset ``, ``CrystalDataset ``, ``VisualGraphDataset `` and
297
+ ``GraphTUDataset `` which further have functions required for the
298
+ specific dataset type to convert and process files such as ‘.txt’,
299
+ ‘.sdf’, ‘.xyz’, ‘.cif’, ‘.jpg’ etc. They are located in ``kgcnn.data `` .
300
+ Most subclasses implement ``prepare_data() `` and ``read_in_memory() ``
301
+ with dataset dependent arguments to preprocess and finally load data
302
+ from different formats.
303
+
304
+ Then there are fully prepared subclasses in ``kgcnn.data.datasets ``
305
+ which download and process common benchmark datasets and can be used as
306
+ simple as this:
307
+
308
+ .. code :: ipython3
309
+
310
+ from kgcnn.data.datasets.MUTAGDataset import MUTAGDataset
311
+ dataset = MUTAGDataset() # inherits from GraphTUDataset2020()
312
+ dataset[0].keys()
313
+
314
+
315
+ .. parsed-literal ::
316
+
317
+ INFO:kgcnn.data.download:Checking and possibly downloading dataset with name MUTAG
318
+ INFO:kgcnn.data.download:Dataset directory located at C:\U sers\p atri\. kgcnn\d atasets
319
+ INFO:kgcnn.data.download:Dataset directory found. Done.
320
+ INFO:kgcnn.data.download:Dataset found. Done.
321
+ INFO:kgcnn.data.download:Directory for extraction exists. Done.
322
+ INFO:kgcnn.data.download:Not extracting zip file. Stopped.
323
+ INFO:kgcnn.data.MUTAG:Reading dataset to memory with name MUTAG
324
+ INFO:kgcnn.data.MUTAG:Shift start of graph ID to zero for 'MUTAG' to match python indexing.
325
+ INFO:kgcnn.data.MUTAG:Graph index which has unconnected '[]' with '[]' in total '0'.
326
+
327
+
328
+
329
+
330
+ .. parsed-literal ::
331
+
332
+ dict_keys(['node_degree', 'node_labels', 'edge_indices', 'edge_labels', 'graph_labels', 'node_attributes', 'edge_attributes', 'node_symbol', 'node_number', 'graph_size'])
333
+
334
+
335
+
336
+ Here are some examples on custom usage of the base classes:
337
+
338
+ MoleculeNetDatasets
339
+ ^^^^^^^^^^^^^^^^^^^
132
340
133
341
134
342
**note **: You can find this page as jupyter notebook in
0 commit comments