Skip to content

Commit f64f53a

Browse files
committed
Update docs.
1 parent 48b9cc5 commit f64f53a

File tree

2 files changed

+235
-27
lines changed

2 files changed

+235
-27
lines changed

docs/source/data.rst

Lines changed: 220 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ For multiple small graphs a list of these dictionaries serves to
2525
represent the common case of datasets for supervised learning tasks, for
2626
example small molecules or crystal structures.
2727

28-
Graph dictionary
29-
----------------
28+
Graph Dict
29+
----------
3030

3131
Graphs are represented by a dictionary ``GraphDict`` of (numpy) arrays
3232
which behaves like a python dict. In principle the ``GraphDict`` can
@@ -43,14 +43,16 @@ arrays or run ``validate()``.
4343
graph = GraphDict({"edge_indices": np.array([[1, 0], [0, 1]]), "node_label": np.array([[0], [1]])})
4444
graph.set("graph_labels", np.array([0]))
4545
graph.set("edge_attributes", np.array([[1.0], [2.0]]));
46-
print({x: v.shape for x,v in graph.items()})
46+
print({key: value.shape for key, value in graph.items()})
4747
print("Is dict: %s" % isinstance(graph, dict))
48+
print("Graph label", graph["graph_labels"])
4849
4950
5051
.. parsed-literal::
5152
5253
{'edge_indices': (2, 2), 'node_label': (2, 1), 'graph_labels': (1,), 'edge_attributes': (2, 1)}
5354
Is dict: True
55+
Graph label [0]
5456
5557
5658
The class ``GraphDict`` can be converted to for example a strict graph
@@ -77,7 +79,7 @@ Or compiling a dictionary of (tensorial) graph properties from a
7779
.. code:: ipython3
7880
7981
graph = GraphDict().from_networkx(nx.cubical_graph())
80-
print({x: v.shape for x,v in graph.items()})
82+
print({key: value.shape for key, value in graph.items()})
8183
8284
8385
.. parsed-literal::
@@ -94,41 +96,247 @@ accordingly.
9496
**WARNING**: However, they should be used with caution since they
9597
only apply to tensor properties regardless of any underlying graph.
9698

97-
For example ``SortEdgeIndices`` can sort an edge_indices tensor and
98-
sort attributed properties such as edge_attributes or edge_labels or
99+
For example ``SortEdgeIndices`` can sort an edge_indices tensor and
100+
sort attributed properties such as edge_attributes or edge_labels or
99101
a list of multiple (named) properties accordingly. In the example below
100102
a generic search string is also valid. To directly update a
101-
``GraphDict`` make a preprocessor with ``in_place=True`` .
103+
``GraphDict`` make a preprocessor with ``in_place=True`` . Note that
104+
preprocessors can be serialised and habe a ``get_config`` method.
102105

103106
.. code:: ipython3
104107
105108
from kgcnn.graph.preprocessor import SortEdgeIndices, AddEdgeSelfLoops, SetEdgeWeightsUniform
106109
107110
SortEdgeIndices(edge_indices="edge_indices", edge_attributes="^edge_(?!indices$).*", in_place=True)(graph)
111+
108112
SetEdgeWeightsUniform(edge_indices="edge_indices", value=1.0, in_place=True)(graph)
113+
109114
AddEdgeSelfLoops(
110115
edge_indices="edge_indices", edge_attributes="^edge_(?!indices$).*",
111116
remove_duplicates=True, sort_indices=True, fill_value=0, in_place=True)(graph);
112117
113-
print({x: v.shape for x,v in graph.items()})
118+
print({key: value.shape for key, value in graph.items()})
114119
115120
116121
.. parsed-literal::
117122
118123
{'node_number': (8,), 'edge_indices': (20, 2), 'edge_weights': (20, 1)}
119124
120125
121-
Graph list
126+
Graph List
122127
----------
123128

124129
A ``MemoryGraphList`` should behave identical to a python list but
125-
contain only ``GraphDict`` items.
130+
contain only ``GraphDict`` items. Here a few examples with some utility
131+
methods of the class.
132+
133+
.. code:: ipython3
134+
135+
from kgcnn.data.base import MemoryGraphList
136+
137+
# List of graph dicts.
138+
graph_list = MemoryGraphList([
139+
GraphDict({"edge_indices": [[0, 1], [1, 0]], "graph_label": [0]}),
140+
GraphDict({"edge_indices": [[0, 0]], "graph_label": [1]}),
141+
GraphDict({"graph_label": [0]})
142+
])
143+
144+
# Remove graphs without certain property
145+
graph_list.clean(["edge_indices"])
146+
print("New length of graph:", len(graph_list))
147+
148+
# Go to every graph dict and take out the requested property. Opposite is set().
149+
print("Labels (list):", graph_list.get("graph_label"))
150+
151+
# Or directly modify list.
152+
for i, x in enumerate(graph_list):
153+
x.set("graph_number", [i])
154+
155+
print(graph_list) # Also supports indexing lists.
156+
157+
158+
.. parsed-literal::
159+
160+
INFO:kgcnn.data.base:Property 'edge_indices' is not defined for graph '2'.
161+
WARNING:kgcnn.data.base:Found invalid graphs for properties. Removing graphs '[2]'.
162+
163+
164+
.. parsed-literal::
165+
166+
New length of graph: 2
167+
Labels (list): [array([0]), array([1])]
168+
<MemoryGraphList [{'edge_indices': array([[0, 1],
169+
[1, 0]]), 'graph_label': array([0]), 'graph_number': array([0])} ...]>
170+
171+
172+
It is also easy to map a a method over the graph dicts in the list. This
173+
can be a class method of ``GraphDict`` or a callable function (or class)
174+
or for legacy compatibility a default name of a preprocessor.
175+
176+
.. code:: ipython3
177+
178+
graph_list.map_list(method=AddEdgeSelfLoops(edge_indices="edge_indices", in_place=True))
179+
180+
# Note: Former deprecated option is to use a method name that is looked up in the preprocessor class.
181+
# graph_list.map_list(method="add_edge_self_loops")
182+
183+
184+
185+
186+
.. parsed-literal::
187+
188+
<MemoryGraphList [{'edge_indices': array([[0, 0],
189+
[0, 1],
190+
[1, 0],
191+
[1, 1]]), 'graph_label': array([0]), 'graph_number': array([0])} ...]>
192+
193+
194+
195+
Most importantly is to obtain a ragged tensor for direct model input.
196+
You can simply pass a list or dict of the config of keras Input layers
197+
as shown below:
198+
199+
.. code:: ipython3
200+
201+
graph_list.tensor([
202+
{"name": "edge_indices", "shape": (None, 2), "ragged": True, "dtype": "int64"},
203+
{"name": "graph_label", "shape": (1, ), "ragged": False}
204+
])
205+
206+
207+
208+
209+
.. parsed-literal::
210+
211+
[<tf.RaggedTensor [[[0, 0],
212+
[0, 1],
213+
[1, 0],
214+
[1, 1]], [[0, 0]]]>,
215+
<tf.Tensor: shape=(2, 1), dtype=int32, numpy=
216+
array([[0],
217+
[1]])>]
218+
219+
126220
127221
Datasets
128222
--------
129223

130-
Model input
131-
-----------
224+
The ``MemoryGraphDataset`` inherits from ``MemoryGraphList`` but must be
225+
initialized with file information on disk that points to a
226+
``data_directory`` for the dataset. The ``data_directory`` can have a
227+
subdirectory for files and/or single file such as a CSV file. The usual
228+
data structure looks like this:
229+
230+
.. code:: bash
231+
232+
├── data_directory
233+
├── file_directory
234+
│ ├── *.*
235+
│ └── ...
236+
├── file_name
237+
└── dataset_name.kgcnn.pickle
238+
239+
.. code:: ipython3
240+
241+
from kgcnn.data.base import MemoryGraphDataset
242+
dataset = MemoryGraphDataset(
243+
data_directory=".", # Path to file directory or current folder
244+
dataset_name="Example",
245+
file_name=None, file_directory=None)
246+
247+
# Modify like a MemoryGraphList
248+
for x in graph_list:
249+
dataset.append(x)
250+
dataset[0]["node_attributes"] = np.array([[0.9, 3.2], [1.2, 2.4]])
251+
print(dataset)
252+
253+
254+
.. parsed-literal::
255+
256+
<MemoryGraphDataset [{'edge_indices': array([[0, 0],
257+
[0, 1],
258+
[1, 0],
259+
[1, 1]]), 'graph_label': array([0]), 'graph_number': array([0]), 'node_attributes': array([[0.9, 3.2],
260+
[1.2, 2.4]])} ...]>
261+
262+
263+
You can also change the location on file with ``relocate()`` . Note that
264+
in this case only the file information is changed, but no files are
265+
moved or copied. Save the dataset as pickled python list of python dicts
266+
to file:
267+
268+
.. code:: ipython3
269+
270+
dataset.save()
271+
dataset.load()
272+
273+
274+
.. parsed-literal::
275+
276+
INFO:kgcnn.data.Example:Pickle dataset...
277+
INFO:kgcnn.data.Example:Load pickled dataset...
278+
279+
280+
281+
282+
.. parsed-literal::
283+
284+
<MemoryGraphDataset [{'edge_indices': array([[0, 0],
285+
[0, 1],
286+
[1, 0],
287+
[1, 1]]), 'graph_label': array([0]), 'graph_number': array([0]), 'node_attributes': array([[0.9, 3.2],
288+
[1.2, 2.4]])} ...]>
289+
290+
291+
292+
Special Datasets
293+
~~~~~~~~~~~~~~~~
294+
295+
From ``MemoryGraphDataset`` there are many subclasses ``QMDataset``,
296+
``MoleculeNetDataset``, ``CrystalDataset``, ``VisualGraphDataset`` and
297+
``GraphTUDataset`` which further have functions required for the
298+
specific dataset type to convert and process files such as ‘.txt’,
299+
‘.sdf’, ‘.xyz’, ‘.cif’, ‘.jpg’ etc. They are located in ``kgcnn.data`` .
300+
Most subclasses implement ``prepare_data()`` and ``read_in_memory()``
301+
with dataset dependent arguments to preprocess and finally load data
302+
from different formats.
303+
304+
Then there are fully prepared subclasses in ``kgcnn.data.datasets``
305+
which download and process common benchmark datasets and can be used as
306+
simple as this:
307+
308+
.. code:: ipython3
309+
310+
from kgcnn.data.datasets.MUTAGDataset import MUTAGDataset
311+
dataset = MUTAGDataset() # inherits from GraphTUDataset2020()
312+
dataset[0].keys()
313+
314+
315+
.. parsed-literal::
316+
317+
INFO:kgcnn.data.download:Checking and possibly downloading dataset with name MUTAG
318+
INFO:kgcnn.data.download:Dataset directory located at C:\Users\patri\.kgcnn\datasets
319+
INFO:kgcnn.data.download:Dataset directory found. Done.
320+
INFO:kgcnn.data.download:Dataset found. Done.
321+
INFO:kgcnn.data.download:Directory for extraction exists. Done.
322+
INFO:kgcnn.data.download:Not extracting zip file. Stopped.
323+
INFO:kgcnn.data.MUTAG:Reading dataset to memory with name MUTAG
324+
INFO:kgcnn.data.MUTAG:Shift start of graph ID to zero for 'MUTAG' to match python indexing.
325+
INFO:kgcnn.data.MUTAG:Graph index which has unconnected '[]' with '[]' in total '0'.
326+
327+
328+
329+
330+
.. parsed-literal::
331+
332+
dict_keys(['node_degree', 'node_labels', 'edge_indices', 'edge_labels', 'graph_labels', 'node_attributes', 'edge_attributes', 'node_symbol', 'node_number', 'graph_size'])
333+
334+
335+
336+
Here are some examples on custom usage of the base classes:
337+
338+
MoleculeNetDatasets
339+
^^^^^^^^^^^^^^^^^^^
132340

133341

134342
**note**: You can find this page as jupyter notebook in

0 commit comments

Comments
 (0)