Skip to content

Commit ec923ed

Browse files
authored
Merge pull request #18 from salsadigitalauorg/develop
Release/0.2.0
2 parents 50b29ad + 780f045 commit ec923ed

38 files changed

+1395
-311
lines changed

.circleci/config.yml

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,15 +74,56 @@ jobs:
7474
--tag $CIRCLE_TAG \
7575
--name merlin-framework \
7676
--file /tmp/merlin-framework.phar
77+
deploy_docs:
78+
docker:
79+
- image: circleci/php:7.3-stretch-node-browsers
80+
steps:
81+
- checkout
82+
- checkout:
83+
path: /tmp/docs
84+
- run:
85+
name: "Deploy docs"
86+
command: |
87+
git config --global user.email "docusaurus-bot@users.noreply.github.com"
88+
git config --global user.name "Website Deployment Script"
89+
90+
git -C /tmp/docs checkout --track origin/docs
91+
92+
npm --prefix=/tmp/docs/website install
93+
94+
./.circleci/scripts/docs-sidebar /tmp/docs
95+
96+
cp ~/project/docs/* /tmp/docs/docs
97+
98+
cd /tmp/docs/website
99+
npm run version $CIRCLE_TAG
100+
101+
cd /tmp/docs
102+
git add .
103+
git commit -m "Automated documentation generation"
104+
git push origin docs -f
105+
106+
cd /tmp/docs/website
107+
CURRENT_BRANCH=docs npm run publish-gh-pages
77108
78109
workflows:
79110
version: 2
80111
main:
81112
jobs:
82-
- build
113+
- build:
114+
filters:
115+
branches:
116+
ignore:
117+
- docs
83118
- deploy:
84119
filters:
85120
branches:
86121
ignore: /.*/
87122
tags:
88123
only: /^\d+\.\d+\.\d+$/
124+
- deploy_docs:
125+
filters:
126+
branches:
127+
ignore: /.*/
128+
tags:
129+
only: /^\d+\.\d+\.\d+$/

.circleci/scripts/docs-sidebar

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/usr/local/bin/php
2+
3+
<?php
4+
/**
5+
*
6+
*/
7+
$docs_dir = $argv[1];
8+
9+
if (!file_exists("$docs_dir/website/sidebars.json")) {
10+
echo "Invalid documentation directory.";
11+
exit(1);
12+
}
13+
14+
$dir = new RecursiveDirectoryIterator(__DIR__ . "/../../docs");
15+
$iterator = new RecursiveIteratorIterator($dir, RecursiveIteratorIterator::SELF_FIRST);
16+
17+
$menu_configuration = [
18+
"docs" => [
19+
'Introduction' => [],
20+
'Types' => [],
21+
'Processors' => [],
22+
],
23+
];
24+
25+
foreach ($iterator as $file) {
26+
if ($file->isFile()) {
27+
$contents = file_get_contents($file->getPathname());
28+
$id = [];
29+
preg_match("/id:\s([-\w]+)/", $contents, $id);
30+
if (empty($id[1])) {
31+
// Not a valid doc file.
32+
continue;
33+
}
34+
preg_match("/weight:\s([-\d]+)/", $contents, $weight);
35+
$weight = empty($weight[1]) ? 0 : $weight[1];
36+
$menu_configuration['docs'][get_menu_key($id[1])][$id[1]] = $weight;
37+
}
38+
}
39+
40+
foreach ($menu_configuration['docs'] as $type => &$links) {
41+
asort($links);
42+
$links = array_keys($links);
43+
}
44+
45+
echo "Updated sidebar!" . PHP_EOL;
46+
file_put_contents("$docs_dir/website/sidebars.json", json_encode($menu_configuration, JSON_PRETTY_PRINT));
47+
exit(0);
48+
49+
/**
50+
* Get the doctype for a file this will be used to write the sidebar menu.
51+
*
52+
* @return string
53+
* The menu key.
54+
*/
55+
function get_menu_key($id)
56+
{
57+
$parts = explode('-', $id);
58+
$type = reset($parts);
59+
60+
switch ($type) {
61+
case 'processor':
62+
return 'Processors';
63+
case 'type':
64+
return 'Types';
65+
default:
66+
return 'Introduction';
67+
}
68+
}

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ composer.phar
33
*.json
44
*.html
55
_local
6+
website
67
# Commit your application's lock file https://getcomposer.org/doc/01-basic-usage.md#commit-your-composer-lock-file-to-version-control
78
# You may choose to ignore a library lock file http://getcomposer.org/doc/02-libraries.md#lock-file
89
# composer.lock

docs/ExampleConfiguration.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
id: examples
3+
title: Examples
4+
---
5+
6+
# Menu
7+
8+
Menu structures use the `menu_link` type. This sample configuration demonstrates how to pull the main menu from the Health.vic site, with parent/child relationships in-tact.
9+
10+
The selector uses an Xpath to reference the element in the DOM, in this case all list-items contained in the header nav are evaluated for top level links. The `text` and `link` options are sub-selectors to help define where link text and link values should come from.
11+
12+
The `children` section allows for sub-menu items to be defined via their own `selector` and configuration.
13+
14+
```
15+
---
16+
domain: https://www2.health.vic.gov.au
17+
18+
urls:
19+
- /
20+
21+
entity_type: menus
22+
23+
mappings:
24+
-
25+
field: main_menu
26+
name: health_main_menu
27+
type: menu_link
28+
selector: '//*[@class="header-nav"]/*/ul/li'
29+
options:
30+
text: './a'
31+
link: './a/@href'
32+
remove_duplicates: true
33+
children:
34+
-
35+
type: menu_link
36+
selector: './descendant::li[@class="dd-level2"]'
37+
options:
38+
text: './a/h3'
39+
link: './a/@href'
40+
```
41+
42+
# URL aliases
43+
44+
The URL alias of each content should be preserved so URLs can remain in-tact when migrated into the destination CMS. Simply attach the `alias` type to the mappings configuration to ensure URL aliases are captured.
45+
46+
```
47+
mappings:
48+
-
49+
field: alias
50+
type: alias
51+
```
52+
53+
54+
# Basic text
55+
56+
Basic text fields can be mapped in the `mappings` section using the `text` type. Example configuration below:
57+
58+
```
59+
mappings:
60+
-
61+
field: title
62+
selector: "#phbody_1_ctl01_h1Title"
63+
type: text
64+
```
65+
66+
This type was used for the 'key messages' content. It supports both individual items, or arrays of items, e.g in the case of key messages there are multiple matches on the selector, so an array of plain-text results will exist in the JSON object for import.
67+
68+
```
69+
mappings:
70+
-
71+
field: field_key_messages
72+
selector: .m-key-messages .m-b li
73+
type: text
74+
processors:
75+
convert_encoding:
76+
to_encoding: "HTML-ENTITIES"
77+
from_encoding: UTF-8
78+
html_entity_decode: { }
79+
whitespace: { }
80+
```
81+
82+
This also includes additional processors, more detail on these can be found on the [Processors]() page.
83+
84+
# Long, formatted text
85+
86+
Long text is used for body content, or anywhere a rich-text WYSIWYG editor may be used. It also allows for embedded media (e.g documents, images).
87+
88+
This content will generally pass through multiple processors to ensure clean markup, and optionally allows for stripping undesirable attributes or tags.
89+
90+
The below example would capture an entire body of content found within the `#main` div, removing non-standard tags, removing empty tags, and stripping whitespace.
91+
92+
```
93+
mappings:
94+
-
95+
field: field_paragraph_body
96+
selector: '//*[@id="main"]'
97+
type: long_text
98+
processors:
99+
- processor: remove_empty_tags
100+
-
101+
processor: convert_encoding
102+
to_encoding: HTML-ENTITIES
103+
from_encoding: UTF-8
104+
-
105+
processor: strip_tags
106+
allowed_tags: <h1><h2><h3><h4><h5><ul><ol><dl><dt><dd><li><p><a><strong><em><cite><blockquote><code><s><span><sup><sub><table><caption><tbody><thead><tfoot><th><td><tr><hr><pre><drupal-entity><br>
107+
remove_attr:
108+
- class
109+
- id
110+
- style
111+
- processor: whitespace
112+
```

docs/GettingStarted.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
id: getting-started
3+
title: Getting Started
4+
weight: -1
5+
---
6+
7+
The migration tool provides a standard mechanism for scraping content from DHHS websites, split into logical content structures, and perform additional processing to ensure a result ready for import into Drupal.
8+
9+
- Initial code is available on https://github.com/salsadigitalauorg/merlin-framework
10+
- As this codebase is likely to be open-sourced and see ongoing development effort the branch `<TBD>` is the safest to use with DHHS migration configurations
11+
12+
13+
# Core concepts
14+
15+
The migration framework expects to take a YAML (.yml) file containing all the configuration required for a migration run. A separate migration configuration exists for each logical content structure split, for example these may be:
16+
- Menus
17+
- Content Type A
18+
- Content Type B
19+
- Taxonomy A
20+
- Taxonomy B
21+
- .. etc
22+
23+
Each configuration file contains a reference to either a website domain and list of URLs, or a path to relevant XML files (see [XML File Support]()).
24+
25+
Content from these sources are then passed through mappings, which take selectors (XPath or JQuery-like selectors) to map content from the DOM to the JSON file that gets generated during a run. These data values can also pass through processors to further refine and alter the data.
26+
27+
# Prerequisites
28+
The framework requires PHP (latest recommended, but tested on most versions of 7.x) and composer. All other dependencies will be pulled in by running a `composer install`
29+
30+
# Running a migration
31+
To run a migration simply run the tool with the input configuration .yml file, and a path to the output, e.g:
32+
33+
`php migrate generate -c configs/bhc/fact_sheet.yml -o /path/to/output/`
34+
35+
You will see output as following:
36+
```
37+
Migration framework
38+
===================
39+
40+
Preparing the configuration
41+
---------------------------
42+
43+
[OK] Done!
44+
45+
Processing requests
46+
-------------------
47+
48+
Parsing... https://www.betterhealth.vic.gov.au/health/conditionsandtreatments/Treating-persistent-pain (Done!)
49+
50+
... etc (x2000 pages)
51+
52+
Generating files
53+
----------------
54+
55+
Generating /tmp/page_type.json Done!
56+
Generating /tmp/error-not-found.json Done!
57+
Generating /tmp/media-image-bhc_fact_sheet.json Done!
58+
Generating /tmp/call_to_action.json Done!
59+
Generating /tmp/content_partner.json Done!
60+
Generating /tmp/fact_sheet.json Done!
61+
Generating /tmp/error-404.json Done!
62+
Generating /tmp/media-embedded_video-bhc_fact_sheet.json Done!
63+
64+
[OK] Done!
65+
66+
Completed in 87.295419931412
67+
```
68+
69+
## Refreshing JSON assets
70+
71+
The resulting JSON files are now ready to push into the Drupal Migration plugins. These files should be hosted somewhere that Drupal can access, e.g a web-accessible URL.
72+
73+
## Error handling
74+
75+
There are JSON files generated with error reporting included. These may include `error-not-found.json`, `error-404.json` and `error-unhandled.json`. These will indicate where selectors cannot find matches on any given page, or where a URL does not resolve (404, 500, or similar).

docs/ProcessorConvertEncoding.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
id: processor-convert-encoding
3+
title: Convert Encoding
4+
sidebar_label: Convert Encoding
5+
---
6+
7+
Converts character encoding of data from one type to another. This uses `mb_convert_encoding` and should allow the same values.
8+
9+
- [phpdocs](https://www.php.net/manual/en/function.mb-convert-encoding.php)
10+
11+
## Options
12+
13+
- **to_encoding**`<default: UTF-8>`: The encoding to convert to.
14+
- **from_encoding**`<default: null>`: The encoding to convert form.
15+
16+
## Usage
17+
18+
```
19+
processors:
20+
-
21+
processor: convert_encoding
22+
to_encoding: UTF-8
23+
from_encoding: auto
24+
```

docs/ProcessorHtmlEntityDecode.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
id: processor-html-entity-decode
3+
title: Html Entity Decode
4+
sidebar_label: Html Entity Decode
5+
---
6+
7+
Converts HTML entities (e.g `&quot;`) to a string.
8+
9+
## Options
10+
11+
Doesn't provide options.
12+
13+
## Usage
14+
15+
```
16+
processors:
17+
-
18+
processor: html_entity_decode
19+
```

docs/ProcessorNl2br.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
id: processor-nl2br
3+
title: nl2br
4+
sidebar_label: nl2br
5+
---
6+
7+
Converts raw newlines to `<br>` markup.
8+
9+
## Options
10+
11+
Doesn't provide options.
12+
13+
## Usage
14+
15+
```
16+
processors:
17+
-
18+
processor: nl1br
19+
```

0 commit comments

Comments
 (0)