|
| 1 | +--- |
| 2 | +id: getting-started |
| 3 | +title: Getting Started |
| 4 | +weight: -1 |
| 5 | +--- |
| 6 | + |
| 7 | +The migration tool provides a standard mechanism for scraping content from DHHS websites, split into logical content structures, and perform additional processing to ensure a result ready for import into Drupal. |
| 8 | + |
| 9 | +- Initial code is available on https://github.com/salsadigitalauorg/merlin-framework |
| 10 | +- As this codebase is likely to be open-sourced and see ongoing development effort the branch `<TBD>` is the safest to use with DHHS migration configurations |
| 11 | + |
| 12 | + |
| 13 | +# Core concepts |
| 14 | + |
| 15 | +The migration framework expects to take a YAML (.yml) file containing all the configuration required for a migration run. A separate migration configuration exists for each logical content structure split, for example these may be: |
| 16 | +- Menus |
| 17 | +- Content Type A |
| 18 | +- Content Type B |
| 19 | +- Taxonomy A |
| 20 | +- Taxonomy B |
| 21 | +- .. etc |
| 22 | + |
| 23 | +Each configuration file contains a reference to either a website domain and list of URLs, or a path to relevant XML files (see [XML File Support]()). |
| 24 | + |
| 25 | +Content from these sources are then passed through mappings, which take selectors (XPath or JQuery-like selectors) to map content from the DOM to the JSON file that gets generated during a run. These data values can also pass through processors to further refine and alter the data. |
| 26 | + |
| 27 | +# Prerequisites |
| 28 | +The framework requires PHP (latest recommended, but tested on most versions of 7.x) and composer. All other dependencies will be pulled in by running a `composer install` |
| 29 | + |
| 30 | +# Running a migration |
| 31 | +To run a migration simply run the tool with the input configuration .yml file, and a path to the output, e.g: |
| 32 | + |
| 33 | +`php migrate generate -c configs/bhc/fact_sheet.yml -o /path/to/output/` |
| 34 | + |
| 35 | +You will see output as following: |
| 36 | +``` |
| 37 | +Migration framework |
| 38 | +=================== |
| 39 | +
|
| 40 | +Preparing the configuration |
| 41 | +--------------------------- |
| 42 | +
|
| 43 | + [OK] Done! |
| 44 | +
|
| 45 | +Processing requests |
| 46 | +------------------- |
| 47 | +
|
| 48 | +Parsing... https://www.betterhealth.vic.gov.au/health/conditionsandtreatments/Treating-persistent-pain (Done!) |
| 49 | +
|
| 50 | + ... etc (x2000 pages) |
| 51 | +
|
| 52 | +Generating files |
| 53 | +---------------- |
| 54 | +
|
| 55 | +Generating /tmp/page_type.json Done! |
| 56 | +Generating /tmp/error-not-found.json Done! |
| 57 | +Generating /tmp/media-image-bhc_fact_sheet.json Done! |
| 58 | +Generating /tmp/call_to_action.json Done! |
| 59 | +Generating /tmp/content_partner.json Done! |
| 60 | +Generating /tmp/fact_sheet.json Done! |
| 61 | +Generating /tmp/error-404.json Done! |
| 62 | +Generating /tmp/media-embedded_video-bhc_fact_sheet.json Done! |
| 63 | +
|
| 64 | + [OK] Done! |
| 65 | +
|
| 66 | +Completed in 87.295419931412 |
| 67 | +``` |
| 68 | + |
| 69 | +## Refreshing JSON assets |
| 70 | + |
| 71 | +The resulting JSON files are now ready to push into the Drupal Migration plugins. These files should be hosted somewhere that Drupal can access, e.g a web-accessible URL. |
| 72 | + |
| 73 | +## Error handling |
| 74 | + |
| 75 | +There are JSON files generated with error reporting included. These may include `error-not-found.json`, `error-404.json` and `error-unhandled.json`. These will indicate where selectors cannot find matches on any given page, or where a URL does not resolve (404, 500, or similar). |
0 commit comments