Skip to content

Commit 16dcc65

Browse files
steveworleyAndrew Rowlandsstooitgargsuchisonnykt
authored
Release 1.0.0 (#157)
* Improved relative link handling * New Group type, docs, tests that didnt make it pre namespace change * Introduces allowed_classes filtering. Fixes encoding issues * UnwrapLinks processor * More comprehensive unwrap links but still WIP * Add option for referer to fetcher * Unescape slashes on json output * Whitespace leave one space * Latest group type * namespace exception * Support for uuidv3 on group item content and json output to be consumed as paragraphs in Drupal world * Allow generic any name of output * Optionally use Guzzle redirect info for speed * Use Guzzle redirect * Check a url exists in cache and report path * Use Guzzle redirect info * composer * Group crawl by query string * Track redirects on crawl in Guzzle * Add mandatory support for field in group * Build effective after redirect url lists * Option to use effective url in fetcher if redirect * Group crawled urls by regex * PHP warnings * More unicode fixing * More options and features * More unicode fixes * Pass in whole object to callback * Fix redirect check * More unicoe support * More unicode support * More unicode support * Add method to return console io * General group_uuid instead of paragraph * Support for extra media attributes * Use results from fetcher, remove JSON UTF8 error check * New sub_fetch processor to fetch and process an URL. Nested Merls. * Proper check for config and rename entity based on config * Track what page media was on * Support for a prebuilt alias map * comment * Generator for mappings * spelling * Array config holder for sub_fetch processor * composer * Moved uuid generation to standard MerlinUuid method. * Unicode menu links * process_file for xpath Type/Media * Comment typo * Better error reporting for SubFetch. WIP Still needs a bit more tidying up in the case the fetched thing wasn't TEXT/HTML. * Use v4 ip resolve for Curl options, a lot faster * Use v4 ip resolve for Curl options, a lot faster * Resolve robots.txt ignore. * Fixed ordered type to emit the field name as well. * Updated error message to be more descriptive. * Allowed to have dot in cache dir name. * Fixing ordered. * Allowed to group URLs by the value of a meta tag. * Added a URL options flag to control content duplicates for redirects. * Print url cache path from CLI exists lookup. * Remove alpha UnwrapLinks type. * MD rendering. * Linting. * Remove old unused functions. * Fix existing tests. * Linting. * Remove old getMapping(). * Comment typo. * Return original reset comment. * phpcs * Add cURL IP resolve method as option. * Default address IP resolve to any/whatever. * Update Fetcher Docs. * Make some feature of group optional. * Update Group type tests and docs. * Docs update. * Pass same config object to Output as used in GenerateCommand * Rename _redirected_from. Add curl ip resolve func. * Use ip resolve func. * Getter for multicurl object * Separate build duplicates function * Options for SubFetch. * phpcs, typos * Save sub fetch status error similar to normal fetch. * Subfetch tests. * Composer update. * Typo and missing JSON files for subfetch test. * sub_fetch processor docs. * Add is_external flag to redirect info. * Only add internal or non redirect links to queue when loading from cache. * Only add redirect to effective url list if internal. * Update browsershot for dependencies vulnerability. * Minor package update. Moved from drupal-entity to drupal-media tags. * Updated packages. * MediaNullAttributeTest update. * Update tests. * Use puppeteer orb. * Remove orb in favour of hardcoding. * Apt-update. * Update to non-strech debian. * Add the google signing key Co-authored-by: Andrew Rowlands <andrew@firecannon.com> Co-authored-by: Stuart Rowlands <1256274+stooit@users.noreply.github.com> Co-authored-by: Stuart Rowlands <stuart@firecannon.com> Co-authored-by: Suchi Garg <gargsuchi@gmail.com> Co-authored-by: Sonny Kieu <sonny@salsadigital.com.au> Co-authored-by: Stuart Rowlands <stuart.rowlands@quantcdn.io>
1 parent a130915 commit 16dcc65

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+4810
-1290
lines changed

.circleci/config.yml

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,36 @@
22
#
33
# Check https://circleci.com/docs/2.0/language-php/ for more details
44
#
5-
version: 2.0
5+
version: 2.1
6+
67
jobs:
78
build:
89
docker:
910
# Specify the version you desire here
10-
- image: circleci/php:7.3-stretch-node-browsers
11-
11+
- image: circleci/php:7.3-node-browsers
1212
steps:
1313
- checkout
14+
- run:
15+
name: Update apt repositories
16+
command: |
17+
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add
18+
sudo apt update
19+
20+
- run:
21+
name: Install Headless Chrome dependencies
22+
command: |
23+
sudo apt-get install -yq \
24+
gconf-service libasound2 libatk1.0-0 libatk-bridge2.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \
25+
libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \
26+
libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 \
27+
libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates \
28+
fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
1429
1530
- run:
1631
name: Install puppeteer with chromium
1732
command: |
1833
npm i puppeteer
1934
20-
- run: sudo apt update
2135
- run: sudo docker-php-ext-install zip
2236
- run: sudo docker-php-ext-install exif && sudo docker-php-ext-enable exif
2337

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ website
77
# Commit your application's lock file https://getcomposer.org/doc/01-basic-usage.md#commit-your-composer-lock-file-to-version-control
88
# You may choose to ignore a library lock file http://getcomposer.org/doc/02-libraries.md#lock-file
99
# composer.lock
10+
node_modules

composer.json

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -31,22 +31,23 @@
3131
"migration"
3232
],
3333
"require": {
34-
"symfony/yaml": "~4.2.0",
35-
"mustangostang/spyc": "^0.6.2",
3634
"chuyskywalker/rolling-curl": "^3.1",
37-
"symfony/dom-crawler": "~4.2.0",
38-
"symfony/css-selector": "~4.2.0",
35+
"consolidation/comments": "^1.0",
36+
"dompdf/dompdf": "^0.8.3",
37+
"league/uri": "^6.0",
3938
"masterminds/html5": "^2.5",
40-
"symfony/console": "~4.2.0",
39+
"mustangostang/spyc": "^0.6.2",
40+
"myclabs/deep-copy": "^1.9",
41+
"php-curl-class/php-curl-class": "^8.6",
4142
"ramsey/uuid": "^3.8",
42-
"spatie/crawler": "^4.4",
43-
"consolidation/comments": "^1.0",
43+
"samchristy/piechart": "^2.0",
4444
"spatie/browsershot": "^3.32",
45-
"php-curl-class/php-curl-class": "^8.6",
46-
"myclabs/deep-copy": "^1.9",
47-
"twig/twig": "^2.0",
48-
"dompdf/dompdf": "^0.8.3",
49-
"samchristy/piechart": "^2.0"
45+
"spatie/crawler": "^4.4",
46+
"symfony/console": "~4.4.0",
47+
"symfony/css-selector": "~4.4.0",
48+
"symfony/dom-crawler": "~4.4.0",
49+
"symfony/yaml": "~4.4.0",
50+
"twig/twig": "^2.0"
5051
},
5152
"require-dev": {
5253
"phpunit/phpunit": "^7.5",
@@ -60,7 +61,7 @@
6061
},
6162
"config": {
6263
"platform": {
63-
"php": "7.2.0"
64+
"php": "7.2.5"
6465
},
6566
"optimize-autoloader": true,
6667
"sort-packages": true

0 commit comments

Comments
 (0)