|
| 1 | +[Home](README.md) | [Previous - Examples of using DeepSpeech](EXAMPLES.md) |
| 2 | + |
| 3 | +# Continuous Integration with DeepSpeech |
| 4 | + |
| 5 | +- [Continuous Integration with DeepSpeech](#continuous-integration-with-deepspeech) |
| 6 | + * [An introduction to GitHub Actions](#an-introduction-to-github-actions) |
| 7 | + * [Key concepts and how they relate to files in the DeepSpeech GitHub repo](#key-concepts-and-how-they-relate-to-files-in-the-deepspeech-github-repo) |
| 8 | + + [Workflows](#workflows) |
| 9 | + + [Events](#events) |
| 10 | + + [Jobs, Steps and Actions](#jobs--steps-and-actions) |
| 11 | + + [Runners](#runners) |
| 12 | + + [Tensorflow builds, cache limitations and how they have been worked around](#tensorflow-builds--cache-limitations-and-how-they-have-been-worked-around) |
| 13 | + |
| 14 | +_This section of the PlayBook assumes that you have [forked](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) DeepSpeech and wish to customize it for your own purposes. The intent of this section is to provide guidance on setting up [continuous integration](https://en.wikipedia.org/wiki/Continuous_integration) using [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/introduction-to-github-actions)._ |
| 15 | + |
| 16 | +## An introduction to GitHub Actions |
| 17 | + |
| 18 | +DeepSpeech uses GitHub Actions for continuous integration (CI). We recommend that you read through [this introduction to GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/introduction-to-github-actions) before proceeding to the rest of the document. |
| 19 | + |
| 20 | +## Key concepts and how they relate to files in the DeepSpeech GitHub repo |
| 21 | + |
| 22 | +This section outlines the key concepts of GitHub Actions, and relates the concepts to files you will see in the DeepSpeech GitHub repository. |
| 23 | + |
| 24 | +### Workflows |
| 25 | + |
| 26 | +_Workflows_ are comprised of one or more _Jobs_. _Workflows_ can be scheduled to execute at a particular time, or can be triggered by an _Event_. |
| 27 | + |
| 28 | +In DeepSpeech, _Workflows_ are defined in the `.github` directory within the `DeepSpeech` directory: |
| 29 | + |
| 30 | +``` |
| 31 | +~/DeepSpeech/.github$ ls |
| 32 | +total 20 |
| 33 | +4 drwxrwxr-x 4 root root 4096 Apr 20 22:52 ./ |
| 34 | +4 drwxrwxr-x 16 root root 4096 Apr 20 22:52 ../ |
| 35 | +4 drwxrwxr-x 16 root root 4096 Apr 20 22:52 actions/ |
| 36 | +4 -rw-rw-r-- 1 root root 1153 Feb 1 23:58 lock.yml |
| 37 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 workflows/ |
| 38 | +``` |
| 39 | + |
| 40 | +As of the time of writing, the following _Workflows_ had been defined for DeepSpeech: |
| 41 | +``` |
| 42 | +~/DeepSpeech/.github/workflows$ ls |
| 43 | +total 72 |
| 44 | + 4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 ./ |
| 45 | + 4 drwxrwxr-x 4 root root 4096 Apr 20 22:52 ../ |
| 46 | +56 -rw-rw-r-- 1 root root 55439 Apr 20 22:52 build-and-test.yml |
| 47 | + 4 -rw-rw-r-- 1 root root 786 Apr 20 22:52 docker.yml |
| 48 | + 0 -rw-rw-r-- 1 root root 0 Apr 20 22:52 .git-keep-empty-folder |
| 49 | + 4 -rw-rw-r-- 1 root root 1047 Apr 20 22:52 lint.yml |
| 50 | +``` |
| 51 | + |
| 52 | +_Workflows_ are defined using the [YAML](https://yaml.org/) scripting language. Each _Workflow_ specifies an _Event_ on which the _Workflow_ should run, and the _Jobs_ that should be run as part of the _Workflow_. |
| 53 | + |
| 54 | +### Events |
| 55 | + |
| 56 | +An _Event_ is an activity. Examples of activities might be a `push` event, when a developer does a `git push` to the `DeepSpeech` repository. Other events include when a developer does a `git commit` to the repository, or when a developer opens a new `pull request` or `issue`. |
| 57 | + |
| 58 | +_Events_ are specified in a _Workflow_. For example, in `DeepSpeech`, most of the _Workflows_ are triggered by `git push` events: |
| 59 | + |
| 60 | +``` |
| 61 | +~/DeepSpeech/.github/workflows$ cat build-and-test.yml |
| 62 | +name: "Builds and tests" |
| 63 | +on: |
| 64 | + pull_request: |
| 65 | + push: |
| 66 | + branches: |
| 67 | + - master |
| 68 | +``` |
| 69 | + |
| 70 | +In this example, which is an exerpt from the `build-and-test.yml` file, the YAML script is specifying that the _Workflow_ should only run on two _Events_; on `pull requests` and on `push` events. `push` events are specified in more detail, and in this example, the _Workflow_ will only run on `push` events on the `master` branch of the repository - and not, say, on a working branch. |
| 71 | + |
| 72 | +### Jobs, Steps and Actions |
| 73 | + |
| 74 | +A _Workflow_ contains many _Jobs_. _Jobs_ can execute in parallel, or can be configured to execute sequentially if needed. A _Job_ is a collection of _Steps_ that are executed as a collection. Each _Step_ can run commands or _Actions_, and _Steps_ can share data with each other. _Actions_ are "pre-built" _Steps_ that other GitHub users have created. |
| 75 | + |
| 76 | +An example of a _Job_ within a `DeepSpeech` _Workflow_ (the `build-and-test.yml` _Workflow_) is shown below. You can see it has several _Steps_, some of which are commands, and some of which are _Actions_. |
| 77 | + |
| 78 | +``` |
| 79 | +jobs: |
| 80 | + # Linux jobs |
| 81 | + swig_Windows_crosscompiled: |
| 82 | + name: "Lin|Build SWIG for Windows" |
| 83 | + runs-on: ubuntu-20.04 |
| 84 | + env: |
| 85 | + CFLAGS: "-static-libgcc -static-libstdc++" |
| 86 | + CXXFLAGS: "-static-libgcc -static-libstdc++" |
| 87 | + steps: |
| 88 | + - run: | |
| 89 | + sudo apt-get install -y --no-install-recommends autoconf automake bison build-essential mingw-w64 |
| 90 | + - uses: actions/checkout@v2 |
| 91 | + with: |
| 92 | + repository: "lissyx/swig" |
| 93 | + ref: "fec7d5d3179833e37759ffc6532f86344982e26a" |
| 94 | + - run: | |
| 95 | + mkdir -p build-static/ |
| 96 | + - run: | |
| 97 | + curl -sSL https://ftp.pcre.org/pub/pcre/pcre-8.43.tar.gz > pcre-8.43.tar.gz |
| 98 | + ./Tools/pcre-build.sh --host=x86_64-w64-mingw32 |
| 99 | + - run: | |
| 100 | + sh autogen.sh |
| 101 | + ./configure --host=x86_64-w64-mingw32 \ |
| 102 | + --prefix=`pwd`/build-static/ \ |
| 103 | + --program-prefix=ds- |
| 104 | + - run: | |
| 105 | + make -j |
| 106 | + - run: | |
| 107 | + make install |
| 108 | + - uses: actions/upload-artifact@v2 |
| 109 | + with: |
| 110 | + name: ${{ github.job }} |
| 111 | + path: ${{ github.workspace }}/build-static/ |
| 112 | +``` |
| 113 | + |
| 114 | +The _Actions_ specific for `DeepSpeech` are housed in the `.github/actions` directory of the repository: |
| 115 | + |
| 116 | +``` |
| 117 | +~/DeepSpeech/.github/actions$ ls |
| 118 | +total 64 |
| 119 | +4 drwxrwxr-x 16 root root 4096 Apr 20 22:52 ./ |
| 120 | +4 drwxrwxr-x 4 root root 4096 Apr 20 22:52 ../ |
| 121 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 build-tensorflow/ |
| 122 | +4 drwxrwxr-x 3 root root 4096 Apr 20 22:52 check_artifact_exists/ |
| 123 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 get_cache_key/ |
| 124 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 host-build/ |
| 125 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 install-python-upstream/ |
| 126 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 node-build/ |
| 127 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 numpy_vers/ |
| 128 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 package/ |
| 129 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 package-tensorflow/ |
| 130 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 python-build/ |
| 131 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 run-tests/ |
| 132 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 select-xcode/ |
| 133 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 setup-tensorflow/ |
| 134 | +4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 win-install-sox/ |
| 135 | +``` |
| 136 | + |
| 137 | +Within each of these directories is a `.yml` file that defines an _Action_. Each _Action_ calls a script. The scripts are usually written in `bash` or `node.js`, depending on which makes it easier to interact with the GitHub API. The scripts are held in the `ci_scripts` directory in the `DeepSpeech` repository: |
| 138 | + |
| 139 | +``` |
| 140 | +~/DeepSpeech/ci_scripts$ ls |
| 141 | +total 164 |
| 142 | + 4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 ./ |
| 143 | + 4 drwxrwxr-x 16 root root 4096 Apr 20 22:52 ../ |
| 144 | + 4 -rwxrwxr-x 1 root root 3678 Apr 20 22:52 all-utils.sh |
| 145 | + 4 -rwxrwxr-x 1 root root 2472 Apr 20 22:52 all-vars.sh |
| 146 | +24 -rwxrwxr-x 1 root root 23668 Apr 20 22:52 asserts.sh |
| 147 | + 4 -rwxrwxr-x 1 root root 1300 Apr 20 22:52 build-utils.sh |
| 148 | + 4 -rwxrwxr-x 1 root root 382 Apr 20 22:52 cpp-bytes-tests.sh |
| 149 | + 4 -rwxrwxr-x 1 root root 488 Apr 20 22:52 cpp-tests-prod.sh |
| 150 | + 4 -rwxrwxr-x 1 root root 345 Apr 20 22:52 cpp-tests.sh |
| 151 | + 4 -rwxrwxr-x 1 root root 428 Apr 20 22:52 cpp_tflite_basic-tests.sh |
| 152 | + 4 -rwxrwxr-x 1 root root 569 Apr 20 22:52 cpp_tflite-tests-prod.sh |
| 153 | + 4 -rwxrwxr-x 1 root root 541 Apr 20 22:52 cpp_tflite-tests.sh |
| 154 | + 4 -rwxrwxr-x 1 root root 318 Apr 20 22:52 cppwin-tests.sh |
| 155 | + 4 -rwxrwxr-x 1 root root 467 Apr 20 22:52 cppwin_tflite-tests.sh |
| 156 | + 4 -rw-rw-r-- 1 root root 316 Apr 20 22:52 docs-requirements.txt |
| 157 | + 4 -rwxrwxr-x 1 root root 602 Apr 20 22:52 electronjs-tests-prod.sh |
| 158 | + 4 -rwxrwxr-x 1 root root 384 Apr 20 22:52 electronjs-tests.sh |
| 159 | + 4 -rwxrwxr-x 1 root root 631 Apr 20 22:52 electronjs_tflite-tests-prod.sh |
| 160 | + 4 -rwxrwxr-x 1 root root 523 Apr 20 22:52 electronjs_tflite-tests.sh |
| 161 | + 4 -rwxrwxr-x 1 root root 538 Apr 20 22:52 host-build.sh |
| 162 | + 4 -rwxrwxr-x 1 root root 556 Apr 20 22:52 node-tests-prod.sh |
| 163 | + 4 -rwxrwxr-x 1 root root 343 Apr 20 22:52 node-tests.sh |
| 164 | + 4 -rwxrwxr-x 1 root root 591 Apr 20 22:52 node_tflite-tests-prod.sh |
| 165 | + 4 -rwxrwxr-x 1 root root 482 Apr 20 22:52 node_tflite-tests.sh |
| 166 | + 4 -rwxrwxr-x 1 root root 614 Apr 20 22:52 package.sh |
| 167 | + 4 -rwxrwxr-x 1 root root 2787 Apr 20 22:52 package-utils.sh |
| 168 | + 4 -rwxrwxr-x 1 root root 521 Apr 20 22:52 python-tests-prod.sh |
| 169 | + 4 -rwxrwxr-x 1 root root 274 Apr 20 22:52 python-tests.sh |
| 170 | + 4 -rwxrwxr-x 1 root root 502 Apr 20 22:52 python_tflite-tests-prod.sh |
| 171 | + 4 -rwxrwxr-x 1 root root 413 Apr 20 22:52 python_tflite-tests.sh |
| 172 | + 4 -rwxrwxr-x 1 root root 2688 Apr 20 22:52 tf-build.sh |
| 173 | + 4 -rwxrwxr-x 1 root root 2384 Apr 20 22:52 tf-package.sh |
| 174 | + 4 -rwxrwxr-x 1 root root 3371 Apr 20 22:52 tf-setup.sh |
| 175 | +12 -rwxrwxr-x 1 root root 8375 Apr 20 22:52 tf-vars.sh |
| 176 | +``` |
| 177 | + |
| 178 | +### Runners |
| 179 | + |
| 180 | +A _Runner_ is a server that listens for available _Jobs_ to run. The _Runner_ runs each _Job_ one at a time and then reports the results back to GitHub. Here, we assume you will use the built-in _Runner_ on GitHub to perform continuous integration. |
| 181 | + |
| 182 | +This example assumes you have cloned DeepSpeech to the location: |
| 183 | + |
| 184 | +`https://github.com/{USERNAME}/DeepSpeech/` |
| 185 | + |
| 186 | +1. Navigate to your GitHub repository's Settings -> Actions. This should be located at `https://github.com/{USERNAME}/DeepSpeech/settings/actions`. It's a good idea for security to change the default Action Permissions from "Allow all actions" to "Allow local actions only". |
| 187 | + |
| 188 | +2. Next, navigate to your GitHub repository's Actions. This should be located at `https://github.com/{USERNAME}/DeepSpeech/actions`. You should see three _Workflows_ already set up. GitHub recognises these from the files stored in the `.github/workflows` directory. Remember from the [Events](#Events) section earlier in this tutorial that the _Workflows_ are set up to trigger on `pull_request` and `push` events that occur on the `master` branch of the code. |
| 189 | + |
| 190 | +3. So, to test that they are working, we need to make a change to the `DeepSpeech` repository that triggers one of those events. Here, will create a `push` event because it's easier to demonstrate. Navigate to the repository's Code listing. This should be at `https://github.com/{USERNAME}/DeepSpeech`. Add a file called `test-workflow.txt` containing dummy text and commit the change. This should trigger a `push` event, and the _Workflows_ will activate. You can monitor the _Workflows_ executing from the repository's Actions page. This is at `https://github.com/{USERNAME}/DeepSpeech/actions`. |
| 191 | + |
| 192 | +It's good to be aware that the first time that your GitHub Actions run, the _Workflow_ will take around four hours to execute. This is due to the TensorFlow build cache. Subsequent runs will be much shorter; typically we have seen them take just over an hour to execute. |
| 193 | + |
| 194 | +### Tensorflow builds, cache limitations and how they have been worked around |
| 195 | + |
| 196 | +As part of understanding DeepSpeech's continuous improvement pipeline, it is useful to know about the limitations of the GitHub platform, and how they have been worked around. |
| 197 | + |
| 198 | +* Building TensorFlow for DeepSpeech takes approximately 3 hours using GitHub Actions. However, TensorFlow _itself_ does not change upon each `pull request` - unless TensorFlow itself is being changed (such as a version upgrade, which is infrequent). |
| 199 | + |
| 200 | +* Additionally, the GitHub cache is limited to 5GB for the `DeepSpeech` repository. This cache size is smaller than the build cache required for TensorFlow. That is, building just TensorFlow would consume _all_ the cache for the entire repository. |
| 201 | + |
| 202 | +* To work around this limitation, the TensorFlow library build is split into two parts. This allows GitHub Actions to run more quickly when a `git commit` or `pull request` event occurs. The two parts are: |
| 203 | + |
| 204 | + - a TensorFlow prebuild cache |
| 205 | + - the actual code of the TensorFlow library |
| 206 | + |
| 207 | +That is, the cache is _prebuilt_ as part of the build pipeline, so that it does not exceed the capacity of GitHub Actions. |
| 208 | + |
| 209 | +* The _Actions_ used to facilitate this are `get_cache_key` and `check_artifact_exists`. They are in the `./github/actions` directory of the `DeepSpeech` repository. `get_cache_key` allows a cache to be accessed. `check_artifact_exists` determines whether an artifact has already been built - whether it is `missing` or `found`. This allows decisions around whether to build the artifact in the _Workflow_ to be made. |
| 210 | + |
| 211 | +For [more information on caching _Workflow_ dependencies in GitHub Actions, please see the GitHub documentation](https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows). |
| 212 | + |
| 213 | +[Home](README.md) | [Previous - Examples of using DeepSpeech](EXAMPLES.md) |
0 commit comments