Skip to content

Commit 4af2935

Browse files
authored
Merge pull request #21 from mozilla/feature/19-github-actions
Add GitHub Actions CI documentation
2 parents dbd6a11 + e8f2578 commit 4af2935

File tree

3 files changed

+221
-4
lines changed

3 files changed

+221
-4
lines changed

CONTINUOUS_INTEGRATION.md

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
[Home](README.md) | [Previous - Examples of using DeepSpeech](EXAMPLES.md)
2+
3+
# Continuous Integration with DeepSpeech
4+
5+
- [Continuous Integration with DeepSpeech](#continuous-integration-with-deepspeech)
6+
* [An introduction to GitHub Actions](#an-introduction-to-github-actions)
7+
* [Key concepts and how they relate to files in the DeepSpeech GitHub repo](#key-concepts-and-how-they-relate-to-files-in-the-deepspeech-github-repo)
8+
+ [Workflows](#workflows)
9+
+ [Events](#events)
10+
+ [Jobs, Steps and Actions](#jobs--steps-and-actions)
11+
+ [Runners](#runners)
12+
+ [Tensorflow builds, cache limitations and how they have been worked around](#tensorflow-builds--cache-limitations-and-how-they-have-been-worked-around)
13+
14+
_This section of the PlayBook assumes that you have [forked](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) DeepSpeech and wish to customize it for your own purposes. The intent of this section is to provide guidance on setting up [continuous integration](https://en.wikipedia.org/wiki/Continuous_integration) using [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/introduction-to-github-actions)._
15+
16+
## An introduction to GitHub Actions
17+
18+
DeepSpeech uses GitHub Actions for continuous integration (CI). We recommend that you read through [this introduction to GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/introduction-to-github-actions) before proceeding to the rest of the document.
19+
20+
## Key concepts and how they relate to files in the DeepSpeech GitHub repo
21+
22+
This section outlines the key concepts of GitHub Actions, and relates the concepts to files you will see in the DeepSpeech GitHub repository.
23+
24+
### Workflows
25+
26+
_Workflows_ are comprised of one or more _Jobs_. _Workflows_ can be scheduled to execute at a particular time, or can be triggered by an _Event_.
27+
28+
In DeepSpeech, _Workflows_ are defined in the `.github` directory within the `DeepSpeech` directory:
29+
30+
```
31+
~/DeepSpeech/.github$ ls
32+
total 20
33+
4 drwxrwxr-x 4 root root 4096 Apr 20 22:52 ./
34+
4 drwxrwxr-x 16 root root 4096 Apr 20 22:52 ../
35+
4 drwxrwxr-x 16 root root 4096 Apr 20 22:52 actions/
36+
4 -rw-rw-r-- 1 root root 1153 Feb 1 23:58 lock.yml
37+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 workflows/
38+
```
39+
40+
As of the time of writing, the following _Workflows_ had been defined for DeepSpeech:
41+
```
42+
~/DeepSpeech/.github/workflows$ ls
43+
total 72
44+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 ./
45+
4 drwxrwxr-x 4 root root 4096 Apr 20 22:52 ../
46+
56 -rw-rw-r-- 1 root root 55439 Apr 20 22:52 build-and-test.yml
47+
4 -rw-rw-r-- 1 root root 786 Apr 20 22:52 docker.yml
48+
0 -rw-rw-r-- 1 root root 0 Apr 20 22:52 .git-keep-empty-folder
49+
4 -rw-rw-r-- 1 root root 1047 Apr 20 22:52 lint.yml
50+
```
51+
52+
_Workflows_ are defined using the [YAML](https://yaml.org/) scripting language. Each _Workflow_ specifies an _Event_ on which the _Workflow_ should run, and the _Jobs_ that should be run as part of the _Workflow_.
53+
54+
### Events
55+
56+
An _Event_ is an activity. Examples of activities might be a `push` event, when a developer does a `git push` to the `DeepSpeech` repository. Other events include when a developer does a `git commit` to the repository, or when a developer opens a new `pull request` or `issue`.
57+
58+
_Events_ are specified in a _Workflow_. For example, in `DeepSpeech`, most of the _Workflows_ are triggered by `git push` events:
59+
60+
```
61+
~/DeepSpeech/.github/workflows$ cat build-and-test.yml
62+
name: "Builds and tests"
63+
on:
64+
pull_request:
65+
push:
66+
branches:
67+
- master
68+
```
69+
70+
In this example, which is an exerpt from the `build-and-test.yml` file, the YAML script is specifying that the _Workflow_ should only run on two _Events_; on `pull requests` and on `push` events. `push` events are specified in more detail, and in this example, the _Workflow_ will only run on `push` events on the `master` branch of the repository - and not, say, on a working branch.
71+
72+
### Jobs, Steps and Actions
73+
74+
A _Workflow_ contains many _Jobs_. _Jobs_ can execute in parallel, or can be configured to execute sequentially if needed. A _Job_ is a collection of _Steps_ that are executed as a collection. Each _Step_ can run commands or _Actions_, and _Steps_ can share data with each other. _Actions_ are "pre-built" _Steps_ that other GitHub users have created.
75+
76+
An example of a _Job_ within a `DeepSpeech` _Workflow_ (the `build-and-test.yml` _Workflow_) is shown below. You can see it has several _Steps_, some of which are commands, and some of which are _Actions_.
77+
78+
```
79+
jobs:
80+
# Linux jobs
81+
swig_Windows_crosscompiled:
82+
name: "Lin|Build SWIG for Windows"
83+
runs-on: ubuntu-20.04
84+
env:
85+
CFLAGS: "-static-libgcc -static-libstdc++"
86+
CXXFLAGS: "-static-libgcc -static-libstdc++"
87+
steps:
88+
- run: |
89+
sudo apt-get install -y --no-install-recommends autoconf automake bison build-essential mingw-w64
90+
- uses: actions/checkout@v2
91+
with:
92+
repository: "lissyx/swig"
93+
ref: "fec7d5d3179833e37759ffc6532f86344982e26a"
94+
- run: |
95+
mkdir -p build-static/
96+
- run: |
97+
curl -sSL https://ftp.pcre.org/pub/pcre/pcre-8.43.tar.gz > pcre-8.43.tar.gz
98+
./Tools/pcre-build.sh --host=x86_64-w64-mingw32
99+
- run: |
100+
sh autogen.sh
101+
./configure --host=x86_64-w64-mingw32 \
102+
--prefix=`pwd`/build-static/ \
103+
--program-prefix=ds-
104+
- run: |
105+
make -j
106+
- run: |
107+
make install
108+
- uses: actions/upload-artifact@v2
109+
with:
110+
name: ${{ github.job }}
111+
path: ${{ github.workspace }}/build-static/
112+
```
113+
114+
The _Actions_ specific for `DeepSpeech` are housed in the `.github/actions` directory of the repository:
115+
116+
```
117+
~/DeepSpeech/.github/actions$ ls
118+
total 64
119+
4 drwxrwxr-x 16 root root 4096 Apr 20 22:52 ./
120+
4 drwxrwxr-x 4 root root 4096 Apr 20 22:52 ../
121+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 build-tensorflow/
122+
4 drwxrwxr-x 3 root root 4096 Apr 20 22:52 check_artifact_exists/
123+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 get_cache_key/
124+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 host-build/
125+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 install-python-upstream/
126+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 node-build/
127+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 numpy_vers/
128+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 package/
129+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 package-tensorflow/
130+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 python-build/
131+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 run-tests/
132+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 select-xcode/
133+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 setup-tensorflow/
134+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 win-install-sox/
135+
```
136+
137+
Within each of these directories is a `.yml` file that defines an _Action_. Each _Action_ calls a script. The scripts are usually written in `bash` or `node.js`, depending on which makes it easier to interact with the GitHub API. The scripts are held in the `ci_scripts` directory in the `DeepSpeech` repository:
138+
139+
```
140+
~/DeepSpeech/ci_scripts$ ls
141+
total 164
142+
4 drwxrwxr-x 2 root root 4096 Apr 20 22:52 ./
143+
4 drwxrwxr-x 16 root root 4096 Apr 20 22:52 ../
144+
4 -rwxrwxr-x 1 root root 3678 Apr 20 22:52 all-utils.sh
145+
4 -rwxrwxr-x 1 root root 2472 Apr 20 22:52 all-vars.sh
146+
24 -rwxrwxr-x 1 root root 23668 Apr 20 22:52 asserts.sh
147+
4 -rwxrwxr-x 1 root root 1300 Apr 20 22:52 build-utils.sh
148+
4 -rwxrwxr-x 1 root root 382 Apr 20 22:52 cpp-bytes-tests.sh
149+
4 -rwxrwxr-x 1 root root 488 Apr 20 22:52 cpp-tests-prod.sh
150+
4 -rwxrwxr-x 1 root root 345 Apr 20 22:52 cpp-tests.sh
151+
4 -rwxrwxr-x 1 root root 428 Apr 20 22:52 cpp_tflite_basic-tests.sh
152+
4 -rwxrwxr-x 1 root root 569 Apr 20 22:52 cpp_tflite-tests-prod.sh
153+
4 -rwxrwxr-x 1 root root 541 Apr 20 22:52 cpp_tflite-tests.sh
154+
4 -rwxrwxr-x 1 root root 318 Apr 20 22:52 cppwin-tests.sh
155+
4 -rwxrwxr-x 1 root root 467 Apr 20 22:52 cppwin_tflite-tests.sh
156+
4 -rw-rw-r-- 1 root root 316 Apr 20 22:52 docs-requirements.txt
157+
4 -rwxrwxr-x 1 root root 602 Apr 20 22:52 electronjs-tests-prod.sh
158+
4 -rwxrwxr-x 1 root root 384 Apr 20 22:52 electronjs-tests.sh
159+
4 -rwxrwxr-x 1 root root 631 Apr 20 22:52 electronjs_tflite-tests-prod.sh
160+
4 -rwxrwxr-x 1 root root 523 Apr 20 22:52 electronjs_tflite-tests.sh
161+
4 -rwxrwxr-x 1 root root 538 Apr 20 22:52 host-build.sh
162+
4 -rwxrwxr-x 1 root root 556 Apr 20 22:52 node-tests-prod.sh
163+
4 -rwxrwxr-x 1 root root 343 Apr 20 22:52 node-tests.sh
164+
4 -rwxrwxr-x 1 root root 591 Apr 20 22:52 node_tflite-tests-prod.sh
165+
4 -rwxrwxr-x 1 root root 482 Apr 20 22:52 node_tflite-tests.sh
166+
4 -rwxrwxr-x 1 root root 614 Apr 20 22:52 package.sh
167+
4 -rwxrwxr-x 1 root root 2787 Apr 20 22:52 package-utils.sh
168+
4 -rwxrwxr-x 1 root root 521 Apr 20 22:52 python-tests-prod.sh
169+
4 -rwxrwxr-x 1 root root 274 Apr 20 22:52 python-tests.sh
170+
4 -rwxrwxr-x 1 root root 502 Apr 20 22:52 python_tflite-tests-prod.sh
171+
4 -rwxrwxr-x 1 root root 413 Apr 20 22:52 python_tflite-tests.sh
172+
4 -rwxrwxr-x 1 root root 2688 Apr 20 22:52 tf-build.sh
173+
4 -rwxrwxr-x 1 root root 2384 Apr 20 22:52 tf-package.sh
174+
4 -rwxrwxr-x 1 root root 3371 Apr 20 22:52 tf-setup.sh
175+
12 -rwxrwxr-x 1 root root 8375 Apr 20 22:52 tf-vars.sh
176+
```
177+
178+
### Runners
179+
180+
A _Runner_ is a server that listens for available _Jobs_ to run. The _Runner_ runs each _Job_ one at a time and then reports the results back to GitHub. Here, we assume you will use the built-in _Runner_ on GitHub to perform continuous integration.
181+
182+
This example assumes you have cloned DeepSpeech to the location:
183+
184+
`https://github.com/{USERNAME}/DeepSpeech/`
185+
186+
1. Navigate to your GitHub repository's Settings -> Actions. This should be located at `https://github.com/{USERNAME}/DeepSpeech/settings/actions`. It's a good idea for security to change the default Action Permissions from "Allow all actions" to "Allow local actions only".
187+
188+
2. Next, navigate to your GitHub repository's Actions. This should be located at `https://github.com/{USERNAME}/DeepSpeech/actions`. You should see three _Workflows_ already set up. GitHub recognises these from the files stored in the `.github/workflows` directory. Remember from the [Events](#Events) section earlier in this tutorial that the _Workflows_ are set up to trigger on `pull_request` and `push` events that occur on the `master` branch of the code.
189+
190+
3. So, to test that they are working, we need to make a change to the `DeepSpeech` repository that triggers one of those events. Here, will create a `push` event because it's easier to demonstrate. Navigate to the repository's Code listing. This should be at `https://github.com/{USERNAME}/DeepSpeech`. Add a file called `test-workflow.txt` containing dummy text and commit the change. This should trigger a `push` event, and the _Workflows_ will activate. You can monitor the _Workflows_ executing from the repository's Actions page. This is at `https://github.com/{USERNAME}/DeepSpeech/actions`.
191+
192+
It's good to be aware that the first time that your GitHub Actions run, the _Workflow_ will take around four hours to execute. This is due to the TensorFlow build cache. Subsequent runs will be much shorter; typically we have seen them take just over an hour to execute.
193+
194+
### Tensorflow builds, cache limitations and how they have been worked around
195+
196+
As part of understanding DeepSpeech's continuous improvement pipeline, it is useful to know about the limitations of the GitHub platform, and how they have been worked around.
197+
198+
* Building TensorFlow for DeepSpeech takes approximately 3 hours using GitHub Actions. However, TensorFlow _itself_ does not change upon each `pull request` - unless TensorFlow itself is being changed (such as a version upgrade, which is infrequent).
199+
200+
* Additionally, the GitHub cache is limited to 5GB for the `DeepSpeech` repository. This cache size is smaller than the build cache required for TensorFlow. That is, building just TensorFlow would consume _all_ the cache for the entire repository.
201+
202+
* To work around this limitation, the TensorFlow library build is split into two parts. This allows GitHub Actions to run more quickly when a `git commit` or `pull request` event occurs. The two parts are:
203+
204+
- a TensorFlow prebuild cache
205+
- the actual code of the TensorFlow library
206+
207+
That is, the cache is _prebuilt_ as part of the build pipeline, so that it does not exceed the capacity of GitHub Actions.
208+
209+
* The _Actions_ used to facilitate this are `get_cache_key` and `check_artifact_exists`. They are in the `./github/actions` directory of the `DeepSpeech` repository. `get_cache_key` allows a cache to be accessed. `check_artifact_exists` determines whether an artifact has already been built - whether it is `missing` or `found`. This allows decisions around whether to build the artifact in the _Workflow_ to be made.
210+
211+
For [more information on caching _Workflow_ dependencies in GitHub Actions, please see the GitHub documentation](https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows).
212+
213+
[Home](README.md) | [Previous - Examples of using DeepSpeech](EXAMPLES.md)

EXAMPLES.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[Home](README.md) | [Previous - Deploying your model](DEPLOYMENT.md)
1+
[Home](README.md) | [Previous - Deploying your model](DEPLOYMENT.md) | [Next - Setting up Continuous Integration](CONTINUOUS_INTEGRATION.md)
22

33
# Example applications of DeepSpeech
44

@@ -44,4 +44,4 @@ One example of a voice-controlled application using DeepSpeech is the [voice add
4444

4545
---
4646

47-
[Home](README.md) | [Previous - Deploying your model](DEPLOYMENT.md)
47+
[Home](README.md) | [Previous - Deploying your model](DEPLOYMENT.md) | [Next - Setting up Continuous Integration](CONTINUOUS_INTEGRATION.md)

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ If you are training a model that uses a different alphabet to English, for examp
2828

2929
## [Building your own scorer](SCORER.md)
3030

31-
Learn what the scorer does, and how you can go about building your own.
31+
Learn what the scorer does, and how you can go about building your own.
3232

3333
## [Acoustic model and language model](AM_vs_LM.md)
3434

@@ -54,6 +54,10 @@ Once trained and tested, your model is deployed. This section provides an overvi
5454

5555
This section covers specific use cases where DeepSpeech can be applied to real world problems, such as transcription, keyword searching and voice controlled applications.
5656

57+
## [Setting up Continuous Integration](CONTINUOUS_INTEGRATION.md)
58+
59+
Learn how to set up Continuous Integration (CI) for your own fork of DeepSpeech. Intended for developers who are utilising DeepSpeech for their own specific use cases.
60+
5761
---
5862

5963
# Introductory courses on machine learning
@@ -66,7 +70,7 @@ Here, we've linked to several resources that you may find helpful; they're liste
6670

6771
* [Google's machine learning crash course](https://developers.google.com/machine-learning/crash-course/ml-intro) provides a gentle introduction to the main concepts of machine learning, including _gradient descent_, _learning rate_, _training, test and validation sets_ and _overfitting_.
6872

69-
* If machine learning is something that sparks your interest, then you may enjoy [the MIT Open Learning Library's Introduction to Machine Learning course](https://openlearninglibrary.mit.edu/courses/course-v1:MITx+6.036+1T2019/course/), a 13-week college-level course covering perceptrons, neural networks, support vector machines and convolutional neural networks.
73+
* If machine learning is something that sparks your interest, then you may enjoy [the MIT Open Learning Library's Introduction to Machine Learning course](https://openlearninglibrary.mit.edu/courses/course-v1:MITx+6.036+1T2019/course/), a 13-week college-level course covering perceptrons, neural networks, support vector machines and convolutional neural networks.
7074

7175
---
7276

0 commit comments

Comments
 (0)