It runs tests and parses nosetests output to detect failures and test
them for regressions against a reference version. If it finds a
regression, it is automatically bisected.
Unstable or flaky tests are detected and ignored automatically by
running them multiple times in a row.
We keep the original test suite around, but mark it as allowed to fail.
It serves as a dashboard of current test statuses, but since the test
can fail out of our control (this is the essence of this project), we
don't want it to be blocking.
Using the regression detection as a fail means that any failing build
need to be examined. Even if the next build is "fixed", it does not mean
that the regression has been fixed. This is a change in semantics when
analyzing build history.
We map/reduce by splitting the test suite in 7 parts by abusing travis'
matrix feature. Reduce is done by hand, by analyzing the dashboard of
travis, any failing test being critical.
Because nosetests --processes just doesn't work with youtube-dl test
suite (yet), we route around it by first enumerating tests. This is a
bit long because nosetests needs to find and run all tests files in
order to enumerate them, but it should be at most 30 seconds, while the
test suite can take more than 2 hours on travis' infrastructure.
By doing this, we ensure we'll be able to run the tests faster, since
they are mostly I/O (network) bound due to the nature of the project.
Closes#8496