Add docs publishing spec
Change-Id: I39b4927226a126438e410a89d805a38ce68d4f22
This commit is contained in:
parent
bec7680a04
commit
9c34887b7c
290
specs/doc-publishing.rst
Normal file
290
specs/doc-publishing.rst
Normal file
@ -0,0 +1,290 @@
|
||||
::
|
||||
|
||||
Copyright 2014 Hewlett-Packard Development Company, L.P.
|
||||
|
||||
This work is licensed under a Creative Commons Attribution 3.0
|
||||
Unported License.
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0
|
||||
Unported License.
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=========================
|
||||
Docs Publishing via Swift
|
||||
=========================
|
||||
|
||||
Story: https://storyboard.openstack.org/#!/story/168
|
||||
|
||||
We need to update the docs publishing pipeline to add features the doc
|
||||
team needs as well as provide a process that does not rely on Jenkins
|
||||
since we intend to retire it.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
Juno summit session: https://etherpad.openstack.org/p/summit-b301-ci-doc-automation
|
||||
|
||||
The current doc publishing process uses Jenkins to FTP the results of
|
||||
doc builds to a Rackspace cloud site. Multiple versions of docs (eg,
|
||||
for releases or branches) are handled by publishing to a subdirectory
|
||||
of the destination. The publishing job for the nova developer
|
||||
documentation is configured to publish everything that appears in
|
||||
`doc/build/html` to `developer/nova`, so that content generated from
|
||||
the master branch appears at
|
||||
`http://docs.openstack.org/developer/nova/`. However, the doc build
|
||||
script detects if it is being run on a change that has merged to a
|
||||
stable branch and moves the output into a subdirectory before
|
||||
uploading. Therefore, havana documentation ends up in
|
||||
`doc/build/html/havana`, and the entire `doc/build/html` tree is still
|
||||
uploaded to `developer/nova` but since no content is present above the
|
||||
`havana` subdirectory, the documentation from the master branch is not
|
||||
touched, while the new havana docs appear at
|
||||
`http://docs.openstack.org/developer/nova/havana/`. Similar
|
||||
approaches may be used for other kinds of documentation builds.
|
||||
|
||||
Put another way, the docs jobs do not have a holistic view of the site
|
||||
to which they are publishing, but instead operate only within the
|
||||
context of one subdirectory and are expected not to interfere with
|
||||
jobs publishing to other locations in the same site.
|
||||
|
||||
The major drawback for the docs team is that this mechanism does not
|
||||
support automatically deleting a file when it has been removed from
|
||||
the documentation. Even if it is not linked internally, it may still
|
||||
remain in search engine indexes and users may find it.
|
||||
|
||||
The infrastructure team wants to remove Jenkins, and both the current
|
||||
FTP and SCP based publishers depend on Jenkins unique security
|
||||
arrangements to prevent exposure of the credentials used. Our
|
||||
intended replacement has no such feature and we expect workers to be
|
||||
completely untrusted.
|
||||
|
||||
The current system is also not atomic or concurrency safe, as multiple
|
||||
publishing jobs may be running at the same time, and using SCP or FTP
|
||||
to upload simultaneously.
|
||||
|
||||
This process is also very similar to log and artifact publishing, and
|
||||
keeping alignment with those process is desirable.
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
The new system will use a dedicated virtual server running Apache and
|
||||
using LVM cinder volumes to serve content. The existing build+publish
|
||||
Jenkins job will be replaced with two jobs: the build job (which will
|
||||
only build the documents and upload them to an intermediate storage
|
||||
location) and the publish job (which will run locally on the docs
|
||||
server and retrieve the contents and publish them to their final
|
||||
locations).
|
||||
|
||||
As part of the effort to migrate away from Jenkins, we have developed
|
||||
a script that uploads logs from test runs to an OpenStack Swift
|
||||
container. Each job is granted limited permission to upload logs with
|
||||
a specific prefix to their ids (like a subdirectory) within the
|
||||
container. We will use the same script to upload the results of the
|
||||
build job to Swift. For jobs that run in the check queue that
|
||||
currently publish to docs-draft, this will be the end of the process
|
||||
(Zuul can link directly to the swift URL). [Alternative: use
|
||||
os-loganalize to proxy from swift at a more friendly url.]
|
||||
|
||||
For final publishing, further action is needed. The general approach
|
||||
will be to have a dedicated Zuul worker on the publishing server that
|
||||
will download files from swift and rsync them to the correct location.
|
||||
|
||||
The publish job needs to know what files to download from swift (as it
|
||||
can not rely on index pages). It should also do this without using
|
||||
any Swift credentials (mostly for simplicity; there is not much of a
|
||||
security concern with the publish worker accessing the Swift API if it
|
||||
needs to). In order to retrieve the full set of files from swift, the
|
||||
build job should create a manifest file called ``.manifest.txt`` at
|
||||
the root of the output directory with a recursive directory listing of
|
||||
all files to be published. The publish job can then retrieve that
|
||||
file (by constructing the URL from the ZUUL_* environment variables)
|
||||
and subsequently all of the contents that it lists. [Alternative: have
|
||||
the build job upload a tarball, but that means we can't use exactly
|
||||
the same process for draft and publishing.]
|
||||
|
||||
The publish job also needs to determine the target directory to rsync.
|
||||
To do this, it will take the contents of the ZUUL_PROJECT environment
|
||||
variable as a base, and to this it will append the contents of the
|
||||
``.target.txt`` file that it will expect to find in the contents it
|
||||
downloaded from swift. The file should be empty for rsyncing to the
|
||||
root ("openstack/nova"), or contain, e.g., the string "havana" to
|
||||
rsync to "nova/havana". This lets the build jobs specify the final
|
||||
publishing location but only within the context of the project (eg,
|
||||
nova can't accidentally overwrite keystone's documentation).
|
||||
|
||||
Before the publish job rsyncs the downloaded data into its final
|
||||
location, it must first create a list of directories that should not
|
||||
be deleted. This way if an entire directory is removed from a
|
||||
document, it will still be removed from the website, but directories
|
||||
which are themselves roots of other documents (e.g., the havana
|
||||
branch) are not removed. A marker file at the root of each such
|
||||
directory will accomplish this, and in fact, simply leaving either the
|
||||
.target.txt or .manifest.txt files in place after copying to the
|
||||
destination will suffice. The publishing job should find each of
|
||||
those in the destination hierarchy and add their containing
|
||||
directories to a list of directories to exclude from rsyncing. Then
|
||||
an rsync command of the form::
|
||||
|
||||
rsync -a --delete-after --exclude-from=/path/to/exclude-file /src/ /dest/
|
||||
|
||||
should safely update and delete only the relevant data.
|
||||
|
||||
The openstack-manuals repo builds multiple manuals, in separate
|
||||
subdirectories, from a single repository. Using an overlay method
|
||||
similar to what is used for the developer documentation, the current
|
||||
build/publishing job updates only the changed documents and places
|
||||
them in appropriate target directories. The approach described above
|
||||
assumes the build job will output the full contents of a single
|
||||
"module" each time; having missing directories in that output risks
|
||||
the publisher removing them in the rsync step. One of the following
|
||||
approaches will need to be chosen to deal with this:
|
||||
|
||||
1) Reconfigure the docs build job to build all of the manuals for
|
||||
publication (the optimization to only build changed manuals could
|
||||
be retained for efficiency in gating). This wastes some CPU time
|
||||
on build hosts but perhaps not that much (and should be
|
||||
quantified).
|
||||
|
||||
2) Alter the above approach to handle multiple rsync source and
|
||||
destination path outputs for a single job. This makes the
|
||||
build/publishing process more complex.
|
||||
|
||||
3) Allow the build job to provide an initial exclusion list for the
|
||||
rsync command (so that it can add directories that it knows are
|
||||
under its control but are not being updated in this run).
|
||||
|
||||
After the rsync is complete, the documents will be in a location that
|
||||
does not necessarily map to the desired URL. The apache process on
|
||||
the docs server can be configured to rewrite URLs as necessary. For
|
||||
instance::
|
||||
|
||||
docs.openstack.org/developer/nova/ -> /srv/docs/openstack/nova/
|
||||
docs.openstack.org/icehouse/install-guide -> /srv/docs/openstack/openstack-manuals/icehouse/install-guide
|
||||
|
||||
Could be achieved with rewrite rules like::
|
||||
|
||||
/developer/(.*) -> /srv/docs/openstack/$1
|
||||
anything else: -> /srv/docs/openstack/openstack-manuals/
|
||||
|
||||
Finally, in the rare cases of major restructuring, or the need to
|
||||
delete an entire "module" from the site, a member of infra-root can
|
||||
log in and manually remove anything needed.
|
||||
|
||||
The developer.openstack.org and specs.openstack.org sites are
|
||||
published using the same mechanisms as docs.openstack.org currently.
|
||||
Under the new system, we can create apache virtual hosts for these
|
||||
sites that connect the appropriate URLs with their publishing
|
||||
locations on disk.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The above implementation has several minor alternative changes noted
|
||||
within. In addition to this approach, we also considered the
|
||||
following:
|
||||
|
||||
ReadTheDocs
|
||||
~~~~~~~~~~~
|
||||
|
||||
While readthedocs does handle docs publishing, including being
|
||||
version-aware, it is specific to python-based sphinx documentation and
|
||||
would not be useful for openstack-manuals (or other artifacts). It is
|
||||
also considered quite complex to set up.
|
||||
|
||||
AFS
|
||||
~~~
|
||||
|
||||
The Andrew File System is a global distributed filesystem that would
|
||||
work quite well in this instance. Workers could be granted limited
|
||||
ACLs to publish to specific locations, so we could use the current
|
||||
combined build+publish job approach, but the worker could rsync
|
||||
directly to the final publishing location in AFS, and volume
|
||||
replication could be used to make atomic updates to the entire site.
|
||||
A static web server would then serve files out of AFS; more web
|
||||
servers can be added as needed to scale.
|
||||
|
||||
This approach requires some investment in creating and maintaining an
|
||||
AFS cell for OpenStack, as well as some enhancement work to Nodepool
|
||||
and Zuul to deal with Kerberos credentials. This is all work that we
|
||||
would like to do for other reasons (including mirrors), but is more
|
||||
substantial than what would be needed for the selected approach.
|
||||
Moreover, it should not be difficult to move from the selected
|
||||
approach to use AFS later should a cell materialize.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
TBD
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Create new publish.openstack.org server (this will be a server name
|
||||
that is not publicized, instead we will use apache virtual hosts for
|
||||
the public hostnames which will be CNAME DNS entries).
|
||||
* Create apache vhosts for docs.openstack.org,
|
||||
developer.openstack.org, and specs.openstack.org on
|
||||
publish.openstack.org
|
||||
* Create new doc build job that publishes to swift; start running this
|
||||
in addition to current publishing jobs on at least one project and
|
||||
openstack-manuals
|
||||
* Enhance the doc publishing jobs to create .target.txt files
|
||||
* Enhance the swift-upload tool to create .manifest.txt files
|
||||
* Write and install the Zuul worker that will run on the docs server
|
||||
* After testing, add the new jobs to all projects
|
||||
* Copy data from the FTP site
|
||||
* Change DNS to point to the new server
|
||||
* Remove old build jobs
|
||||
* Remove Rackspace cloud sites instances
|
||||
|
||||
Repositories
|
||||
------------
|
||||
|
||||
N/A.
|
||||
|
||||
Servers
|
||||
-------
|
||||
|
||||
publish.openstack.org will be a new server with LVM managed cinder
|
||||
volumes. Perhaps using SSD.
|
||||
|
||||
DNS Entries
|
||||
-----------
|
||||
|
||||
publish.openstack.org will need to point to the new server.
|
||||
docs.openstack.org, developer.openstack.org, and specs.openstack.org
|
||||
will need their TTLs lowered in advance of the moves. On moving, they
|
||||
will become CNAME entries for publish.openstack.org.
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
Infra documentation will need to be written for the new server and
|
||||
this process.
|
||||
|
||||
Security
|
||||
--------
|
||||
|
||||
The build jobs will have no special access and will only be able to
|
||||
put content in swift. The publishing job will run locally on the docs
|
||||
server, but will run no user-supplied code, and will constrain the
|
||||
publishing of content to a project-specific area.
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
||||
This can operate in parallel with the current system without
|
||||
disruption.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
We should finalize the log publishing system first (this is nearly
|
||||
done at the time of writing).
|
Loading…
Reference in New Issue
Block a user