Discussion:
[Numpy-discussion] Move scipy.org docs to Github?
Matthew Brett
2017-03-15 02:21:12 UTC
Permalink
Hi,

The scipy.org site is down at the moment, and has been for more than 36 hours:

https://github.com/numpy/numpy/issues/8779#issue-213781439

This has happened before:

https://github.com/scipy/scipy.org/issues/187#issue-186426408

I think it was down for about 24 hours that time.

From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.

It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.

What do y'all think?

Cheers,

Matthew
Ralf Gommers
2017-03-15 09:47:36 UTC
Permalink
Post by Matthew Brett
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted)
infrastructure. I suspect that Github Pages won't work, we'll exceed or be
close to exceeding both the 1 GB site size limit and the 100 GB/month
bandwidth limit [1].

Rough bandwidth estimate (using page size from
http://scipy.github.io/devdocs/ and Alexa stats): 2 million visits per
month, 2.5 page views per visit, 5 kb/page = 25 GB/month (html). Add to
that pdf docs, which are ~20 MB in size: if only a small fraction of
visitors download those, we'll be at >100 GB.

Ralf

[1] https://help.github.com/articles/what-is-github-pages/#usage-limits
Ilhan Polat
2017-03-15 10:55:47 UTC
Permalink
In the meantime maybe it's a good idea to keep one of the issues open so
that people can see that this is an open issue? As we close them they
disappear from the issues tab on Github
Post by Ralf Gommers
Post by Matthew Brett
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted)
infrastructure. I suspect that Github Pages won't work, we'll exceed or be
close to exceeding both the 1 GB site size limit and the 100 GB/month
bandwidth limit [1].
Rough bandwidth estimate (using page size from http://scipy.github.io/
devdocs/ and Alexa stats): 2 million visits per month, 2.5 page views per
visit, 5 kb/page = 25 GB/month (html). Add to that pdf docs, which are ~20
MB in size: if only a small fraction of visitors download those, we'll be
at >100 GB.
Ralf
[1] https://help.github.com/articles/what-is-github-pages/#usage-limits
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Ilhan Polat
2017-03-15 10:59:53 UTC
Permalink
By the way, this is not bad at all in the absence of the actual
documentation http://devdocs.io/numpy~1.12/
Post by Ilhan Polat
In the meantime maybe it's a good idea to keep one of the issues open so
that people can see that this is an open issue? As we close them they
disappear from the issues tab on Github
Post by Ralf Gommers
Post by Matthew Brett
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted)
infrastructure. I suspect that Github Pages won't work, we'll exceed or be
close to exceeding both the 1 GB site size limit and the 100 GB/month
bandwidth limit [1].
Rough bandwidth estimate (using page size from
http://scipy.github.io/devdocs/ and Alexa stats): 2 million visits per
month, 2.5 page views per visit, 5 kb/page = 25 GB/month (html). Add to
that pdf docs, which are ~20 MB in size: if only a small fraction of
visitors download those, we'll be at >100 GB.
Ralf
[1] https://help.github.com/articles/what-is-github-pages/#usage-limits
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Bryan Van de ven
2017-03-15 13:10:30 UTC
Permalink
Doesn't help all the CI builds that are failing because they utilize intersphinx to link to the official docs, unfortunately.

Bryan
By the way, this is not bad at all in the absence of the actual documentation http://devdocs.io/numpy~1.12/
In the meantime maybe it's a good idea to keep one of the issues open so that people can see that this is an open issue? As we close them they disappear from the issues tab on Github
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted) infrastructure. I suspect that Github Pages won't work, we'll exceed or be close to exceeding both the 1 GB site size limit and the 100 GB/month bandwidth limit [1].
Rough bandwidth estimate (using page size from http://scipy.github.io/devdocs/ and Alexa stats): 2 million visits per month, 2.5 page views per visit, 5 kb/page = 25 GB/month (html). Add to that pdf docs, which are ~20 MB in size: if only a small fraction of visitors download those, we'll be at >100 GB.
Ralf
[1] https://help.github.com/articles/what-is-github-pages/#usage-limits
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2017-03-15 12:24:41 UTC
Permalink
Post by Matthew Brett
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted)
infrastructure. I suspect that Github Pages won't work, we'll exceed or be
close to exceeding both the 1 GB site size limit and the 100 GB/month
bandwidth limit [1].

Rough bandwidth estimate (using page size from http://scipy.github.io/
devdocs/ and Alexa stats): 2 million visits per month, 2.5 page views per
visit, 5 kb/page = 25 GB/month (html). Add to that pdf docs, which are ~20
MB in size: if only a small fraction of visitors download those, we'll be
at >100 GB.


No matter where we go, we can likely reduce the endpoint bandwidth
requirements substantially by putting something like cloudflare's free tier
in front. That doesn't help for the actual disk size though of course...

-n
Bryan Van de ven
2017-03-15 13:16:52 UTC
Permalink
NumPy is a NumFocus fiscally sponsored project, perhaps they can help with the costs of different/better hosting.

Bryan
Post by Matthew Brett
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted) infrastructure. I suspect that Github Pages won't work, we'll exceed or be close to exceeding both the 1 GB site size limit and the 100 GB/month bandwidth limit [1].
Rough bandwidth estimate (using page size from http://scipy.github.io/devdocs/ and Alexa stats): 2 million visits per month, 2.5 page views per visit, 5 kb/page = 25 GB/month (html). Add to that pdf docs, which are ~20 MB in size: if only a small fraction of visitors download those, we'll be at >100 GB.
No matter where we go, we can likely reduce the endpoint bandwidth requirements substantially by putting something like cloudflare's free tier in front. That doesn't help for the actual disk size though of course...
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Benjamin Root
2017-03-15 15:07:02 UTC
Permalink
How is it that scipy.org's bandwidth usage is that much greater than
matplotlib.org's? We are quite image-heavy, but we haven't hit any
bandwidth limits that I am aware of.

Ben Root
Post by Bryan Van de ven
NumPy is a NumFocus fiscally sponsored project, perhaps they can help with
the costs of different/better hosting.
Bryan
Post by Matthew Brett
Hi,
The scipy.org site is down at the moment, and has been for more than 36
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better
(hosted) infrastructure. I suspect that Github Pages won't work, we'll
exceed or be close to exceeding both the 1 GB site size limit and the 100
GB/month bandwidth limit [1].
Post by Matthew Brett
Rough bandwidth estimate (using page size from http://scipy.github.io/
devdocs/ and Alexa stats): 2 million visits per month, 2.5 page views per
visit, 5 kb/page = 25 GB/month (html). Add to that pdf docs, which are ~20
MB in size: if only a small fraction of visitors download those, we'll be
at >100 GB.
Post by Matthew Brett
No matter where we go, we can likely reduce the endpoint bandwidth
requirements substantially by putting something like cloudflare's free tier
in front. That doesn't help for the actual disk size though of course...
Post by Matthew Brett
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2017-03-15 21:56:52 UTC
Permalink
Post by Bryan Van de ven
NumPy is a NumFocus fiscally sponsored project, perhaps they can help with the costs of different/better hosting.
Enthought already provides hosting and operations support (thanks!) –
the problem is that it doesn't make sense to have a full-time ops
person just for numpy, but if we're taking a tiny slice of someone's
time then occasional outages are going to happen.

The advantage of something like github pages is that it's big enough
that it *does* have dedicated ops support.

As long as we can fit under the 1 gig size limit then GH pages seems
like the best option so far... it's reliable, widely understood, and
all of the limits besides the 1 gig size are soft limits where they
say they'll work with us to figure things out.

-n
--
Nathaniel J. Smith -- https://vorpus.org
Pauli Virtanen
2017-03-15 23:20:03 UTC
Permalink
Wed, 15 Mar 2017 14:56:52 -0700, Nathaniel Smith kirjoitti:
[clip]
Post by Nathaniel Smith
As long as we can fit under the 1 gig size limit then GH pages seems
like the best option so far... it's reliable, widely understood, and all
of the limits besides the 1 gig size are soft limits where they say
they'll work with us to figure things out.
The Scipy html docs weigh 60M apiece and numpy is 35M, so it can be done
if a limited number of releases are kept, and the rest and the auxiliary
files are put as release downloads.

Otherwise there's probably no problem as you can stick a CDN in front if
it's too heavy otherwise.

Sounds sensible? Certainly it's the lowest-effort approach, and would
simplify management of S3/etc origin site access permissions.

Pauli
Matthew Brett
2017-03-16 04:14:24 UTC
Permalink
Hi,
Post by Pauli Virtanen
[clip]
Post by Nathaniel Smith
As long as we can fit under the 1 gig size limit then GH pages seems
like the best option so far... it's reliable, widely understood, and all
of the limits besides the 1 gig size are soft limits where they say
they'll work with us to figure things out.
The Scipy html docs weigh 60M apiece and numpy is 35M, so it can be done
if a limited number of releases are kept, and the rest and the auxiliary
files are put as release downloads.
Otherwise there's probably no problem as you can stick a CDN in front if
it's too heavy otherwise.
Sounds sensible? Certainly it's the lowest-effort approach, and would
simplify management of S3/etc origin site access permissions.
Sounds very sensible to me,

Cheers,

Matthew
Didrik Pinte
2017-03-16 07:15:08 UTC
Permalink
Post by Bryan Van de ven
Post by Bryan Van de ven
NumPy is a NumFocus fiscally sponsored project, perhaps they can help
with the costs of different/better hosting.
Enthought already provides hosting and operations support (thanks!) –
the problem is that it doesn't make sense to have a full-time ops
person just for numpy, but if we're taking a tiny slice of someone's
time then occasional outages are going to happen.
The key issue is this specific case is that the entire system was totally
undocumented. If it had been, the outage would have lasted much less than
an hour!

The advantage of something like github pages is that it's big enough
Post by Bryan Van de ven
that it *does* have dedicated ops support.
Agreed. One issue is that we are working with a lot of legacy. Github will
more than likely be a great solution to host static web pages but the
evaluation for the shift needs to get into all the funky legacy
redirects/rewrites we have in place, etc. This is probably not a real issue
for docs.scipy.org but would be for other services.
Post by Bryan Van de ven
As long as we can fit under the 1 gig size limit then GH pages seems
like the best option so far... it's reliable, widely understood, and
all of the limits besides the 1 gig size are soft limits where they
say they'll work with us to figure things out.
Another option would be to just host the content under S3 with Cloudfront.
It will also be pretty simple as a setup, scale nicely and won't have much
restrictions on sizing.

-- Didrik
Pauli Virtanen
2017-03-16 21:08:31 UTC
Permalink
Post by Didrik Pinte
Post by Nathaniel Smith
The advantage of something like github pages is that it's big enough
that it *does* have dedicated ops support.
Agreed. One issue is that we are working with a lot of legacy. Github
will more than likely be a great solution to host static web pages but
the evaluation for the shift needs to get into all the funky legacy
redirects/rewrites we have in place, etc. This is probably not a real
issue for docs.scipy.org but would be for other services.
IIRC, there's not that many of them, so in principle it could be possible
to cobble them with <meta> redirects.
Post by Didrik Pinte
Post by Nathaniel Smith
As long as we can fit under the 1 gig size limit then GH pages seems
like the best option so far... it's reliable, widely understood, and
all of the limits besides the 1 gig size are soft limits where they say
they'll work with us to figure things out.
Another option would be to just host the content under S3 with
Cloudfront.
It will also be pretty simple as a setup, scale nicely and won't have
much restrictions on sizing.
Some minor-ish disadvantages of this are that it brings a new set of
credentials to manage, it will be somewhat less transparent, and the
tooling will be less familiar to people (eg release managers) who have to
deal with it.
Robert T. McGibbon
2017-03-16 22:18:47 UTC
Permalink
I have always put my docs on Amazon S3 (examples: http://mdtraj.org/1.8.0/
, .http://msmbuilder.org/3.7.0/) For static webpages, you can't beat the
cost, and there's a lot of tooling in the wild for uploading pages to S3.

It might be an option to consider.

-Robert
Post by Pauli Virtanen
Post by Didrik Pinte
Post by Nathaniel Smith
The advantage of something like github pages is that it's big enough
that it *does* have dedicated ops support.
Agreed. One issue is that we are working with a lot of legacy. Github
will more than likely be a great solution to host static web pages but
the evaluation for the shift needs to get into all the funky legacy
redirects/rewrites we have in place, etc. This is probably not a real
issue for docs.scipy.org but would be for other services.
IIRC, there's not that many of them, so in principle it could be possible
to cobble them with <meta> redirects.
Post by Didrik Pinte
Post by Nathaniel Smith
As long as we can fit under the 1 gig size limit then GH pages seems
like the best option so far... it's reliable, widely understood, and
all of the limits besides the 1 gig size are soft limits where they say
they'll work with us to figure things out.
Another option would be to just host the content under S3 with Cloudfront.
It will also be pretty simple as a setup, scale nicely and won't have
much restrictions on sizing.
Some minor-ish disadvantages of this are that it brings a new set of
credentials to manage, it will be somewhat less transparent, and the
tooling will be less familiar to people (eg release managers) who have to
deal with it.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
-Robert
Didrik Pinte
2017-03-17 10:49:57 UTC
Permalink
Quick update:

- the current static content for docs.scipy.org is about 2.7Gb. Some clean
can happen but probably not going below 1Gb.
- www.scipy.org is really small.

-- Didrik
Post by Robert T. McGibbon
I have always put my docs on Amazon S3 (examples: http://mdtraj.org/1.8.0/
, .http://msmbuilder.org/3.7.0/) For static webpages, you can't beat the
cost, and there's a lot of tooling in the wild for uploading pages to S3.
It might be an option to consider.
-Robert
Post by Pauli Virtanen
Post by Didrik Pinte
Post by Nathaniel Smith
The advantage of something like github pages is that it's big enough
that it *does* have dedicated ops support.
Agreed. One issue is that we are working with a lot of legacy. Github
will more than likely be a great solution to host static web pages but
the evaluation for the shift needs to get into all the funky legacy
redirects/rewrites we have in place, etc. This is probably not a real
issue for docs.scipy.org but would be for other services.
IIRC, there's not that many of them, so in principle it could be possible
to cobble them with <meta> redirects.
Post by Didrik Pinte
Post by Nathaniel Smith
As long as we can fit under the 1 gig size limit then GH pages seems
like the best option so far... it's reliable, widely understood, and
all of the limits besides the 1 gig size are soft limits where they say
they'll work with us to figure things out.
Another option would be to just host the content under S3 with Cloudfront.
It will also be pretty simple as a setup, scale nicely and won't have
much restrictions on sizing.
Some minor-ish disadvantages of this are that it brings a new set of
credentials to manage, it will be somewhat less transparent, and the
tooling will be less familiar to people (eg release managers) who have to
deal with it.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
-Robert
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Didrik Pinte +32 475 665 668
CTO +44 1223 969515
Enthought Inc. ***@enthought.com
Scientific Computing Solutions http://www.enthought.com

The information contained in this message is Enthought confidential & not
to be disseminated to outside parties without explicit prior approval from
sender. This message is intended solely for the addressee(s), If you are
not the intended recipient, please contact the sender by return e-mail and
destroy all copies of the original message.
Matthew Brett
2017-03-15 15:33:54 UTC
Permalink
Hi,
Post by Ralf Gommers
Post by Matthew Brett
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted)
infrastructure. I suspect that Github Pages won't work, we'll exceed or be
close to exceeding both the 1 GB site size limit and the 100 GB/month
bandwidth limit [1].
Rough bandwidth estimate (using page size from
http://scipy.github.io/devdocs/ and Alexa stats): 2 million visits per
month, 2.5 page views per visit, 5 kb/page = 25 GB/month (html). Add to that
pdf docs, which are ~20 MB in size: if only a small fraction of visitors
download those, we'll be at >100 GB.
Maybe we could host the PDF docs somewhere else? I wonder if Github
would consider allowing us to go a bit over if necessary?

Cheers,

Matthew
Daπid
2017-03-15 15:36:56 UTC
Permalink
What about readthedocs? I haven't seen any explicit limit in traffic.
Post by Matthew Brett
Hi,
Post by Ralf Gommers
Post by Matthew Brett
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted)
infrastructure. I suspect that Github Pages won't work, we'll exceed or be
close to exceeding both the 1 GB site size limit and the 100 GB/month
bandwidth limit [1].
Rough bandwidth estimate (using page size from
http://scipy.github.io/devdocs/ and Alexa stats): 2 million visits per
month, 2.5 page views per visit, 5 kb/page = 25 GB/month (html). Add to that
pdf docs, which are ~20 MB in size: if only a small fraction of visitors
download those, we'll be at >100 GB.
Maybe we could host the PDF docs somewhere else? I wonder if Github
would consider allowing us to go a bit over if necessary?
Cheers,
Matthew
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Marten van Kerkwijk
2017-03-15 16:11:09 UTC
Permalink
Astropy uses readthedocs quite happily (auto-updates on merges to master too).
-- Marten
Pauli Virtanen
2017-03-15 19:28:18 UTC
Permalink
Post by Marten van Kerkwijk
Astropy uses readthedocs quite happily (auto-updates on merges to master too).
AFAIK, scipy cannot be built on readthedocs.
Nathaniel Smith
2017-03-15 20:02:57 UTC
Permalink
Post by Marten van Kerkwijk
Astropy uses readthedocs quite happily (auto-updates on merges to master too).
AFAIK, scipy cannot be built on readthedocs.


Another issue is that switching to rtd would (I think?) force us into their
URL structure and break all incoming links, which would be really bad.

-n
Todd
2017-03-15 22:07:20 UTC
Permalink
Post by Matthew Brett
Hi,
https://github.com/numpy/numpy/issues/8779#issue-213781439
https://github.com/scipy/scipy.org/issues/187#issue-186426408
I think it was down for about 24 hours that time.
From the number of people opening issues or commenting on the
scipy.org website this time, it seems to be causing quite a bit of
disruption.
It seems to me that we would have a much better chances of avoiding
significant down-time, if we switched to hosting the docs on github
pages.
What do y'all think?
Once the site is back up we should look at migrating to a better (hosted)
infrastructure. I suspect that Github Pages won't work, we'll exceed or be
close to exceeding both the 1 GB site size limit and the 100 GB/month
bandwidth limit [1].

Rough bandwidth estimate (using page size from http://scipy.github.io/
devdocs/ and Alexa stats): 2 million visits per month, 2.5 page views per
visit, 5 kb/page = 25 GB/month (html). Add to that pdf docs, which are ~20
MB in size: if only a small fraction of visitors download those, we'll be
at >100 GB.

Ralf

[1] https://help.github.com/articles/what-is-github-pages/#usage-limits



Would it be possible to include the PDF and zipped HTML as sources in
GitHub releases and (if desired) link to those on the documentation page?
That way only the individual HTML pages would need to be stored in GitHub
pages, reducing both the size and bandwidth.
Loading...