[Numpy-discussion] Making datetime64 timezone naive

Discussion:

[Numpy-discussion] Making datetime64 timezone naive

Stephan Hoyer

2015-10-12 07:10:26 UTC

As has come up repeatedly over the past few years, nobody seems to be very
happy with the way that NumPy's datetime64 type parses and prints datetimes
in local timezones.

The tentative consensus from last year's discussion was that we should make
datetime64 timezone naive, like the standard library's datetime.datetime:
http://thread.gmane.org/gmane.comp.python.numeric.general/57184

That makes sense to me, and it's exactly what I'd like to see happen for
NumPy 1.11. Here's my PR to make that happen:
https://github.com/numpy/numpy/pull/6453

As a temporary measure, we still will parse datetimes that include a
timezone specification by converting them to UTC, but will issue a
DeprecationWarning. This is important for a smooth transition, because at
the very least I suspect the "Z" modifier for UTC is widely used. Another
option would be to preserve this conversion indefinitely, without any
deprecation warning.

There's one (slightly) contentious API decision to make: What should we do
with the numpy.datetime_to_string function? As far as I can tell, it was
never documented as part of the NumPy API and has not been used very much
or at all outside of NumPy's own test suite, but it is exposed in the main
numpy namespace. If we can remove it, then we can delete and simplify a lot
more code related to timezone parsing and display. If not, we'll need to do
a bit of work so we can distinguish between the string representations of
timezone naive and UTC.

Best,
Stephan

Nathaniel Smith

2015-10-12 07:38:09 UTC

Post by Stephan Hoyer
As has come up repeatedly over the past few years, nobody seems to be very
happy with the way that NumPy's datetime64 type parses and prints datetimes
in local timezones.
The tentative consensus from last year's discussion was that we should make
http://thread.gmane.org/gmane.comp.python.numeric.general/57184
That makes sense to me, and it's exactly what I'd like to see happen for
https://github.com/numpy/numpy/pull/6453
As a temporary measure, we still will parse datetimes that include a
timezone specification by converting them to UTC, but will issue a
DeprecationWarning. This is important for a smooth transition, because at
the very least I suspect the "Z" modifier for UTC is widely used. Another
option would be to preserve this conversion indefinitely, without any
deprecation warning.

I'm dubious about supporting conversions in the long run -- even "Z"
-- because UTC datetimes and naive datetimes are really not the same
thing. OTOH maybe if we dropped this it would break everyone's code
and they would hate us -- I actually have no idea what people are
doing with datetime64 outside of pandas. One way to find out is to
start issuing DeprecationWarnings and see if anyone notices :-).
(Though of course this is far from fool-proof.)

Post by Stephan Hoyer
There's one (slightly) contentious API decision to make: What should we do
with the numpy.datetime_to_string function? As far as I can tell, it was
never documented as part of the NumPy API and has not been used very much or
at all outside of NumPy's own test suite, but it is exposed in the main
numpy namespace. If we can remove it, then we can delete and simplify a lot
more code related to timezone parsing and display. If not, we'll need to do
a bit of work so we can distinguish between the string representations of
timezone naive and UTC.

One possible strategy here would be to do some corpus analysis to find
out whether anyone is actually using it, like I did for the ufunc ABI
stuff:
https://github.com/njsmith/codetrawl
https://github.com/njsmith/ufunc-abi-analysis

"datetime_to_string" is an easy token to search for, though it looks
like enough people have their own functions named that that you'd have
to do a bit of filtering to ignore non-numpy-related uses. A
filter("content", "import.*numpy") would collect all files that import
numpy into a single group for further examination.

-n

--
Nathaniel J. Smith -- http://vorpus.org

Stephan Hoyer

2015-10-13 17:36:29 UTC

Post by Nathaniel Smith
One possible strategy here would be to do some corpus analysis to find
out whether anyone is actually using it, like I did for the ufunc ABI
https://github.com/njsmith/codetrawl
https://github.com/njsmith/ufunc-abi-analysis
"datetime_to_string" is an easy token to search for, though it looks
like enough people have their own functions named that that you'd have
to do a bit of filtering to ignore non-numpy-related uses.

Yes, this is a good approach. I actually mistyped the name here -- it's
actually "datetime_as_string". A GitHub search does turn up a handful of
uses outside of NumPy:
https://github.com/search?utf8=%E2%9C%93&q=numpy.datetime_as_string+in%3Afile%2Cpath+NOT+numpy%2Fcore+NOT+test_datetime.py+NOT+arrayprint.py&type=Code&ref=searchresults

That said, I'm not sure it's worth going to the trouble to ensure it
continues to work in the future. This function was entirely undocumented,
and doesn't even have an inspectable function signature.

Stephan

Chris Barker

2015-10-13 22:04:50 UTC

Post by Nathaniel Smith

Post by Stephan Hoyer
As a temporary measure, we still will parse datetimes that include a
timezone specification by converting them to UTC, but will issue a
DeprecationWarning. This is important for a smooth transition, because at
the very least I suspect the "Z" modifier for UTC is widely used. Another
option would be to preserve this conversion indefinitely, without any
deprecation warning.

I'm dubious about supporting conversions in the long run -- even "Z"
-- because UTC datetimes and naive datetimes are really not the same
thing.

no -- but almost!

Post by Nathaniel Smith
OTOH maybe if we dropped this it would break everyone's code
and they would hate us --

I think it probably would. In the current implementation, an ISO string
without an offset specifier is converted using the system's locale
timezone. So to get naive time (or UTC), we need to tack a Z (or 00:00) on
there.

So killing that would likely break a lot of code!

And excepting a Z or 00:00 and then treating it as naive, while being
perhaps misleading, would not actually change any results. So I say we keep
it.

Depreciating it eventually would be good in the long run -- but maybe when
we have actual time zone support.

I actually have no idea what people are

Post by Nathaniel Smith
doing with datetime64 outside of pandas.

What do we need to do with this not to break Panda? I'm guessing more
people use datetime64 wrapped by Pandas than any other way...

(not me, though)

Post by Nathaniel Smith
There's one (slightly) contentious API decision to make: What should we do

Post by Stephan Hoyer
with the numpy.datetime_to_string function? As far as I can tell, it was
never documented as part of the NumPy API and has not been used very

much

Well, I'm not using it :-) though I can see that it might be pretty useful.
Though once we get rid of datetime64 adjusting for the locale timezone,
maybe not anymore.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Alexander Belopolsky

2015-10-12 18:48:30 UTC

Post by Stephan Hoyer
The tentative consensus from last year's discussion was that we should
make datetime64 timezone naive, like the standard library's
datetime.datetime

If you are going to make datetime64 more like datetime.datetime, please
consider adding the "fold" bit. See PEP 495. [1]

[1]: https://www.python.org/dev/peps/pep-0495/

Benjamin Root

2015-10-13 16:42:38 UTC

I'd be totally in support of switching to timezone naive form. While it
would be ideal that everyone stores their dates in UTC, the real world is
messy and most of the time, people are just loading dates as-is and don't
even care about timezones. I work on machines with different TZs, and I
hate it when I save a bunch of data on one machine in UTC, but then go to
view it on my local machine and everything is shifted. It gets even more
confusing around DST switches because it gets all mixed up.

Ben Root

Post by Alexander Belopolsky

Post by Stephan Hoyer
The tentative consensus from last year's discussion was that we should
make datetime64 timezone naive, like the standard library's
datetime.datetime

If you are going to make datetime64 more like datetime.datetime, please
consider adding the "fold" bit. See PEP 495. [1]
[1]: https://www.python.org/dev/peps/pep-0495/
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Chris Barker

2015-10-13 21:52:38 UTC

Post by Alexander Belopolsky
If you are going to make datetime64 more like datetime.datetime, please
consider adding the "fold" bit. See PEP 495. [1]
[1]: https://www.python.org/dev/peps/pep-0495/

well, adding any timezone support is not (yet) in the table.

(no need for "fold" with purely naive time, yes?)

But yes, when we get there, absolutely.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Nathaniel Smith

2015-10-13 22:21:07 UTC

Post by Alexander Belopolsky

Post by Stephan Hoyer
The tentative consensus from last year's discussion was that we should

make datetime64 timezone naive, like the standard library's
datetime.datetime

Post by Alexander Belopolsky
If you are going to make datetime64 more like datetime.datetime, please

consider adding the "fold" bit. See PEP 495. [1]

Post by Alexander Belopolsky
[1]: https://www.python.org/dev/peps/pep-0495/

The challenge here is that we literally do not have a bit too use :-)
Unless we make it datetime65 + 63 bits of padding, stealing a bit to use
for fold would halve the range of representable times, and I'm guessing
this would not be acceptable? -- pandas's 64-bits-of-nanoseconds already
has a somewhat narrow range (584 years).

I think for now the two goals are to make the built in datetime64 minimally
functional and self consistent, and to make it possible for fancier
datetime needs to be handled using third party dtypes.

-n

Chris Barker

2015-10-13 22:48:38 UTC

Post by Alexander Belopolsky

Post by Alexander Belopolsky
If you are going to make datetime64 more like datetime.datetime, please

consider adding the "fold" bit. See PEP 495. [1]

The challenge here is that we literally do not have a bit too use :-)
hmm -- I was first thinking that this could all be in the timezone stuff
(when we get there), but while I imagine we'll want an entire array to be
in a single timezone, each individual value would need its own "fold" flag.

But in any case, we don't need it 'till we do timezones, and my
understanding is that we aren't' going to do timezones until we have the
mythical new-and-improved-dtype-system.

So a future datetime dtype could be 64 bits + a byte of extra info, or be
63 bits plus the fold flag, or...

Post by Alexander Belopolsky
Unless we make it datetime65 + 63 bits of padding, stealing a bit to use
for fold would halve the range of representable times, and I'm guessing
this would not be acceptable?

well, not now, with eh fixed epoch, but if the epoch could be adjusted,
maybe a small range would be fine -- who need nanosecond accuracy, AND
centuries of range?

Thinking a bit more here:

For those that didn't follow the massive discussion on this on Python-dev
and the new datetime list:

the fold flag is required to round-trip properly for timezones with
discontiguous time -- i.e. Daylight savings. So if you have:

2015-11-01T01:30

Do you mean the first 1:30 am or the seconds one, after the DST transition?
(i.e. in the fold, or not?)

So it is key, for Python's Datetime, to make sure to keep that information
around.

However: Python's datetime was designed to be optimized for:
- converting between datetime and other representations in Database, etc.
- fast math for "naive time" -- i.e. basic manipulations within the same
timezone, like "one day later"
- Fast math for "absolute relative deltas" is of secondary concern.

The result of this is that datetime stores: year, month, day, hour minute
second, microsecond

It does NOT store some time_unit_since_an_epch, like unix time or numpy
datetime64.

Also, IIUC, when you associate a datetime with a timezone, it stores the
year, month, day, hour, second,... in the specified timezone -- NOT in UTC,
or anything else. This makes manipulations within that timezone easy -- the
next day simply required adding a day to teh day field (then normalizing
to the month).

Given all that -- the "fold" bit is needed, as a particular datetime in a
particular timezone may have more than one meaning.

Note that to compute a proper time span between two "aware" datetimes, it
is necessary to convert to UTC, do the math, then convert back to the
timezone you want.

However, numpy datetime is optimized for compact storage and fast
computation of absolute deltas (actual hours, minutes, seconds... not
calendar units like "the next day" ).

Because of this, and because it's what we already have, datetime64 stores
times as "some number of time units since an epoch -- a simple integer.

And because we probably want fast absolute delta computation, when we add
timezones, we'll probably want to store the datetime in UTC, and apply the
timezone on I/O.

Alexander: Am I right that we don't need the "fold" bit in this case? You'd
still need it when specifying a time in a timezone with folds.. -- but
again, only on I/O

-Chris
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Nathaniel Smith

2015-10-13 22:58:14 UTC

On Oct 13, 2015 3:49 PM, "Chris Barker" <***@noaa.gov> wrote:
[...]

Post by Chris Barker
However, numpy datetime is optimized for compact storage and fast

computation of absolute deltas (actual hours, minutes, seconds... not
calendar units like "the next day" ).

Except that ironically it actually can't compute absolute deltas accurately
with one second resolution, because it does the POSIX time thing of
pretending that all UTC days have the same number of seconds, even though
this is not true (leap seconds).

This isn't really relevant to anything else in this thread, except as a
reminder of how freaky date/time handling is.

-n

Marten van Kerkwijk

2015-10-14 00:08:14 UTC

Post by Chris Barker

Post by Chris Barker
However, numpy datetime is optimized for compact storage and fast

computation of absolute deltas (actual hours, minutes, seconds... not
calendar units like "the next day" ).
Except that ironically it actually can't compute absolute deltas
accurately with one second resolution, because it does the POSIX time thing
of pretending that all UTC days have the same number of seconds, even
though this is not true (leap seconds).
This isn't really relevant to anything else in this thread, except as a
reminder of how freaky date/time handling is.

Maybe not directly relevant, but also very clearly why one should ideally
not use these at all! Perhaps even less relevant, but if you do need
absolute times (and thus work with UTC or TAI or GPS), have a look at
astropy's `Time` class. It does use two doubles, but with that maintains
"sub-nanosecond precision over times spanning the age of the universe" [1].
And it even converts to strings nicely!

-- Marten

[1] http://docs.astropy.org/en/latest/time/index.html

Chris Barker

2015-10-14 16:07:41 UTC

On Tue, Oct 13, 2015 at 5:08 PM, Marten van Kerkwijk <

Post by Marten van Kerkwijk
Maybe not directly relevant, but also very clearly why one should ideally

Post by Marten van Kerkwijk
not use these a

all!
I wouldn't say not at all -- I'd say "not in some circumstances"
Perhaps even less relevant, but if you do need absolute times (and thus

Post by Marten van Kerkwijk
work with UTC or TAI or GPS), have a look at astropy's `Time` class. It
does use two doubles,

interesting -- I wonder why not two integers?
but with that maintains "sub-nanosecond precision over times spanning the

Post by Marten van Kerkwijk
age of the universe" [1].

well, we do all need that!

Seriously, though -- if we are opening all this up, maybe it's worth
considering other options, rather than kludging datetime64 -- particularly
if there is something someone has already implemented and tested...

But for now, Stephan's patches to make datetime64 far more useful and easy
are very welcome!

-CHB

[1] http://docs.astropy.org/en/latest/time/index.html

Post by Marten van Kerkwijk
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Chris Barker

2015-10-14 15:59:53 UTC

Post by Chris Barker

Post by Chris Barker
However, numpy datetime is optimized for compact storage and fast

computation of absolute deltas (actual hours, minutes, seconds... not
calendar units like "the next day" ).
Except that ironically it actually can't compute absolute deltas
accurately with one second resolution, because it does the POSIX time thing
of pretending that all UTC days have the same number of seconds, even
though this is not true (leap seconds).

Note that I said "fast", not "accurate" -- but the leap second thing may be
one more reason not to call datetime64 "UTC" -- who's to say that "naive"
time should include leap seconds :-)

Also, we could certainly add a leap seconds implementation to the current
infrastructure -- the real technical problem with that is how to keep the
leap-seconds table up to date -- we have no way to know when there will be
leap-seconds in the future...

Also -- this may be one more reason to have a selectable epoch -- then
you'd likely overlap fewer leap-seconds in a given us case.

Post by Chris Barker
This isn't really relevant to anything else in this thread, except as a
reminder of how freaky date/time handling is.

yup -- it sure is.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Phil Hodge

2015-10-14 17:34:42 UTC

we have no way to know when there will be leap-seconds in the future

Leap seconds are announced about six months in advance.

Phil

Chris Barker

2015-10-14 17:55:17 UTC

Post by Phil Hodge

we have no way to know when there will be leap-seconds in the future

Leap seconds are announced about six months in advance.

exactly -- so more than six month, we have no idea.

and even within six months, then you'd need to update some sort of database
of leapseconds to get it.

So depending on what version of the DB someone was using, they'd get
different answers.

That could all get ugly :-(

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Alexander Belopolsky

2015-10-16 17:19:58 UTC

Post by Chris Barker
And because we probably want fast absolute delta computation, when we add
timezones, we'll probably want to store the datetime in UTC, and apply the
timezone on I/O.
Alexander: Am I right that we don't need the "fold" bit in this case?
You'd still need it when specifying a time in a timezone with folds.. --
but again, only on I/O

Since Guido hates leap seconds, PEP 495 is silent on this issue, but
strictly speaking UTC leap seconds are "folds." AFAICT, a strictly POSIX
system must repeat the same value of time_t when a leap second is
inserted. While datetime will never extend the second field to allow
second=60, with PEP 495, it is now possible to represent 23:59:60 as
23:59:59/fold=1.

Apart from leap seconds, there is no need to use "fold" on datetimes that
represent time in UTC or any timezone at a fixed offset from utc.

Chris Barker

2015-10-17 22:59:16 UTC

Post by Alexander Belopolsky
Since Guido hates leap seconds, PEP 495 is silent on this issue, but
strictly speaking UTC leap seconds are "folds." AFAICT, a strictly POSIX
system must repeat the same value of time_t when a leap second is
inserted. While datetime will never extend the second field to

allow second=60, with PEP 495, it is now possible to represent 23:59:60 as
23:59:59/fold=1.

Thanks -- If anyhone decides to actually get arond to leap seconds suport
in numpy datetime, se can decide whether to do folds or allow second: 60.

Off the top of my head, I think allowing a 60th second makes more sense --
jsut like we do leap years. Granted, external systems often don't
understand/allow a 60th second, but they generally don't understand a fold
bit, either....

-CHB

Post by Alexander Belopolsky
Apart from leap seconds, there is no need to use "fold" on datetimes that
represent time in UTC or any timezone at a fixed offset from utc.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Alexander Belopolsky

2015-10-18 19:20:03 UTC

Post by Chris Barker
Off the top of my head, I think allowing a 60th second makes more sense --
jsut like we do leap years.

Yet we don't implement DST by allowing the 24th hour. Even the countries
that adjust the clocks at midnight don't do that.

In some sense leap seconds are more similar to timezone changes (DST or
political) because they are irregular and unpredictable.

Furthermore, the notion of "fold" is not tied to a particular 24/60/60
system of encoding times and thus more applicable to numpy where
times are encoded as binary integers.

Chris Barker

2015-10-19 19:34:56 UTC

Post by Alexander Belopolsky

Post by Chris Barker
Off the top of my head, I think allowing a 60th second makes more sense
-- jsut like we do leap years.

Yet we don't implement DST by allowing the 24th hour. Even the countries
that adjust the clocks at midnight don't do that.

Well, isn't that about conforming to already existing standards? DST is a
civil construct -- and mst (all?) implementations use the convention of
having repeated times. -- so that's what software has to deal with.

IIUC, at least +some+standards handle leap seconds by adding a 60th (61st)
second, rather than having a repeated one. So it's at least an option to do
it that way. And it can then fit into the already existing standards for
representing datetimes, etc.

Does the "fold" flag approach for representing, well, "folds" exist in a
widely used standards? It's my impression that it doesn't since we had to
argue a lot about what to call it :-)

Post by Alexander Belopolsky
In some sense leap seconds are more similar to timezone changes (DST or
political) because they are irregular and unpredictable.

in that regard, yes -- you need a constantly updating database to use them.
but I don't know that that has any impact on how you represent them. They
seem a lot more like leap years to me -- some februaries have a 29th day --
some hours on some days have a 61st second.

Post by Alexander Belopolsky
Furthermore, the notion of "fold" is not tied to a particular 24/60/60
system of encoding times and thus more applicable to numpy where
times are encoded as binary integers.

but there are no folds in the underlying integer representation -- that is
the "continuous" time scale -- the folds (or leap seconds, or leap years,
or any of the 24/60/60 business comes in only when you want to go to-from
the "datetime" representation.

If anyone decides to actually get around to leap seconds support in numpy

Post by Alexander Belopolsky
datetime, s/he can decide ...

This attitude is the reason why we will probably never have bug free
software when it comes to civil time reckoning.

OK -- fair enough -- good to think about it sooner than later.

Similarly, current numpy.datetime64 design ties arithmetic with encoding.

Post by Alexander Belopolsky
This makes arithmetic easier, but in the long run may preclude designs that
better match the problem domain.

I don't follow here -- how can you NOT tied arithmetic to encoding? sure
you could decide that you are going to overload the arithmetic, and it's up
t the object that encodes the data to do that math -- but that's pretty
much what datetime64 is doing -- defining an encoding so that it can do
math -- numpy dtypes are very much about binary representation. No reason
one couldn't make a different numpy dtype for datetimes that encoded it a
different way, and then it would have to implement math, too.

Note how the development of PEP 495 has highlighted the fact that allowing
binary operations (subtraction, comparison etc.) between times in different
timezones was a design mistake. It will be wise to learn from such
mistakes when redesigning numpy.datetime64.

So was not considering folds -- frankly, and I this this may be your point,
I don't think timezones were well thought out at all when datetime
was first introduced -- and however well thought out it was, if you don't
provide an implementation, you are not going to find the limitations. And
despite Tim's articulate defense of the original impp;imentation decisions,
I think encoding the datetime in the local "calendar/clock" just invites a
mess. And I'm quite convinced that it wouldn't be a the way to go for numpy
use-cases.

If you ever plan to support civil time in some form, you should think about
it now.

well, the goal for now is naive time -- and unlike the original datetime --
we are not adding on a "you can implement your own timezone handling this
way" hook yet.

Post by Alexander Belopolsky
In Python 3.6, datetime.now() will return different values in the first

and the second repeated hour in the "fall-back fold." > If you allow
datetime.datetime to numpy.datetime64 conversion, you should decide what
you do with that difference.

Indeed. Though will that only occur with timezones that have DST? I know
I'd be fine with NOT being able to create a numpy datetime64 from a
non-naive datetime object. Which would force the user to think about and
convert to the timezone they want before passing off to numpy.

Unless you can suggest a sensible default way to handle this. At first
blush, I think naive time does not have folds, so there is no way to handle
them "properly"

Also -- I think we are at phase one of a (at least) two step process:

1) clean up datetime64 just enough that it is useful, and less error-prone
-- i.e. have it not pretend to support anything other than naive datetimes.

2) Do it right -- perhaps adding some time zone support. This is going to
wait until the numpy dtype machinery is cleaned up some.

Phase 2 is where we really need the thinking ahead. And I'm still confused
about what thinking ahead needs to be done for potential leap second
support.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Alexander Belopolsky

2015-10-19 20:00:49 UTC

DST is a civil construct -- and mst (all?) implementations use the
convention of having repeated times.

What is "mst"?

Alexander Belopolsky

2015-10-19 20:11:54 UTC

Post by Chris Barker

Post by Alexander Belopolsky
In Python 3.6, datetime.now() will return different values in the first

and the second repeated hour in the "fall-back fold." > If you allow
datetime.datetime to numpy.datetime64 conversion, you should decide what
you do with that difference.
Indeed. Though will that only occur with timezones that have DST? I know
I'd be fine with NOT being able to create a numpy datetime64 from a
non-naive datetime object.

datetime.now() returns *naive* datetime objects unless you supply the
timezone. In Python 3.6 *naive* datetime objects will have the fold
attribute and datetime.now() will occasionally return fold=1 values unless
your system timezone has a fixed UTC offset.

Stephan Hoyer

2015-10-19 20:12:19 UTC

Post by Chris Barker
1) clean up datetime64 just enough that it is useful, and less error-prone
-- i.e. have it not pretend to support anything other than naive datetimes.
2) Do it right -- perhaps adding some time zone support. This is going to
wait until the numpy dtype machinery is cleaned up some.

I agree with Chris. My intent with this work for now (for NumPy 1.11) is
simply to complete phase 1. Once NumPy stops pretending to be time zone
aware (and with a few other small cleanups), datetime64 will be far more
useable. For major fixes, we'll have to wait until dtype support is better.

Alexander -- by "mst" I think Chris meant "most".

Best,
Stephan

Alexander Belopolsky

2015-10-19 20:14:38 UTC

Post by Stephan Hoyer
Alexander -- by "mst" I think Chris meant "most".

Good because in context it could be "Moscow Standard Time" or "Mean Solar
Time". :-)

Alexander Belopolsky

2015-10-19 20:25:07 UTC

Post by Stephan Hoyer

Post by Chris Barker
1) clean up datetime64 just enough that it is useful, and less
error-prone -- i.e. have it not pretend to support anything other than
naive datetimes.

I agree with Chris. My intent with this work for now (for NumPy 1.11) is
simply to complete phase 1.

This is fine. Just be aware that *naive* datetimes will also have the PEP
495 "fold" attribute in Python 3.6. You are free to ignore it, but you
will loose the ability to round-trip between naive stdlib datetimes and
numpy.datetime64.

Chris Barker - NOAA Federal

2015-10-20 00:54:49 UTC

This is fine. Just be aware that *naive* datetimes will also have the PEP 495 "fold" attribute in Python 3.6. You are free to ignore it, but you will loose the ability to round-trip between naive stdlib datetimes and numpy.datetime64.

Sigh. I can see why it's there ( primarily to support now(), I
suppose). But a naive datetime doesn't have a timezone, so how could
you know what time one actually corresponds to if fold is True? And
what could you do with it if you did know?

I've always figured that if you are using naive time for times in a
timezone that has DST, than you'd better know wether you were in DST
or not.

(Which fold tells you, I guess) but the fold isn't guaranteed to be an
hour is it? So without more info, what can you do? And if the fold bit
is False, then you still have no idea if you are in DST or not.

And then what if you attach a timezone to it? Then the fold bit could
be wrong...

I take it back, I can't see why the fold bit could be anything but
confusing for a naive datetime. :-)

Anyway, all I can see to do here is for the datetime64 docs to say
that fold is ignored if it's there.

But what should datetime64 do when provided with a datetime with a timezone?

- Raise an exception?
- ignore the timezone?
- Convert to UTC?

If the time zone is ignored, then you could get DST and non DST times
in the same array - that could be ugly.

Is there any way to query a timezone object to ask if it's a constant-offset?

And yes, I did mean "most". There is no way I'm ever going to
introduce a three letter "timezone" abbreviation in one of these
threads!

-CHB

Alexander Belopolsky

2015-10-18 19:57:36 UTC

If anyone decides to actually get around to leap seconds support in numpy
datetime, s/he can decide ...

This attitude is the reason why we will probably never have bug free
software when it comes to civil time reckoning. Even though ANSI C has
the difftime(time_t time1, time_t time0) function which in theory may not
reduce to time1 - time0, in practice it is only useful to avoid overflows
in integer to float conversions in cross-platform code and cannot account
for the fact some days are longer than others.

Similarly, current numpy.datetime64 design ties arithmetic with encoding.
This makes arithmetic easier, but in the long run may preclude designs that
better match the problem domain.

Note how the development of PEP 495 has highlighted the fact that allowing
binary operations (subtraction, comparison etc.) between times in different
timezones was a design mistake. It will be wise to learn from such
mistakes when redesigning numpy.datetime64.

If you ever plan to support civil time in some form, you should think about
it now. In Python 3.6, datetime.now() will return different values in the
first and the second repeated hour in the "fall-back fold." If you allow
datetime.datetime to numpy.datetime64 conversion, you should decide what
you do with that difference.

25 Replies
8 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Stephan Hoyer 2015-10-12 07:10:26 UTC

Nathaniel Smith 2015-10-12 07:38:09 UTC

Stephan Hoyer 2015-10-13 17:36:29 UTC

Chris Barker 2015-10-13 22:04:50 UTC

Alexander Belopolsky 2015-10-12 18:48:30 UTC

Benjamin Root 2015-10-13 16:42:38 UTC

Chris Barker 2015-10-13 21:52:38 UTC

Nathaniel Smith 2015-10-13 22:21:07 UTC

Chris Barker 2015-10-13 22:48:38 UTC

Nathaniel Smith 2015-10-13 22:58:14 UTC

Marten van Kerkwijk 2015-10-14 00:08:14 UTC

Chris Barker 2015-10-14 16:07:41 UTC

Chris Barker 2015-10-14 15:59:53 UTC

Phil Hodge 2015-10-14 17:34:42 UTC

Chris Barker 2015-10-14 17:55:17 UTC

Alexander Belopolsky 2015-10-16 17:19:58 UTC

Chris Barker 2015-10-17 22:59:16 UTC

Alexander Belopolsky 2015-10-18 19:20:03 UTC

Chris Barker 2015-10-19 19:34:56 UTC

Alexander Belopolsky 2015-10-19 20:00:49 UTC

Alexander Belopolsky 2015-10-19 20:11:54 UTC

Stephan Hoyer 2015-10-19 20:12:19 UTC

Alexander Belopolsky 2015-10-19 20:14:38 UTC

Alexander Belopolsky 2015-10-19 20:25:07 UTC

Chris Barker - NOAA Federal 2015-10-20 00:54:49 UTC

Alexander Belopolsky 2015-10-18 19:57:36 UTC

about - legalese

Loading...