Celebrate Excellence in Education: Nominate Outstanding Educators by April 15!
Hi, I'm looking for a "data extract date" or "data as of date" so that we know when the data was pulled into Redshift. I didn't see a status date or table anywhere in the documentation, but maybe I missed it? We could retrieve the max(timestamp) from one of the fact tables, but is that the best/only way?
Solved! Go to Solution.
Thanks for your question, Ruby Nugent. At this time the best we can say is that the date associated with the latest dump is the data extract date.
Thanks for your question, Ruby Nugent. At this time the best we can say is that the date associated with the latest dump is the data extract date.
You mean the date from the files? Is there anything within the Redshift DW?
Looks like the "best" date from within the Redshift instance would be max(timestamp) from requests - which is a day earlier than the date on the files. We plan to use that as the data extract date for communication to usrs.
I sometimes see a discrepancy where the max date from requests is days prior to a max date for a discussion post. I assume this to be related to the various issues handling what I understand to be very large requests table.
So for this I am using MAX date of the discussion_entry_dim.created_at field. I can do so as all of our courses use discussions.
@rubyn
As @Jeff_F says, you should look at the MAX date of a created_at or updated_at field. submission_dim is a good one for us because it's our largest table outside of requests.
I import every day and I have an extra field in our DB for data_timestamp that puts the timestamp of when the data import process was started. That way we at least know how old OUR imported data is.
You are awesome, Joni.
ps. chances are many of us have this but just in case that isn't so for extra credit I am sharing a few lines I use to convert the UTC timestamp to EDT/EST. Cheers ~!
----
if date([Timestamp]) > #13-03-2016# and date([Timestamp]) < #06-11-2016# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #12-03-2017# and date([Timestamp]) < #05-11-2017# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #11-03-2018# and date([Timestamp]) < #04-11-2018# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #10-03-2019# and date([Timestamp]) < #03-11-2019# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #08-03-2020# and date([Timestamp]) < #01-11-2020# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #14-03-2021# and date([Timestamp]) < #07-11-2021# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #13-03-2022# and date([Timestamp]) < #06-11-2022# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #12-03-2023# and date([Timestamp]) < #05-11-2023# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #10-03-2024# and date([Timestamp]) < #03-11-2024# then DATEADD('hour',-4,[Timestamp])
elseif date([Timestamp]) > #09-03-2025# and date([Timestamp]) < #02-11-2025# then DATEADD('hour',-4,[Timestamp])
else DATEADD('hour',-5,[Timestamp]) end
@Jeff_F do you store in the db in UTC or local? right now I'm storing in UTC and just convert on query.
^^^ Exactly the same here --> convert in the reports.
All,
Two issues here. Firstly, requests data is always delayed by more than the rest of the data. Clearly, the latest requests data can be found by max(timestamp). For the rest of the data I've found that max(last_request_at) in the pseudonym dim to be the best indicator. For the data this morning, this is what I see in UTC:
requests: 2019-11-02 23:59:59.973
pseudonym_dim: 2019-11-04 00:38:21.533
Secondly, not sure which platform you are using, but Oracle has built-in functionality for timezone conversion including automatically handling daylight saving time changes. We extract all data from the text files into raw tables using VARCHAR2 and CLOB data types, then transform the data to convert '\N' strings to nulls and date / timestamp data to appropriate datatypes in local time in a second set of tables. For example, here's how we convert the requests timestamp column to give UTC and local time columns in the transformed table:
SELECT
TO_TIMESTAMP(DECODE(lmrr.timestamp, '\N', NULL, lmrr.timestamp), 'YYYY-MM-DD HH24:MI:SS.FF') AS timestamp_utc,
CAST(FROM_TZ(TO_TIMESTAMP(DECODE(lmrr.timestamp, '\N', NULL, lmrr.timestamp), 'YYYY-MM-DD HH24:MI:SS.FF'), 'UTC') AT TIME ZONE 'Australia/Adelaide' AS TIMESTAMP) AS timestamp
~
~
FROM dwstglm.lm_requests_raw lmrr;
Regards,
Stuart.
@a1222252 - Interesting that pseudonym_dim is so up to date compared to requests! I always thought the whole batch got exported at the same time.
I could do it on import but as long as I know it's all in UTC I know they all match. The US time zones and daylight savings time date ranges are kind of difficult to work with.
Hi Joni,
This will be due to the fact that the data sources are different. The dims and facts will be extracted from the database, while the requests data is extracted from the web server activity log. I would expect that extracting data from the log is more involved an takes longer, and probably can't be done until the log rotates. This would be why the data always contains data up until midnight UTC.
S.
@Jeff_F ,
Those timestamps in your example code seem to be dependent upon some locale setting. When I tried #10-03-2019#, it interpreted it as October 3, 2019, and not March 10, 2019. People may be better served using the unambiguous YYYY-MM-DD format and #2019-03-10#. That's what Tableau shows in their documentation.
Also, because the time change happens at 2 am local time, which would be 07:00 UTC for EST switching to EDT and 06:00 UTC for EDT switching to EST, you're missing out on 17 hours of DST by making it be > March 10, 2019, instead of >=.
You can include the timestamp as part of your check and not bother chopping it to the date.
IF DATE([Timestamp]) > #10-03-2019# AND DATE([Timestamp]) < #03-11-2019# THEN
could be written as more precisely as
IF [Timestamp] >= #2019-03-10 07:00# AND [Timestamp] < #2019-11-03 06:00# THEN
You could make it a little more readable (arguable) but definitely shorted by combining the checks with an OR instead of doing IF THEN ELSEIF ELSIF ELSIF ... ELSIF END
IF
([Timestamp] >= #2019-03-10 07:00# AND [Timestamp] < #2019-11-03 06:00#) OR
([Timestamp] >= #2020-03-08 07:00# AND [Timestamp] < #2020-11-01 06:00#) OR
// fill in the rest of the years here
([Timestamp] >= #2025-03-09 07:00# AND [Timestamp] < #2025-11-02 06:00#)
THEN DATEADD('hour',-4,[Timestamp])
ELSE DATEADD('hour',-5,[Timestamp])END
I don't know that this is clearer, but you could also manipulate that to be
DATEADD('hour',
IF
([Timestamp] >= #2019-03-10 07:00# AND [Timestamp] < #2019-11-03 06:00#) OR
([Timestamp] >= #2020-03-08 07:00# AND [Timestamp] < #2020-11-01 06:00#) OR
// fill in the rest of the years here
([Timestamp] >= #2025-03-09 07:00# AND [Timestamp] < #2025-11-02 06:00#)
THEN -4 ELSE -5 END,
[Timestamp])
Tableau uses short circuit Boolean analysis with AND and OR logical functions, so it won't evaluate all of the other checks once it finds once that matches. I don't know what kind of performance hit there is to checking all the way out to 2025 when it's still 2019, but that's present in both of our examples.
You might be able to speed up performance by partitioning the data off of the year. Something like this would let you skip a bunch of checks for the later years
DATEADD('hour',
CASE YEAR([Timestamp])
WHEN 2019 THEN
IIF([Timestamp] >= #2019-03-10 07:00# AND [Timestamp] < #2019-11-03 06:00#,-4,-5)
WHEN 2020 THEN
IIF([Timestamp] >= #2020-03-08 07:00# AND [Timestamp] < #2020-11-01 06:00#,-4,-5)// Keep going with additional years
WHEN 2025 THEN
IIF([Timestamp] >= #2025-03-09 07:00# AND [Timestamp] < #2025-11-02 06:00#,-4,-5)
ELSE 0
END,
[Timestamp])
I don't have a real budget for Canvas Data, so I use MySQL. It isn't as high powered as some of the other database systems, but here is what I found a year or so ago when I did some checking.
If I store my values as a DATETIME rather than TIMESTAMP, then the database doesn't do automatic conversions. A TIMESTAMP field in MySQL is converted to UTC when stored and converted back when retrieved. That made it hard to work with timestamps that were already in UTC as it erroneously converted it again. Using DATETIME or DATETIME(3) for the requests table, the timestamps are left in UTC and then I can use the CONVERT_TZ(dt, 'UTC', 'US/Eastern') function to get the local time (although it would be US/Central for me).
Earlier this week, I saw where a many states were pushing to do away with the time split and go with one standard year round. The Illinois Senate just passed a bill on November 13 to make Illinois permanently be on Daylight Saving Time. I think the bill still has to be approved by the Illinois House and it would require a change at the federal level to the Uniform Time Act of 1966. Currently, states can opt out of DST without federal say-so, but we cannot opt into year-round DST without their OK. Some New England states are trying to get into a different timezone so that they can be on permanent Atlantic Standard Time, which would be permanent Eastern Daylight Saving time. I suppose that Illinois could take that approach and requested a move into Eastern Time, but almost none of the substantive requirements are met, so it wouldn't be likely to happen.
From a technology perspective, it would be easier if people stuck to one time year round. Somehow, I don't think the complexity of a Tableau calculation is going to be the deciding factor in that decision.
To participate in the Instructure Community, you need to sign up or log in:
Sign In
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.