Celebrate Excellence in Education: Nominate Outstanding Educators by April 15!
Found this content helpful? Log in or sign up to leave a like!
Would love to know how and where we are able to access a Root Cause Analysis for any degradations and/or outages that affect us.
Typically the status page is horrifically opaque:
We have found the cause and have applied a fix for this behavior.
Which is utterly not appropriate to provide to the administrative and development teams who support large-scale instances and who are accountable to providing outage details throughout our organisations.
Additionally, our metrics of the-user-contacts-us-to-report-a-problem are consistently more rapid than any monitoring you appear to have in place. Allowing us to generate internal comms IN ADVANCE of user reports is necessary and should not be controversial.
Providing full and open detail on what went wrong and why should also be seen as necessary. If you can't explain it to us in customer-facing terms, I don't think you understand your platform properly.
@Mikee Not sure if this is the answer you are looking for but someone at your institution should be considered the "Field Admin" and is sent a full incident report a few days after each outage. This typically contains much more detail about the specifics of the outage, the cause, and what Instructure is doing to address it for the future. I do not have any insight into why these are not included in or added to the status page other than it normally takes several days to get the full details (might be because they want all the details and plan for the future, guessing here).
Hope this helps!
-Nick
Thanks @nwilson7 - yes, I am the field admin. In the most recent case, the only notice provided was via the status page; about half an hour after the outage commenced, and also after the platform was responding to our checks too.
Our CSM is great - and will be providing the report when it's done. The ones I've seen in the past are great for general users, but less detailed than what other vendors provide to technical support teams. The best I've seen went full-in-depth with the user action that triggered a race condition on a database table, how the issue was addressed, and then how the problem was traced, identified, and additional logic put in place to prevent a recurrence. I'd love to see that sort of detail here, but am not holding much hope.
Canvas, at this point, is not kicking goals with the proactive comms when things don't go as well as hoped - to the point where I'm now actively investigating external uptime monitoring for our instance so we can know when the platform isn't responsive before users contact us.
I'd like to note that we've had a very responsive, proactive notification go out since I grumpily posted this, and wanted to acknowledge it as well. Thank you to whomever is in and around that process - it's appreciated. 🙂
To participate in the Instructure Community, you need to sign up or log in:
Sign In