I agree that in some way we are twisting the system to do something it wasn't designed to do - but we were pushed this way by Instructure as the real-time alternatives we are currently using do not scale or offer incremental data approaches (and don't cover all the data we require). We currently utilise various REST API calls either directly (horrendous cost if there are hundreds of thousands of calls required) or to run various reports - some of which time out on our volumes, and others which can take 45 minutes to run, and transfer hundreds of megabytes of essentially the same data to just detect a few changes. We were advised that CD2 was going to solve these issues - but it also generates its own challenges.
We are looking to use this data for a number of purposes. One is to effectively synchronise the state of some courses (enrolments mainly) with the SIS - and to implement rules that handle updates made in Canvas itself. Lagging data we can handle, but inconsistencies that cause things to appear and disappear will cause noise and result in updates being attempted which are not necessary.
Results are also being pushed through to other systems for storing compliance and other purposes. This will be done on an incremental basis - pushing through changes. Having results being removed due to referential issues is not something we want to contemplate.
A lot of these issues and concerns arise from the user_id and the way in which it can be changed. In order to understand the magnitude of this, and why this may be relevant to us but not of a concern to others, it is important to note that we are forced to use a Consortia due to volumes.
We have over 1,250 schools each with their own sub account. There are over 1.3 million users, of which over 650,000 are active. We have around 450,000 active courses. Our largest courses have over 120,000 active enrolments, and the most used course will have over 200,000 submissions.
This is supported by 11 Canvas instances sitting under one additional instance that is the Consortia root. The Consortia root only handles logins - all other objects are in one of the child instances. The consortia is set up with a trust, and any user can participate in courses in any of the instances - but this means that the user and the course data are in different instances. This all works, but there are Canvas limitations when working across instances which impact on some functionality - limitations which do not occur if the user and course are on the same Canvas instance.
In order to minimise these impacts, users are assigned to the instance where they are expected to have most of the courses - which works really well for the majority of school students who attend a single school. Where it gets more problematic is where students take courses at multiple schools, where teachers teach at multiple schools. The real impact occurs when students or teachers move schools (transition from primary to secondary being a major change, but moves happen all the time for many reasons). We did a lot of work to try to minimise the need to move a user from one instance to another by analysing past movements and trying to keep the most commonly moved between schools in the same instance, but obviously this is not perfect.
In order to facilitate moving a user from one instance to another, Instructure developed the "home_account" feature in the users sis import. This uses Canvas user merge functionality to "move" a user from one instance to another. What this does is actually create a new canvas_user_id - and then changes all the previous references to point to that new user - across all the Canvas instances in the consortia - and in all the tables (so enrolments, assignments etc.). When a user has their base location changed, and that location is in a different instance to where they previously were, we do a home_account merge of the user via SIS import. (We don't actually check whether the location has changed instances, we simply compare the user attributes with the previous attributes, and for any change be it location, name, email etc. simply send a SIS import with home_account TRUE always - which will move if necessary, or leave where currently located).
When dealing with the data for our Canvas implementation, we aggregate the data from all 12 instances (as there are references that point between instances, integrity can only be achieved by doing this). We add instance references to all the data so that we avoid duplicate keys. Not only do we need to deal with the difference in currency between tables, we also have to address the timing differences within the same table across 11 or 12 instances.
With the volume of user moves, and the number of different instances, we have a lot of potential for large numbers of enrolment and result entities to have no corresponding user - and for users to be :"missing" any results. If the timing is wrong, sending report to downstream systems that certain users no longer have current compliance training results simply because the keys haven't been updated is something to be avoided.