@cassidy_vela
My advice is to move past thinking that it needs to come through the API. That's not happening in the normal sense of an API call.
The API is not the only place Canvas shares data. Canvas makes data available through the REST API, GraphQL, Canvas Data 2, Live Events, and their reports. Not all information is available from all sources (GraphQL has some things that the REST API doesn't and vice versa). Canvas Data 2 is the most complete version of any of those, but there are objects available in other places that are not available through Canvas Data 2.
There is some information that is not available through any of those sources.
When it comes to SIS information, your SIS should be the definitive guide.
Technically speaking, you can obtain the provisioning report through the API. It's a batch job, which means you request the report and then check to see when it's done. Then you can download it. This is because they are large and take lots of resources to generate. They are not suitable for REST API requests that return immediate results.
Still, it is best to get it through your SIS as it's the one that sent the information to Canvas in the first place.
You can also use Canvas Data 2 to get a list. This is MySQL flavored, but gives a comma-separated list of everyone who has multiple pseudonyms.
SELECT
user_id,
GROUP_CONCAT(sis_user_id),
COUNT(1) AS n
FROM canvas__pseudonyms
GROUP BY user_id
HAVING n > 1;
Running that report took 0.031 seconds and gave me 65 users with multiple pseudonyms. But it doesn't give the status of the sis_user_id.
This request took 0.032 seconds and gives a list of 142 records that have sis user ids where at least at least one sis user id record has been deleted.
WITH cte1 AS (
SELECT user_id
FROM canvas__pseudonyms
WHERE workflow_state = 'deleted'
AND sis_user_id IS NOT NULL
GROUP BY user_id
)
SELECT
user_id,
sis_user_id,
workflow_state
FROM cte1
JOIN canvas__pseudonyms USING (user_id)
WHERE sis_user_id IS NOT NULL;
That can duplicate the functionality of the provisioning report without having to run the provisioning report. But there is a huge overhead with setting up Canvas Data 2. Information is not live. The data may be 4 hours older than the last time you ran the database update. That's better than Canvas Data [1] where data could have been up to 36 hours old when you fetched it.
Canvas Data 2 may not be a complete solution, either. This spring, I lost the database that contained all my integration data between our SIS and Canvas. It wasn't the SIS server, it was my local database that acted as a go-between. I tried to recreate it from information in Canvas Data 2. When the information was there, it was great. We require students to complete an online orientation course before they can get into their real courses. There were almost 400 students who had completed it, but Canvas Data 2 had no record of it. I was trying to sync people up by email and Canvas Data 2 didn't have pseudonym records for all users. Notably, there was no pseudonym record for me -- my email wasn't in Canvas Data 2. I've been using Canvas since 2012. Anecdotally, it was older information that was missing, but I don't have full trust in Canvas Data 2 to recover information.
Some of the information I had to get through the REST API. Some of it I had to guess about the data based on other data that I had.
The reports (such as the provisioning report) had the data -- so it's still in the system, but it didn't have the timestamps that were available with Canvas Data 2. Reiterating that not all information is available in all places -- if at all.
The one thing that I didn't have to guess about was which courses we had, what user logins were, what SIS IDs were, because those came from our SIS. The SIS system is the authoritative source for SIS data and you should use it instead of trying to get that information back out of Canvas.
The individual API requests are not going to give you what you want. There is only one exposed sis_user_id associated with a user account at a time. The old ones are soft-deleted, meaning they're still in the system, but have a workflow_state of 'deleted'. They are not returned in the request because they are deleted and not active. Getting deleted information out of Canvas is difficult to impossible in API or GraphQL requests.
The include_deleted_users parameter for the list users endpoint doesn't give just the users that have deleted pseudonyms. Also, that API call is going to be very time consuming to get if unless you are a small institution with few users. You can only get 100 at a time (per_page=100), it uses numbered pagination (page=2, page=3), and does not include a last link header so you know how many pages there are to fetch. This essentially rules out any kind of parallel requests to speed things up.
If I download the entire list of users, without the include_deleted_users flag, with a per_page=50, I get 31,770 users for my institution. It took me 846.7 seconds (14.1 minutes) to get the information. With the include_deleted_users, I got 31,965 users. The time was faster because the results from the first query were still cached.
That makes it unsuitable for any real-time reporting. You're going to have to download the information and store it somewhere locally. That's part of the reason I keep steering you toward your SIS. They already have this information and can get it a lot faster than anything Canvas can give you.
But wait, it gets better -- and reinforces my point.
I ran the report and I found one of my users that had an old SIS ID that had been merged into a new one. The list of accounts did not give me two entries for that user. That's because there wasn't a deleted pseudonym (login ID). It was the same login, but with two SIS IDs. Canvas gave me just the active one. Even that report that you asked about doesn't give you the second SIS ID.
Canvas doesn't have the workflow_state in the object either. That is, even when you include the users with deleted pseudonyms, there's nothing that indicates they're deleted. The only difference I could find is that the users with deleted pseudonyms didn't have a login_id in the record.
The problem is that you don't really have deleted pseudonyms -- at least not in how you've described it -- you have merged records. Regardless, the include_deleted_users parameter isn't going to help you.
In short. I know of no way to get your second SIS to be included with the API calls. You will need to use a non-Canvas database that provides a link between those duplicated SIS IDs. The most logical place for that to come from is the SIS system that you're using. You will then have to add extra code to handle it.
A longer term solution is to fix your broken system so that it doesn't automatically create new SIS IDs for people. SIS IDs are designed to be unique.