Celebrate Excellence in Education: Nominate Outstanding Educators by April 15!
Found this content helpful? Log in or sign up to leave a like!
Hi,
Does anyone know a way to find student IDs or user emails from CD2?
I have checked users and accounts tables in the canvas namespace - but they don't have "email" column. I only have the primary key of userID and accountID and only their full name.
Is there some other place that I should be looking?
Thanks
Z
I'd also love to find this. I keep a "local" table that connects UserID and StudentId/email ☹️
SIS ID for users is in pseudonyms table, and emails are in the communication_channels table.
https://data-access-platform-api.s3.eu-central-1.amazonaws.com/schema.json
Great! Thanks for that - legend!
Also 2 questions to you as I asked the same to Pete in the thread below:
1. I am not sure but looks like there is no need for communication_channels - atleast to get the Emails and SIS ID. The pseudonym table has the `Unique_ID` as Email and sis_user_id. Am I missing something here?
2. Are you aware of this link to get the datasets schema? Was curious why are you using the JSON link for the schema. I presume you are doing something differntly?
1) UniqueID is the login_id, it could be an email, username, or whatever the institution has configured.
2) Noise, I guess. The JSON page is fine for my purposes, Firefox makes it easily readable.
Keep in mind that a user can have more than one pseudonym and more than one communication channel. For instance:
SELECT
u.sortable_name,
p.user_id,
p.unique_id,
p.sis_user_id,
p.authentication_provider_id,
cc.path_type, -- e.g. 'email', 'SMS' etc).
cc.path, -- e.g person@example.com
FROM
canvas.users u
LEFT JOIN canvas.pseudonyms p ON p.user_id = u.id
LEFT JOIN canvas.communication_channels cc ON cc.pseudonym_id = p.id;
Great! Thanks for the query Pete. Yes I was planning to filter on workflow state too - I am doing the same for the courses table.
I am not sure but looks like there is no need for communication_channels - atleast to get the Emails and SIS ID
The pseudonym table has the `Unique_ID` as Email and sis_user_id.
Am I missing something here?
Below is the CTE that I'd be using to bring in Emails & SIS_ID to the users.
SELECT
USER_ID,
CREATED_AT,
UPDATED_AT,
WORKFLOW_STATE AS USERSTATUS,
SIS_BATCH_ID,
UNIQUE_ID AS EMAIL,
LOGIN_COUNT,
--LAST_REQUEST_AT,
LAST_LOGIN_AT,
--CURRENT_LOGIN_AT,
SIS_USER_ID
FROM
CANVAS.PSEUDONYMS
canvas.pseudonyms.unique_id represents login_id, not email per se. It sounds like your institution uses email address as the value for login_id. In that case, the unique_id values should match canvas.communication_channels.path for their email address. You probably do not need to bother with the communications_channels table.
Yep, if you're using something like School Data Sync/SSO then you unique_id (login_id) will be an email address. But like @ZaidGandhi mentioned, it could be a username, a student/staff id number, etc.
If users are able to choose their preferred email address (ie not locked down in Account settings) then email (cc.path) might still differ from p.unique_id. It can also be relevant for LTI configs that use default email.
It is actually much more complicated than that if you actually want to get the default email for a user who may have multiples, as the email in the communication channel is joined via the pseudonyms. The default pseudonym and communication channel is the one with the lowest position that is active - which may not be the lowest or one, as positions are nor necessarily changed when an email or pseudonym is removed. A simplified SQL server query (ignoring consortia considerations) example....
SELECT [USER].id,, pse.integration_id, [USER].workflow_state as userstatus,
[USER].[name], pse.sis_user_id, com.[path] as default_email,
FROM canvas.users [user]
inner join [canvas.pseudonyms] pse
on and pse.[user_id] = [user].id
and pse.workflow_state = 'active'
and pse.position = (select MIN(position) from [canvas.pseudonyms] pse2
where pse2.user_id = [user].id
and pse2.workflow_state = 'active')
left outer join [working_cd2_communication_channels] com
on com.[user_id] = [user].id
and com.pseudonym_id = pse.id
and com.path_type = 'email'
and com.workflow_state = 'active'
and com.position = (select MIN(position) from [working_cd2_communication_channels] com2
where com2.user_id = [user].id
and com2.pseudonym_id = pse.id
and com2.path_type = 'email'
and com2.workflow_state = 'active')
@KeithSmith_au @Pete5484 @stimme
Thanks guys, this is super helpful and quite crisp explanation - covers the use cases. Yes I can confirm that my org uses emails as logins, hence I used pseudonym table.
Correctly pointed as it was a one to many join on users - since there were not many additional emails (for users - duplicates), I removed them manually for now, but your queries can come in handy when my dataset would increase eventually!
Just to mention - I am doing this to find non-engaged students from canvas_logs.web_logs.
Have you guys done any analytics work regarding student engagement with DAP?
Happy to have a chat - if possible.
Cheers,
Z
We typically use last_activity_at and total_activity_time off of the enrollments table rather web_logs for student engagement questions. Instructure usually disclaims that web_logs is not guaranteed to be 100% accurate, shouldn't be used for individual-level auditing or academic integrity questions, etc. If you do use the enrollments data, a few things to think about:
1) You'll be dealing with course/section-level data rather than an absolute "last time this user shows up in web_logs" so you'll have to think about how to aggregate across courses. E.g., a student may have a last_activity_at of weeks ago in one course, but activity yesterday in another course. Does that mean they are non-engaged?
2) One way of trying to answer that is by averaging those measures across a course. Then you can see, for example, well every student in this class has a last_activity_at of several weeks ago so maybe the course just doesn't require much Canvas use. (Quite possible at our institution, may of course be different at yours.)
3) Combining last_activity_at and total_activity_time with other data points like missing assignments or submissions can give a fuller sense of (non-)engagement as well. But, the same caveat applies that you probably want to normalize those values across a course rather than just looking at absolutes. It's not out of the ordinary for one of our instructors, for example, to be using Canvas in a way where every student has multiple missing submissions but that is actually fine for their engagement and grade.
@jwals @ZaidGandhi I'm also using the enrollments table for similar purposes.
I'm trialling an analytics table loosely based on scd; so each day I'm grabbing changes in enrolments total_activity_time (ie incremental activity in a course by a user) and storing it in a way that I can aggregate/analyse at a more granular level.
Great points from @jwals about interpreting the enrolment table.
Not even sure if it's useful or works as intended, yet, but basically I execute the following insert each day in databricks immediately after the cd2 incremental update completes. Because the table could get huge, there's a monthly aggregation script with a clean up of the daily data (could do more or less frequently, or partition eg USING DELTA PARTITIONED BY (year_month)). It's still an experiment at the moment.
CREATE TABLE IF NOT EXISTS canvas_analytics.user_activity_daily (etc...)
INSERT INTO canvas_analytics.user_activity_daily (
timestamp_column,
year_day,
year_week,
year_month,
user_id,
enrollment_id,
course_id,
course_section_id,
enrollment_type,
role_id,
total_activity_time,
incremental_activity_time,
workflow_state,
e_created_at,
e_updated_at,
e_start_at,
e_end_at,
e_completed_at
)
SELECT
current_timestamp() AS timestamp_column,
CONCAT(
EXTRACT(YEAR FROM current_date()),
'-',
LPAD(EXTRACT(DOY FROM current_date()), 3, '0')
) AS year_day,
CONCAT(
EXTRACT(YEAR FROM current_date()),
'-',
LPAD(EXTRACT(WEEK FROM current_date()), 2, '0')
) AS year_week,
CONCAT(
EXTRACT(YEAR FROM current_date()),
'-',
LPAD(EXTRACT(MONTH FROM current_date()), 2, '0')
) AS year_month,
e.user_id,
e.id AS enrollment_id,
e.course_id,
e.course_section_id,
e.type AS enrollment_type,
e.role_id,
COALESCE(e.total_activity_time, 0) AS total_activity_time,
CASE
WHEN ua.max_total_activity_time IS NULL THEN COALESCE(e.total_activity_time, 0)
ELSE COALESCE(e.total_activity_time, 0) - ua.max_total_activity_time
END AS incremental_activity_time,
e.workflow_state,
e.created_at AS e_created_at,
e.updated_at AS e_updated_at,
e.start_at AS e_start_at,
e.end_at AS e_end_at,
e.completed_at AS e_completed_at
FROM canvas.enrollments e
--JOIN canvas.pseudonyms p ON e.user_id = p.user_id
LEFT JOIN (
SELECT enrollment_id, MAX(total_activity_time) AS max_total_activity_time
FROM canvas_analytics.user_activity_daily
GROUP BY enrollment_id
) ua ON e.id = ua.enrollment_id
WHERE e.workflow_state = 'active'
AND (ua.max_total_activity_time IS NULL OR COALESCE(e.total_activity_time, 0) > ua.max_total_activity_time)
AND e.type NOT IN ('StudentViewEnrollment', 'DesignerEnrollment', 'ObserverEnrollment')
To participate in the Instructure Community, you need to sign up or log in:
Sign In