Canvas Data 2 is coming 😍

The content in this blog is over six months old, and the comments are closed. For the most recent product updates and discussions, you're encouraged to explore newer posts from Instructure's Product Managers.

Edina_Tipter
Instructure Alumni
Instructure Alumni
12
16281

Blog Headers -- 2000 x 200 (4).png

We are very excited to share that Canvas Data 2 has evolved from infancy into adolescence. We started last year in June with inputs and feedback from 13 Alpha customers from two regions and have grown to over 100 Beta customers around the world. The maturity will continue to accelerate as Canvas Data 2 is made publicly available in March. At full maturity, Canvas Data 2 will evolve to become a modern data access platform that will provide developers with efficient and flexible access to data from various Instructure products in bulk with high fidelity and low latency. 

Progress

 

During the Alpha and Beta phases, we have focused on the following:

  • Listening to and incorporating customer feedback (the wish list is not exhaustive, so further changes and improvements are ongoing). We have great experts and collaborators in the Beta group who help us understand customers’ perspective, pain points, and the data journey “over the hedge” to build the right tool in the right way. KristinL_0-1675037008570.png
  • Platform improvement, stability, and scalability. When we started the Alpha phase we only had a walking skeleton and a goal of achieving an MVP (minimum viable product) by the Beta phase. We at Instructure see this first platform piece as a cornerstone that we can build on more in the future, so it’s important that we make it robust and reliable. Because robustness and scalability are paramount, most of our energy goes into improving, stabilising, and scaling all components of the CD2 by the General Availability (GA) date.
  • Performance optimisation. As our customer base has grown and data volumes increase, we have learned where we need to optimise.
  • Testing. We've built a testing framework that includes an expanding unit testing suite and daily end-to-end testing and monitoring according to industry-standard practices. This framework will aid us in monitoring the pipeline's functionality and ensure that we won't disrupt working features when we release updates.
  • Sustainability, monitoring and alerting: Software issues occur from time to time. When they do, we want to make sure you feel confident that we’re addressing them. That is why we have focused on having more than just basic monitoring and alerting for important KPIs. 
  • Documentation: During the past few months, we worked to expand documentation by providing examples, complementary documentation, videos, and much more for a quick start both for institutions who will be using CD2 as your first big data solution with Instructure as well as those institutions migrating from CD1. As such, you might want to view our OpenAPI specification and the links referenced inside. All documentation including videos will be shared in the Community space once we release CD2 to production. 
  • Integrations: Between the Alpha and Beta phases, CD2 integrated with the Identity Management Service and API Gateway as part of our platform vision for a more streamlined customer/developer experience. More specifically, all the API calls now go through an API gateway service while the authentication/authorisation is managed through our brand new identity management service. Integrating with the Identity Management Service and API Gateway opens up new possibilities for a unified API look and feel. In the short term, it also provides improved API key management and secure API key sharing between an institution and its partners or providers.

Release Timeline and CD1 Sunset Plans

 

Canvas Data 2 will be released no later than the end of March. In terms of datasets, this initial release will contain this set of Canvas data tables. We are aware that many customers leverage Apache weblogs (requests tables) and Catalog data. Those datasets will be added in the second quarter of the year. Nevertheless, we encourage all users to start planning their transition as soon as Canvas Data 2 becomes available. Because the CD1 and CD2 pipelines are not compatible, consuming table deltas require changes in the ETL given that the schema has also changed. For more details please see the CD1-CD2 comparison below.

To assist customers with the transition, we are planning to provide a reference solution for downloading and importing data into a database. Furthermore, a data mapping sheet is being prepared to explain the CD1 to CD2 schema differences if you need to remap existing reports and dashboards. Both of these are a work in progress and we are hoping to release them by the end of March.

Given the release of our new data pipeline, the target date for sunsetting CD1 is the end of 2023. By this date we are expecting all CD1 customers to have transitioned to Canvas Data 2 to benefit from the new feature set and fresher data.

* Customers may opt to use Instructure Professional Services to perform data warehousing services for a cost. Those customers who have purchased the Hosted Data Services will have their data warehouse transitioned automatically.  Additional migration support for queries, integrations, or consulting can be purchased for an additional fee.

Onboarding

 

Customer onboarding requires loading an institution’s data into the data lake so that it can be consumed via the CD2 API or CLI. Onboarding for CD2 will happen in a phased manner:

  • On the GA date we will start onboarding customers actively using CD1. Those users will be notified by their CSM as soon as their institution has been added. From that time onwards they will be able to query their institution’s data.
  • For those who haven’t leveraged CD1 but want to work with CD2, we will define a separate workflow for how to request access. This is a work in progress on our end and I will share the process in my next blog post.

What is CD2 (for those who haven’t already heard..)

 

The Canvas Data 2 offering is a service that enables institutions to download their raw data across various Instructure products. It is a revamp and expansion of our “Canvas Data'' offering. The purpose of this offering is to allow institutions’ IT & data teams to retrieve LMS data in bulk and keep it up to date (≤ 4 hours data freshness). Data can be used to conduct research and build custom reports, dashboards and tools to meet the unique needs of the institution. It allows access to high-fidelity source data and is more granular than the existing Canvas Data 1 star schema. It is also worth noting that Canvas Data 2 as a product doesn’t provide users with custom data request tooling. In other words, there is no reporting engine on top of the data to produce custom data extracts. Canvas Data 2 has a defined relational schema as opposed to the Canvas Data 1 star schema which dictates what is available in each file. 

API Usage Workflow

  1. Create API key via the Identity service to access the CD2 API
  2. Request JWT access token to authenticate and get access to your root account’s data
  3. Trigger your first snapshot
  4. Chain it to your next incremental query 
  5. Continue chaining incremental queries to get the latest changes

KristinL_1-1675037008583.png

 

High Level Comparison between CD1 and CD2

Features

Canvas Data 

Canvas Data 2

Latency (data freshness)

24 – 48 hours

≤ 4 hours

Table snapshot

Table deltas (incremental query) Includes deleted records

X

CLI

API

UI downloads

X

Schema

Star schema

Relational schema

Schema versioning

Available in all regions

Canvas LMS data

65 dimensions

90 unique datasets

Multiple file format 

tsv flat files

json ✓

csv ✓

tsv ✓

 parquet ✓

Features/data not included in the initial release (GA) but which we are considering for future releases in 2023

Weblogs aka requests

Target Q2

Catalog data

Target Q2

New quizzes

 

TBC

Mobile data

 

TBC

Pageviews

 

TBC

 

Let our platform journey begin.

KristinL_2-1675037008701.png

 

 

The content in this blog is over six months old, and the comments are closed. For the most recent product updates and discussions, you're encouraged to explore newer posts from Instructure's Product Managers.

12 Comments
KathyPalm
Community Participant

Exciting news! Looking forward to Canvas Data 2 😃

Maeve_McCooey
Community Coach
Community Coach

Good news that the first GA is on the horizon. However, it is disappointing to see the TBC status for mobile data and pageviews. Could you give some indication of when this data will be included? This would be extremely useful when planning transition timelines.

adam_c_voyton
Community Participant

This is exciting, glad to see the Canvas Data 2 product will become available by the end of March 2023. The four hour latency is a game changer! 

leturno
Community Participant

Just as I was about to test the waters again on taking on a Canvas Data project you go and give me Canvas Data 2 releasing in March.  Aren't you so nice! 

Thank you 

Scott

Jeff_F
Community Coach
Community Coach

@Edina_Tipter  

Re: Those customers who have purchased the Hosted Data Services will have their data warehouse transitioned automatically.  

Hello - we are a hosted customer and I am being asked for additional details on the timing aspect of this transition. In the FAQ there is a response to a similar question that says 3 months of overlap. When does that start (and end)?

https://community.canvaslms.com/t5/Data-Access-Platform-Canvas-Data/Canvas-Data-2-Frequently-Asked-Q... 

Also, details on which system is to be used is needed. For example, will this remain on AWS RedShift? 

Edina_Tipter
Instructure Alumni
Instructure Alumni
Author

@Jeff_F 
As the overlap will vary, the Hosted Data Services team will reach out to each of their customers and coordinate the timing. Thus there is no generic start and end date we can communicate at this time.

jason_hill
Community Contributor

All the Canvas Admins rn.All the Canvas Admins rn.

This high-five is for you, @Edina_Tipter, from all us Canvas admins.

ChadMcGuire
Instructure
Instructure

@Jeff_F  

If you reach out to your CSM they can put you in touch with the Custom Development team (which manages the Hosted Redshift offering) and they can help with some of the clarification of the plan for hosted accounts and also perhaps let you in on some of their Beta testing as well.

IsaacOdeh
Community Member

Hi all,

Not sure if anyone has tried this. I am trying to build an  ETL pipeline using Azure Data Factory. This will interact with canvas data api to extract all the assignment entities and dump the tables in an azure datalake gen 2 container.

I have tried checking the API calls using postman and everything seems to work. I was able to generate an access token using my institution API key. However, I don't seems to get it working in Azure data factory when building the pipeline.

Has anyone in this community tried? Has anyone tried extracting canvas data via api using ADF? Please, share your experience

Chrisleej
Community Explorer

@IsaacOdeh we have done something similar with CD1 using Azure Synapse. Our solution involves running the canvas data sync tool on a VM, sending the CSV files from that VM to ADLS Gen 2 using azcopy, then processing the CSV files into parquet files/lake tables using a PySpark notebook (via scheduled synapse pipeline). I would be happy to share more detail about what we have done if it would be helpful! Once we get access to CD2 things will change for us, but we needed a solution sooner than later and it was fairly simple to implement.

IsaacOdeh
Community Member

@Chrisleej Thanks for your timely reply. Yes, I would be happy if you can share more detail about what you have done and how that was achieved. Your approach might be the solution to my problems.

 

Thanks

Aasmund
Community Member

@Chrisleej Absolutely! We run Canvas Data on a VM, and would like to know more about "sending the CSV files from that VM to ADLS Gen 2 using azcopy".