Re: Canvas Data 2 DAP Client Documentation

Found this content helpful? Log in or sign up to leave a like!

Hi sorry if this has already been discussed somewhere. We are looking to build a data pipeline that can continuously pull data (daily) from Canvas Data 2 API. The current option with Data Access Platform python client looks promising for accomplishing this. However, I can not find any documentation anywhere about how to actually use "instructure-dap-client" in python code, like a readme page about the modules, classes and functions. The only related document I can find now is: https://pypi.org/project/instructure-dap-client/

Has anyone spotted any more documentation than above? Or even there is a better way to build a data pipeline using CD2, I'm open to suggestions! Thanks very much in advance!

1 Solution

@RolfSchenk We are aware that customers use various DB dialects to load data into and that 2 of them will not help everybody. We are seeing these as reference implementations but cannot commit to build out of the box solutions for more. Further DB support is not on our roadmap for this year. Thank you for your understanding.

View solution in original post

29 Replies

@heyuan Reach out to me at dpassey@everettsd.org, as I may have what you are looking for as a more pythonic solution.

Dave

It would be great to share this with the whole group if you can!

--Colin

Author

Thanks so much Dave! I have sent you an email.

@heyuan --

The documentation on pypi is what is available right now; as you note it isn't developer-focused. But - Instructure has said that they plan on adding an API to the DAP client library so it would be easier for developers to integrate into a custom workflow, and they have said that they plan on making the code available on GitHub at some point.

I haven't seen a timeframe for these changes, but maybe @Edina_Tipter or @LeventeHunyadi could chime in.

--Colin

Hi @heyuan and @ColinMurtaugh ,

We are actively working on adding the API to the client library which will be released in June.

As for making the code available that is still the plan, it's just that we could not prioritize this so far due to other competing tasks.

Edina

Author

Hi thanks so much for the reply!

Just wonder have we heard any update on this so far?

@heyuan Not exactly the progress we wanted by now, but have done some steps in this direction. With the 0.3.9 release the Client Library will adopt an extensible plugin architecture. Integrations to various database engines would become plugins. This move opens up the opportunity to contribute integrations for other database engines in the future, e.g. add Oracle, MSSQL, MySQL or SQLite support in addition to PostgreSQL support that exists today. With the version 0.3.9 release, PostgreSQL support has been re-written as a plugin but remains bundled with the Client Library package. While the Python class interfaces may be subject to change, early adopters are welcome to explore the solution and leave feedback as we solidify the plugin framework.

(Just to say, this as been share in the Canvas API and CLI change logs, too: https://community.canvaslms.com/t5/Canvas-Change-Log/2023-API-and-CLI-Change-Log/ta-p/549097)

Author

Hello hello!

Just want to circle back on this again and see if the team has made some promising progresses!

I'm also curious, whether it would be considered to push back sunsetting Canvas Data 1 to Q1 2024 instead of end of this year? I imagine with the delay of developer friendly documentation, some teams might also be waiting on getting started on their development work. We certainly are. I understand it is no easy work, but would appreciate some clarity around this. Thanks!

I agree that it would be ideal if Instructure could push the sunsetting of CD1 to Q1 24 at the earliest. In my view, developers need at least 6 months of full availability (including doc resources that @heyuan refers to) to work on migration of not only data resources but also applications that may depend on CD1.

Where can I contribute and see the latest development of the DAP Python library? Is there a public git repository (e.g., GitHub.com, GitLab.com, etc.)? I am not able to find the public repo and have searched the DAP client library documentation, the PyPi package page, and searched GitHub.com. I notice that there is a contributing page on the DAP client library documentation but no reference on how to actually contribute?

I am considering creating a MySQL plugin for the DAP client. It would make me more confident in creating this plugin if I were to have community support. I rather not build this in a silo just to have some API change and have to make a lot of changes, etc.

Have you checked out the latest release of the DAP client library? It comes with MySQL support out of the box.

Due to shifting priorities, we have stopped investing into DAP client library before a solid plugin interface could be established, and chose to delay open-sourcing the client library on an open platform like GitHub or GitLab. Today, adding a new plugin (e.g. MSSQL or Oracle) is rather cumbersome and comes with substantial code duplication, leading to limited maintainability. New plugins contributed at this stage would likely break in future versions of the DAP client library.

Where is the MySQL support documented? The release notes don't seem to mention it, but I may be just missing something.

Thanks

Common use cases covers database connection string syntax for both PostgreSQL and MySQL:

The dialect the library supports are postgresql and mysql.

Examples:

mysql://scott:password@localhost:3306/testdb
mysql://scott:password@localhost/testdb

Apart from the different connection string, commands such as initdb and syncdb have the same syntax.

What would you recommend to be the best way to be notified about new features and releases for the DAP client library? Luckily I posted here and you told me about MySQL support. Otherwise, I may have written a MySQL plugin.

The MySQL support was a pleasant surprise because there was no public time frame. Do you have any time frame for Oracle support out of the box?
What "shifting priorities," "cumbersome," etc., imply in terms of months, quarters, years?
Thanks!

I also agree that it would be ideal if Instructure could push the sunsetting of CD1 to Q1 24 at the earliest.

Hi All,

In response to your request, this is to share the first version of the improved documentation: https://data-access-platform-api.s3.eu-central-1.amazonaws.com/client/README.html

We plan to continuously iterate on it, depending on your feedback.
Please take a look if this is what you were after and let us know what else would be worth including or clarifying to make the "instructure-dap-client" python code easier to work with.

As a next step, we plan to add a separate section about how to write integrations with the Plugin API, similar to what we have today for the Postgres and enable you to open issues/bugs or start discussions in GitHub.

Thank you for your patience so far and I am looking forward to your opinions.

CC: @milligm3 @heyuan @pgo586 @ColinMurtaugh @lmsstaff

Author

Great work! Thanks for addressing our requests. So I assume this part in my original post about "readme page about the modules, classes and functions" will come in the later integration part of the documentation?

@heyuan The documentation we shared was the first attempt to improve code understanding. As a next step we will add more in-line comments to the code and separate readme files under the main modules: actions, commands, plugins, replicator. In the meantime you might want to check the API reference section of the documentation page I shared above.

I've just gotten around to updating to version 0.3.10 of the dap library and started to check out the new documentation.

As a very early first impression, I'm completely annoyed that the code examples do not run as written. The docs say "you must wrap the examples below into an async function" which is incredibly unhelpful. Async programming is relatively new in Python and many people (including me, who has been using Python for at least a decade) will be unfamiliar with it. If it is necessary to "wrap the examples below into an async function" why on earth would you not just do that?!

Ditto re: the requirement that code must call dap.plugins.load() -- the example code absolutely will not run without adding this.

The missing call to

dap.plugins.load()

is an oversight on our end, future versions of the client library should call this function automatically when you import one of the packages that deal with initdb/syncdb functionality. We should place it in dap.replicator.sql to make sure PostgreSQL support is always initialized before use. Experimental or third-party database integrations would still require an explicit call to the plugin registration function.

async/await has been around since Python version 3.5, released almost 8 years ago. However, I totally agree that not everyone is familiar with it, and we might want to link to a step-by-step tutorial. We don't wrap the examples individually into async functions because the examples would have more clutter, and you still couldn't run them as-is due to function scope isolation. On the other hand, I see tremendous value in sharing complete examples that run out of the box once you provide your client key/secret that you obtain from Instructure Identity Service.

@ColinMurtaugh Thanks for your comment. Would you mind showing the complete Python that we should have so that we are properly including the async function? We can't get out of the constant error loop and we think it is because of this issue.

@SamB5 --

Here's how we're using it:

https://github.com/Harvard-University-iCommons/canvas-data-2-aws/blob/3ab98a01ef3e37b70257e02ded7201...

Note that there's a little extra complexity due to the fact that we're running our code in AWS Lambda, but hopefully you can get the gist.

--Colin

Has anyone else been trying out the python package? We're a new adopter and I've been working through the python documentation (actually a really great first pass, thank you all for writing that) and have been able to grab schemas and table data with ease.

I also set up a local Postgres server and database to test out the built-in initialize & synchronize functionality, but get an error about there being no plugin for the postgresql dialect. I can connect to the local database using SQLAlchemy, so I'm not sure whether the issue is my particular environment or on the DAP client side as this is an ongoing dvelopment.

The PostgreSQL plugin comes bundled with the Python package instructure-dap-client that is published to PyPI. We are actively using PostgreSQL integration in our own tests, the postgresql dialect should work out of the box. The fact that postgresql dialect is not available seems either an environment issue or a bug. Can you share some specifics about your deployment (e.g. OS, SQLAlchemy version) so that our team can take a look? Thanks in advance!

Looking at the code, plugins (including PostgreSQL) are registered with a call to

dap.plugins.load()

This is invoked in __main__.py but if you are not working with the CLI but accessing functionality directly with Python calls, you may need to invoke it explicitly. I will be reaching out to the development team to make sure we can bypass this step in the future.

Thanks for the plugins suggestion. Adding that snippet did the trick.

For what it's worth - I'm using Windows at the moment, have SQLAlchemy version 2.0.19 (including psycopg2 2.9.6), and was able to read/write to a set of Postgres database tables using purely SQLAlchemy connections before adding in the dap.plugins piece.

I also do not think it's a pain point to add in the load call. If it's a lot of work to bypass that via the package itself, a quick addition to the python portion of the documentation would be more than sufficient. I was just confused as to why I was getting dialect failures when I knew I had explicit packages for them.

The call to `dap.plugins.load()` is completely absent from the example code in the documentation, and yet that code cannot work without it. It's unbelievably frustrating when example code does not work out of the box.

Canvas Data 2 DAP Client Documentation

Canvas Data

Data Services sends incorrect Content-Type when re...

duplicates in Catalog enrollment table snapshot

Historical data of assessment decision when using ...

Roll Call Attendance / Grades Alert

CD2 - how to identify which users / sections have ...

CD2: Microsoft Sync State (for MS Teams)

Integrating Canvas with PowerBI?

New Quizzes

Data Services sends incorrect Content-Type when re...

CD2 Python Client Library Typing Error

You're signed out

Canvas Data 2 DAP Client Documentation

Community Help

View our top guides and resources: