Celebrate Excellence in Education: Nominate Outstanding Educators by April 15!
Hi all,
I have just gain access to the Canvas Data portal last week and have been exploring it. From what I can tell so far, it is a tab delimited file stored inside a gz file.
Excel can to a certain point understand the Tab delimitation but it is not the best tool, so I am wondering what everyone else is using to access the data in those files.
Thanks,
Sc.
Solved! Go to Solution.
A relational database. Canvas uses Postgresql on the backend, but you can use MySQL, MSSQL, Oracle (has some problems with long field names), Informix, etc. Basically, whatever database system you're already using, but if you're talking "Excel" be prepared for the difficulties of big data. People who can afford it may use Amazon Redshift since that eliminates the need to host and manage the data locally.
Excel cannot handle the size of some of the tables. Excel also has issues with the IDs, which are bigInt unless they are imported as strings instead of numbers, but they aren't by default so it's problematic. Tableau likewise had problems with the bigInt fields (that may have been recently fixed, but I can't find the announcement I thought I read about it).
For additional information about getting Canvas Data loaded, please see these threads.
A relational database. Canvas uses Postgresql on the backend, but you can use MySQL, MSSQL, Oracle (has some problems with long field names), Informix, etc. Basically, whatever database system you're already using, but if you're talking "Excel" be prepared for the difficulties of big data. People who can afford it may use Amazon Redshift since that eliminates the need to host and manage the data locally.
Excel cannot handle the size of some of the tables. Excel also has issues with the IDs, which are bigInt unless they are imported as strings instead of numbers, but they aren't by default so it's problematic. Tableau likewise had problems with the bigInt fields (that may have been recently fixed, but I can't find the announcement I thought I read about it).
For additional information about getting Canvas Data loaded, please see these threads.
@jamesjonespa , thanks for the reply...
but that is where my first problem was. As the documentation mentioned Postgres, I figured I could have dump the data into MySQL. the dump file created is tab delimited and MyQL workbench doesn't like that. I figured I may have to create a brand new schema and define my tables within the schema based on the definition before loading the file in via commandline...
However, I was wondering if that's what everyone else is doing, as it seems like a lot of repeating work for all administrators... but of course, there's always Redshift....
Well, I guess that's what i'll have to do... basically create a schema with all the tables an inject the data....
Thanks,
Sc.
As @James shared, when dealing with Big Data the issue is tools. The community tries to share what it can but some institutions have fancy tools that don't even need to define the schema, see the AWS article James linked above. Other's are stuck building a schema and trying to get the data into a database in a timely manner, see the Embulk thread James shared. This one has the schema built for 4 databases, even MySQL. This way you don't have to create another solution before you start.
Thank you! I will take a look at your post that James have linked....
To participate in the Instructure Community, you need to sign up or log in:
Sign In
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.