[ARCHIVED] Our Canvas data journey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In case it's useful for others, below is information on our locally hosted canvas data setup.
1. Setup a local linux server and installed Postgresql 9.5.x.
2. Used the provided Canvas data CLI to download and unpack the data after configuring API keys within the canvas admin portal. GitHub - instructure/canvas-data-cli: Command line tool to connect and download files from Canvas Da...
3. Since we had to locally install nodejs for the cli, and the DB schema canvas provides is in json format, we wrote code in js to generate the postgres table creates DDL with comments. Script attached. Example output snippet:
drop table "course_dim";
create table "course_dim"
(
"id" bigint,
"canvas_id" bigint,
"root_account_id" bigint,
"account_id" bigint,
"enrollment_term_id" bigint,
"name" varchar(256),
"code" varchar(256),
"type" varchar(256),
"created_at" timestamp DEFAULT TIMESTAMP 'epoch',
"start_at" timestamp DEFAULT TIMESTAMP 'epoch',
"conclude_at" timestamp DEFAULT TIMESTAMP 'epoch',
"publicly_visible" boolean,
"sis_source_id" varchar(256),
"workflow_state" varchar(256),
"wiki_id" bigint
) TABLESPACE canvasdata_tablespc;
comment on table "course_dim" is 'A course in the canvas system';
comment on column "course_dim"."wiki_id" is 'Foreign key to the wiki_dim table.';
comment on column "course_dim"."workflow_state" is 'Workflow status indicating the current state of the course, valid values are: completed, created, deleted, available, claimed';
comment on column "course_dim"."sis_source_id" is 'Correlated id for the record for this course in the SIS system (assuming SIS integration is configured)';
comment on column "course_dim"."publicly_visible" is 'True if the course is publicly visible';
comment on column "course_dim"."conclude_at" is 'Timestamp for when the course finishes';
comment on column "course_dim"."start_at" is 'Timestamp for when the course starts.';
comment on column "course_dim"."created_at" is 'Timestamp when the course object was created in Canvas';
comment on column "course_dim"."type" is 'TBD';
comment on column "course_dim"."code" is 'The code for the course (e.g. FA12 MATH 2000)';
comment on column "course_dim"."name" is 'The friendly name of the course.';
comment on column "course_dim"."enrollment_term_id" is 'Foreign key to enrollment term table';
comment on column "course_dim"."account_id" is 'The parent account for this course.';
comment on column "course_dim"."root_account_id" is 'The root account associated with this course.';
comment on column "course_dim"."canvas_id" is 'Primary key for this course in the canvas courses table.';
comment on column "course_dim"."id" is 'Unique surrogate id for a course';
.................................
4. With a bash shell, we invoked the canvasdatacli to unpack each table we were interested in and load it into postgres. The unpacked files are just what postgres wants, but it seems the unpack by the CLI sometimes left blank lines within the data. Also, a few times data inthe file was too large for the column as defined by the schema. Since Postgres's "copy into" tool is an all or nothing, any error made the entire loading of that file fail. So with some simplistic "sed" regex, the below script was used to skip a line that failed and retry. Truncates and postgresql's "vacuum"(free space from deleted rows) was used to reduce disk space usage.
DATADIR=......../CanvasData/unpackedFiles
export DATADIR
psql -d canvasdata -c "vacuum;" 2>/dev/null
for t in `cat ~/Config/table_names.txt`
do
sedfile=/tmp/${t}_load.sed
errfile=/tmp/${t}_load.err
fname=$DATADIR/${t}.txt
# sed regex to delete first line of file
echo "1d" > $sedfile
echo "" > $errfile
echo "Loading table $t from file $fname ..."
until sed '/^$/d' < $fname | sed -f $sedfile | psql -d canvasdata -c "truncate $t; copy $t from STDIN" 2>$errfile
do
cat $errfile
echo "Trying to load $t again, skipping bad input line..."
grep "^CONTEXT: .*, line " < $errfile | sed 's/^.* line //' | sed 's/[:, ].*$/d/' >> $sedfile
done
done
psql -d canvasdata -c "vacuum;" 2>/dev/null
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.