Community Help

wre0001 · ‎06-22-2016

In case it's useful for others, below is information on our locally hosted canvas data setup.

1. Setup a local linux server and installed Postgresql 9.5.x.

2. Used the provided Canvas data CLI to download and unpack the data after configuring API keys within the canvas admin portal. GitHub - instructure/canvas-data-cli: Command line tool to connect and download files from Canvas Da...

3. Since we had to locally install nodejs for the cli, and the DB schema canvas provides is in json format, we wrote code in js to generate the postgres table creates DDL with comments. Script attached. Example output snippet:

drop table "course_dim";

create table "course_dim"

(

"id" bigint,

"canvas_id" bigint,

"root_account_id" bigint,

"account_id" bigint,

"enrollment_term_id" bigint,

"name" varchar(256),

"code" varchar(256),

"type" varchar(256),

"created_at" timestamp DEFAULT TIMESTAMP 'epoch',

"start_at" timestamp DEFAULT TIMESTAMP 'epoch',

"conclude_at" timestamp DEFAULT TIMESTAMP 'epoch',

"publicly_visible" boolean,

"sis_source_id" varchar(256),

"workflow_state" varchar(256),

"wiki_id" bigint

) TABLESPACE canvasdata_tablespc;

comment on table "course_dim" is 'A course in the canvas system';

comment on column "course_dim"."wiki_id" is 'Foreign key to the wiki_dim table.';

comment on column "course_dim"."workflow_state" is 'Workflow status indicating the current state of the course, valid values are: completed, created, deleted, available, claimed';

comment on column "course_dim"."sis_source_id" is 'Correlated id for the record for this course in the SIS system (assuming SIS integration is configured)';

comment on column "course_dim"."publicly_visible" is 'True if the course is publicly visible';

comment on column "course_dim"."conclude_at" is 'Timestamp for when the course finishes';

comment on column "course_dim"."start_at" is 'Timestamp for when the course starts.';

comment on column "course_dim"."created_at" is 'Timestamp when the course object was created in Canvas';

comment on column "course_dim"."type" is 'TBD';

comment on column "course_dim"."code" is 'The code for the course (e.g. FA12 MATH 2000)';

comment on column "course_dim"."name" is 'The friendly name of the course.';

comment on column "course_dim"."enrollment_term_id" is 'Foreign key to enrollment term table';

comment on column "course_dim"."account_id" is 'The parent account for this course.';

comment on column "course_dim"."root_account_id" is 'The root account associated with this course.';

comment on column "course_dim"."canvas_id" is 'Primary key for this course in the canvas courses table.';

comment on column "course_dim"."id" is 'Unique surrogate id for a course';

.................................

4. With a bash shell, we invoked the canvasdatacli to unpack each table we were interested in and load it into postgres. The unpacked files are just what postgres wants, but it seems the unpack by the CLI sometimes left blank lines within the data. Also, a few times data inthe file was too large for the column as defined by the schema. Since Postgres's "copy into" tool is an all or nothing, any error made the entire loading of that file fail. So with some simplistic "sed" regex, the below script was used to skip a line that failed and retry. Truncates and postgresql's "vacuum"(free space from deleted rows) was used to reduce disk space usage.

DATADIR=......../CanvasData/unpackedFiles

export DATADIR

psql -d canvasdata -c "vacuum;" 2>/dev/null

for t in `cat ~/Config/table_names.txt`

do

sedfile=/tmp/${t}_load.sed

errfile=/tmp/${t}_load.err

fname=$DATADIR/${t}.txt

# sed regex to delete first line of file

echo "1d" > $sedfile

echo "" > $errfile

echo "Loading table $t from file $fname ..."

until sed '/^$/d' < $fname | sed -f $sedfile | psql -d canvasdata -c "truncate $t; copy $t from STDIN" 2>$errfile

do

cat $errfile

echo "Trying to load $t again, skipping bad input line..."

grep "^CONTEXT: .*, line " < $errfile | sed 's/^.* line //' | sed 's/[:, ].*$/d/' >> $sedfile

done

psql -d canvasdata -c "vacuum;" 2>/dev/null

wlehnertz · ‎10-11-2016

William,

Thanks for this!! Worked beautifully.

Did you do anything for foreign keys and Uniques?

What do you plan on doing for a front end?

I'm hoping to build a Django app.

Thanks again!

[ARCHIVED] Our Canvas data journey

Archived

Questions

You're signed out

[ARCHIVED] Our Canvas data journey

Archived

Questions

Community Help

View our top guides and resources: