Community Help

schang11 · ‎01-13-2016

I often get this error when trying to unpack certain files. The resulting file only contains the column headers with no data. Retrying has worked for all tables except for requests. What can be done about it?

events.js:141

throw er; // Unhandled 'error' event

^

Error: unknown compression method

at Zlib._handle.onerror (zlib.js:363:17)

James · ‎01-13-2016

Have you verified that the file downloaded correctly?

If you have gunzip on your system, you can test the validity of the file using

gunzip -tv filename.gz

It should come back with the name of the file and OK. You might get something like "not in gzip format".

If you're on a Windows machine, you can test it using 7-Zip. Highlight the file, cick the right mouse button, choose 7-Zip and then Test archive

Hopefully you get something like this:

In either case, it means the file is corrupted. That happened to me the first time I tried to download the data files (I wasn't using the CLI tool, though). I had to delete the bad ones and after I tried to download them again, everything was okay.

I'm not sure what that means with the CLI, but the first thing to do is make sure it's a problem with the CLI and not the file itself.

schang11 · ‎01-14-2016

Thanks for the tip on gunzip. It returned OK for all the files that I tested.

Since I am using a Mac, I am able to open the gzip files with no problem. One thing that I've noticed is that sometimes the unpacked files (and even the files downloaded by the CLI that I unzip myself) are smaller and missing data compared with the files that I download and prepare manually, so I'm inclined to think there's a problem with the CLI tool.

James · ‎01-15-2016

@schang11 ,

I'm going to provide three things to try. Hopefully one of them helps.

All files test okay with gunzip

If the files are all fine with gunzip, then you might check to make sure that you have the latest version of the zlib package for node that will work with your version of node. Sometimes the one that comes with the package managers aren't the latest. I don't know that this is the issue, but there were some "unknown compression method" posts in 2014 for the zlib.js file.

By the way, you can use this gunzip command with a BASH shell to test them. Run it in the dataFiles/requests folder.

for f in *.gz ; do gunzip -t $f ; done

The problem with gunzip -t *.gz is that it stops after the first error and so it doesn't check every file once it finds one. There is gunzip -l *.gz (that's a lowercase L), but it will return a value sometimes when the file is corrupt.

Corrupted files / Re-downloading

The rest of this might not help you if your files are all valid, but maybe someone else will find it of use, so I'm going to leave it here.

I finally broke down and installed the CLI tool so I could try to diagnose what you're saying. When I ran it, there were some request files that were incomplete and much smaller than the others. Some of the ones that were smaller were valid giles.

But my error wasn't the same error as yours, it reached the maximum number of attempts trying to download a file with no url.

The error you reported isn't complete as the files it references aren't even part of the CLI tool, it's in the included files, but you didn't include the line that called it. Normally, a dump has a stack trace. Mine had this:

/usr/lib/node_modules/canvas-data-cli/lib/FileDownloader.js:32

if (attempt > MAX_ATTEMPTS) return cb(new Error('max number of retries reached for ' + fileUrl + ', aborting'));

^

ReferenceError: fileUrl is not defined

at FileDownloader._downloadRetry (/usr/lib/node_modules/canvas-data-cli/lib/FileDownloader.js:32:94)

at null._onTimeout (/usr/lib/node_modules/canvas-data-cli/lib/FileDownloader.js:46:26)

at Timer.listOnTimeout (timers.js:92:15)

From that, I can tell to go look at line 32 of the FileDownload.js file.

That said, I figured out enough to kind of guess what I think might be happening and make a suggestion for your problem.

When you look at the readme from the CLI tool site, it says this about the sync process.

canvasDataCli sync -c path/to/config.js will start the sync process.

On the first sync, it will look through all the data exports and download only the latest version of any tables that are not marked as partial and will download any files from older exports to complete a partial table.

On subsequent executions, it will check for newest data exports after the last recorded export, delete any old tables if the table is NOT a partial table and will append new files for partial tables.

That makes it sound like it only goes back for old partial files, of which requests is, on the first sync.

So, that makes it sound like you can manually download the files and, place them into the dataFiles folder, and name them so they start with the sequence number. You'll need to do that for all of the missing files between the corrupt ones and the last sequence number, which is found in the state.json file.

Unfortunately for me, the program crashed after downloading 187 out of 297 request files on the initial run and never wrote a state.json file. Four of those were incomplete. That's 3.5 GB of information it got before it crashed. Unfortunately, when you run the program again, it wipes out all existing files, rather than checking to see if they are the same that it would be downloading again, and so the 3.5 GB is gone. The next time, it claims that it ran successfully, downloaded 8.5 GB of data, but it claimed the sequence was 52. When I download the dump file manually, it says its sequence number is 81.

Even worse, it saved a schema.json file that had the 1.2.0 schema in it, even though the current one is 1.3.0. Sequence 52 isn't even the last sequence to use schema 1.2.0, that was sequence 53 for us, but they may be starting with 0 instead of 1 so the sequence numbering may not t match what's in the actual dump.

When I run the sync again, it now starts over with version 1.3.0 of the schema and deleted all existing files, including the requests -- presumably because they were schema 1.2.0. So, I had to download 12 GB of data (between the crash and the worthless 1.2.0 schema) before I could ever get access to the correct and most current dump. We're not a huge institution, either.

I would say the program needs a logic reworking, but it definitely is easier to use than downloading the files by hand. It would be nice if the sync command supported the --filter like the unpack did. You may also want to go to GitHub and file an issue. I don't know if the developer monitors the community.

However, if you're lucky and you had that initial download and you've got the state.json file, you may be able to just supplement the bad files as I described with the manual downloads. Just make sure that the naming scheme uses the sequence in the dump, but that the sequence you look at (if you need to modify it), is one less than the actual sequence number.

Summary of this section:

Here's how you should be able to solve the issue of corrupt files. Let's say that the first bad file starts with 76_. That means that it was sequence 76 was corrupt. Edit the state.json file and change the sequence number to be 75 (it needs to be one less than the bad one). Then re-run the sync command. It will redownload everything from 76 on, but that's still easier than doing it by hand.

Luckily for us, the complete dumps are only about 500 MB while the requests are 7.3 GB, so this step isn't too bad.

If you ever need to determine which *.gz files are corrupt and you are running a BASH shell, you can run for f in *.gz ; do gunzip -t $f ; done from the dataFiles/requests folder. The problem with gunzip -t *.gz is that it stops after the first error and so it doesn't check every file once it finds one. There is gunzip -l *.gz (that's a lowercase L), but it will return a value sometimes when the file is corrupt.

Check hard drive space

Finally, it shouldn't be the cause of an unknown compression method, but be aware of how much hard drive space you have available. There is a copy of the data in compressed form and then there is a copy in uncompressed form. My request files were 7.3 GB compressed and 45 GB uncompressed. I would need a drive with at least 53 GB free just for the requests table.

I ran the unpack command. 45 minutes later, it finished without any errors, so that didn't help in diagnosing the problem. Sorry.

schang11 · ‎01-22-2016

Thank you, James, for taking the time to test this and provide suggestions.

I was suspecting the problem was with the zlib.js file, but after I got the newest one, it still persisted.

I have actually been using two machines, both Macs, and get this error on both of them, though for different files. For one of them, it was with the course_dim table, but it unpacked properly after I downloaded the files again. Now after the most recent sync, I'm getting this same error for the requests files. I'm certain there's enough hard drive space.

This is the entire error, including the line that called it:

canvasDataCli -c config.js unpack -f requests

outputting requests to /Users/VPTL/unpackedFiles/requests.txt

events.js:141

throw er; // Unhandled 'error' event

^

Error: unknown compression method

at Zlib._handle.onerror (zlib.js:363:17)

Even though all the files tested okay with gunzip, I modified the state.json file to re-download the files, as you suggested. After that, the requests files unpacked with no errors. Now I know if I get the error again, I should re-sync all the files from when the error started appearing.

Thanks again for your help!

James · ‎01-22-2016

I'm glad the re-sync trick worked for you.

canvasDataCli error when unpacking

DAP Canvas Data 2 replication error--invalid synta...

Interpreting context_module_progressions in Canvas...

Join the Stepping Stones HigherEd Community: A sha...

dap client throwing QueryException on sync table

Ability to see marks from multiple courses built u...

DAP Canvas Data 2 replication error--invalid synta...

Strategies for Using Canvas Data 2 to Improve Stud...

How to get HTTP debugging info when using Python d...

DAP Client v1.3.0 -> MS SQL Server

Replicating/Automating Built In Reports using Canv...

You're signed out

canvasDataCli error when unpacking

Community Help

View our top guides and resources: