Community Help

rexj · ‎03-06-2025

Hi all,

I am an admin and Python novice. I am trying to create a script that will pull a user's pageviews within a date range and save that to a CSV file. To date, I have only be able to access the first page of results. I can't seem to get the script to navigate through subsequent pages.

I have seen posts with example script snippets (here, here, and here) but am having a difficult time implementing those strategies. I have attached two example files, one using the Requests module and one using the CanvasAPI module.

Can someone please look at my scripts and point out what I am missing?

Thank you in advance for any assistance.

mclark19 · ‎03-07-2025

Hi @rexj -- Are you sure that there are more than 100 records for the person you are searching for? I ran your script successfully within minimal changes (just changing the base url, access token and student id) for a date range of an entire semester and pulled down 13,756 rows for a student. This was the "pageviews-requests-2a.TXT" script.

rexj · ‎03-07-2025

Thanks @mclark19 ! I am glad to hear that the script worked for you. At least I know the script works for someone. 😉

Yes, I pulled the pageviews for a user for a single date (2/18/25) from the user account details page, downloading the CSV. There are 280 pageviews in that CSV. I am not sure why I seem to only be able to pull 100 through the API.

Do you have any thoughts about the cause of the discrepancy?

mclark19 · ‎03-07-2025

Hi @rexj I had a trailing slash on mine (e.g., https://YOUR-INSTANCE.instructure.com/api/v1/), but the same other than that. Are you seeing more than one url being output? If you reduce your "per_page" to 10, do you only get 10 results?

The only thing I could think of might be a timing issue due to converting the timezone, but that doesn't make sense for missing so many records (unless the student is only active at a particular time).

rexj · ‎03-07-2025

Thank you.

If I change the per_page parameter less than or equal to 100, then I receive that many results. Changing it to more than 100 doesn't yield any more results. I think that 100 is the maximum number of links in a Pageview item.

I am going to try just passing the date without the time to see if that makes a difference.

rexj · ‎03-07-2025

@mclark19 How was your base URL formatted? Was it the same as mine?

chriscas · ‎03-07-2025

Hi @rexj,

I was able to run the 2a version of your code and get 750+ results for myself during a 1-week period in Jan, which seems to match everything else I see, and matches results from code I wrote myself pretty quickly using some functions I've written for other projects in the past.

It's very hard to explain why it's working for other people but not you. I guess the biggest clue is that you stated if you change your per_page to 10, you only get 10 results. That indicates for some reason in your environment, the pagination isn't working. Maybe you could add some additional debug print lines in that area just to see what exactly is executing and perhaps go from there?

-Chris

rexj · ‎03-10-2025

Thanks @chriscas

I added a debug print:

def get_pageviews(user_id, start_date, end_date):
    url = f'{base_url}/users/{user_id}/page_views'
    params = {
        'start_time': start_date_utc,
        'end_time': end_date_utc,
        'per_page': 100
    }
    pageviews = []
    
    while url:
        try:
            print(f"Request URL: {url}")  # Debug print
            response = requests.get(url, headers=headers, params=params)
            response.raise_for_status()
            data = response.json()
            pageviews.extend(data)
            
            if 'next' in response.links:
                url = response.links['next']['url']
                params = None  # Clear params for subsequent requests
            else:
                url = None
        except requests.exceptions.RequestException as e:
            print(f"An error occurred: {e}")
            break
    
    return pageviews

After running the script I receive the following:

Request URL: https://lawrence.instructure.com/api/v1/users/3311/page_views
Request URL: https://lawrence.instructure.com/api/v1/users/3311/page_views?end_time=2025-02-19T05%3A59%3A59%2B00%...
The results have been written to pageviews.csv

Some questions:

I am not sure why it seems that the start and end time are duplicated in the second URL. Any ideas?
After reading this, I wonder if the bookmark is being stripped out because subsequent requests are removing parameters. Could that be happening?
I know that I shouldn't add the parameters, namely the start/end dates to subsequent requests. I assumed that based on the previous link that the script would pick up the bookmark. Do you think that this could be part of the issue?

Thank you.

chriscas · ‎03-10-2025

Hi @rexj,

I usually send the parameters for GET calls as querystrings, which is really how the 'next' URLs work. With that being said, you're correct subsequent next page calls shouldn't need any extra parameters added, just use the next url as given back by the previous call. I'd suggest the revisions below (just editing by hand without testing):

def get_pageviews(user_id, start_date, end_date):
    url = f'{base_url}/users/{user_id}/page_views?start_time=start_date_utc&end_time=end_date_utc&per_page=100'
    pageviews = []
    
    while url:
        try:
            print(f"Request URL: {url}")  # Debug print
            response = requests.get(url, headers=headers)
            response.raise_for_status()
            data = response.json()
            pageviews.extend(data)
            
            if 'next' in response.links:
                url = response.links['next']['url']
            else:
                url = None
        except requests.exceptions.RequestException as e:
            print(f"An error occurred: {e}")
            break
    
    return pageviews

I'm still a bit perplexed by the fact that your original 2a code did run just fine for an and @mclark19, yet it apparently doesn't run correctly for you. Something strange is definitely going on, but let us know if my suggested code here makes any difference for you.

I can also make a public version of the code I developed for this, which is really similar to yours but has a bunch of extra error checking for the Canvas environment, which adds some perhaps unneeded complexity for your smaller project.

-Chris

rexj · ‎03-11-2025

Thanks @chriscas

I upgraded requests. I was running 2.31.0 and upgraded to 2.32.3.

I ran the code with your update and received:

PS C:\Users\rexj-a\Desktop\canvasapi> python pageviews-requests-2a.py
Request URL: https://lawrence.instructure.com/api/v1/users/3311/page_views?start_time=start_date_utc&end_time=end...
An error occurred: 503 Server Error: Service Unavailable for url: https://lawrence.instructure.com/api/v1/users/3311/page_views?start_time=start_date_utc&end_time=end...
The results have been written to pageviews.csv

I noticed that the Request URL includes the start and end date variable but not the explicit UTC date/time. I added the curly brackets around those variables in the url f statement and things worked better.

I still only received 100 pageviews.

My supervisor wondered if it could be due to my user generated API access key, and so I tried creating a Developer key. That did not work, resulting in a 401 permissions error.

Thanks for your help in trying to sort this out.

chriscas · ‎03-11-2025

Hi @rexj,

Whoops, I knew I'd miss something editing by hand... I forgot the curly braces around the variables in the url.

def get_pageviews(user_id, start_date, end_date):
    url = f'{base_url}/users/{user_id}/page_views?start_time={start_date_utc}&end_time={end_date_utc}&per_page=100'
    pageviews = []
    
    while url:
        try:
            print(f"Request URL: {url}")  # Debug print
            response = requests.get(url, headers=headers)
            response.raise_for_status()
            data = response.json()
            pageviews.extend(data)
            
            if 'next' in response.links:
                url = response.links['next']['url']
            else:
                url = None
        except requests.exceptions.RequestException as e:
            print(f"An error occurred: {e}")
            break
    
    return pageviews

What version of Python itself are you running, just in case that's an issue? Trying to use a developer key will definitely complicate things. I usually just run my own scripts with a self-generated API token, though I am a full admin so I can access anything with that. If you have a more limited role, some things may not work at all, but pagination should not be affected.

I was just copying the code you had without making too many modifications, but I notice you're passing start_date and end_date to your function, but using start_date_utc and end_date_utc in your function, which are defined outside. You may want to do some code cleanup on that.

See if the above code improves anything for you (again, apologies for the issue).

-Chris

chriscas · ‎03-11-2025

Hi @rexj,

If you want to try my version just to see if it produces anything different for you, I'm attaching it here. It should prompt you for all required info, or you can set up your canvas environment info in the file itself. This is using pieces of validation code I made for other more complex projects, which I know makes things more complex but is sometimes handy to find and catch weird input errors.

-Chris

rexj · ‎03-12-2025

Thanks for this @chriscas

I tried using it for our test instance. I realized that there were a few modules I needed to install. That led me down a trail and now I am looking at re-installing Python and all packages to start fresh. I am also interested in getting the PyCharm IDE running on my machine by have run into issues with that. More that you want to know I am sure, but I am committed to getting this running. I will let you know where things end up.

User Pageviews Pagination <sigh/>

API Canvas

CanvasAPI library

python

Requests

Disabling notifications for observers

How to access and manage Question Banks via API? A...

Not able to access custom LTI Tool on Community Ca...

Get Credentials UserID for API awards

Authenticate using user access token instead of us...

Disabling notifications for observers

GraphQL authentication using browser cookies

How to access and manage Question Banks via API? A...

Not able to access custom LTI Tool on Community Ca...

Get Credentials UserID for API awards

You're signed out

User Pageviews Pagination <sigh/>

Community Help

View our top guides and resources: