I have already proposed in another discussion that the Total Activity column be eliminated from the People web page, since it is completely bogus data. Nope, Canvas keeps it, so it is fair to try to verify it.
Too many people want it for them to get rid of it.
You can try to verify the numbers, but it will be futile as you do not have the full picture. Run your own server and log every thing or convince Canvas to share all of the data and not just some of it, and you might have a chance to duplicate their numbers and verify their calculations. But you're not going to be able to do it with what's available right now as they don't give you all of the information that you need to verify it.
But study the source code and see what they're doing and see what counts and what doesn't count and you might have a better chance of getting closer. I still think it's going to be a nearly impossible task, but perhaps that is because I don't understand Ruby enough to know what they're doing.
James, you seem to say that there will be no way to recreate this data, yet Canvas creates it?
Correct.
Recreating is not the same as creating. If I give you summary statistics, you're not going to be able to recreate the original data, although I can easily calculate any appropriate statistics because I have access to the raw data.
Now, let's say that I want to find the mean and standard deviation of a set of numbers. I don't have to save the numbers in order to do that. All I have to do is keep the number of values, the sum of the values, and the sum of the squares of the values. If I don't save all of the information, only what is needed to calculate what I'm sharing, then the person on the other end isn't going to be able to recreate the information, even though my results are probably as accurate as they can be. Even if I save all of the information, but don't give it to you, you're not going to be able to recreate my results.
Transferring the discussion back to Canvas ... Canvas doesn't give us all of the information that they used to create it in the first place, so it is impossible for us to recreate it with the information we're supplied. They don't even have to hang on to all of the data to generate the statistics.
This is my point. They don't give us access to the raw data that is used so it is impossible for us to recreate their numbers exactly.
The requests table is not the raw data that is used to create their numbers in the first place. They have specific code within the controllers that determine what gets logged as a page_view and what gets counted as an asset_access. It's not exactly the same as every web request. You may have several web requests
There is (or at least there used to be) a ping sent from Canvas to it that you were still on a page. There is code to explicitly not return any page views that come from that ping particular ping, but there is still ping information in the requests table from another source. There are other commands that don't get returned. When you look through the code, there is a comment about page_views in the application controller.
We only record page_views for html page requests coming from within the app, or if coming from a developer api request and specified as a page_view.
That statement is ambiguous unless you study the code. Is it that they don't record anything but page_views for html requests coming from within the app, or is that they don't recording anything but page_views for html requests coming from within the app?
If you look at the HTTP requests that are made when you load a page, none of those for javascript, css, or svg files make it into the requests table. That's all noise that people don't want to see. It's the application controller that gets the page_view information logged. I don't have access to the code they use to extract the information that they supply in Canvas Data to know if they're even sending us everything in the requests table or not. I imagine they keep webserver logs (it used to be using Cassandra, but I don't know if it still is), but not all of those get sent to use with the page view requests.
James, no, there are not more than a million records for one student in one course.
You didn't say anything about single student in a single course. You were talking about the speed in getting access to the information. Canvas doesn't pull much from the page views table except when the admin goes in and asks for a users page_views. The reason is because it is huge and takes too long to do any processing. Canvas is a web application and needs to return nearly instantaneous results. People don't want to wait more than a couple of seconds, let alone minutes. When you requests the page views through the admin interface, or through the API, you can only get a maximum of 100 at a time.
If you want to subset the data so that all you have is the data that comes from your course, then you'll have a much smaller number of records and it will be easier to work with.
I don't understand why more people aren't asking to verify "analytics" from Canvas.
Perhaps because they realize that the numbers themselves aren't important in isolation. Does it really matter whether someone spent 2 hours or 50 hours in the course? Not for some courses. Good students may not need to spend much time, while other students are doing well because they did spend a lot of time in there. I know this last semester that many of my students who had low activity time, when compared to others, were the ones in trouble in the course. But that still doesn't give a threshold, it gives a relative number.
By the way, Canvas does that was well, lumping people into categories by participation and page_views relative to the other people in the class. The information I record each day from the student analytics includes the number of page_views, the maximum number of page_views in the class, and the level of page_views; it does the same thing with participations. It has an overview of tardiness for assignments.
If you are willing to accept the first time and last time someone viewed an asset, then that's available. If you want to look at page_views, then the count by hour is the best they make available. For most people, that's enough, it answers the question that they were busy between 11 and midnight, for example. If you want to know when people participated, that's available for every participation in the course.
If you want to know when every page_view happened, then start gathering the data every day while the course is going on and archive it over time. That still only gets you down to a the nearest day, not the exact moment. You can tell that if the count was 12 one day and it's 15 the next, that they viewed that asset 3 times. The problem is that the access report, which details the information, isn't available through the API. I think I had to write a script that would log into Canvas and fetch it daily when I was tracking things.
No, there are not "analytics and there are analytics."
There is an existing view course analytics and there is a version that Canvas is working on. You may not have participated in the beta testing of Analytics 2.0. My point was that Canvas realizes that people weren't happy with the existing analytics and Canvas is working to improve it.
But analytics isn't about getting the raw data. It's about finding meaningful patterns. Most people using Canvas would not know how to take a requests table, even for a single user, and come up with meaningful results from it. So Canvas does that for them. Most users do not need to see the nail prints in the hand to believe what Canvas says, they're perfectly happy to accept on blind faith that the numbers are correct.
Canvas doesn't design for the needs of the power user or for the few. Those of us in mathematics and the sciences have been dealing with that in quizzes for a long time. If you're not in the middle of the user base, you may just have to make do with what you have available because there's not enough return on their investment to warrant creating something more powerful. You can make a feature request, but it probably won't do any good.
P.S., I would love to be wrong because I would like this information out of Canvas Data as well. Other people know a lot more about Canvas Data than I do and have hobbled together things that approximate this. I seem to remember them sharing it in the community, too. They may be able to shed some ability to approximate things.