Understanding Analytics


#1

I have 3 questions about Podlove Analytics.

(1) My analytics page says I had 1,495 downloads in September and 5,127 all time.

But if I do this query in MYSQL: “SELECT COUNT(DISTINCT request_id) FROM wp_podlove_downloadintent
WHERE accessed_at BETWEEN ‘2017-09-01 00:00:00’ and ‘2017-09-30 23:59:59’” I get 607 downloads in September.

And if I change COUNT(DISTINCT request_id) to COUNT(*) instead, I get 4,355 downloads in September.

How does the plugin arrive at 1,495 downloads in September (in my instance)?

(2) If I click on a specific episode from the Analytics main page, I get a set of graphs headed by a graph that tracks three factors: Cumulative, Current Episode, and Average Episode. Can you define for me what each of these factor labels mean?

(3) What does the httprange field track in the wp_podlove_downloadintent table?


App for publishers
#2

If you want to create your own queries, always base them of wp_podlove_downloadintentclean, not wp_podlove_downloadintent, which is already cleaned of bots and duplicate requests.

“Cumulative” and “Current Episode” represents the same data in different ways: The latter displays the number of downloads of that episode in the given time frame, the former adds those numbers together. “Average Episode” is an esimation how the download development is for an average episode, so you can compare if the current episode is doing normally or abnormally.

httprange: When clients do a request they may specify what bytes they want, instead of requesting the whole file. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests for details.


#3

Thanks for the explanations.

I still don’t understand the numbers I’m getting, though.

I did as you suggested and used the “clean” table for my queries.

Now, if I take the modified monthly queries, put them into a spreadsheet, and add them all up, they add up to less than the totals the plugin shows on the dashboard. In fact, the overall total from the beginning of time as calculated by my spreadsheet is less than what the dashboard shows for last month alone.

What am I overlooking?

EDIT

Oh, wait a minute. Does request_id represent the user, regardless of how many sessions the user has and regardless of how many recordings he/she listens to, and not the session? That might explain it.

If that’s the case, then SELECT COUNT(DISTINCT request_id) FROM wp_podlove_downloadintentclean tells me how many total users have attempted to download podcasts on the site since the beginning of time, correct?


#4

It is a hash from the combined IP address and user agent string.

(from http://docs.podlove.org/podlove-publisher/guides/download-analytics.html)

So correct, you should not use it for grouping. In fact, the point of the “clean” table is that you don’t need to clean it up at all, each entry should be viewed as an actual Download Intent.