That’s a huge difference! Would be interesting to compare and find out why.
Here’s the relevant excerpt on how we handle aggregation:
Before tracking data is presented in the analytics area, it is cleaned up. Cleanup involves the following steps:
- Based on the UA analysis bots are filtered out.
- Duplicate requests are filtered out. A request is considered a duplicate if it contains
- the same File ID
- and the same Request ID
- and was made within the same hour
- Pre-Release downloads are filtered out. They may happen if you test downloads before publishing the episode.
(Source: http://docs.podlove.org/guides/download-analytics/)
… which immediately raises the question how much of a difference it would make if I changed “within the same hour” to “within 24 hours”. I will investigate.
Unlikely that they are all bots. I use the same UA (user agent) parser as Piwik and they already have a pretty decent bot detection. They only had a completely blind eye for podcast clients. That’s why I forked their library and started adding my own detection rules for popular clients. This is far from complete, because it’s a slow and tedious process. We have a crowd-sourced solution in mind but lack the manpower to execute at the moment.
When building the system, I only had data from the Metaebene to work with. I actually worked until the “Unknown” podcast client disappeared from the top 10 most of the time. That’s why your report of such a high number surprises me. If you want to help, throw this against your database:
SELECT
COUNT(ua.id) cnt, ua.*
FROM
wp_podlove_downloadintentclean di
JOIN `wp_podlove_useragent` ua ON ua.id = di.`user_agent_id`
WHERE client_name IS NULL
GROUP BY ua.id
ORDER BY cnt DESC
It looks for unknown clients and orders them by popularity.