Overcast crawling confounding Analytics

Hi community!

I took eventually a look into the database of my podlove instance after I had observed unexplicable constant downloads in the Analytics statistics over the last 2 years. This affected only the newest episode.

I could attribute those downloads to the useragent „Overcast/1.0 Podcast Sync (+http://overcast.fm/)“. Every one or two days it incremented the counter with 1-2 downloads without being an actual listener. Which made my statistics almost worthless - taking into account that I found that about 500 of the 2100 downloads of my podcast attribute to this useragent.

Is there any sustainable method to exclude this useragent from the calculations of Podlove Analytics?

(And if not: If I deleted all the regarding entries from the _download_intent-table would Analytics recalculate and deliver corrected statistics?
Answering myself: Yes that works. I have now more correct statistics. The ranking is now completely different from before.)

Thanks for your efforts and ideas :slight_smile:

My solution:

  • In the end I deleted manually all entries that were linked to the BrowserID “Overcast/1.0 Podcast Sync” from table podlove_download_intent.
  • At Podlove/Tools in the WP-Admin-panel I pushed the buttons under “Tracking & Analysis”. No idea if that was necessary at all as these processes are triggered regularly but neverthelesss …
  • Then I extended the WordPress’s .htaccess with
BrowserMatchNoCase "Overcast/1.0 Podcast Sync" badbots
Order Allow,Deny
Allow from ALL
Deny from env=badbots 
  • tested it with
curl --user-agent 'Overcast/1.0 Podcast Sync' -v https://plapperbu.de

in a shell and the block worked perfectly. So the problem seems to be solved for me. Until Overcast or similar “services” release another one :smirk:

1 Like

Thanks for your efforts! I’ll see that I add a rule for this new Agent in the next Publisher release.

2 Likes

Sorry!

I didn’t realize before that I can just set the bot-field in wp_podlove_useragent to 1.

I yet experienced another shady useragent:
„Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36“

which causes a download very regularly:
Bildschirmfoto von 2022-01-05 13-56-55

It looks like a normal browser but dl the newest episode exactly weekly. So it looks like an automated script.

1 Like