Feedback wanted: Exporting Analytics Data

timpritlove · April 8, 2017, 3:11pm

The team is considering adding an export function to the Podlove Publisher Analytics feature so that episode download data can be exported to one or more files.

We’d like to collect feedback from you how you would want these files to be structured, what information it should contain and which format(s).

What kind of functionality we’d like to enable

Our front-end analytics are off to a good start. It looks as if we can produce stable results that provide actual insight into what’s going on with our podcasts but we’d love to be able to dig deeper. Before pushing out new features we’d love to see others doing statistical experiments with their data.

What is in the database

We are currently tracking Download Intents which is our name for a download that is expected to be triggered as the Publisher receives a request for the download URL from some program.

We store time and date along with user agent information and we generate a location by evaluating the IP address the request came from. We do not store IP addresses in the analytics database for privacy reasons.

This data is regularly aggregated so that it can be quickly displayed on the analytics page.

So any export could present either raw download data (with every single download on record) or aggregated data (by 24h).

Episode metadata description file

An upcoming release would also provide a general file (in XML and/or JSON format) containing the most important meta data about all episodes. This metadata would include both titles and descriptions as well as all associated media file names and image URLs. This file can serve as a reference for statistical software to display stats in a meaningful way.

What we need feedback on

Most important we’d love to get suggestions on

structure of the export format (XML, lined-based, JSON etc,)
which datapoints to include
if you prefer raw or aggregated data

Thanks in advance

pommes · April 9, 2017, 11:36am

Here are my first thoughts about this:

I would prefer a simple csv file with a header line that can directly be opened in Excel if - and only if - the data structures really are that simple. In case it needs 1:n relations of course json or xml would be ok. It does not matter to me if it is xml or json. That’s just personal taste. I personally like json more but use xml more often.

Columns:

Number of Download intents and the dimensions and attributes from below (No 3).

Aggregated Number of download intents per day and typical other dimensions would be sufficient for me. The the dimensions could be:

Date
Episode id
A combination of Download Source and Download Context. (or even two dimensions if that makes sense.)
Episode Asset
Podcast Client
Operating System

So there would be I guess something between a hand full of entries to plenty of hundreds per episode and day depending of how successful the podcast is and if the episode was just released.

Additional (redundant) attributes maybe:

Episode Name
Release Date
Episode length

teubi · April 11, 2017, 5:28pm

Hi,
I’d love to have this feature! I’m a data nerd and creating my own scripts for every niche use case I can think of would be awesome!
The more details the better — you can always filter things out, but it’s impossible to create more details. Nevertheless, it is important to have clear information about what is what. In my opinion, exporting one row per valid* download intend with as much data as possible (user agent, feed) and a timestamp is the best and at the same time easiest to handle solution.

As far as I know, podlove analytics are filtered, such that invalid request (e.g., from bots or of unpublished episodes) don’t show up in the statistcs. I’d do the same thing for a csv export.

For this, a line-based format (e.g. comma-separated values) is the most suitable as it is simple to interpret and creates almost no overhead.

The most important details (in my opinion) are:

timestamp
episode-id (name)
feed
user-agent
time after episode published

<tl;dr>

CSV
time, episode, feed, useragent
raw (but with invalid data removed)

Thanks for your efforts, I’m looking forward to this amazing feature!

katrinleinweber · April 11, 2017, 6:41pm

I’m interested in XML & JSON equally in order to try out parsing those in R (inspired by).

Does it make a difference in the amount of coding work, whether simply all available datapoints are exported, compared to the amount of work needed to decide on the subset?

Aggregated by default, raw as an option.

henningkrause · May 3, 2017, 7:02am

I have a feature request re the analytics. The first 50 or so episodes of our podcasts were published before podlove analytics were around. We have download numbers derived from the server logs re each episode for this period of time until we enabled podlove analytics. For our all time record I’d like to add those numbers as a starting offset to these first 50 episodes in the podlove analytics, maybe by once uploading a CSV file.

Right now I’m manually adding those historical numbers to the podlove analytics number in a spreadsheet to get the whole image for all episode. But that kind of sucks. Thanks for considering!