How it Works

Datastream is a raw data pipeline that delivers real-time, user-level data from visitor interactions on a page, streamed to your Amazon S3 or Google Cloud Storage bucket.

Our feed contains over 50 fields of data. These can be categorized into four broad groups:

  • Engagement: Chartbeat’s best-in-class engagement metrics, such as engaged time, time on page, scroll depth, page and browser geometry.

  • Data about the page: Data points related to the identity of the page, such as the path, title, section and author, content type, platform, and sponsor data associated with each page view.

  • Data about the user: For user level analysis, a unique ID, their browser’s user agent string, frequency, and recency.

  • Timestamp: The time the visitor visited the page, left the page, and user's time zone.

Note: Unique IDs are unique to a given user, in a given browser, on a given website. Chartbeat only uses a first-party cookie, so our IDs cannot be used to track a user between sites. Customers who want to track users between sites can pass us an ID and perform user journey analysis on their backend systems.

Some of the Platform data supported in Datastream include:

  • Web

  • Google AMP

  • Facebook Instant Articles

  • Apple News

  • Your own native app

Chartbeat’s Datastream Reporting supports exporting data to the following data storage platforms:

  • Amazon Web Services

  • Google Cloud Storage

Datastream specifications and formats

File Format: CSV, one row per Chartbeat-logged page session-expired page view

Compression Type: GZIP

Delimiters: pipe-separated

Character Encoding: UTF-8

Example File Naming Convention: rawdata/YYYY/MM/DD/h/[00|30]/[epoch timestamp].[file hash].csv.gz

Data Batch Interval: by minute

Delivery Frequency: by minute

Delivery Destination: Amazon S3 or GCS bucket with shared read/write permissions

Note: Files are created every minute, with each minute’s files representing the users whose page views ended in that minute.

Download a sample data file

Click the link below to download a sample data CSV file or preview a row of pageview data.

distribution|last_ping_timestamp|host|cookie_id|page_session_id|domain|path|new_user|device|engaged_time_on_page_seconds|page_width|page_height|max_scroll_position_top|window_height|external_referrer|no_client_storage|city_name|region_name|country_code|country_name|continent_name|dma_code|utc_offset_minutes|user_agent|recency|frequency|internal_referrer|author|section|content_type|sponsor|utm_campaign|utm_medium|utm_source|utm_content|utm_term|account_id|page_title|virtual_page|scrolldepth|total_time_on_page_seconds|ga_client_id|login_id|id_sync|subscriber_acct|page_load_time
SITE|1571314031|mysite.com|M_qCECKGCqIcP9a3|Ffe3bT84JvqrIDfBDS+w5FgVyRY=|mysite.com|mysite.com/news/3977611002|false|desktop|5|1366|768.0|0.0|768||false|Brooklyn|New York|US|United States|North America|501|-240|Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36|1|16|mysite.com/|no author|news,local|how-to|||||||REMOVED|How to become a writer for your local newspaper|false|768|74|1236546315.5527916466||"{""clientId"":""62d8fbb2-0060-1cfd-a004-a6f56c0dc7a4"",""anonymousId"":""46e7a61c3a0d3208cf504ff859008b70"",""userMeterState"":""3""}"||784

Map Chartbeat data with other data sources

With ID Sync, website owners can populate their Datastream feed with custom ID values via a few extra lines of JavaScript in our tracking snippet for standard websites. This custom metadata can be used to join a user’s Chartbeat engagement data to other data sources, or to enrich engagement data by specifying information about the current viewing session.

Dimensions and Metrics included

Datastream exports all of the following dimensions and metrics to clients. Each row of data in a Datastream file represents a completed user session where a session corresponds with the time a visitor spent on a single page before either going to a new page or leaving your website.

Unless otherwise mentioned, data should be considered “raw”, meaning it has been unaltered by Chartbeat.

Note: All data of type STRING is UTF-8 encoded.

Page Dimensions

Name

ID

Description

Data Type

Host

host

The Chartbeat dashboard identifier that the page’s tracking code is instrumented to track.

STRING

Domain

domain

The DNS domain from which the tracking request was made from. In the case that the tracking request came from a subdomain of your site, this value will be the subdomain, not the top-level domain.

STRING

Page Path

path

A page’s full URL on the website specified by the domain, path and/or query parameters.

STRING

Page Title

page_title

The page's title. Multiple pages may have the same page title.

STRING

Page Authors

author

List of authors associated with the page. Returned as a comma-separated list. If no authors are specified, the value will be set as “no author.”

STRING

Page Sections

section

List of sections associated with the page. Returned as a comma-separated list. If no sections are specified, the value will be set as “no section.”

STRING

Page Content Type

content_type

Optional. The type of content represented on a page, for example, Gallery or Article.

STRING

Engagement Metrics

Name

ID

Description

Data Type

Engaged Time on Page

engaged_time_on_page_seconds

Time (in seconds) a user spent engaging with the page’s content.

NUMERIC

Total Time on Page

total_time_on_page_seconds

Time (in seconds) a user spent on this page.

NUMERIC

Scroll Depth

scrolldepth

The furthest down the page the bottom of the user’s viewport reached, in pixels.

NUMERIC

User Dimensions

Name

ID

Description

Data Type

New Visitor

new_user

A boolean indicating if the user is a New Visitor or Returning Visitor.

BOOLEAN

Days Since Last Visit

recency

The number of days that have elapsed since the user last visited this property.

NUMERIC

User Visit Frequency

frequency

The number of days out of the last 16 in which the user visited this property. Used to calculate user loyalty.

  • 0: New User

  • 1 - 7: Returning User

  • 8+: Loyal User

NUMERIC

Subscription Status

subscriber_accnt

Optional. An enumerated value indicating if the user is subscribed to the publication.

  • 0: Guest User

  • 1: Registered User

  • 2: Subscribed User

NUMERIC

User ID Synchronizer

id_sync

Optional. An arbitrary key-value store for user identification information. Example values may include:

  • Payer ID used by payment/subscription services

  • Unique identifier set by various advertising platforms

JSON

Google Analytics Client ID

ga_client_id

Optional. A unique value used in Google Analytics to identify if different browser sessions came from the same user.

Learn more here

Requires implementation of Chartbeat’s User ID Synchronizer and is derived from the idSync.ga field in the Chartbeat tracker.

STRING

Login ID

login_id

Optional. Any unique value to distinctly identify a user across browsers and devices.

Requires implementation of Chartbeat’s User ID Synchronizer and is derived from the idSync.l field in the Chartbeat tracker.

STRING

Traffic Source Dimensions

Name

ID

Description

Data Type

External Referrer Path

external_referrer

The path of the referring URL (document.referrer) if the page containing the referring link is on a different domain than this piece of content.

If this piece of content is on a different subdomain than the referring piece of content it will be classified as an external referrer. This means that if the reader clicking on a link from a page on blog.mysite.com to a page on mysite.com, the referring URL will be considered external.

STRING

Internal Referrer Path

internal_referrer

The path of the referring URL (document.referrer) if the page containing the referring link is on the same domain as this piece of content.

For the content to be considered internal, the content containing the referring link and this piece of content must reside not only on the same top-level domain, but also the same subdomain. This means that if the reader clicking on a link from a page on blog.mysite.com to a page on mysite.com, the referring URL will be considered external not internal.

STRING

Distribution Channel

distribution

The distribution channel used to access the content. Support distribution channels include:

  • Native App

  • Google AMP

  • Facebook Instant Article

  • Apple News

STRING

Campaign Name

utm_campaign

The value of the utm_campaign tracking parameter. Use to identify a specific product promotion or strategic campaign.

STRING

Campaign Medium

utm_medium

The value of the utm_medium tracking parameter. Use to identify a type of referral, such as email.

STRING

Campaign Source

utm_source

The value of the utm_source tracking parameter. Use to the source of a referral, such as a search engine, newsletter name, or another source.

STRING

Campaign Term

utm_term

The value of the utm_term tracking parameter. Use to note the keywords for an ad in a paid search campaign.

STRING

Campaign Content

utm_content

The value of the utm_content tracking parameter. Use to differentiate ads or links that point to the same URL.

STRING

Platform or Device

Name

ID

Description

Data Type

Device Type

device

The type of device: desktop, tablet, or mobile.

STRING

Page Width

page_width

The initial width, in pixels, of all content on the page.

NUMERIC

Page Height

page_height

The initial height, in pixels, of all content on the page, including content not currently in the user’s viewport.

NUMERIC

Viewport Height

window_height

The initial heigh, in pixels, of the user’s viewport.

NUMERIC

Maximum Scroll Top

max_scroll_position_top

The furthest down the page the top of the user’s viewport reached, in pixels.

NUMERIC

Browser Cookies Disabled

no_client_storage

A boolean indicator of whether the user has cookies disabled in their browser.

BOOLEAN

User Agent

user_agent

Additional browser and device information, including:

  • Browser

  • Browser Version

  • Operating System

  • Operating System Version

Learn more here

STRING

Geography

Name

ID

Description

Data Type

Continent

continent_name

Users' continent, derived from users' IP addresses.

STRING

Country

country_name

Users' country, derived from users' IP addresses.

STRING

Region

region_name

Users' region, derived from users' IP addresses. In the U.S., a region is a state, New York, for example.

STRING

Metro ID

dma_code

The three-digit Designated Market Area (DMA) code from where traffic arrived, derived from users' IP addresses.

STRING

City

city_name

Users' city, derived from their IP addresses.

STRING

Country ISO Code

country_code

Users' country's ISO code (in ISO-3166-1 alpha-2 format), derived from their IP addresses. For example, BR for Brazil, CA for Canada.

STRING

Site Timing

Name

ID

Description

Data Type

Page Load Time (ms)

page_load_time

Total time (in milliseconds), from pageview initiation (e.g., a click on a page link) to page load completion in the browser.

NUMERIC

Time

Name

ID

Description

Data Type

Time of Last Ping

last_ping_timestamp

The time the last ping in this user’s session was fired, in Unix Time.

TIME

UTC Offset

utc_offset_minutes

Offset, in minutes, between the user’s timezone and UTC. This can be used to localize the session to when the reader was accessing content in their local time.

NUMERIC

Name

ID

Description

Data Type

Sponsor ID

sponsor

A unique Sponsor ID. Use to identify the content sponsor or sponsored campaign.

STRING

Additional Metadata

Name

ID

Description

Data Type

Organization Account ID

account_id

Your organization’s Chartbeat account ID.

NUMERIC

Session ID

page_session_id

A unique value used to identify all tracking requests which make up the viewing session for this page by this user. A new session identifier is generated for each page that a user visits on your site.

STRING

Cookie ID

cookie_id

A unique value used to identify all tracking requests made for this user.

This value is not fully persistent. Cookie identifiers are regenerated for users every 30 days.

STRING

Virtual Page

virtual_page

A boolean value indicating if this page session was the result of a virtual page. This will often be true if your website is a single-page app or has infinite scroll article views.

Learn more about Virtual Pages here

BOOLEAN