LogoLogo
Help CenterStatusContact
  • Chartbeat Documentation
  • Implement Tracking
    • Standard Websites
      • Adding chartbeat.js to Your Site
      • Customize Tracking Settings
      • Tracking Virtual Page Change
      • User Subscriber Status
      • Alternative Site Integrations
      • Integration QA Steps: Website
      • Additional Page Metadata
    • Google AMP
      • Chartbeat Code for AMP
      • AMP Configuration Variables
      • Alternative AMP Integration
      • Integration QA Steps: AMP
    • Mobile App SDKs
      • Intro to Mobile App Tracking
      • Android SDK
      • iOS SDK
      • Integration QA Steps: Mobile Apps
  • Feature Integrations
    • Headline and Image Testing
      • Adding chartbeat_mab.js to Your Site
      • Image Compatibility
      • Flicker & Flicker Control
      • mab.js Specifications
      • Integration QA Steps: Headline and Image Testing
    • Video Engagement
      • Adding chartbeat_video.js to Your Site
      • Supported OVP Integrations
      • Custom Player Integration SDK
      • Configure Video Tracking Settings
      • Integration QA: Video Tracking
    • Conversion
      • Adding subscriptions.js to Your Site
      • Conversion Events
      • Integration QA Steps: Conversion
      • Supported Conversion Flows
  • API Docs
    • Real-Time API
      • Getting Started with our Real-Time API
      • Traffic Data
      • Video Engagement Data
    • Historical API
      • Getting Started with our Historical API
      • One-time Queries
      • Recurring Queries
      • Metrics, Dimensions, and Filters
    • Headline Testing API
      • Getting Started with our Headline Testing API
      • Raw Data
      • Summary Report
      • Variant Report
    • Conversion API
      • Getting Started with our Conversion API
      • Top Articles
    • Data Lab API
      • Getting Started with Data Lab API
  • Help Center
  • Contact Support
  • Datastream Docs
  • Back to Chartbeat.com
Powered by GitBook
On this page
  • Introduction
  • JSON-LD
  • Examples
  • Properties
  • Microdata
  • Examples
  • Properties
  • Repeated <meta> tags
  • Examples
  • Properties
  • Web scraper
  • Up next

Was this helpful?

  1. Implement Tracking
  2. Standard Websites

Additional Page Metadata

Learn how we're beginning to collect metadata about your content to power our next generation products, and get a head start by configuring this metadata in your page HTML.

PreviousIntegration QA Steps: WebsiteNextGoogle AMP

Last updated 1 year ago

Was this helpful?

Introduction

Important: We automatically collect metadata from your webpages as described below, and we're using it to inform future product development at Chartbeat. Some of this data is now being used in our in the Historical Dashboard.

We rely on your integration of our to populate key page properties in your Chartbeat dashboards and reports, like page sections, authors, paths, and titles.

To collect metadata about the content of a given page, Chartbeat uses a web scraper that makes an HTTP GET request to the page’s URL, then extracts certain attributes from the page’s HTML as well as the main body text. For convenience and ease of implementation, we rely on existing standards as much as possible. Metadata may be provided in the following formats:

  • JSON-LD (recommended): Embed metadata in a <script> tag in the <head> of an HTML document, with properties and values in JSON form following the vocabulary. This standard is used by Google and others to display enriched content in search results.

  • Microdata: Embed metadata within existing elements of an HTML document's <head> or <body>, using HTML tag properties and values following the vocabulary. This standard is also used by Google and others to display enriched content in search results.

  • Repeated <meta> tags: Embed metadata in multiple <meta> tags in the <head> of an HTML document, using HTML tag properties and values following the . This standard is used by Facebook and others to display enriched content in social media posts.

JSON-LD

The JSON-LD format embeds structured data in JSON form following a flexible, expressive vocabulary developed and maintained by . Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the and .

Examples

Here's a basic example that follows the standard spec:

<script type="application/ld+json">
  {
    "@context": "http://schema.org",
    "@type": "NewsArticle",
    "headline": "Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019",
    "description": "We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.",
    "articleBody": "In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...",
    "url": "https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/",
    "datePublished": "2020-01-16T11:07:35-05:00",
    "articleSection": "Blog,Customer",
    "author": {
        "@type": "Person",
        "name": "Nick Lioudis",
    },
    "publisher": {
        "@type": "Organization",
        "name": "Chartbeat",
    },
    "keywords": "2019,audience,data,engagement",
    "thumbnailUrl": "http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png",
  }
</script>

And here's the same example, using alternate properties and Chartbeat's more permissive spec:

<script type="application/ld+json">
  {
    "@context": "http://schema.org",
    "@type": "NewsArticle",
    "headline": "Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019",
    "description": "We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.",
    "articleBody": "In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...",
    "mainEntityOfPage": {
        "@type": "WebPage",
        "@id": "https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/",
    },
    "dateCreated": "2020-01-16 11:07:35-05:00",
    "articleSection": ["Blog", "Customer"],
    "creator": ["Nick Lioudis"],
    "publisher": "Chartbeat",
    "keywords": ["2019", "audience", "data", "engagement"],
    "image": {
        "@type": "ImageObject",
        "url": "http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png",
    }
  }
</script>

Properties

Required properties:

Key properties:

Additional properties:

Microdata

Examples

Here's a basic example that follows the standard spec:

<html>
    <body>
        <div itemscope itemtype="http://schema.org/NewsArticle">
            <h1 itemprop="headline">Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019</h1>
            <span itemprop="datePublished" content="2020-01-16T11:07:35-05:00">2020-01-16T11:07:35-05:00</span>
            <span itemprop="description">We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.</span><br>
            <div itemprop="image" itemscope itemtype="http://schema.org/ImageObject">
                <meta itemprop="url" content="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
                <img src="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
            </div>
            Author: <span itemprop="author">Nick Lioudis</span><br>
            <div itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
                <span itemprop="name">Chartbeat</span>
            </div>
            <span itemprop="articleBody">In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...</span>
        </div>
    </body>
</html>

Properties

Since this format follows the same schema.org vocabulary as JSON-LD, it uses the same properties as described above for JSON-LD. However, those properties are incorporated into HTML differently.

  • itemprop: Adds a property to an HTML element.

  • itemscope: Works in conjunction with itemtype to specify the type of item associated with a particular HTML element.

  • itemtype: Provides the URL of the (schema.org) vocabulary item associated with the HTML element, and gives the context for its constituent itemprop properties.

Repeated <meta> tags

Examples

<html>
    <head>
        <meta property="og:type" content="article">
        <meta property="og:title" content="Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019">
        <meta property="og:url" content="https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/">
        <meta property="og:image" content="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
        <meta property="article:published_time" content="2020-01-16T11:07:35-05:00">
        <meta property="article:author" content="Nick Lioudis">
        <meta property="og:description" content="We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website">
        <meta property="og:site_name" content="Chartbeat">
    </head>
</html>

Properties

Required properties:

Key properties:

Additional properties:

Web scraper

Chartbeat uses a web scraper to collect metadata about the pages tracked by Chartbeat's main pinger. For each page, the scraper makes a standard HTTP GET request to the URL and downloads the raw HTML, from which metadata is then extracted. Page metadata is joined to the visitor engagement data collected by our pinger using the host and path fields (aka h and p keys).

In general, Chartbeat's web scraper will only visit pages once, shortly after their first appearance in our engagement data pipeline. Load on your site's servers should be minimal; however, scraper activity will be higher at first, when many pages appear "new" to our system, before declining to match your site's usual publication rate.

The scraper identifies itself with the UserAgent Chartbeat-ContentX/0.3 (http://www.chartbeat.com). Note that the version number (e.g. 0.3) will change over time. To prevent unsuccessful scrapes and ensure that every page has associated metadata, please add this UserAgent to your site's whitelist.

Chartbeat regularly tries to visit pages where we detect significant traffic to examine the content of the page. You may notice traffic on your page from Chartbeat (note that this traffic will not appear in your Chartbeat dashboards), and you can identify our visits as they will be from the IP addresses listed below:

52.200.230.127 35.174.236.164 44.211.104.80 54.159.123.51 52.44.209.155 52.0.154.147 50.17.188.178 44.210.22.85 54.242.4.200

Please note that while this list does not regularly change, it may be expanded in the future due to infrastructure improvements.

Up next

Choose your next integration.

  • Standard Website Tracking ✅

@context: Collection in which the schema is defined. Should always be "" or "".

@type: Type of schema used. For articles, should be one of , , (or one of its sub-types), (or one of its sub-types).

articleSection (): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas or as an array of strings.

author (, , or ): One or more authors of the article. If not specified, the creator property is checked. Multiple authors may be expressed as a single string delimited by commas, an array of strings, or an array of Person or Organization items.

headline (): Title of the article. If not specified, the alternativeHeadline and name properties are checked, in that order.

datePublished ( or ): Date and (optionally) time the article was first published. If not specified, the dateCreated property is checked. This value shouldn't change over time, and should be as early or earlier than the dateModified property.

keywords (): One or more subject matter tags associated with the article. Multiple keywords may be expressed as a single string delimited by commas or as an array of strings.

thumbnailUrl (): URL of the main image associated with the article. If not specified, the image property ( or ) is checked.

url (): Canonical URL of the article.

articleBody (): Full text of the article. If not specified, the text property is checked.

dateModified ( or ): Date and (optionally) time the article was last modified.

description (): Brief description of the article.

lang (): Language code or locale of the article's content.

publisher (, , or ): Name of the article's publisher.

wordCount (): Number of words in the full text of the article.

The Microdata format embeds structured data into existing HTML elements throughout the document using tag properties specified by the vocabulary. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the and .

Structured data may be embedded into an HTML document with repeated <meta> tags in the <head> element, identified with tag properties specified by the . Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. This format was designed to incorporate webpages into a "social graph", and as such, includes fewer properties in more restrictive data structures.

og:type (): Type of article within a social graph. Must be one of article, blog, or website.

og:image (): URL of the image used to represent the article within a social graph. If not specified, og:image:url is checked.

og:title (): Title of the article.

og:url (): Canonical URL of the article that can be used as its permanent ID in a social graph.

article:author ( or ): One or more authors of the article.

article:modified_time (): Date and (optionally) time the article was last modified. If not specified, og:updated_time is checked.

article:published_time (): Date and (optionally) time the article was first published. If not specified, og:pubdate is checked.

article:section (): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas.

article:tag ( ): One or more subject matter tags associated with the article. Multiple tags may be expressed as a single string delimited by commas.

og:description (): Brief description of the article.

article:content_tier (): Access tier assigned to article by its publisher. Should be one of free, metered, or locked.

og:locale (): Locale in which the article's tags are marked up. Should be formatted as language_TERRITORY, e.g. en_US.

og:site_name (): Name of the site on which the article appears. If not specified, article:publisher is checked.

og:video (): URL of the video used to represent the article within a social graph. If not specified, og:video:url is checked.

You can also increase our ability to detect data for and additional metadata driven features (such as the word count pivot in the Historical Dashboard) by allowing traffic from our IP addresses to bypass any features which obfuscate the content of the page (e.g. paywalls).

Topics and Categories tabs
JavaScript configuration variables
schema.org
schema.org
Open Graph protocol
schema.org
official JSON-LD site
Google's structured data guide
http://schema.org
https://schema.org
Article
TechArticle
NewsArticle
BlogPosting
Text
Text
Person
Organization
Text
Date
DateTime
Text
Text
Text
ImageObject
Text
Text
Date
DateTime
Text
Text
Text
Person
Organization
Number
schema.org
official Microdata specification
Google's structured data guide
Open Graph protocol
enum
url
string
url
string
profile
datetime
datetime
string
string
array
string
enum
string
string
url
our Topics and Categories tabs
Google AMP Tracking
Mobile App SDKs
Headline and Image Testing
Video Tracking