Additional Page Metadata

Learn how we're beginning to collect metadata about your content to power our next generation products, and get a head start by configuring this metadata in your page HTML.

Introduction

Important: We automatically collect metadata from your webpages as described below, and we're using it to inform future product development at Chartbeat. Some of this data is now being used in our Topics and Categories tabs in the Historical Dashboard.

We rely on your integration of our JavaScript configuration variables to populate key page properties in your Chartbeat dashboards and reports, like page sections, authors, paths, and titles.

To collect metadata about the content of a given page, Chartbeat uses a web scraper that makes an HTTP GET request to the page’s URL, then extracts certain attributes from the page’s HTML as well as the main body text. For convenience and ease of implementation, we rely on existing standards as much as possible. Metadata may be provided in the following formats:

  • JSON-LD (recommended): Embed metadata in a <script> tag in the <head> of an HTML document, with properties and values in JSON form following the schema.org vocabulary. This standard is used by Google and others to display enriched content in search results.

  • Microdata: Embed metadata within existing elements of an HTML document's <head> or <body>, using HTML tag properties and values following the schema.org vocabulary. This standard is also used by Google and others to display enriched content in search results.

  • Repeated <meta> tags: Embed metadata in multiple <meta> tags in the <head> of an HTML document, using HTML tag properties and values following the Open Graph protocol. This standard is used by Facebook and others to display enriched content in social media posts.

JSON-LD

The JSON-LD format embeds structured data in JSON form following a flexible, expressive vocabulary developed and maintained by schema.org. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the official JSON-LD site and Google's structured data guide.

Examples

Here's a basic example that follows the standard spec:

<script type="application/ld+json">
  {
    "@context": "http://schema.org",
    "@type": "NewsArticle",
    "headline": "Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019",
    "description": "We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.",
    "articleBody": "In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...",
    "url": "https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/",
    "datePublished": "2020-01-16T11:07:35-05:00",
    "articleSection": "Blog,Customer",
    "author": {
        "@type": "Person",
        "name": "Nick Lioudis",
    },
    "publisher": {
        "@type": "Organization",
        "name": "Chartbeat",
    },
    "keywords": "2019,audience,data,engagement",
    "thumbnailUrl": "http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png",
  }
</script>

And here's the same example, using alternate properties and Chartbeat's more permissive spec:

<script type="application/ld+json">
  {
    "@context": "http://schema.org",
    "@type": "NewsArticle",
    "headline": "Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019",
    "description": "We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.",
    "articleBody": "In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...",
    "mainEntityOfPage": {
        "@type": "WebPage",
        "@id": "https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/",
    },
    "dateCreated": "2020-01-16 11:07:35-05:00",
    "articleSection": ["Blog", "Customer"],
    "creator": ["Nick Lioudis"],
    "publisher": "Chartbeat",
    "keywords": ["2019", "audience", "data", "engagement"],
    "image": {
        "@type": "ImageObject",
        "url": "http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png",
    }
  }
</script>

Properties

Required properties:

Key properties:

  • articleSection (Text): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas or as an array of strings.

  • author (Text, Person, or Organization): One or more authors of the article. If not specified, the creator property is checked. Multiple authors may be expressed as a single string delimited by commas, an array of strings, or an array of Person or Organization items.

  • headline (Text): Title of the article. If not specified, the alternativeHeadline and name properties are checked, in that order.

  • datePublished (Date or DateTime): Date and (optionally) time the article was first published. If not specified, the dateCreated property is checked. This value shouldn't change over time, and should be as early or earlier than the dateModified property.

  • keywords (Text): One or more subject matter tags associated with the article. Multiple keywords may be expressed as a single string delimited by commas or as an array of strings.

  • thumbnailUrl (Text): URL of the main image associated with the article. If not specified, the image property (Text or ImageObject) is checked.

  • url (Text): Canonical URL of the article.

Additional properties:

  • articleBody (Text): Full text of the article. If not specified, the text property is checked.

  • dateModified (Date or DateTime): Date and (optionally) time the article was last modified.

  • description (Text): Brief description of the article.

  • lang (Text): Language code or locale of the article's content.

  • publisher (Text, Person, or Organization): Name of the article's publisher.

  • wordCount (Number): Number of words in the full text of the article.

Microdata

The Microdata format embeds structured data into existing HTML elements throughout the document using tag properties specified by the schema.org vocabulary. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the official Microdata specification and Google's structured data guide.

Examples

Here's a basic example that follows the standard spec:

<html>
    <body>
        <div itemscope itemtype="http://schema.org/NewsArticle">
            <h1 itemprop="headline">Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019</h1>
            <span itemprop="datePublished" content="2020-01-16T11:07:35-05:00">2020-01-16T11:07:35-05:00</span>
            <span itemprop="description">We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.</span><br>
            <div itemprop="image" itemscope itemtype="http://schema.org/ImageObject">
                <meta itemprop="url" content="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
                <img src="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
            </div>
            Author: <span itemprop="author">Nick Lioudis</span><br>
            <div itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
                <span itemprop="name">Chartbeat</span>
            </div>
            <span itemprop="articleBody">In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...</span>
        </div>
    </body>
</html>

Properties

Since this format follows the same schema.org vocabulary as JSON-LD, it uses the same properties as described above for JSON-LD. However, those properties are incorporated into HTML differently.

  • itemprop: Adds a property to an HTML element.

  • itemscope: Works in conjunction with itemtype to specify the type of item associated with a particular HTML element.

  • itemtype: Provides the URL of the (schema.org) vocabulary item associated with the HTML element, and gives the context for its constituent itemprop properties.

Repeated <meta> tags

Structured data may be embedded into an HTML document with repeated <meta> tags in the <head> element, identified with tag properties specified by the Open Graph protocol. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. This format was designed to incorporate webpages into a "social graph", and as such, includes fewer properties in more restrictive data structures.

Examples

<html>
    <head>
        <meta property="og:type" content="article">
        <meta property="og:title" content="Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019">
        <meta property="og:url" content="https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/">
        <meta property="og:image" content="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
        <meta property="article:published_time" content="2020-01-16T11:07:35-05:00">
        <meta property="article:author" content="Nick Lioudis">
        <meta property="og:description" content="We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website">
        <meta property="og:site_name" content="Chartbeat">
    </head>
</html>

Properties

Required properties:

  • og:type (enum): Type of article within a social graph. Must be one of article, blog, or website.

  • og:image (url): URL of the image used to represent the article within a social graph. If not specified, og:image:url is checked.

  • og:title (string): Title of the article.

  • og:url (url): Canonical URL of the article that can be used as its permanent ID in a social graph.

Key properties:

  • article:author (string or profile): One or more authors of the article.

  • article:modified_time (datetime): Date and (optionally) time the article was last modified. If not specified, og:updated_time is checked.

  • article:published_time (datetime): Date and (optionally) time the article was first published. If not specified, og:pubdate is checked.

  • article:section (string): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas.

  • article:tag (string array): One or more subject matter tags associated with the article. Multiple tags may be expressed as a single string delimited by commas.

  • og:description (string): Brief description of the article.

Additional properties:

  • article:content_tier (enum): Access tier assigned to article by its publisher. Should be one of free, metered, or locked.

  • og:locale (string): Locale in which the article's tags are marked up. Should be formatted as language_TERRITORY, e.g. en_US.

  • og:site_name (string): Name of the site on which the article appears. If not specified, article:publisher is checked.

  • og:video (url): URL of the video used to represent the article within a social graph. If not specified, og:video:url is checked.

Web scraper

Chartbeat uses a web scraper to collect metadata about the pages tracked by Chartbeat's main pinger. For each page, the scraper makes a standard HTTP GET request to the URL and downloads the raw HTML, from which metadata is then extracted. Page metadata is joined to the visitor engagement data collected by our pinger using the host and path fields (aka h and p keys).

In general, Chartbeat's web scraper will only visit pages once, shortly after their first appearance in our engagement data pipeline. Load on your site's servers should be minimal; however, scraper activity will be higher at first, when many pages appear "new" to our system, before declining to match your site's usual publication rate.

The scraper identifies itself with the UserAgent Chartbeat-ContentX/0.3 (http://www.chartbeat.com). Note that the version number (e.g. 0.3) will change over time. To prevent unsuccessful scrapes and ensure that every page has associated metadata, please add this UserAgent to your site's whitelist.

You can also increase our ability to detect data for our Topics and Categories tabs and additional metadata driven features (such as the word count pivot in the Historical Dashboard) by allowing traffic from our IP addresses to bypass any features which obfuscate the content of the page (e.g. paywalls).

Chartbeat regularly tries to visit pages where we detect significant traffic to examine the content of the page. You may notice traffic on your page from Chartbeat (note that this traffic will not appear in your Chartbeat dashboards), and you can identify our visits as they will be from the IP addresses listed below:

52.200.230.127 35.174.236.164 44.211.104.80 54.159.123.51 52.44.209.155 52.0.154.147 50.17.188.178 44.210.22.85 54.242.4.200

Please note that while this list does not regularly change, it may be expanded in the future due to infrastructure improvements.

Up next

Choose your next integration.

Last updated