# Additional Page Metadata

## Introduction

{% hint style="warning" %}
**Important:** We automatically collect metadata from your webpages as described below, and we're using it to inform future product development at Chartbeat. Some of this data is now being used in our [Topics and Categories tabs](https://help.chartbeat.com/hc/en-us/articles/16574320245787) in the Historical Dashboard.

We rely on your integration of our [JavaScript configuration variables](https://docs.chartbeat.com/cbp/tracking/standard-websites/configuration-variables) to populate key page properties in your Chartbeat dashboards and reports, like page sections, authors, paths, and titles.
{% endhint %}

To collect metadata about the content of a given page, Chartbeat uses a web scraper that makes an HTTP GET request to the page’s URL, then extracts certain attributes from the page’s HTML as well as the main body text. For convenience and ease of implementation, we rely on existing standards as much as possible. Metadata may be provided in the following formats:

* **JSON-LD (*****recommended*****):** Embed metadata in a `<script>` tag in the `<head>` of an HTML document, with properties and values in JSON form following the [**schema.org**](https://schema.org/) vocabulary. This standard is used by Google and others to display enriched content in search results.
* **Microdata:** Embed metadata within existing elements of an HTML document's `<head>` or `<body>`, using HTML tag properties and values following the [**schema.org**](https://schema.org/) vocabulary. This standard is also used by Google and others to display enriched content in search results.
* **Repeated `<meta>` tags:** Embed metadata in multiple `<meta>` tags in the `<head>` of an HTML document, using HTML tag properties and values following the [**Open Graph protocol**](https://ogp.me/). This standard is used by Facebook and others to display enriched content in social media posts.

## JSON-LD

The JSON-LD format embeds structured data in JSON form following a flexible, expressive vocabulary developed and maintained by [**schema.org**](https://schema.org/). Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the [**official JSON-LD site**](https://json-ld.org/) and [**Google's structured data guide**](https://developers.google.com/search/docs/guides/search-gallery).

### **Examples**

Here's a basic example that follows the standard spec:

```markup
<script type="application/ld+json">
  {
    "@context": "http://schema.org",
    "@type": "NewsArticle",
    "headline": "Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019",
    "description": "We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.",
    "articleBody": "In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...",
    "url": "https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/",
    "datePublished": "2020-01-16T11:07:35-05:00",
    "articleSection": "Blog,Customer",
    "author": {
        "@type": "Person",
        "name": "Nick Lioudis",
    },
    "publisher": {
        "@type": "Organization",
        "name": "Chartbeat",
    },
    "keywords": "2019,audience,data,engagement",
    "thumbnailUrl": "http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png",
  }
</script>
```

And here's the same example, using alternate properties and Chartbeat's more permissive spec:

```markup
<script type="application/ld+json">
  {
    "@context": "http://schema.org",
    "@type": "NewsArticle",
    "headline": "Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019",
    "description": "We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.",
    "articleBody": "In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...",
    "mainEntityOfPage": {
        "@type": "WebPage",
        "@id": "https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/",
    },
    "dateCreated": "2020-01-16 11:07:35-05:00",
    "articleSection": ["Blog", "Customer"],
    "creator": ["Nick Lioudis"],
    "publisher": "Chartbeat",
    "keywords": ["2019", "audience", "data", "engagement"],
    "image": {
        "@type": "ImageObject",
        "url": "http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png",
    }
  }
</script>
```

### **Properties**

#### Required properties:

* `@context`: Collection in which the schema is defined. Should always be "<http://schema.org>" or "<https://schema.org>".
* `@type`: Type of schema used. For articles, should be one of [**Article**](https://schema.org/Article), [**TechArticle**](https://schema.org/TechArticle), [**NewsArticle**](https://schema.org/NewsArticle) (or one of its sub-types), [**BlogPosting**](https://schema.org/BlogPosting) (or one of its sub-types).

#### Key properties:

* `articleSection` ([**Text**](https://schema.org/Text)): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas or as an array of strings.
* `author` ([**Text**](https://schema.org/Text), [**Person**](https://schema.org/Person), or [**Organization**](https://schema.org/Organization)): One or more authors of the article. If not specified, the `creator` property is checked. Multiple authors may be expressed as a single string delimited by commas, an array of strings, or an array of Person or Organization items.
* `headline` ([**Text**](https://schema.org/Text)): Title of the article. If not specified, the `alternativeHeadline` and `name` properties are checked, in that order.
* `datePublished` ([**Date**](https://schema.org/Date) or [**DateTime**](https://schema.org/DateTime)): Date and (optionally) time the article was first published. If not specified, the `dateCreated` property is checked. This value shouldn't change over time, and should be as early or earlier than the `dateModified` property.
* `keywords` ([**Text**](https://schema.org/Text)): One or more subject matter tags associated with the article. Multiple keywords may be expressed as a single string delimited by commas or as an array of strings.
* `thumbnailUrl` ([**Text**](https://schema.org/Text)): URL of the main image associated with the article. If not specified, the `image` property ([**Text**](https://schema.org/Text) or [**ImageObject**](https://schema.org/ImageObject)) is checked.
* `url` ([**Text**](https://schema.org/Text)): Canonical URL of the article.

#### Additional properties:

* `articleBody` ([**Text**](https://schema.org/Text)): Full text of the article. If not specified, the `text` property is checked.
* `dateModified` ([**Date**](https://schema.org/Date) or [**DateTime**](https://schema.org/DateTime)): Date and (optionally) time the article was last modified.
* `description` ([**Text**](https://schema.org/Text)): Brief description of the article.
* `lang` ([**Text**](https://schema.org/Text)): Language code or locale of the article's content.
* `publisher` ([**Text**](https://schema.org/Text), [**Person**](https://schema.org/Person), or [**Organization**](https://schema.org/Organization)): Name of the article's publisher.
* `wordCount` ([**Number**](https://schema.org/Number)): Number of words in the full text of the article.

## **Microdata**

The Microdata format embeds structured data into existing HTML elements throughout the document using tag properties specified by the [**schema.org**](https://schema.org/) vocabulary. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the [**official Microdata specification**](https://www.w3.org/TR/microdata/) and [**Google's structured data guide**](https://developers.google.com/search/docs/guides/search-gallery).

### **Examples**

Here's a basic example that follows the standard spec:

```markup
<html>
    <body>
        <div itemscope itemtype="http://schema.org/NewsArticle">
            <h1 itemprop="headline">Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019</h1>
            <span itemprop="datePublished" content="2020-01-16T11:07:35-05:00">2020-01-16T11:07:35-05:00</span>
            <span itemprop="description">We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.</span><br>
            <div itemprop="image" itemscope itemtype="http://schema.org/ImageObject">
                <meta itemprop="url" content="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
                <img src="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
            </div>
            Author: <span itemprop="author">Nick Lioudis</span><br>
            <div itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
                <span itemprop="name">Chartbeat</span>
            </div>
            <span itemprop="articleBody">In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...</span>
        </div>
    </body>
</html>
```

### **Properties**

Since this format follows the same schema.org vocabulary as JSON-LD, it uses the same properties as described above for JSON-LD. However, those properties are incorporated into HTML differently.

* `itemprop`: Adds a property to an HTML element.
* `itemscope`: Works in conjunction with `itemtype` to specify the type of item associated with a particular HTML element.
* `itemtype`: Provides the URL of the (schema.org) vocabulary item associated with the HTML element, and gives the context for its constituent `itemprop` properties.

## **Repeated `<meta>` tags**

Structured data may be embedded into an HTML document with repeated `<meta>` tags in the `<head>` element, identified with tag properties specified by the [**Open Graph protocol**](https://ogp.me/). Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. This format was designed to incorporate webpages into a "social graph", and as such, includes fewer properties in more restrictive data structures.

### **Examples**

```markup
<html>
    <head>
        <meta property="og:type" content="article">
        <meta property="og:title" content="Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019">
        <meta property="og:url" content="https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/">
        <meta property="og:image" content="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Historical@2x.png">
        <meta property="article:published_time" content="2020-01-16T11:07:35-05:00">
        <meta property="article:author" content="Nick Lioudis">
        <meta property="og:description" content="We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website">
        <meta property="og:site_name" content="Chartbeat">
    </head>
</html>
```

### **Properties**

#### Required properties:

* `og:type` ([**enum**](https://ogp.me/#enum)): Type of article within a social graph. Must be one of `article`, `blog`, or `website`.
* `og:image` ([**url**](https://ogp.me/#url)): URL of the image used to represent the article within a social graph. If not specified, `og:image:url` is checked.
* `og:title` ([**string**](https://ogp.me/#string)): Title of the article.
* `og:url` ([**url**](https://ogp.me/#url)): Canonical URL of the article that can be used as its permanent ID in a social graph.

#### Key properties:

* `article:author` ([**string**](https://ogp.me/#string) or [**profile**](https://ogp.me/#type_profile)): One or more authors of the article.
* `article:modified_time` ([**datetime**](https://ogp.me/#datetime)): Date and (optionally) time the article was last modified. If not specified, `og:updated_time` is checked.
* `article:published_time` ([**datetime**](https://ogp.me/#datetime)): Date and (optionally) time the article was first published. If not specified, `og:pubdate` is checked.
* `article:section` ([**string**](https://ogp.me/#string)): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas.
* `article:tag` ([**string**](https://ogp.me/#string) [**array**](https://ogp.me/#array)): One or more subject matter tags associated with the article. Multiple tags may be expressed as a single string delimited by commas.
* `og:description` ([**string**](https://ogp.me/#string)): Brief description of the article.

#### Additional properties:

* `article:content_tier` ([**enum**](https://ogp.me/#enum)): Access tier assigned to article by its publisher. Should be one of `free`, `metered`, or `locked`.
* `og:locale` ([**string**](https://ogp.me/#string)): Locale in which the article's tags are marked up. Should be formatted as language\_TERRITORY, e.g. `en_US`.
* `og:site_name` ([**string**](https://ogp.me/#string)): Name of the site on which the article appears. If not specified, `article:publisher` is checked.
* `og:video` ([**url**](https://ogp.me/#url)): URL of the video used to represent the article within a social graph. If not specified, `og:video:url` is checked.

## Web scraper

Chartbeat uses a web scraper to collect metadata about the pages tracked by Chartbeat's main pinger. For each page, the scraper makes a standard HTTP GET request to the URL and downloads the raw HTML, from which metadata is then extracted. Page metadata is joined to the visitor engagement data collected by our pinger using the host and path fields (aka h and p keys).

In general, Chartbeat's web scraper will only visit pages once, shortly after their first appearance in our engagement data pipeline. Load on your site's servers should be minimal; however, scraper activity will be higher at first, when many pages appear "new" to our system, before declining to match your site's usual publication rate.

The scraper identifies itself with the UserAgent `Chartbeat-ContentX/0.3 (http://www.chartbeat.com)`. Note that the version number (e.g. 0.3) will change over time. To prevent unsuccessful scrapes and ensure that every page has associated metadata, please add this UserAgent to your site's whitelist.

You can also increase our ability to detect data for [our Topics and Categories tabs](https://help.chartbeat.com/hc/en-us/articles/16574320245787) and additional metadata driven features (such as the word count pivot in the Historical Dashboard) by allowing traffic from our IP addresses to bypass any features which obfuscate the content of the page (e.g. paywalls).&#x20;

Chartbeat regularly tries to visit pages where we detect significant traffic to examine the content of the page. You may notice traffic on your page from Chartbeat (note that this traffic will not appear in your Chartbeat dashboards), and you can identify our visits as they will be from the IP addresses listed below:

52.200.230.127\
35.174.236.164\
44.211.104.80\
54.159.123.51\
52.44.209.155\
52.0.154.147\
50.17.188.178\
44.210.22.85\
54.242.4.200

Please note that while this list does not regularly change, it may be expanded in the future due to infrastructure improvements.

## Up next

#### Choose your next integration.

* ~~**Standard Website Tracking**~~ ✅
* [Google AMP Tracking](https://docs.chartbeat.com/cbp/tracking/google-amp)
* [Mobile App SDKs](https://docs.chartbeat.com/cbp/tracking/mobile-app-sdks)
* [Headline and Image Testing](https://docs.chartbeat.com/cbp/feature-integrations/testing)
* [Video Tracking ](https://docs.chartbeat.com/cbp/feature-integrations/video-engagement)&#x20;
