Additional Page Metadata
Learn how we're beginning to collect metadata about your content to power our next generation products, and get a head start by configuring this metadata in your page HTML.
Last updated
Was this helpful?
Learn how we're beginning to collect metadata about your content to power our next generation products, and get a head start by configuring this metadata in your page HTML.
Last updated
Was this helpful?
Important: We automatically collect metadata from your webpages as described below, and we're using it to inform future product development at Chartbeat. Some of this data is now being used in our in the Historical Dashboard.
We rely on your integration of our to populate key page properties in your Chartbeat dashboards and reports, like page sections, authors, paths, and titles.
To collect metadata about the content of a given page, Chartbeat uses a web scraper that makes an HTTP GET request to the page’s URL, then extracts certain attributes from the page’s HTML as well as the main body text. For convenience and ease of implementation, we rely on existing standards as much as possible. Metadata may be provided in the following formats:
JSON-LD (recommended): Embed metadata in a <script>
tag in the <head>
of an HTML document, with properties and values in JSON form following the vocabulary. This standard is used by Google and others to display enriched content in search results.
Microdata: Embed metadata within existing elements of an HTML document's <head>
or <body>
, using HTML tag properties and values following the vocabulary. This standard is also used by Google and others to display enriched content in search results.
Repeated <meta>
tags: Embed metadata in multiple <meta>
tags in the <head>
of an HTML document, using HTML tag properties and values following the . This standard is used by Facebook and others to display enriched content in social media posts.
The JSON-LD format embeds structured data in JSON form following a flexible, expressive vocabulary developed and maintained by . Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the and .
Here's a basic example that follows the standard spec:
And here's the same example, using alternate properties and Chartbeat's more permissive spec:
Here's a basic example that follows the standard spec:
Since this format follows the same schema.org vocabulary as JSON-LD, it uses the same properties as described above for JSON-LD. However, those properties are incorporated into HTML differently.
itemprop
: Adds a property to an HTML element.
itemscope
: Works in conjunction with itemtype
to specify the type of item associated with a particular HTML element.
itemtype
: Provides the URL of the (schema.org) vocabulary item associated with the HTML element, and gives the context for its constituent itemprop
properties.
<meta>
tagsChartbeat uses a web scraper to collect metadata about the pages tracked by Chartbeat's main pinger. For each page, the scraper makes a standard HTTP GET request to the URL and downloads the raw HTML, from which metadata is then extracted. Page metadata is joined to the visitor engagement data collected by our pinger using the host and path fields (aka h and p keys).
In general, Chartbeat's web scraper will only visit pages once, shortly after their first appearance in our engagement data pipeline. Load on your site's servers should be minimal; however, scraper activity will be higher at first, when many pages appear "new" to our system, before declining to match your site's usual publication rate.
The scraper identifies itself with the UserAgent Chartbeat-ContentX/0.3 (http://www.chartbeat.com)
. Note that the version number (e.g. 0.3) will change over time. To prevent unsuccessful scrapes and ensure that every page has associated metadata, please add this UserAgent to your site's whitelist.
Chartbeat regularly tries to visit pages where we detect significant traffic to examine the content of the page. You may notice traffic on your page from Chartbeat (note that this traffic will not appear in your Chartbeat dashboards), and you can identify our visits as they will be from the IP addresses listed below:
52.200.230.127 35.174.236.164 44.211.104.80 54.159.123.51 52.44.209.155 52.0.154.147 50.17.188.178 44.210.22.85 54.242.4.200
Please note that while this list does not regularly change, it may be expanded in the future due to infrastructure improvements.
Standard Website Tracking ✅
@context
: Collection in which the schema is defined. Should always be "" or "".
@type
: Type of schema used. For articles, should be one of , , (or one of its sub-types), (or one of its sub-types).
articleSection
(): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas or as an array of strings.
author
(, , or ): One or more authors of the article. If not specified, the creator
property is checked. Multiple authors may be expressed as a single string delimited by commas, an array of strings, or an array of Person or Organization items.
headline
(): Title of the article. If not specified, the alternativeHeadline
and name
properties are checked, in that order.
datePublished
( or ): Date and (optionally) time the article was first published. If not specified, the dateCreated
property is checked. This value shouldn't change over time, and should be as early or earlier than the dateModified
property.
keywords
(): One or more subject matter tags associated with the article. Multiple keywords may be expressed as a single string delimited by commas or as an array of strings.
thumbnailUrl
(): URL of the main image associated with the article. If not specified, the image
property ( or ) is checked.
url
(): Canonical URL of the article.
articleBody
(): Full text of the article. If not specified, the text
property is checked.
dateModified
( or ): Date and (optionally) time the article was last modified.
description
(): Brief description of the article.
lang
(): Language code or locale of the article's content.
publisher
(, , or ): Name of the article's publisher.
wordCount
(): Number of words in the full text of the article.
The Microdata format embeds structured data into existing HTML elements throughout the document using tag properties specified by the vocabulary. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the and .
Structured data may be embedded into an HTML document with repeated <meta>
tags in the <head>
element, identified with tag properties specified by the . Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. This format was designed to incorporate webpages into a "social graph", and as such, includes fewer properties in more restrictive data structures.
og:type
(): Type of article within a social graph. Must be one of article
, blog
, or website
.
og:image
(): URL of the image used to represent the article within a social graph. If not specified, og:image:url
is checked.
og:title
(): Title of the article.
og:url
(): Canonical URL of the article that can be used as its permanent ID in a social graph.
article:author
( or ): One or more authors of the article.
article:modified_time
(): Date and (optionally) time the article was last modified. If not specified, og:updated_time
is checked.
article:published_time
(): Date and (optionally) time the article was first published. If not specified, og:pubdate
is checked.
article:section
(): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas.
article:tag
( ): One or more subject matter tags associated with the article. Multiple tags may be expressed as a single string delimited by commas.
og:description
(): Brief description of the article.
article:content_tier
(): Access tier assigned to article by its publisher. Should be one of free
, metered
, or locked
.
og:locale
(): Locale in which the article's tags are marked up. Should be formatted as language_TERRITORY, e.g. en_US
.
og:site_name
(): Name of the site on which the article appears. If not specified, article:publisher
is checked.
og:video
(): URL of the video used to represent the article within a social graph. If not specified, og:video:url
is checked.
You can also increase our ability to detect data for and additional metadata driven features (such as the word count pivot in the Historical Dashboard) by allowing traffic from our IP addresses to bypass any features which obfuscate the content of the page (e.g. paywalls).