Additional Page Metadata
Learn how we're beginning to collect metadata about your content to power our next generation products, and get a head start by configuring this metadata in your page HTML.
Important: We are beginning to automatically collect metadata from your webpages as described below, and we're using it to inform future product development at Chartbeat. However, this data is not yet exposed in our current tools. We currently rely on your integration of our JavaScript configuration variables to populate key page properties in your Chartbeat dashboards and reports, like page sections, authors, paths, and titles.
To collect metadata about the content of a given page, Chartbeat uses a web scraper that makes an HTTP GET request to the page’s URL, then extracts certain attributes from the page’s HTML as well as the main body text. For convenience and ease of implementation, we rely on existing standards as much as possible. Metadata may be provided in the following formats:
- JSON-LD (recommended): Embed metadata in a
<script>
tag in the<head>
of an HTML document, with properties and values in JSON form following the schema.org vocabulary. This standard is used by Google and others to display enriched content in search results. - Microdata: Embed metadata within existing elements of an HTML document's
<head>
or<body>
, using HTML tag properties and values following the schema.org vocabulary. This standard is also used by Google and others to display enriched content in search results. - Repeated
<meta>
tags: Embed metadata in multiple<meta>
tags in the<head>
of an HTML document, using HTML tag properties and values following the Open Graph protocol. This standard is used by Facebook and others to display enriched content in social media posts.
The JSON-LD format embeds structured data in JSON form following a flexible, expressive vocabulary developed and maintained by schema.org. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the official JSON-LD site and Google's structured data guide.
Here's a basic example that follows the standard spec:
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "NewsArticle",
"headline": "Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019",
"description": "We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.",
"articleBody": "In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...",
"url": "https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/",
"datePublished": "2020-01-16T11:07:35-05:00",
"articleSection": "Blog,Customer",
"author": {
"@type": "Person",
"name": "Nick Lioudis",
},
"publisher": {
"@type": "Organization",
"name": "Chartbeat",
},
"keywords": "2019,audience,data,engagement",
"thumbnailUrl": "http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/[email protected]",
}
</script>
And here's the same example, using alternate properties and Chartbeat's more permissive spec:
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "NewsArticle",
"headline": "Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019",
"description": "We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.",
"articleBody": "In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...",
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/",
},
"dateCreated": "2020-01-16 11:07:35-05:00",
"articleSection": ["Blog", "Customer"],
"creator": ["Nick Lioudis"],
"publisher": "Chartbeat",
"keywords": ["2019", "audience", "data", "engagement"],
"image": {
"@type": "ImageObject",
"url": "http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/[email protected]",
}
}
</script>
@context
: Collection in which the schema is defined. Should always be "http://schema.org" or "https://schema.org".@type
: Type of schema used. For articles, should be one of Article, TechArticle, NewsArticle (or one of its sub-types), BlogPosting (or one of its sub-types).
articleSection
(Text): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas or as an array of strings.author
(Text, Person, or Organization): One or more authors of the article. If not specified, thecreator
property is checked. Multiple authors may be expressed as a single string delimited by commas, an array of strings, or an array of Person or Organization items.headline
(Text): Title of the article. If not specified, thealternativeHeadline
andname
properties are checked, in that order.keywords
(Text): One or more subject matter tags associated with the article. Multiple keywords may be expressed as a single string delimited by commas or as an array of strings.thumbnailUrl
(Text): URL of the main image associated with the article. If not specified, theimage
property (Text or ImageObject) is checked.
The Microdata format embeds structured data into existing HTML elements throughout the document using tag properties specified by the schema.org vocabulary. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. For more info, see the official Microdata specification and Google's structured data guide.
Here's a basic example that follows the standard spec:
<html>
<body>
<div itemscope itemtype="http://schema.org/NewsArticle">
<h1 itemprop="headline">Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019</h1>
<span itemprop="datePublished" content="2020-01-16T11:07:35-05:00">2020-01-16T11:07:35-05:00</span>
<span itemprop="description">We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website.</span><br>
<div itemprop="image" itemscope itemtype="http://schema.org/ImageObject">
<meta itemprop="url" content="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/[email protected]">
<img src="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/[email protected]">
</div>
Author: <span itemprop="author">Nick Lioudis</span><br>
<div itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<span itemprop="name">Chartbeat</span>
</div>
<span itemprop="articleBody">In December, we showed you where readers spent over 294 billion minutes of Total Engaged Time through our 100 Most Engaging Stories...</span>
</div>
</body>
</html>
Since this format follows the same schema.org vocabulary as JSON-LD, it uses the same properties as described above for JSON-LD. However, those properties are incorporated into HTML differently.
itemprop
: Adds a property to an HTML element.itemscope
: Works in conjunction withitemtype
to specify the type of item associated with a particular HTML element.itemtype
: Provides the URL of the (schema.org) vocabulary item associated with the HTML element, and gives the context for its constituentitemprop
properties.
Structured data may be embedded into an HTML document with repeated
<meta>
tags in the <head>
element, identified with tag properties specified by the Open Graph protocol. Chartbeat mostly follows the official specification, but allows alternate forms of some properties for your convenience. This format was designed to incorporate webpages into a "social graph", and as such, includes fewer properties in more restrictive data structures.<html>
<head>
<meta property="og:type" content="article">
<meta property="og:title" content="Data, platforms, and subscriptions: What you read (by Total Engaged Time) in 2019">
<meta property="og:url" content="https://blog.chartbeat.com/2020/01/16/total-engaged-time-chartbeat-blog-data/">
<meta property="og:image" content="http://4ehuia1v75h912e6wht7ul1m-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/[email protected]">
<meta property="article:published_time" content="2020-01-16T11:07:35-05:00">
<meta property="article:author" content="Nick Lioudis">
<meta property="og:description" content="We dove into our own Historical Dashboard to understand the topics that garnered the most total engaged time on the Chartbeat blog in 2019. See more on our website">
<meta property="og:site_name" content="Chartbeat">
</head>
</html>
og:image
(url): URL of the image used to represent the article within a social graph. If not specified,og:image:url
is checked.
article:modified_time
(datetime): Date and (optionally) time the article was last modified. If not specified,og:updated_time
is checked.article:published_time
(datetime): Date and (optionally) time the article was first published. If not specified,og:pubdate
is checked.article:section
(string): One or more sections assigned to the article. Multiple sections may be expressed as a single string delimited by commas.
article:content_tier
(enum): Access tier assigned to article by its publisher. Should be one offree
,metered
, orlocked
.og:locale
(string): Locale in which the article's tags are marked up. Should be formatted as language_TERRITORY, e.g.en_US
.og:site_name
(string): Name of the site on which the article appears. If not specified,article:publisher
is checked.og:video
(url): URL of the video used to represent the article within a social graph. If not specified,og:video:url
is checked.
Chartbeat uses a web scraper to collect metadata about the pages tracked by Chartbeat's main pinger. For each page, the scraper makes a standard HTTP GET request to the URL and downloads the raw HTML, from which metadata is then extracted. Page metadata is joined to the visitor engagement data collected by our pinger using the host and path fields (aka h and p keys).
In general, Chartbeat's web scraper will only visit pages once, shortly after their first appearance in our engagement data pipeline. Load on your site's servers should be minimal; however, scraper activity will be higher at first, when many pages appear "new" to our system, before declining to match your site's usual publication rate.
The scraper identifies itself with the UserAgent
Chartbeat-ContentX/0.3 (http://www.chartbeat.com)
. Note that the version number (e.g. 0.3) will change over time. To prevent unsuccessful scrapes and ensure that every page has associated metadata, please add this UserAgent to your site's whitelist.- Standard Website Tracking ✅
Last modified 2yr ago