Structured data, JSON-LD, Schema.org and hackered

Hackered
Wednesday, March 18, 2015
by Sean McAlinden

Over the last year I have been lucky enough to be exposed to some great technologies within the semantic web realm such as RDF triple stores and inference engines to great new standards such as JSON-LD.

I have decided to take a little bit of this knowledge and apply it to my blog with some great results, specifically JSON-LD.

Why bother implementing JSON LD?

So the answer to this is pretty simple, it is to provide the main search engines such as Google and Bing with some contextual information about my blog.

Contextual information can include various aspects such as the fact it is a blog, it is aimed at developers, written by me etc. etc.

Also each post has some very important sections such as a title, description, main content area and details such as the creation date.

Whilst the search engines are generally incredible at understanding the resources on the internet, giving them a clear helping hand can only be a good thing.

JSON-LD 

JSON-LD (LD = Linked Data) is just JSON with a few predefined properties. It is a great structure for describing contextual information and can easily be serialized to classes in languages such as C# or RDF.

The following is the JSON LD I output on my home page to describe the intention of the site:

<script type="application/ld+json">
    {
  "@context": "http://schema.org",
  "@type": "Blog",
  "Author": {
    "@id": "http://uk.linkedin.com/in/seanmcalinden",
    "@type": "Author",
    "Name": "Sean McAlinden",
    "Description": ""
  },
  "About": "Interesting articles about application development",
  "Audience": {
    "AudienceType": "application developers",
    "@type": "Audience"
  }
}
</script>

To fully explain the semantic web and JSON-LD is out of scope for this post, however it is relatively straight forward to explain the above JSON.

@context: this points to a public schema which describes many "things" including types of websites such as blogs. There are a number of schemas publicly available, however the schema.org schema is perfect for my blogs requirements.

@type: this is a property indicating what the JSON content "is", i.e. provides some type context, in our case the type is indicating the site is a blog.

If you navigate to http://schema.org/Blog you will find a bunch of properties which can be used to further describe the blog type, some of these properties are primitive types and some of them are complex properties themselves such as Author.

Note: the schema.org directories are case sensitive so make sure Blog has a capital B.

@id: this is a URI or URL representing an instance of the @type.

In the case of the author subtype, I have decided the main instance of me as the author can be represented by my linkedin profile.

Another important thing to note is the content type: application/ld+json.

Blog Post

Blog posts are a very similar story, here is an example taken from my site:

<script type="application/ld+json">
        {
  "@context": "http://schema.org",
  "@id": "http://www.hackered.co.uk/articles/signalr-logging-exceptions-using-a-custom-hubpipelinemodule",
  "@type": "BlogPosting",
  "Name": "SignalR: Logging exceptions using a custom HubPipeline Module",
  "Description": "<p>When things go wrong with complex rich client applications it can be difficult to monitor and diagnose, especially when using complex client-server communication approaches such as websockets, SignalR offers a nice way to hook into its pipeline and add custom exception logging and trace handling.</p>\r\n<p>In this post I'll show a very simple custom HubPipelineModule which can log and trace exceptions.</p>",
  "Url": "http://www.hackered.co.uk/articles/signalr-logging-exceptions-using-a-custom-hubpipelinemodule",
  "Author": {
    "@id": "http://uk.linkedin.com/in/seanmcalinden",
    "@type": "Author",
    "Name": "Sean McAlinden",
    "Description": ""
  },
  "ArticleBody": "Article body goes here"
}
</script>

As you can see there is quite a lot of rich information about the post, lets take a look at the parts:

@context: we are using the same schema.org schema

@type: the type for a post is BlogPosting (see http://schema.org/BlogPosting for more details)

@id: I have chosen to use the URL of the blog post itself as the id

The other properties such as name and description are properties from the schema type and Author is the same as before.

How do I know this is working?

After implementing JSON LD within your site/blog, next time Google indexes your pages, go to your Google Webmaster tools and take a look at the structured data section, it will be filled with the rich information you have output within your JSON-LD.

My Classes

Just for completeness I'll quickly show the classes I use to serialize the JSON-LD for my blog.

I am using JSON.Net to perform the serialization.

Firstly I have created a SemanticBase class which has my main properties:

public class SemanticBase
{
    [JsonProperty(PropertyName = "@context", Order = 0)]
    public object Context { get; set; }

    [JsonProperty(PropertyName = "@id", Order = 1)]
    public object SemanticId { get; set; }

    [JsonProperty(PropertyName = "@type", Order = 2)]
    public string SemanticType { get; set; }

    [JsonProperty(Order = 3)]
    public string Name { get; set; }

    [JsonProperty(Order = 4)]
    public string Description { get; set; }

    [JsonProperty(Order = 5)]
    public Uri Url { get; set; }
}

I then derived classes for posts and authors:

public class PostSemanticViewModel : SemanticBase
{
    [JsonProperty(Order = 9)]
    public AuthorSemanticModel Author { get; set; }

    [JsonProperty(Order = 10)]
    public string ArticleBody { get; set; }
}

public class AuthorSemanticModel : SemanticBase
{
}

The main thing to notice here is the JsonProperty attributes, I am using them to set the correct property names such as @context, @type and @id.

For me it was very satisfying when I noticed my Google Webmaster Tools structured data suddenly fill with great information.