Our great sponsors
-
Web pages are an interesting example of both structured and unstructured data. There are specific elements one could look at for certain information like the element or other semantic elements like
or
. The problem though is that these elements are more like our "address" example earlier - they often contain more than just the strict data we are looking for. A title might have a prefix or suffix of the website's name. An article or section might have many other layers of
,
or any other elements to help form the site's structure. To top it off, the HTML structure can vary wildly from site to site. If you were wanting to extract data from multiple websites, it can get very hard very fast.
That said, there are a number of ways to embed structured data into web pages. A web page could use Microdata, RDFa, JSON-LD or Open Graph to express structured data. More than that though, a web page can use multiple of these at the same time. Open Graph is commonly used as a method of defining details for a link preview while the others might express more complex data like product pricing or reviews.
Having standard formats like Microdata or JSON-LD are a good start but only represent the format of the data - we need a common vocabulary so we can understand the data those formats encode. One common vocabulary used is called Schema.org and provides over 700 types including types to describe people, places, products, recipes, reviews, vehicles, movies and medical devices. Using Schema.org for structured data on a website can help search engines provide richer experiences in the search results.
Summary
Structured data, through standardising expected properties and value formats, makes the sharing and processing of data easier. Web pages in particular benefit from encoding structured data in their mark-up where it can be used by search engines and other tools.
-
Web pages are an interesting example of both structured and unstructured data. There are specific elements one could look at for certain information like the element or other semantic elements like
or
. The problem though is that these elements are more like our "address" example earlier - they often contain more than just the strict data we are looking for. A title might have a prefix or suffix of the website's name. An article or section might have many other layers of
,
or any other elements to help form the site's structure. To top it off, the HTML structure can vary wildly from site to site. If you were wanting to extract data from multiple websites, it can get very hard very fast.
That said, there are a number of ways to embed structured data into web pages. A web page could use Microdata, RDFa, JSON-LD or Open Graph to express structured data. More than that though, a web page can use multiple of these at the same time. Open Graph is commonly used as a method of defining details for a link preview while the others might express more complex data like product pricing or reviews.
Having standard formats like Microdata or JSON-LD are a good start but only represent the format of the data - we need a common vocabulary so we can understand the data those formats encode. One common vocabulary used is called Schema.org and provides over 700 types including types to describe people, places, products, recipes, reviews, vehicles, movies and medical devices. Using Schema.org for structured data on a website can help search engines provide richer experiences in the search results.
Summary
Structured data, through standardising expected properties and value formats, makes the sharing and processing of data easier. Web pages in particular benefit from encoding structured data in their mark-up where it can be used by search engines and other tools.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.