

WebNLG Dataset Summary

This repository presents the evolution of the WebNLG corpus.

Each folder contains the same data in two formats: xml and json.

  1. release_v2

    It is the latest release.

    It includes release_v1 and test data (seen categories) from the WebNLG challenge.

    We split it into train/dev/test, ensuring equal representation of DBpedia categories and tripleset sizes.

    Tree shapes and types (sibling, chain, mixed) were added for each input RDF tree.

  2. release_v2_constrained

    It has the same data as release_v2.

    The split into train/dev/test is more challenging. That split ensures that a triple occurring in train/dev is not present in test (more info in the INLG 2018 paper below).

  3. release_v1

    It matches Final Release (Larger Dataset) on the challenge website.

    It doesn't include test data (seen categories) from the challenge.

    No split into train/dev/test was provided.

    Covers 15 DBpedia categories.

  4. webnlg_challenge_2017

    Contains the data used in the WebNLG Challenge 2017.

    Covers 10 DBpedia categories (the City category only partially).





