Thursday, 21 April 2011

DBpedia facts and figures

Content of DBpedia's data set

Every resource in DBpedia (URls in the form http://dbpedia.org/page/Name) is tied directly to an English Wikipedia page (in the form http://en.wikipedia.org/wiki/Name).
  • DBpedia derives its ontology from Wikipedia.
  • DBpedia's data set describes over 3.5 million "things" with over half a billion "facts" as of January 2010.
  • 1.67 million of these "things" are classified in a consistent Ontology, including:
    • 364,000 persons
    • 462,000 places (including 340,000 populated places
    • 99,000 music albums
    • 54,000 films
    • 17,000 video games
    • 148,000 organizations (including 35,000 companies and 34,000 educational institutions)
    • 169,000 species
    • 5,200 diseases
  • DBpedia uses the Resource Description Framework (RDF) to publish data extracted from Wikipedia.
  • Development toolkits are available for many programming languages to process DBpedia data.
  • The DBpedia data set features
    • labels and abstracts for these 3.5 million things in up to 97 different languages
    • 1,850,000 links to images
    • 5,900,000 links to external web pages
    • 6,500,000 external links into other RDF datasets
    • 633,000 Wikipedia categories
    • 2,900,000 YAGO categories
  • The knowledge base consists of over 672 million pieces of information (RDF triples), where
    • 286 million were extracted from the English edition of Wikipedia
    • 386 million were extracted from other language editions