Monday, 18 April 2011

Back to Basics: XML

So I began my day today with more reading on XML. This led me to look closer at DTDs. In fact I just looked at the HTML 4 doctype closely for the first time. Kind of funny if you consider that it's the first line added to well formed markup and that I've been using it religiously for years.

I never realized CDATA and PCDATA were considered building blocks of XML (and HTML) documents. Apparently there are more than just elements ;)
  • Elements
  • Attributes
  • Entities
  • CDATA (character data)
  • PCDATA (parsed character data)
I originally only wanted to learn how to scrape data sets from Wikipedia, but I've realized now that to get a firm understanding of information retrieval, the logical order of study (in this case) is:
  • XML (Come on, you know this one)
  • RDF (Resource Description Framework)
  • SPARQL (SPARQL Protocol and RDF Query Language)
I've always been interested in the semantic Web. Finally, I have some goals that will keep me motivated to dig into web 3.0!
If HTML and the Web made all the online documents look like one huge book, RDF, schema, and inference languages will make all the data in the world look like one huge database.
Tim Berners-Lee, Weaving the Web, 1999