GNU.WIKI: The GNU/Linux Knowledge Base

  [HOME] [PHP Manual] [HowTo] [ABS] [MAN1] [MAN2] [MAN3] [MAN4] [MAN5] [MAN6] [MAN7] [MAN8] [MAN9]

  [0-9] [Aa] [Bb] [Cc] [Dd] [Ee] [Ff] [Gg] [Hh] [Ii] [Jj] [Kk] [Ll] [Mm] [Nn] [Oo] [Pp] [Qq] [Rr] [Ss] [Tt] [Uu] [Vv] [Ww] [Xx] [Yy] [Zz]


       HTML::HTML5::Outline - implementation of the HTML5 Outline algorithm


               use JSON;
               use HTML::HTML5::Outline;

               my $html = <<'HTML';
               <!doctype html>
               <h1>Good Morning</h1>

               my $outline = HTML::HTML5::Outline->new($html);
               print to_json($outline->to_hashref, {pretty=>1,canonical=>1});


       This is an implementation of the HTML5 Outline algorithm, as per

       The module can output a JSON-friendly hashref, or an RDF model.

       ·   "HTML::HTML5::Outline->new($html, %options)"

           Construct a new outline. $html is the HTML to generate an outline
           from, either as an HTML or XHTML string, or as an
           XML::LibXML::Document object.


           ·   default_language - default language to assume text is in when
               no lang/xml:lang attribute is available. e.g. 'en-gb'.

           ·   element_subjects - rather advanced feature that doesn't bear
               explaining. See USE WITH RDF::RDFA::PARSER for an example.

           ·   microformats - support "<ul class="xoxo">", "<ol class="xoxo">"
               and "<whatever class="figure">" as sectioning elements (like
               "<section>", "<figure>", etc).  Boolean, defaults to false.

           ·   parser - 'html' (default) or 'xml' - choose the parser to use
               for XHTML/HTML. If the constructor is passed an
               XML::LibXML::Document, this is ignored.

           ·   suppress_collections - allows rdf:List stuff to be suppressed
               from RDF output. RDF output - especially in Turtle format -
               looks somewhat nicer without them, but if you care about the
               order of headings and sections, then you'll want them. Boolean,
               defaults to false.

           ·   uri - the document URI for resolving relative URI references.
               Only really used by the RDF output.

   Object Methods
       ·   "to_hashref"

           Returns data as a nested hashref/arrayref structure. Dump it as
           JSON and you'll figure out the format pretty easily.

       ·   "to_rdf"

           Returns data as a n RDF::Trine::Model. Requires RDF::Trine to be
           installed. Otherwise this method won't exist.

       ·   "primary_outlinee"

           Returns a HTML::HTML5::Outline::Outlinee element representing the
           outline for the page.

   Class Methods
       ·   "has_rdf"

           Indicates whether the "to_rdf" object method exists.


       This module produces RDF data where many of the resources described are
       HTML elements. RDFa data typically does not, but RDF::RDFa::Parser does
       also support some extensions to RDFa which do (e.g. support for the
       "cite" and "role" attributes). It's useful to combine the RDF data from
       each, and RDF::RDFa::Parser 1.093 and upwards contains a few shims to
       make this possible.

       Without further ado...

               use HTML::HTML5::Outline;
               use RDF::RDFa::Parser 1.093;
               use RDF::TrineShortcuts;

               my $rdfa = RDF::RDFa::Parser->new(
                               'html5', '1.1',
                               role_attr     => 1,
                               cite_attr     => 1,
                               longdesc_attr => 1,

               my $outline = HTML::HTML5::Outline->new(
                       uri              => $rdfa->uri,
                       element_subjects => $rdfa->element_subjects,

               # Merging two graphs is pretty complicated in RDF::Trine
               # but a little easier with RDF::TrineShortcuts...
               my $combined = rdf_parse();
               rdf_parse($rdfa->graph,     model => $combined);
               rdf_parse($outline->to_rdf, model => $combined);

               my $NS = {
                       dc    => '',
                       o     => '',
                       type  => '',
                       xs    => '',
                       xhv   => '',

               print rdf_string($combined => 'Turtle', namespaces => $NS);


       HTML::HTML5::Outline::RDF, HTML::HTML5::Outline::Outlinee,

       HTML::HTML5::Parser, HTML::HTML5::Sanity.


       Toby Inkster, <>


       This module is a fork of the document structure parser from Swignition

       That in turn includes the following credits: thanks to Ryan King and
       Geoffrey Sneddon for pointing me towards [the HTML5] algorithm. I also
       used Geoffrey's python implementation as a crib sheet to help me figure
       out what was supposed to happen when the HTML5 spec was ambiguous.


       Copyright (C) 2008-2011 by Toby Inkster

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

  All copyrights belong to their respective owners. Other content (c) 2014-2018, GNU.WIKI. Please report site errors to
Page load time: 0.135 seconds. Last modified: November 04 2018 12:49:43.