GNU.WIKI: The GNU/Linux Knowledge Base

  [HOME] [PHP Manual] [HowTo] [ABS] [MAN1] [MAN2] [MAN3] [MAN4] [MAN5] [MAN6] [MAN7] [MAN8] [MAN9]

  [0-9] [Aa] [Bb] [Cc] [Dd] [Ee] [Ff] [Gg] [Hh] [Ii] [Jj] [Kk] [Ll] [Mm] [Nn] [Oo] [Pp] [Qq] [Rr] [Ss] [Tt] [Uu] [Vv] [Ww] [Xx] [Yy] [Zz]


NAME

       hxpipe - convert XML file to a format easier to parse with Perl or AWK

SYNOPSIS

       hxpipe [ -l ] [ -- ] [ file-or-URL ]

DESCRIPTION

       hxpipe  parses  an  HTML  or  XML  file  and  outputs  a  line-oriented
       representation of it that is well suited to further processing with AWK
       or  similar tools. The format is similar to the ESIS (Element Structure
       Information Set) that is output by nsgmls/onsgmls.

       The reverse operation, converting back to mark-up, is performed by  the
       hxunpipe program.

       The output format is as follows:

       <!--comment-->
                 Comments are output as

                     *comment

                 I.e., a single line starting with "*" followed by the text of
                 the comment. Line feeds, carriage returns  and  tabs  in  the
                 text  are  written as "
", "
" and "	", respectively. Text
                 that looks like a numerical character entity is written  with
                 the "&" replaced by "\".  The line ends with a line feed.

                 Note  that  onsgmls  outputs  comments  starting  with  a "_"
                 instead of a "*" and doesn't replace  the  "&"  of  numerical
                 character  entities  by "\" (and by default it omits comments
                 altogether).

       <?processing instruction>
                 Processing instructions are output as

                     ?processing instruction

                 I.e., a single line starting with a "?" followed by the  text
                 of  the  processing  instruction.  The text is escaped as for
                 comments (see above).

       <!DOCTYPE root PUBLIC "-//foo//DTD bar//EN" "http://example.org/dtd">
                 DOCTYPEs are output as one of the following:

                     !root "-//foo//DTD bar//EN" http://example.org/dtd
                     !root "-//foo//DTD bar//EN"
                     !root "" http://example.org/dtd
                     !root ""

                 for respectively: a DOCTYPE with (1)  both  a  public  and  a
                 system  identifier,  (2) only a public identifier, (3) only a
                 system identifier, or (4) neither of the two. I.e., a  single
                 line  starting with a "!", followed by a space and a possibly
                 empty quoted string,  followed  optionally  by  a  space  and
                 arbitrary text. Note the quotes for the public identifier and
                 the absence of quotes for the system identifier.

       <elt att1="value1" att2="value2">
                 A start tag is output as

                     Aatt1 CDATA value1
                     Aatt2 CDATA value2
                     (elt

                 I.e., as zero or more lines for the attributes and  one  line
                 for  the element type. Each line for an attribute starts with
                 "A" followed by the name  of  the  attribute,  a  space,  the
                 literal  string  "CDATA",  another  space,  and the attribute
                 value. The text of the attribute  value  is  escaped  as  for
                 comments  (see  above).  The line for the element type starts
                 with "(" followed by the element type.

                 hxpipe does not read DTDs and  assumes  that  attributes  are
                 always CDATA. It never generates other types (IMPLIED, TOKEN,
                 ID, etc.), unlike onsgmls.

       </elt>    End tags are output as

                     )elt

                 I.e., as a line starting with ")"  followed  by  the  element
                 type.

       <empty att1="val1" att2="val2"/>
                 Empty elements (in XML) are output as

                     Aatt1 CDATA val1
                     Aatt2 CDATA val2
                     |empty

                 I.e.,  as  zero  or  more  lines  for attributes and one line
                 starting with "|" followed by the element type.

                 Note  that  onsgmls  never  outputs  "|".  (However,  it  can
                 optionally  output  a  line  consisting  of a single "e" just
                 before the "(" line, to indicate that the element is empty.)

       text      Text is output as

                     -text

                 I.e., as a single line starting  with  a  "-".  The  text  is
                 escaped as for comments (see above).

       line numbers
                 When  the -l option is in effect, hxpipe will intersperse the
                 output with lines of the form

                     L12

                 where "12" is replaced with the line  number  in  the  source
                 where the next output came from.

       hxpipe does not normalize the input and does not add mising tags. It is
       thus possible that there are unequal numbers of "(" and ")"  lines.  If
       it is important that every start tag is matched by an end tag, pipe the
       input through hxnormalize -x first.

OPTIONS

       The following options are supported:

       -l        Add "L" lines to the output to indicate the line  numbers  in
                 the source.

OPERANDS

       The following operand is supported:

       file-or-URL
                 The name or URL of an HTML file. If absent, standard input is
                 read instead.

EXIT STATUS

       The following exit values are returned:

       0         Successful completion.

       > 0       An error occurred in the parsing of the  HTML  file.   hxpipe
                 will try to correct the error and produce output anyway.

ENVIRONMENT

       To  use a proxy to retrieve remote files, set the environment variables
       http_proxy and ftp_proxy.  E.g., http_proxy="http://localhost:8080/"

BUGS

       The error recovery for incorrect HTML is  primitive.   hxnormalize  can
       currently  only  retrieve  remote  files  over  HTTP. It doesn't handle
       password-protected files, nor  files  whose  content  depends  on  HTTP
       "cookies."

SEE ALSO

       hxunpipe(1), onsgmls(1).



  All copyrights belong to their respective owners. Other content (c) 2014-2018, GNU.WIKI. Please report site errors to webmaster@gnu.wiki.
Page load time: 0.118 seconds. Last modified: November 04 2018 12:49:43.