[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FW: DOCBOOK-APPS: sgml auto-indenter



-----Original Message-----
From: Kevin M. Dunn [mailto:kdunn@hsc.edu]
Sent: Sunday, November 26, 2000 12:22 PM
To: docbook-apps@lists.oasis-open.org
Subject: DOCBOOK-APPS: sgml auto-indenter


Several people have discussed the use of tidy to indent sgml and xml
sources. It didn't work for my documents, as 
tidy did not recognize my entities. Rather than fix tidy, I just wrote a
perl script to indent anything with sgml-type 
tags. Only non-empty tags are indented, and text is justified at 80
characters/line (easily changed). Try it out, if you 
like, and let me know what needs fixing. I am running perl under redhat 6.1.

Known problems: will break line-specific enviroments. So far, the script is
quite general--it does not recognize 
specific tags and so could be used for any xml or sgml, not just docbook. Is
there any way to recognize literal text 
independent of DTD? Leading whitespace, for example? Trailing whitespace? Or
I could indent tags only, and leave 
all non-tag text unjustified and unindented. 
----Cut Here------ 
#!/usr/bin/perl -w
#
# sb: the sgml beautifier
# indents non-empty sgml tags
# usage: sb filename or sb < filename or | sb
# author: Kevin M. Dunn (kdunn@hsc.edu)
# license: anyone is free to use this for any purpose whatever
#
$jl = 80; #text will be justified to 80 characters/line
$nl = 0;
$sp = 0;
$newline = ""; # hack to prevent extraneous blank first line
$space[0] = "";
separate_tags();
get_tags();
indent_tags();
unlink ("$$.tmp"); # remove temporary file
print "\n"; # add final newline to output
sub separate_tags {
  open(FILETMP, ">$$.tmp");
  while (<>){
    $_ =~ s/</\n</g;
    $_ =~ s/>/>\n/g;
    print FILETMP "$_";
  }
  close(FILETMP);
}
sub get_tags {
  open(FILETMP, "$$.tmp");
  while (<FILETMP>){
    $word = $_;
    $word =~ s/[> ].*//;
    chomp($word);
    if ( $word =~ /^<\/.*/ ){;
      $tag2{$word} = 1;
      $word =~ s/\///;
      $tag1{$word} = 1;
    }
  }
}
sub indent_tags {
  open(FILETMP, "$$.tmp");
  while (<FILETMP>){
    chomp($_);    $word = $_;
    $word =~ s/[> ].*//;
    if ( $tag1{$word} ){
      print "\n$space[$sp]$_";
      $nl = $jl; # force new line on next line of input
      $sp++;
      if ( ! $space[$sp] ){
        $space[$sp] = $space[$sp-1] . "  ";
      }
    }
    elsif ( $tag2{$word} ){
      $sp--;
      print "\n$space[$sp]$_";
      $nl = $jl; # force new line on next line of input
    }
    elsif ( $word =~ /<.*/ ) {
      print "$newline$space[$sp]$_";
      $newline = "\n"; # hack to prevent extraneous blank first line
      $nl = $jl; # force new line on next line of input
    }
    elsif ( length($_) > 0 ) {
      justify();
    }
  }
}
sub justify {
  @words = split;
  $nw = @words;
  for ($i = 0; $i < $nw; $i++ ){
    $ll += length($words[$i]) + 1 + $nl; # line length if this word is added
    if ($ll < $jl){ # if short enough, print it
      print "$words[$i] ";
      $nl = 0;
    }
    else { # if line is too long, start a new one
      print "\n$space[$sp]$words[$i] ";
      $nl = 0;
      $ll = length($space[$sp] . $words[$i]) + 1;
    }
  }
}
----Cut Here------ 
-- 
Kevin M. Dunn
kdunn@hsc.edu
Department of Chemistry
Hampden-Sydney College
HSC, VA 23943
(804) 223-6181
(804) 223-6374 (Fax)
  


--  
To UNSUBSCRIBE, email to ldp-discuss-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


mirror server hosted at Truenetwork, Russian Federation.