by Thomas Leitner a.k.a. gettalong
a versatile PDF creation and manipulation library for Ruby
a standalone application for performing the most common PDF tasks like merging files
designed with ease of use and performance in mind
require 'hexapdf'
doc = HexaPDF::Document.new
info = doc.trailer.info
info[:Title] = 'This is the PDF document title'
info[:CreationDate] = Time.now
doc.catalog[:PageLayout] = :SinglePage
doc.catalog[:NeedsRendering] = true
HexaPDF::Object
which wraps all the
information (oid, gen, value, stream) in a separate HexaPDF::PDFData
object.HexaPDF::PDFData
increases memory usage but provides “convertibility”They rely on a base type but use special semantics.
→ Use automatic, behind-the scenes conversion
require 'hexapdf'
require 'stringio'
doc = HexaPDF::Document.new
doc.trailer.info[:CreationDate] = Time.now
doc.trailer.info[:UnknownField] = Time.now
out = StringIO.new
doc.write(out)
doc = HexaPDF::Document.new(io: out)
info = doc.trailer.info
p info.data.value[:CreationDate] # => "D:20180826074144+02'00'"
p info[:CreationDate] # => 2018-08-26 07:41:44 +0200
p info.data.value[:CreationDate] # => 2018-08-26 07:41:44 +0200
p info[:UnknownField] # => "D:20180826074144+02'00'"
See lib/hexapdf/dictionary_fields.rb
and HexaPDF::Dictionary#[]
:Type
key specifying their type but
sometimes the type is only implicitly known through their location.HexaPDF::PDFData
object in a “throwaway”
object based on the type that provides additional functionalityclass HexaPDF::Type::Info < Dictionary
define_type :XXInfo
define_field :Title, type: String, version: '1.1'
define_field :Author, type: String
define_field :Subject, type: String, version: '1.1'
define_field :Keywords, type: String, version: '1.1'
define_field :Creator, type: String
define_field :Producer, type: String
define_field :CreationDate, type: PDFDate
define_field :ModDate, type: PDFDate
define_field :Trapped, type: Symbol, version: '1.3'
end
HexaPDF::GlobalConfiguration['object.type_map'][:XXInfo] =
'HexaPDF::Type::Info'
class HexaPDF::Type::PageTreeNode < Dictionary
define_type :Pages
define_field :Type, type: Symbol, required: true, default: type
define_field :Parent, type: Dictionary, indirect: true
define_field :Kids, type: Array, required: true, default: []
define_field :Count, type: Integer, required: true, default: 0
def page_count
def page(index)
def insert_page(index, page)
def add_page(page)
def delete_page(page)
def each_page
end
class HexaPDF::Type::Trailer
def perform_validation
super
unless value[:ID]
msg = if value[:Encrypt]
"ID field is required when an Encrypt dictionary is present"
else
"ID field should always be set"
end
yield(msg, true)
set_random_id
end
unless value[:Root]
yield("A PDF document must have a Catalog dictionary", true)
value[:Root] = document.add(Type: :Catalog)
value[:Root].validate {|message, correctable| yield(message, correctable) }
end
if value[:Encrypt] && (!document.security_handler ||
!document.security_handler.encryption_key_valid?)
yield("Encryption key doesn't match encryption dictionary", false)
end
end
end
HexaPDF::PDFData
conceptHexaPDF::PDFData
object lead to higher memory usage→ Don’t let obvious disadvantages discourage you from trying out new things!
Parsing and serialization classes
s = HexaPDF::Serializer.new
p s.serialize(Time.now) # => "(D:20180826080743+02'00')"
→ Also used for PDF content streams.
PDF filter implementation
source = HexaPDF::Filter.source_from_string('My String')
source = HexaPDF::Filter::ASCII85Decode.encoder(source)
HexaPDF::Filter.string_from_source(source) # => "9mIj[FE2)5B)~>"
→ Also used by the PNG parsing code.
Encrypting and decrypting PDFs is a nearly completely tacked-on thing.
$ rake test Run options: --seed 5344 # Running: .......|SNIP 100s more dots|.............................. Finished in 2.093626s, 885.5449 runs/s, 13857.2973 assertions/s. 1854 runs, 29012 assertions, 0 failures, 0 errors, 0 skips Coverage report generated. 9032 / 9032 LOC (100%) covered.
require 'hexapdf'
doc = HexaPDF::Document.new
doc.trailer.info[:CreationDate] = Time.now
doc.trailer.info[:Title] = "My Hello World"
canvas = doc.pages.add.canvas
canvas.font('Helvetica', size: 100)
canvas.text("Hello World!", at: [20, 400])
doc.write("hello-world.pdf")
Yes, it does.
HexaPDF vs ? - file size optimization
Black HexaPDF, orange pdftk (GCJ), blue QPDF (C++)
Black HexaPDF, orange pdftk (GCJ), blue QPDF (C++)
Generate readable (because most of PDF is in ASCII format) but compact output
Use best compression available
hexapdf optimize
produces smaller files than pdftk and qpdf
HexaPDF vs ? - raw text output
Black HexaPDF, orange Prawn
HexaPDF vs ? - line wrapping
Black HexaPDF, orange Prawn, blue reportlab, green tcpdf
Code samples and comparisons
Raw text benchmark
Line wrapping benchmark
Image centering and stitching scripts
Complex text fitting
hexapdf
application
cmdparse
library for command-style interface
Merging PDF files (comparison with pdftk
)
Modifying a PDF file (selecting and optionally rotating pages)
Batch execution
Text layout using classes like Paragraph
, Table
, …
AcroForm support
Document outlines (i.e. bookmarks)
More commands for the CLI
…
HexaPDF is a complete PDF solution, written in pure Ruby
Uses orthogonality, lazy loading and iterative design
Designed with ease of use, performance, low memory usage in mind
Homepage and documentation at https://hexapdf.gettalong.org
Licensed under the AGPL, commercial licenses available at https://gettalong.at/hexapdf
What questions do you have?