by Thomas Leitner a.k.a. gettalong
Prawn
pdf-reader
pdf-inspector
combine_pdf
origami
(originally intended for auditing PDFs)
Prawn
pdf-reader
pdf-inspector
combine_pdf
origami
They only implement a subset of PDF functionality.
until
a versatile PDF creation and manipulation library for Ruby
a standalone application for performing the most common PDF tasks like merging files
designed with ease of use and performance in mind
require 'hexapdf'
doc = HexaPDF::Document.new
canvas = doc.pages.add.canvas
canvas.font('Helvetica', size: 100)
canvas.text("Hello World!", at: [20, 400])
doc.write("hello-world.pdf")
require 'hexapdf'
doc = HexaPDF::Document.new
info = doc.trailer.info
info[:Title] = 'This is the PDF document title'
info[:CreationDate] = Time.now
doc.catalog[:PageLayout] = :SinglePage
doc.catalog[:NeedsRendering] = true
require 'hexapdf'
require 'stringio'
doc = HexaPDF::Document.new
doc.trailer.info[:CreationDate] = Time.now
doc.trailer.info[:UnknownField] = Time.now
out = StringIO.new
doc.write(out)
doc = HexaPDF::Document.new(io: out)
info = doc.trailer.info
p info.data.value[:CreationDate] # => "D:20180826074144+02'00'"
p info[:CreationDate] # => 2018-08-26 07:41:44 +0200
p info.data.value[:CreationDate] # => 2018-08-26 07:41:44 +0200
p info[:UnknownField] # => "D:20180826074144+02'00'"
See lib/hexapdf/dictionary_fields.rb
and HexaPDF::Dictionary#[]
class HexaPDF::Type::Info < Dictionary
define_type :XXInfo
define_field :Title, type: String, version: '1.1'
define_field :Author, type: String
define_field :Subject, type: String, version: '1.1'
define_field :Keywords, type: String, version: '1.1'
define_field :Creator, type: String
define_field :Producer, type: String
define_field :CreationDate, type: PDFDate
define_field :ModDate, type: PDFDate
define_field :Trapped, type: Symbol, version: '1.3'
end
HexaPDF::GlobalConfiguration['object.type_map'][:XXInfo] =
'HexaPDF::Type::Info'
class HexaPDF::Type::PageTreeNode < Dictionary
define_type :Pages
define_field :Type, type: Symbol, required: true, default: type
define_field :Parent, type: Dictionary, indirect: true
define_field :Kids, type: Array, required: true, default: []
define_field :Count, type: Integer, required: true, default: 0
def page_count
def page(index)
def insert_page(index, page)
def add_page(page)
def delete_page(page)
def each_page
end
class HexaPDF::Type::Trailer
def perform_validation
super
unless value[:ID]
msg = if value[:Encrypt]
"ID field is required when an Encrypt dictionary is present"
else
"ID field should always be set"
end
yield(msg, true)
set_random_id
end
unless value[:Root]
yield("A PDF document must have a Catalog dictionary", true)
value[:Root] = document.add(Type: :Catalog)
value[:Root].validate {|message, correctable| yield(message, correctable) }
end
if value[:Encrypt] && (!document.security_handler ||
!document.security_handler.encryption_key_valid?)
yield("Encryption key doesn't match encryption dictionary", false)
end
end
end
require 'hexapdf'
s = HexaPDF::Serializer.new
p s.serialize(Time.now) # => "(D:20180826080743+02'00')"
source = HexaPDF::Filter.source_from_string('My String')
source = HexaPDF::Filter::ASCII85Decode.encoder(source)
p HexaPDF::Filter.string_from_source(source) # => "9mIj[FE2)5B)~>"
$ rake test Run options: --seed 5344 # Running: .......|SNIP 100s more dots|.............................. Finished in 2.093626s, 885.5449 runs/s, 13857.2973 assertions/s. 1854 runs, 29012 assertions, 0 failures, 0 errors, 0 skips Coverage report generated. 9032 / 9032 LOC (100%) covered.
HexaPDF vs ? - file size optimization
Black HexaPDF, orange pdftk (GCJ), blue QPDF (C++)
HexaPDF vs ? - raw text output
Black HexaPDF, orange Prawn
HexaPDF vs ? - line wrapping
Black HexaPDF, orange Prawn, blue reportlab, green tcpdf
Black HexaPDF, orange pdftk (GCJ), blue QPDF (C++)
Generate readable (because most of PDF is in ASCII format) but compact output
Use best compression available
hexapdf optimize
produces smaller files than pdftk and qpdf
hexapdf
application
cmdparse
library for command-style interface
Merging PDF files (comparison with pdftk
)
Modifying a PDF file (selecting and optionally rotating pages)
Batch execution
Code samples and comparisons
Raw text benchmark
Line wrapping benchmark
Image centering and stitching scripts
Complex text fitting
Text layout using classes like Paragraph
, Table
, …
AcroForm support
Document outlines (i.e. bookmarks)
More commands for the CLI
…
HexaPDF is a complete PDF solution, written in pure Ruby
hexapdf
binary for most quick PDF tasks like merging
Homepage and documentation at https://hexapdf.gettalong.org
Licensed under the AGPL, commercial licenses available at https://gettalong.at/hexapdf
What questions do you have?