Provides low level access to the objects in a PDF file via a hash-like object.
A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.
Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.
The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.
h = PDF::Reader::ObjectHash.new("somefile.pdf") h[1] => 3469 h[PDF::Reader::Reference.new(1,0)] => 3469
Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.
Valid options:
:password - the user password to decrypt the source PDF
# File lib/pdf/reader/object_hash.rb, line 41 def initialize(input, opts = {}) @io = extract_io_from(input) @pdf_version = read_version @xref = PDF::Reader::XRef.new(@io) @trailer = @xref.trailer @cache = PDF::Reader::ObjectCache.new @sec_handler = build_security_handler(opts) end
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
# File lib/pdf/reader/object_hash.rb, line 71 def [](key) return default if key.to_i <= 0 unless key.is_a?(PDF::Reader::Reference) key = PDF::Reader::Reference.new(key.to_i, 0) end if @cache.has_key?(key) @cache[key] elsif xref[key].is_a?(Fixnum) buf = new_buffer(xref[key]) @cache[key] = decrypt(key, Parser.new(buf, self).object(key.id, key.gen)) elsif xref[key].is_a?(PDF::Reader::Reference) container_key = xref[key] object_streams[container_key] ||= PDF::Reader::ObjectStream.new(object(container_key)) @cache[key] = object_streams[container_key][key.id] end rescue InvalidObjectError return default end
Recursively dereferences the object refered to be key
. If
key
is not a PDF::Reader::Reference, the key is returned
unchanged.
# File lib/pdf/reader/object_hash.rb, line 103 def deref!(key) case object = deref(key) when Hash {}.tap { |hash| object.each do |k, value| hash[k] = deref!(value) end } when PDF::Reader::Stream object.hash = deref!(object.hash) object when Array object.map { |value| deref!(value) } else object end end
iterate over each key, value. Just like a ruby hash.
# File lib/pdf/reader/object_hash.rb, line 146 def each(&block) @xref.each do |ref| yield ref, self[ref] end end
iterate over each key. Just like a ruby hash.
# File lib/pdf/reader/object_hash.rb, line 155 def each_key(&block) each do |id, obj| yield id end end
iterate over each value. Just like a ruby hash.
# File lib/pdf/reader/object_hash.rb, line 163 def each_value(&block) each do |id, obj| yield obj end end
return true if there are no objects in this file
# File lib/pdf/reader/object_hash.rb, line 178 def empty? size == 0 ? true : false end
# File lib/pdf/reader/object_hash.rb, line 258 def encrypted? trailer.has_key?(:Encrypt) end
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
local_default is the object that will be returned if the requested key doesn't exist.
# File lib/pdf/reader/object_hash.rb, line 133 def fetch(key, local_default = nil) obj = self[key] if obj return obj elsif local_default return local_default else raise IndexError, "#{key} is invalid" if key.to_i <= 0 end end
return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference
# File lib/pdf/reader/object_hash.rb, line 185 def has_key?(check_key) # TODO update from O(n) to O(1) each_key do |key| if check_key.kind_of?(PDF::Reader::Reference) return true if check_key == key else return true if check_key.to_i == key.id end end return false end
return true if the specifiedvalue exists in the file
# File lib/pdf/reader/object_hash.rb, line 202 def has_value?(value) # TODO update from O(n) to O(1) each_value do |obj| return true if obj == value end return false end
return an array of all keys in the file
# File lib/pdf/reader/object_hash.rb, line 217 def keys ret = [] each_key { |k| ret << k } ret end
returns the type of object a ref points to
# File lib/pdf/reader/object_hash.rb, line 51 def obj_type(ref) self[ref].class.to_s.to_sym rescue nil end
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
# File lib/pdf/reader/object_hash.rb, line 95 def object(key) key.is_a?(PDF::Reader::Reference) ? self[key] : key end
returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.
Useful for apps that want to extract data from specific pages.
# File lib/pdf/reader/object_hash.rb, line 253 def page_references root = fetch(trailer[:Root]) @page_references ||= get_page_objects(root[:Pages]).flatten end
# File lib/pdf/reader/object_hash.rb, line 262 def sec_handler? !!sec_handler end
return the number of objects in the file. An object with multiple generations is counted once.
# File lib/pdf/reader/object_hash.rb, line 171 def size xref.size end
returns true if the supplied references points to an object with a stream
# File lib/pdf/reader/object_hash.rb, line 58 def stream?(ref) self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream) end
return an array of arrays. Each sub array contains a key/value pair.
# File lib/pdf/reader/object_hash.rb, line 239 def to_a ret = [] each do |id, obj| ret << [id, obj] end ret end
# File lib/pdf/reader/object_hash.rb, line 211 def to_s "<PDF::Reader::ObjectHash size: #{self.size}>" end
return an array of all values in the file
# File lib/pdf/reader/object_hash.rb, line 225 def values ret = [] each_value { |v| ret << v } ret end
return an array of all values from the specified keys
# File lib/pdf/reader/object_hash.rb, line 233 def values_at(*ids) ids.map { |id| self[id] } end