./print/qpdf, Structural, content-preserving transformations on PDF files

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 8.0.0, Package name: qpdf-8.0.0, Maintainer: pkgsrc-users

QPDF is a command-line program that does structural, content-preserving
transformations on PDF files. It could have been called something like
pdf-to-pdf. It also provides many useful capabilities to developers of
PDF-producing software or for people who just want to look at the innards of a
PDF file to learn more about how they work.

QPDF is capable of creating linearized (also known as web-optimized) files and
encrypted files. It is also capable of converting PDF files with object streams
(also known as compressed objects) to files with no compressed objects or to
generate object streams from files that don't have them (or even those that
already do). QPDF also supports a special mode designed to allow you to edit the
content of PDF files in a text editor. For more details, please see the
documentation links below.

QPDF includes support for merging and splitting PDFs through the ability to copy
objects from one PDF file into another and to manipulate the list of pages in a
PDF file. The QPDF library also makes it possible for you to create PDF files
from scratch. In this mode, you are responsible for supplying all the contents
of the file, while the QPDF library takes care off all the syntactical
representation of the objects, creation of cross references tables and, if you
use them, object streams, encryption, linearization and other syntactic details.

Required to run:
[graphics/jpeg] [lang/perl5] [devel/pcre]

Required to build:

Master sites:

SHA1: 5fc59652c6c2742a4b115530163f342822e07dcd
RMD160: 74a4dcbbfaa68210aebdbf986f4aac2f00b989a8
Filesize: 7760.989 KB

Version history: (Expand)

CVS history: (Expand)

   2018-02-27 13:37:20 by Ryo ONODERA | Files touched by this commit (4) | Package updated
Log message:
Update to 8.0.0

2018-02-25  Jay Berkenbilt  <ejb@ql.org>

	* 8.0.0: release

2018-02-17  Jay Berkenbilt  <ejb@ql.org>

	* Fix QPDFObjectHandle::getUTF8Val() to properly handle strings
	that are encoded with PDF Doc Encoding. Fixes #179.

	* Add qpdf_check_pdf to the "C" API. This method just attempts to
	read the entire file and produce no output, making possible to
	assess whether the file has any errors that qpdf can detect.

	* Major enhancements to handling of type errors within the qpdf
	library. This fix is intended to eliminate those annoying cases
	where qpdf would exit with a message like "operation for
	dictionary object attemped on object of wrong type" without
	providing any context. Now qpdf keeps enough context to be able to
	issue a proper warning and to handle such conditions in a sensible
	way. This should greatly increase the number of bad files that
	qpdf can recover, and it should make it much easier to figure out
	what's broken when a file contains errors.

	* Error message fix: replace "file position" with "offset" in
	error messages that report lexical or parsing errors. Sometimes
	it's an offset in an object stream or a content stream rather than
	a file position, so this makes the error message less confusing in
	those cases. It still requires some knowledge to find the exact
	position of the error, since when it's not a file offset, it's
	probably an offset into a stream after uncompressing it.

	* Error message fix: correct some cases in which the object that
	contained a lexical error was omitted from the error message.

	* Error message fix: improve file name in the error message when
	there is a parser error inside an object stream.

2018-02-11  Jay Berkenbilt  <ejb@ql.org>

	* Add QPDFObjectHandle::filterPageContents method to provide a
	different interface for applying token filters to page contents
	without modifying the ultimate output.

2018-02-04  Jay Berkenbilt  <ejb@ql.org>

        * Changes listed on today's date are numerous and reflect
	significant enhancements to qpdf's lexical layer. While many
	nuances are discussed and a handful of small bugs were fixed, it
	should be emphasized that none of these issues have any impact on
	any output or behavior of qpdf under "normal" operation. There are
	some changes that have an effect on content stream normalization
	as with qdf mode or on code that interacts with PDF files
	lexically using QPDFTokenizer. There are no incompatible changes
	for normal operation. There are a few changes that will affect the
	exact error messages issued on certain bad files, and there is a
	small non-compatible enhancement regarding the behavior of
	manually constructed QPDFTokenizer::Token objects. Users of the
	qpdf command line tool will see no changes other than the addition
	of a new command-line flag and possibly some improved error

	* Significant lexer (tokenizer) enhancements. These are changes to
	the QPDFTokenizer class. These changes are of concern only to
	people who are operating with PDF files at the lexical layer using
	qpdf. They have little or no impact on most high-level interfaces
	or the command-line tool.

	New token types tt_space and tt_comment to recognize whitespace
	and comments. this makes it possible to tokenize a PDF file or
	stream and preserve everything about it.

	For backward compatibility, space and comment tokens are not
	returned by the tokenizer unless QPDFTokenizer.includeIgnorable()
	is called.

	Better handling of null bytes. These are now included in space
	tokens rather than being their own "tt_word" tokens. This should
	have no impact on any correct PDF file and has no impact on
	output, but it may change offsets in some error messages when
	trying to parse contents of bad files. Under default operation,
	qpdf does not attempt to parse content streams, so this change is
	mostly invisible.

	Bug fix to handling of bad tokens at ends of streams. Now, when
	allowEOF() has been called, these are treated as bad tokens
	(tt_bad or an exception, depending on invocation), and a
	separate tt_eof token is returned. Before the bad token
	contents were returned as the value of a tt_eof token. tt_eof
	tokens are always empty now.

	Fix a bug that would, on rare occasions, report the offset in an
	error message in the wrong space because of spaces or comments
	adjacent to a bad token.

	Clarify in comments exactly where the input source is positioned
	surrounding calls to readToken and getToken.

	* Add a new token type for inline images. This token type is only
	returned by QPDFTokenizer immediately following a call to
	expectInlineImage(). This change includes internal refactoring of
	a handful of places that all separately handled inline images, The
	logic of detecting inline images in content streams is now handled
	in one place in the code. Also we are more flexible about what
	characters may surround the EI operator that marks the end of an
	inline image.

	* New method QPDFObjectHandle::parsePageContents() to improve upon
	QPDFObjectHandle::parseContentStream(). The parseContentStream
	method used to operate on a single content stream, but was fixed
	to properly handle pages with contents split across multiple
	streams in an earlier release. The new method parsePageContents()
	can be called on the page object rather than the value of the
	page dictionary's /Contents key. This removes a few lines of
	boiler-plate code from any code that uses parseContentStream, and
	it also enables creation of more helpful error messages if
	problems are encountered as the error messages can include
	information about which page the streams come from.

	* Update content stream parsing example
	(examples/pdf-parse-content.cc) to use new
	QPDFObjectHandle::parsePageContents() method in favor of the older
	QPDFObjectHandle::parseContentStream() method.

	* Bug fix: change where the trailing newline is added to a stream
	in QDF mode when content normalization is enabled (the default for
	QDF mode). Before, the content normalizer ensured that the output
	ended with a trailing newline, but this had the undesired side
	effect of including the newline in the stream data for purposes of
	length computation. QPDFWriter already appends a newline without
	counting in length for better readability. Ordinarily this makes
	no difference, but in the rare case of a page's contents being
	split in the middle of a token, the old behavior could cause the
	extra newline to be interprted as part of the token. This bug
	could only be triggered in qdf mode, which is a mode intended for
	manual inspection of PDF files' contents, so it is very unlikely
	to have caused any actual problems for people using qpdf for
	production use. Even if it did, it would be very unusual for a PDF
	file to actually be adversely affected by this issue.

	* Add support for coalescing a page's contents into a single
	stream if they are represented as an array of streams. This can be
	performed from the command line using the --coalesce-contents
	option. Coalescing content streams can simplify things for
	software that wants to operate on a page's content streams without
	having to handle weird edge cases like content streams split in
	the middle of tokens. Note that
	QPDFObjectHandle::parsePageContents and
	QPDFObjectHandle::parseContentStream already handled split content
	streams. This is mainly to set the stage for new methods of
	operating on page contents. The new method
	QPDFObjectHandle::pipeContentStreams will pipe all of a page's
	content streams though a single pipeline. The new method
	QPDFObjectHandle.coalesceContentStreams, when called on a page
	object, will do nothing if the page's contents are a single
	stream, but if they are an array of streams, it will replace the
	page's contents with a single stream whose contents are the
	concatenation of the original streams.

	* A few library routines throw exceptions if called on non-page
	objects. These constraints have been relaxed somewhat to make qpdf
	more tolerant of files whose page dictionaries are not properly
	marked as such. Mostly exceptions about page operations being
	called on non page objects will only be thrown in cases where the
	operation had no chance of succeeding anyway. This change has no
	impact on any default mode operations, but it could allow
	applications that use page-level APIs in QPDFObjectHandle to be
	more tolerant of certain types of damaged files.

	* Add QPDFObjectHandle::TokenFilter class and methods to use it to
	perform lexical filtering on content streams. You can call
	QPDFObjectHandle::addTokenFilter on stream object, or you can call
	the higher level QPDFObjectHandle::addContentTokenFilter on a page
	object to cause the stream's contents to passed through a token
	filter while being retrieved by QPDFWriter or any other consumer.
	For details on using TokenFilter, please see comments in

	* Enhance the string, type QPDFTokenizer::Token constructor to
	initialize a raw value in addition to a value. Tokens have a
	value, which is a canonical representation, and a raw value. For
	all tokens except strings and names, the raw value and the value
	are the same. For strings, the value excludes the outer delimiters
	and has non-printing characters normalized. For names, the value
	resolves non-printing characters. In order to better facilitate
	token filters that mostly preserve contents and to enable
	developers to be mostly unconcerned about the nuances of token
	values and raw values, creating string and name tokens now
	properly handles this subtlety of values and raw values. When
	constructing string tokens, take care to avoid passing in the
	outer delimiters. This has always been the case, but it is now
	clarified in comments in QPDFObjectHandle.hh::TokenFilter. This
	has no impact on any existing code unless there's some code
	somewhere that was relying on Token::getRawValue() returning an
	empty string for a manually constructed token. The token class's
	operator== method still only looks at type and value, not raw
	value. For example, string tokens for <41> and (A) would still be
	equal because both are representations of the string "A".

	* Add QPDFObjectHandle::isDataModified method. This method just
	returns true if addTokenFilter has been called on the stream. It
	enables a caller to determine whether it is safe to optimize away
	piping of stream data in cases where the input and output are
	expected to be the same. QPDFWriter uses this internally to skip
	the optimization of not re-compressing already compressed streams
	if addTokenFilter has been called. Most developers will not have
	to worry about this as it is used internally in the library in the
	places that need it. If you are manually retrieving stream data
	with QPDFObjectHandle::getStreamData or
	QPDFObjectHandle::pipeStreamData, you don't need to worry about
	this at all.

	* Provide heavily annoated examples/pdf-filter-tokens.cc example
	that illustrates use of some simple token filters.

	* When normalizing content streams, as in qdf mode, issue warning
	about bad tokens. Content streams are only normalized when this is
	explicitly requested, so this has no impact on normal operation.
	However, in qdf mode, if qpdf detects a bad token, it means that
	either there's a bug in qpdf's lexer, that the file is damaged, or
	that the page's contents are split in a weird way. In any of those
	cases, qpdf could potentially damage the stream's contents by
	replacing carrige returns with newlines or otherwise messing with
	spaces. The mostly likely case of this would be an inline image's
	compressed data being divided across two streams and having the
	compressed data in the second stream contain a carriage return as
	part of its binary data. If you are using qdf mode just to look at
	PDF files in text editors, this usually doesn't matter. In cases
	of contents split across multiple streams, coalescing streams
	would eliminate the problem, so the warning mentions this. Prior
	to this enhancement, the chances of qdf mode writing incorrect
	data were already very low. This change should make it nearly
	impossible for qdf mode to unknowingly write invalid data.

2018-02-04  Jay Berkenbilt  <ejb@ql.org>

	* Add QPDFWriter::setLinearizationPass1Filename method and
	--linearize-pass1 command line option to allow specification of a
	file into which QPDFWriter will write its intermediate
	linearization pass 1 file. This is useful only for debugging qpdf.
	qpdf creates linearized files by computing the output in two
	passes. Ordinarily the first pass is discarded and not written
	anywhere. This option allows it to be inspected.
   2018-02-23 07:25:23 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
qpdf: updated to 7.1.1

7.1.1: release
* Bug fix: properly linearize files whose /ID has a length of
  other than 16 bytes.
* Rename some test files to avoid files with three dots in their
* Fix various build and compilation issues on some platforms and
* Fix a few typos and clarify a few comments in header files.
   2018-02-05 14:39:05 by Ryo ONODERA | Files touched by this commit (2) | Package updated
Log message:
Update to 7.1.0

2018-01-14  Jay Berkenbilt  <ejb@ql.org>

	* 7.1.0: release

	* Allow raw encryption key to be specified in libary and command
	line with the QPDF::setPasswordIsHexKey method and
	--password-is-hex-key option. Allow encryption key to be displayed
	with --show-encryption-key option. Thanks to Didier Stevens
	<didier.stevens@gmail.com> for the idea and contribution of one
	implementation of this idea. See his blog post at
	https://blog.didierstevens.com/2017/12/ … fs-part-3/
	for a discussion of using this for cracking encrypted PDFs. I hope
	that a future release of qpdf will include some additional
	recovery options that may also make use of this capability.

2018-01-13  Jay Berkenbilt  <ejb@ql.org>

	* Fix lexical error: the PDF specification allows floating point
	numbers to end with ".". Fixes #165.

	* Fix link order in the build to avoid conflicts when building
	from source while an older version of qpdf is installed. Fixes #158.

	* Add support for TIFF predictor for LZW and Flate streams. Now
	all predictor functions are supported. Fixes #171.

2017-12-25  Jay Berkenbilt  <ejb@ql.org>

	* Clarify documentation around options that control parsing but
	not output creation. Two options: --suppress-recovery and
	--ignore-xref-streams, were documented in the "Advanced
	Transformation Options" section of the manual and --help output
	even though they are not related to output. These are now
	described in a separate section called "Advanced Parsing Options."

	* Implement remaining PNG filters for decode. Prior versions could
	decode only the "up" filter. Now all PNG filters (sub, up,
	average, Paeth, optimal) are supported for decoding. Thanks to
	Tobias Hoffmann for providing a test PDF file that has images with
	all PNG filters along with different numbers of bits per sample
	and samples per pixel, and thanks to Casey Rojas for providing
	implementations of the remaining PNG filters.

	The implementation of the remaining PNG filters changed the
	interface to the private Pl_PNGFilter class, but this class's
	header file is not in the installation, and there is no public
	interface to the class. Within the library, the class is never
	allocated on the stack; it is only ever dynamically allocated. As
	such, this does not actually break binary compatibility of the
   2017-10-03 10:33:45 by Leonardo Taccari | Files touched by this commit (3) | Package updated
Log message:
qpdf: Delete reference to libjpeg in the libqpdf `.pc' file

graphics/jpeg does not provide any `.pc' file this will lead in failures when
invoking pkg-config agaisnt `libqpdf'. Adjust `Requires.private:' in
libqpdf.pc.in to omit libjpeg.

Bump PKGREVISION since the new installed libqpdf.pc file will fix packages that
depends on print/qpdf.

OK by <pgoyette> and <ryoon>, thanks!
   2017-09-29 23:11:40 by Thomas Klausner | Files touched by this commit (2)
Log message:
qpdf: add missing jpeg dependency. Add missing files to PLIST.
   2017-09-28 14:57:20 by Thomas Klausner | Files touched by this commit (1) | Package updated
Log message:
qpdf: Update LICENSE for 7.0.0
   2017-09-28 14:50:36 by Ryo ONODERA | Files touched by this commit (2) | Package updated
Log message:
Update to 7.0.0

2017-09-15  Jay Berkenbilt  <ejb@ql.org>

	* 7.0.0: release

2017-09-12  Jay Berkenbilt  <ejb@ql.org>

	* Relicense qpdf under version 2.0 of the Apache License rather
	than version 2.0 of the Artistic License. Both are fine, but the
	Apache License is in more widespread use, and I like it a little
	better than Artistic-2.0. It is my intention that there be no
	change in what you can or can't do with qpdf. Versions of qpdf
	prior to version 7 were released under the terms of version 2.0 of
	the Artistic License. At your option, you may continue to consider
	qpdf to be licensed under those terms. Please see the manual for
	additional information.

	* Improve the error message that is issued when QPDFWriter
	encounters a stream that can't be decoded. In particular, mention
	that the stream will be copied without filtering to avoid data

	* Add new methods to the C API to correspond to new additions to
	- qpdf_set_compress_streams
	- qpdf_set_decode_level
	- qpdf_set_preserve_unreferenced_objects
	- qpdf_set_newline_before_endstream

2017-08-25  Jay Berkenbilt  <ejb@ql.org>

	* Re-implement parser iteratively to avoid stack overflow on very
	deeply nested arrays and dictionaries. Fixes #146.

	* Detect infinite loop while finding additional xref tables. Fixes

2017-08-22  Jay Berkenbilt  <ejb@ql.org>

	* 7.0.b1: release

	* Convert all README files to markdown. Names changed as follows:
	  - README --> README.md
	  - README.hardening --> README-hardening.md
	  - README.maintainer --> README-maintainer.md
	  - README-what-to-download.txt --> README-what-to-download.md
	  - README-windows.txt --> README-windows.md
	  The file README-windows-install.txt remains a text file.

2017-08-21  Jay Berkenbilt  <ejb@ql.org>

	* Add support for writing PCLm files. Most of the work was done by
	Sahil Arora <sahilarora.535@gmail.com> as part of a Google Summer
	of Code project in 2017. PCLm support is useful only for clients
	that specifically know how to create PCLm files. Support in qpdf
	is just for ensuring that objects are written in the correct order
	and for including some additional material in the output that is
	required by the PCLm standard.

2017-08-19  Jay Berkenbilt  <ejb@ql.org>

	* Remove --precheck-streams. This is enabled by default now
	without any efficiency cost. This feature was never released.

	* Update pdf-create example to illustrate use of additional image
	compression filters.

	* Add support for /RunLengthDecode and /DCTDecode:
	  - New pipeline types Pl_RunLength and Pl_DCT
	  - New command-line flags --compress-streams and --decode-level
	    to replace/enhance --stream-data
	  - New QPDFWriter::setCompressStreams and
 	    QPDFWriter::setDecodeLevel methods
	  Please see documentation, header files, and help messages for
	  details on these new features.

2017-08-12  Jay Berkenbilt  <ejb@ql.org>

	* Add QPDFObjectHandle::rotatePage to apply rotation to a page
	object. Add --rotate option to qpdf to specify page rotation from
	the command line.

	* Provide --verbose option that causes qpdf to print an indication
	of what files it is writing.

	* Change --single-pages to --split-pages and make it take an
	optional argument specifying the number of pages per file.

2017-08-11  Jay Berkenbilt  <ejb@ql.org>

	* Fix --newline-before-endstream to always add a newline before
	endstream even if the last character was already a newline. This
	is actually what's required by PDF/A. Fixes #133.

	* Handle encrypted files whose encryption parameters are too
	short. Fixes #96.

2017-08-10  Jay Berkenbilt  <ejb@ql.org>

	* Remove dependency on libpcre.

	* Be more forgiving of certain types of errors in the xref table
	that don't interfere with interpreting the table.

	* Remove unused "tracing" parameter from PointerHolder's
	(T*, bool) constructor. This change breaks source code
	compatibility, but since this argument to PointerHolder has not
	used for a long time and the presence of a boolean parameter in
	the primary constructor makes it too easy to use that by mistake
	when trying to use PointerHolder for arrays, it seems like it's
	finally time to take it out. If you have a compile error because
	of this change, please check to see whether you intended to use
	the (bool, T*) version of the constructor instead. If not, just
	remove the second parameter.

2017-08-09  Jay Berkenbilt  <ejb@ql.org>

	* When recovering stream length, find endobj without endstream as
	well as just looking for endstream. Be a little more lax about
	where we allow it to be found.

2017-08-05  Jay Berkenbilt  <ejb@ql.org>

	* Add --single-pages option to cause output to be written to a
	separate file for each page rather than one big file.

	* Process --pages options earlier so that certain inspection
	options, like --show-pages, can show the state after the merging

2017-08-02  Jay Berkenbilt  <ejb@ql.org>

	* Fix off-by-one error in parsing pages options. Fixes #129.

2017-07-29  Jay Berkenbilt  <ejb@ql.org>

	* Support @filename and @- in the qpdf command-line tool to read
	command-line arguments, one per line, from the named file. @-
	reads from standard input. Fixes #16.

	* Detect when input file and output file are the same and exit to
	avoid overwriting and losing input file. Fixes #29.

	* When passing multiple inspection arguments, run --check first,
	and defer exit until after all the checks have been run. This
	makes it possible to force operations such as --show-xref to be
	delayed until after recovery attempts have been made. For example,
	if you have a file with a syntactically valid xref table that has
	some offsets that are incorrect, running qpdf --check --show-xref
	on that file will first recover the xref and the dump the
	recovered xref, while just running qpdf --show-xref will show the
	xref table as present in the file. Fixes #42.

	* When recovering stream length, indicate the recovered length.
	Fixes #44.

	* Add --newline-before-endstream command-line option and
	setNewlineBeforeEndstream method to QPDFWriter. This forces qpdf
	to always add a newline before the endstream keyword. It is a
	necessary but not sufficient condition for PDF/A compliance. Fixes

	* Handle zlib data errors when decoding streams. Fixes #106.

	* Improve handling of files where the "stream" keyword is not
	followed by proper line terminators. Fixes #104.

	* Fix content stream parsing to handle cases of structures within
	the stream split across stream boundaries. Fixes #73.

2017-07-28  Jay Berkenbilt  <ejb@ql.org>

	* Add --preserve-unreferenced command-line option and
	setPreserveUnreferencedObjects method to QPDFWriter. This option
	causes QPDFWriter to write all objects from the input file to the
	output file regardless of whether the objects are referenced.
	Objects are written to the output file in numerical order from the
	input file. This option has no effect for linearized files.

2017-07-27  Jay Berkenbilt  <ejb@ql.org>

	* Add --precheck-streams command-line option and setStreamPrecheck
	method to QPDFWriter to tell QPDFWriter to attempt decoding a
	stream fully before deciding whether to filter it or not.

	* Recover gracefully from streams that aren't filterable because
	the filter parameters are invalid in the stream dictionary or the
	dictionary itself is invalid.

	* Significantly improve recoverability from invalid qpdf objects.
	Most conditions in basic object parsing that used to cause qpdf to
	exit are now warnings. There are still many more opportunities for
	improvements of this sort beyond just object parsing.

2017-07-26  Jay Berkenbilt  <ejb@ql.org>

	* Fixes to infinite loops below also fix problems reported in
	other issues and cover CVE-2017-11624, CVE-2017-11625,
	CVE-2017-11626, and CVE-2017-11627.

	* Don't attempt to interpret syntactic keywords (like R and
	endobj) found while parsing content streams.

	* Detect infinite loops while resolving objects. This could happen
	if something inside an object that had to be resolved during
	parsing, such as a stream length, recursively referenced the
	object being resolved.

	* CVE-2017-9208: Handle references to and appearance of object 0
	as a special case. Object 0 is not allowed, and qpdf was using it
	internally to represent direct objects.

	* CVE-2017-9209: Fix infinite loop caused by attempting to
	reconstruct the xref table while already in the process of
	reconstructing the xref table.

	* CVE-2017-9210: Fix infinite loop caused by attempting to unparse
	an object for inclusion in the text of an exception.
   2016-07-09 08:39:18 by Thomas Klausner | Files touched by this commit (1068) | Package updated
Log message:
Bump PKGREVISION for perl-5.24.0 for everything mentioning perl.