./textproc/py-pdf, Pure-python PDF library

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 5.3.1, Package name: py312-pdf-5.3.1, Maintainer: pkgsrc-users

pypdf is a free and open-source pure-python PDF library capable of
splitting, merging, cropping, and transforming the pages of PDF
files. It can also add custom data, viewing options, and passwords
to PDF files. pypdf can retrieve text and metadata from PDFs as

Master sites:

Filesize: 4894.38 KB

Version history: (Expand)

CVS history: (Expand)

   2025-03-03 14:06:43 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-pdf: updated to 5.3.1


Bug Fixes (BUG)

Use the correct name StandardEncoding for the predefined cmap
Handle inline images containing EI  sequences
Fix check box value which should be name object
Fix stream position on inline image fallback extraction
Fix object count for incremental writer

Robustness (ROB)

Avoid index errors on empty lines in xref table
Improve handling of LZW decoder table overflow
Ignore non-numbers for width when building font width map
Avoid negative seek values when reading partially broken files

Documentation (DOC)

Fixed PageObject.images example usage for replacing image
   2025-02-23 21:44:52 by Thomas Klausner | Files touched by this commit (2)
Log message:
py-pdf: adapt for flit_core 3.11.

   2025-02-12 13:12:49 by Adam Ciarcinski | Files touched by this commit (3) | Package updated
Log message:
py-pdf: updated to 5.3.0

Version 5.3.0, 2025-02-09

New Features (ENH)
- Handle attachments in /Kids and provide object-oriented API

Bug Fixes (BUG)
- Handle annotations being None on merging

Robustness (ROB)
- Prevent excessive layout mode text output from Type3 fonts

Documentation (DOC)
- stefan6419846 becomes BDFL of pypdf
- Tidy the visitor function description

Developer Experience (DEV)
- Remove ignoring multiple Ruff rules
- Remove unused mutmut configuration

Testing (TST)
- Fix warning assertions to use `pytest.warns()`
   2025-01-27 15:00:11 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-pdf: updated to 5.2.0

Version 5.2.0, 2025-01-26

Deprecations (DEP)
- Deprecate with replacement CCITParameters
- Correct deprecation of interiour_color

New Features (ENH)
- Support alternative (U)F names for embedded file retrieval
- Adding support for reading .metadata.keywords

Bug Fixes (BUG)
- Handle further Tf operators in text extraction layout mode
- Ensure `add_metadata` can deal with `_info = None`
- Handle IndirectObject in CCITTFaxDecode filter
- Handle chained colorspace for inline images when no filter is set
- Avoid extracting inline images twice and dropping other operators
- Fixed reference of value with `str.__new__` in TextStringObject
- Handle indirect objects in font width calculations
- Title sometimes is bytes and not str
- Fix undefined variable for text extraction (regression)
- Don't close stream passed to PdfWriter.write()

Robustness (ROB)
- Handle zero height fonts when extracting text
- Deal with content streams not containing streams
- Gracefully handle some text operators when the operands are missing
- Fall back to non-Adobe Ascii85 format for missing end markers
- Ignore odd-length strings when processing cmap lines
- Skip annotation destination being NullObject in PdfWriter
- Skip destination page being None in PdfWriter
- Fix infinite loop case when reading null objects within an Array
- Fixing infinite loop in ArrayObject read_from_stream

Documentation (DOC)
- Add note about default line colors

Developer Experience (DEV)
- Remove ignoring Ruff rule PGH004
- Tidy ignore array in tool.ruff.lint
- Move Windows CI to Python 3.13
- Move to Ubuntu 22.04

Maintenance (MAINT)
- Fix formatting of warning message and include exception message
- Narrow return type for `ContentStream.operations`

Testing (TST)
- Fix image similarity for upcoming Ubuntu 24.04
- Replace broken Apache Tika Corpora urls

Code Style (STY)
- Add form feed to WHITESPACES
- Lots of small internal changes
   2024-11-04 18:58:39 by Thomas Klausner | Files touched by this commit (3) | Package updated
Log message:
py-pdf: update to 5.1.0.

## Version 5.1.0, 2024-10-27

### New Features (ENH)
- Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` \ 

### Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893)
- Line breaks are not generated due to incorrect calculation of text leading (#2890)
- Improve handling of spaces in text extraction (#2882)

### Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900)

### Documentation (DOC)
- Use latest package versions (#2907)
- Correct example of reading FileAttachment annotation (#2906)

### Developer Experience (DEV)
- Update pinned requirements (#2918)
- Make make_release.py compatible with Windows environment (#2894)

### Maintenance (MAINT)
- Remove references to outdated Python versions (#2919)
- Generalize the method of obtaining space_code (#2891)
- Unnecessary character mapping process (#2888)
- New LZW decoding implementation (#2887)

### Testing (TST)
- Add LzwCodec for encoding (#2883)

### Code Style (STY)
- Capitalize error messages (#2903)
- Modify error messages in PdfWriter (#2902)
   2024-10-11 14:43:58 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-pdf: updated to 5.0.1

Version 5.0.1, 2024-09-29

New Features (ENH)
- Add `full` parameter to PdfWriter constructor

Bug Fixes (BUG)
- Update pyproject.toml with minimum Python version of 3.8
- Cope with unbalanced delimiters in dictionary object
- Cope with encoding with too many differences
- Missing spaces in extract_text() method
- Tolerate truncated files and no warning when jumping startxref

Robustness (ROB)
- Repair PDF with invalid Root object
- Continue parsing dictionary object when error is detected
- Merge documents with invalid pages in named destinations
- Tolerate comments in arrays

Developer Experience (DEV)
- Use latest Python version for benchmarking

Maintenance (MAINT)
- Add tests to source distributions
- Refactor _update_field_annotation
   2024-09-22 18:19:49 by Thomas Klausner | Files touched by this commit (2) | Package updated
Log message:
py-pdf: update to 5.0.0.

## Version 5.0.0, 2024-09-15

This version drops support for Python 3.7 (not maintained since July 2023), \ 
PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations \ 

### Deprecations (DEP)
- Remove the deprecated PfdMerger and AnnotationBuilder classes and other \ 
deprecations cleanup (#2813)
- Drop Python 3.7 support (#2793)

### New Features (ENH)
- Add capability to remove /Info from PDF (#2820)
- Add incremental capability to PdfWriter (#2811)
- Add UniGB-UTF16 encodings (#2819)
- Accept utf strings for metadata (#2802)
- Report PdfReadError instead of RecursionError (#2800)
- Compress PDF files merging identical objects (#2795)

### Bug Fixes (BUG)
- Fix sheared image (#2801)

### Robustness (ROB)
- Robustify .set_data() (#2821)
- Raise PdfReadError when missing /Root in trailer (#2808)
- Fix extract_text() issues on damaged PDFs (#2760)
- Handle images with empty data when processing an image from bytes (#2786)

### Developer Experience (DEV)
- Fix coverage uploads (#2832)
- Test against Python 3.13 (#2776)
   2024-07-17 05:50:23 by Adam Ciarcinski | Files touched by this commit (3) | Package updated
Log message:
py-pdf: updated to 4.3.0

Version 4.3.0, 2024-06-23

New Features (ENH)
- Accept ETen-B5 and UniCNS-UTF16 encodings
- Add decode_as_image() to ContentStreams
- context manager for PdfReader
- Add capability to set font and size in fields
- Allow to pass input file without named argument

Bug Fixes (BUG)
- Fix deprecation for Ressources when using old constants
- Fix images issue 4 bits encoding and LUT starting with UTF16_BOM
- Reading large compressed images takes huge time to process
- Highlighted Text Cannot Be Printed
- Fix UnboundLocalError on malformed pdf

Robustness (ROB)
- Cope with missing Standard 14 fonts in fields
- Improve inline image extraction
- Cope with loops in Fields tree
- Discard /I in choice fields for compatibility with Acrobat
- Cope with some issues in pillow
- Cope with some image extraction issues

Documentation (DOC)
- Various improvements on docstrings and examples

Maintenance (MAINT)
- Deprecate interiour_color with replacement interior_color
- Add deprecate_with_replacement to PdfWriter.find_bookmark

Code Style (STY)
- Change Link to be a non-markup annotation