./textproc/py-pdf, Pure-python PDF library

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 4.2.0nb1, Package name: py311-pdf-4.2.0nb1, Maintainer: pkgsrc-users

pypdf is a free and open-source pure-python PDF library capable of
splitting, merging, cropping, and transforming the pages of PDF
files. It can also add custom data, viewing options, and passwords
to PDF files. pypdf can retrieve text and metadata from PDFs as
well.


Master sites:

Filesize: 281.688 KB

Version history: (Expand)


CVS history: (Expand)


   2024-04-14 10:58:05 by Thomas Klausner | Files touched by this commit (1)
Log message:
py-pdf: fix depends for Python 3.10

Also needs py-typing-extensions there.

Bump PKGREVISION.
   2024-04-10 13:19:09 by Adam Ciarcinski | Files touched by this commit (3) | Package updated
Log message:
py-pdf: updated to 4.2.0

Version 4.2.0, 2024-04-07

New Features (ENH)
- Allow multiple charsets for NameObject.read_from_stream
- Add support for /Kids in page labels
- Allow to update fields on many pages
- Tolerate PDF with invalid xref pointed objects
- Add Enforce from PDF2.0 in viewer_preferences
- Add += and -= operators to ArrayObject

Bug Fixes (BUG)
- Fix merge_page sometimes generating unknown operator 'QQ'
- Fix fields update where annotations are kids of field
- Process CMYK images without a filter correctly
- Extract text in layout mode without finding resources
- Prevent recursive loop in some PDF files

Robustness (ROB)
- Tolerate "truncated" xref
- Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode
- Rebuild xref table if one entry is invalid
- Robustify stream extraction

Documentation (DOC)
- Update release process for latest changes
- Encryption/decryption: Clone document instead of copying all pages
- Minor improvements
- Update annotation list
- Update references and formatting
- Correct threads reference, plus minor changes
- Minor readability increases
- Simplify PaperSize examples
- Minor improvements

Developer Experience (DEV)
- Remove unused dependencies
- Remove page labels PR link from message
- Fix changelog generator regarding whitespace and handling of "Other" \ 
group
- Add REL to known PR prefixes
- Release using the REL commit instead of git tag
- Unify code between PdfReader and PdfWriter
- Bump softprops/action-gh-release from 1 to 2

Maintenance (MAINT)
- Ressources → Resources (and internal name childs)
- Fix typos found by codespell
- Update Read the Docs configuration
- Add root_object, _info and _ID to PdfReader

Testing (TST)
- Allow loading truncated images if required
- Fix download issues from
- Improve test_get_contents_from_nullobject to show real use-case
- Add missing test annotations
   2024-03-11 14:18:00 by Thomas Klausner | Files touched by this commit (2) | Package updated
Log message:
py-pdf: update to 4.1.0.

## Version 4.1.0, 2024-03-03

Generating name objects (`NameObject`) without a leading slash
is considered deprecated now. Previously, just a plain warning
would be logged, leading to possibly invalid PDF files. According
to our deprecation policy, this will log a *DeprecationWarning*
for now.

### New Features (ENH)
- Add get_pages_from_field  (#2494)
- Add reattach_fields function (#2480)
- Automatic access to pointed object for IndirectObject (#2464)

### Bug Fixes (BUG)
- Missing error on name without leading / (#2387)
- encode_pdfdocencoding() always returns bytes (#2440)
- BI in text content identified as image tag (#2459)

### Robustness (ROB)
- Missing basefont entry in type 3 font (#2469)

### Documentation (DOC)
- Improve lossless compression example (#2488)
- Amend robustness documentation (#2479)

### Developer Experience (DEV)
- Fix changelog for UTF-8 characters (#2462)

### Maintenance (MAINT)
- Add _get_page_number_from_indirect in writer (#2493)
- Remove user assignment for feature requests (#2483)
- Remove reference to old 2.0.0 branch (#2482)

### Testing (TST)
- Fix benchmark failures (#2481)
- Broken test due to expired test file URL (#2468)
- Resolve file naming conflict in test_iss1767 (#2445)
   2024-02-19 07:09:44 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-pdf: updated to 4.0.2

Version 4.0.2, 2024-02-18

Bug Fixes (BUG)
-  Use NumberObject for /Border elements of annotations
   2024-01-28 18:33:44 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-pdf: updated to 4.0.1

Version 4.0.1, 2024-01-28

Bug Fixes (BUG)
- layout mode text extraction ZeroDivisionError

Testing (TST)
- Skip tests using fpdf2 if it's not installed
   2024-01-21 21:28:37 by Thomas Klausner | Files touched by this commit (3) | Package updated
Log message:
py-pdf: update to 4.0.0.

## Version 4.0.0, 2024-01-19

### Deprecations (DEP)
-  Drop Python 3.6 support (#2369)
-  Remove deprecated code (#2367)
-  Remove deprecated XMP properties (#2386)

### New Features (ENH)
-  Add "layout" mode for text extraction (#2388)
-  Add Jupyter Notebook integration for PdfReader (#2375)
-  Improve/rewrite PDF permission retrieval (#2400)

### Bug Fixes (BUG)
-  PdfWriter.add_uri was setting the wrong type (#2406)
-  Add support for GBK2K cmaps (#2385)

### Maintenance (MAINT)
-  Return None instead of -1 when page is not attached (#2376)
-  Complete FileSpecificationDictionaryEntries constants (#2416)
-  Replace warning with logging.error (#2377)
   2023-12-28 19:46:28 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-pdf: updated to 3.17.4

3.17.4

Bug Fixes (BUG)
-  Handle IndirectObject as image filter
   2023-12-18 10:40:09 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
py-pdf: updated to 3.17.3

Version 3.17.3, 2023-12-17

Robustness (ROB)
-  Out-of-bounds issue in handle_tj (text extraction)

Developer Experience (DEV)
-  Make make_release.py easier to configure

Maintenance (MAINT)
-  Bump actions/download-artifact from 3 to 4