pkgsrc.se | The NetBSD package collection

Subject: CVS commit: pkgsrc/math/py-pandas
From: Adam Ciarcinski
Date: 2023-08-28 12:34:02
Message id: 20230828103402.E05CEFBDB@cvs.NetBSD.org
Log Message:
py-pandas: updated to 2.0.3

2.0.3

Fixed regressions

Bug in Timestamp.weekday`() was returning incorrect results before '0000-02-29' \ 
(GH53738)
Fixed performance regression in merging on datetime-like columns (GH53231)
Fixed regression when DataFrame.to_string() creates extra space for string \ 
dtypes (GH52690)

Bug fixes

Bug in DataFrame.convert_dtype() and Series.convert_dtype() when trying to \ 
convert ArrowDtype with dtype_backend="nullable_numpy" (GH53648)
Bug in RangeIndex.union() when using sort=True with another RangeIndex (GH53490)
Bug in Series.reindex() when expanding a non-nanosecond datetime or timedelta \ 
Series would not fill with NaT correctly (GH53497)
Bug in read_csv() when defining dtype with bool[pyarrow] for the "c" \ 
and "python" engines (GH53390)
Bug in Series.str.split() and Series.str.rsplit() with expand=True for \ 
ArrowDtype with pyarrow.string (GH53532)
Bug in indexing methods (e.g. DataFrame.__getitem__()) where taking the entire \ 
DataFrame/Series would raise an OverflowError when Copy on Write was enabled and \ 
the length of the array was over the maximum size a 32-bit integer can hold \ 
(GH53616)
Bug when constructing a DataFrame with columns of an ArrowDtype with a \ 
pyarrow.dictionary type that reindexes the data (GH53617)
Bug when indexing a DataFrame or Series with an Index with a timestamp \ 
ArrowDtype would raise an AttributeError (GH53644)

2.0.2

Fixed regressions

Fixed performance regression in GroupBy.apply() (GH53195)
Fixed regression in merge() on Windows when dtype is np.intc (GH52451)
Fixed regression in read_sql() dropping columns with duplicated column names \ 
(GH53117)
Fixed regression in DataFrame.loc() losing MultiIndex name when enlarging object \ 
(GH53053)
Fixed regression in DataFrame.to_string() printing a backslash at the end of the \ 
first row of data, instead of headers, when the DataFrame doesn’t fit the line \ 
width (GH53054)
Fixed regression in MultiIndex.join() returning levels in wrong order (GH53093)

Bug fixes

Bug in arrays.ArrowExtensionArray incorrectly assigning dict instead of list for \ 
.type with pyarrow.map_ and raising a NotImplementedError with pyarrow.struct \ 
(GH53328)
Bug in api.interchange.from_dataframe() was raising IndexError on empty \ 
categorical data (GH53077)
Bug in api.interchange.from_dataframe() was returning DataFrame’s of incorrect \ 
sizes when called on slices (GH52824)
Bug in api.interchange.from_dataframe() was unnecessarily raising on bitmasks \ 
(GH49888)
Bug in merge() when merging on datetime columns on different resolutions (GH53200)
Bug in read_csv() raising OverflowError for engine="pyarrow" and \ 
parse_dates set (GH53295)
Bug in to_datetime() was inferring format to contain "%H" instead of \ 
"%I" if date contained “AM” / “PM” tokens (GH53147)
Bug in DataFrame.convert_dtypes() ignores convert_* keywords when set to False \ 
dtype_backend="pyarrow" (GH52872)
Bug in DataFrame.convert_dtypes() losing timezone for tz-aware dtypes and \ 
dtype_backend="pyarrow" (GH53382)
Bug in DataFrame.sort_values() raising for PyArrow dictionary dtype (GH53232)
Bug in Series.describe() treating pyarrow-backed timestamps and timedeltas as \ 
categorical data (GH53001)
Bug in Series.rename() not making a lazy copy when Copy-on-Write is enabled when \ 
a scalar is passed to it (GH52450)
Bug in pd.array() raising for NumPy array and pa.large_string or pa.large_binary \ 
(GH52590)
Bug in DataFrame.__getitem__() not preserving dtypes for MultiIndex partial keys \ 
(GH51895)

2.0.1

Fixed regressions

Fixed regression for subclassed Series when constructing from a dictionary (GH52445)
Fixed regression in SeriesGroupBy.agg() failing when grouping with categorical \ 
data, multiple groupings, as_index=False, and a list of aggregations (GH52760)
Fixed regression in DataFrame.pivot() changing Index name of input object (GH52629)
Fixed regression in DataFrame.resample() raising on a DataFrame with no columns \ 
(GH52484)
Fixed regression in DataFrame.sort_values() not resetting index when DataFrame \ 
is already sorted and ignore_index=True (GH52553)
Fixed regression in MultiIndex.isin() raising TypeError for Generator (GH52568)
Fixed regression in Series.describe() showing RuntimeWarning for extension dtype \ 
Series with one element (GH52515)
Fixed regression when adding a new column to a DataFrame when the \ 
DataFrame.columns was a RangeIndex and the new key was hashable but not a scalar \ 
(GH52652)

Bug fixes

Bug in Series.dt.days that would overflow int32 number of days (GH52391)
Bug in arrays.DatetimeArray constructor returning an incorrect unit when passed \ 
a non-nanosecond numpy datetime array (GH52555)
Bug in ArrowExtensionArray with duration dtype overflowing when constructed from \ 
data containing numpy NaT (GH52843)
Bug in Series.dt.round() when passing a freq of equal or higher resolution \ 
compared to the Series would raise a ZeroDivisionError (GH52761)
Bug in Series.median() with ArrowDtype returning an approximate median (GH52679)
Bug in api.interchange.from_dataframe() was unnecessarily raising on categorical \ 
dtypes (GH49889)
Bug in api.interchange.from_dataframe() was unnecessarily raising on large \ 
string dtypes (GH52795)
Bug in pandas.testing.assert_series_equal() where check_dtype=False would still \ 
raise for datetime or timedelta types with different resolutions (GH52449)
Bug in read_csv() casting PyArrow datetimes to NumPy when \ 
dtype_backend="pyarrow" and parse_dates is set causing a performance \ 
bottleneck in the process (GH52546)
Bug in to_datetime() and to_timedelta() when trying to convert numeric data with \ 
a ArrowDtype (GH52425)
Bug in to_numeric() with errors='coerce' and dtype_backend='pyarrow' with \ 
ArrowDtype data (GH52588)
Bug in ArrowDtype.__from_arrow__() not respecting if dtype is explicitly given \ 
(GH52533)
Bug in DataFrame.describe() not respecting ArrowDtype in include and exclude \ 
(GH52570)
Bug in DataFrame.max() and related casting different Timestamp resolutions \ 
always to nanoseconds (GH52524)
Bug in Series.describe() not returning ArrowDtype with pyarrow.float64 type with \ 
numeric data (GH52427)
Bug in Series.dt.tz_localize() incorrectly localizing timestamps with ArrowDtype \ 
(GH52677)
Bug in arithmetic between np.datetime64 and np.timedelta64 NaT scalars with \ 
units always returning nanosecond resolution (GH52295)
Bug in logical and comparison operations between ArrowDtype and numpy masked \ 
types (e.g. "boolean") (GH52625)
Fixed bug in merge() when merging with ArrowDtype one one and a NumPy dtype on \ 
the other side (GH52406)
Fixed segfault in Series.to_numpy() with null[pyarrow] dtype (GH52443)

Other

DataFrame created from empty dicts had columns of dtype object. It is now a \ 
RangeIndex (GH52404)
Series created from empty dicts had index of dtype object. It is now a \ 
RangeIndex (GH52404)
Implemented Series.str.split() and Series.str.rsplit() for ArrowDtype with \ 
pyarrow.string (GH52401)
Implemented most str accessor methods for ArrowDtype with pyarrow.string (GH52401)
Supplying a non-integer hashable key that tests False in api.types.is_scalar() \ 
now raises a KeyError for RangeIndex.get_loc(), like it does for \ 
Index.get_loc(). Previously it raised an InvalidIndexError (GH52652).
Files:
Revision	Action	file
1.47	modify	pkgsrc/math/py-pandas/Makefile
1.22	modify	pkgsrc/math/py-pandas/PLIST
1.33	modify	pkgsrc/math/py-pandas/distinfo
1.3	modify	pkgsrc/math/py-pandas/patches/patch-pandas___libs_window_aggregations.pyx