Log message:
py-pandas: Update to 1.3.4
Changelog:
What's new in 1.3.4 (October 17, 2021)
These are the changes in pandas 1.3.4. See Release notes for a full changelog
including other versions of pandas.
-------------------------------------------------------------------------------
Fixed regressions
* Fixed regression in DataFrame.convert_dtypes() incorrectly converts byte
strings to strings (GH43183)
* Fixed regression in GroupBy.agg() where it was failing silently with mixed
data types along axis=1 and MultiIndex (GH43209)
* Fixed regression in merge() with integer and NaN keys failing with outer
merge (GH43550)
* Fixed regression in DataFrame.corr() raising ValueError with method=
"spearman" on 32-bit platforms (GH43588)
* Fixed performance regression in MultiIndex.equals() (GH43549)
* Fixed performance regression in GroupBy.first() and GroupBy.last() with
StringDtype (GH41596)
* Fixed regression in Series.cat.reorder_categories() failing to update the
categories on the Series (GH43232)
* Fixed regression in Series.cat.categories() setter failing to update the
categories on the Series (GH43334)
* Fixed regression in read_csv() raising UnicodeDecodeError exception when
memory_map=True (GH43540)
* Fixed regression in DataFrame.explode() raising AssertionError when column
is any scalar which is not a string (GH43314)
* Fixed regression in Series.aggregate() attempting to pass args and kwargs
multiple times to the user supplied func in certain cases (GH43357)
* Fixed regression when iterating over a DataFrame.groupby.rolling object
causing the resulting DataFrames to have an incorrect index if the input
groupings were not sorted (GH43386)
* Fixed regression in DataFrame.groupby.rolling.cov() and
DataFrame.groupby.rolling.corr() computing incorrect results if the input
groupings were not sorted (GH43386)
-------------------------------------------------------------------------------
Bug fixes
* Fixed bug in pandas.DataFrame.groupby.rolling() and
pandas.api.indexers.FixedForwardWindowIndexer leading to segfaults and
window endpoints being mixed across groups (GH43267)
* Fixed bug in GroupBy.mean() with datetimelike values including NaT values
returning incorrect results (GH43132)
* Fixed bug in Series.aggregate() not passing the first args to the user
supplied func in certain cases (GH43357)
* Fixed memory leaks in Series.rolling.quantile() and Series.rolling.median()
(GH43339)
-------------------------------------------------------------------------------
Other
* The minimum version of Cython needed to compile pandas is now 0.29.24 (
GH43729)
What's new in 1.3.3 (September 12, 2021)
These are the changes in pandas 1.3.3. See Release notes for a full changelog
including other versions of pandas.
-------------------------------------------------------------------------------
Fixed regressions
* Fixed regression in DataFrame constructor failing to broadcast for defined
Index and len one list of Timestamp (GH42810)
* Fixed regression in GroupBy.agg() incorrectly raising in some cases (
GH42390)
* Fixed regression in GroupBy.apply() where nan values were dropped even with
dropna=False (GH43205)
* Fixed regression in GroupBy.quantile() which was failing with pandas.NA (
GH42849)
* Fixed regression in merge() where on columns with ExtensionDtype or bool
data types were cast to object in right and outer merge (GH40073)
* Fixed regression in RangeIndex.where() and RangeIndex.putmask() raising
AssertionError when result did not represent a RangeIndex (GH43240)
* Fixed regression in read_parquet() where the fastparquet engine would not
work properly with fastparquet 0.7.0 (GH43075)
* Fixed regression in DataFrame.loc.__setitem__() raising ValueError when
setting array as cell value (GH43422)
* Fixed regression in is_list_like() where objects with __iter__ set to None
would be identified as iterable (GH43373)
* Fixed regression in DataFrame.__getitem__() raising error for slice of
DatetimeIndex when index is non monotonic (GH43223)
* Fixed regression in Resampler.aggregate() when used after column selection
would raise if func is a list of aggregation functions (GH42905)
* Fixed regression in DataFrame.corr() where Kendall correlation would
produce incorrect results for columns with repeated values (GH43401)
* Fixed regression in DataFrame.groupby() where aggregation on columns with
object types dropped results on those columns (GH42395, GH43108)
* Fixed regression in Series.fillna() raising TypeError when filling float
Series with list-like fill value having a dtype which couldn't cast
lostlessly (like float32 filled with float64) (GH43424)
* Fixed regression in read_csv() raising AttributeError when the file handle
is an tempfile.SpooledTemporaryFile object (GH43439)
* Fixed performance regression in
core.window.ewm.ExponentialMovingWindow.mean() (GH42333)
-------------------------------------------------------------------------------
Performance improvements
* Performance improvement for DataFrame.__setitem__() when the key or value
is not a DataFrame, or key is not list-like (GH43274)
-------------------------------------------------------------------------------
Bug fixes
* Fixed bug in DataFrameGroupBy.agg() and DataFrameGroupBy.transform() with
engine="numba" where index data was not being correctly passed \
into func (
GH43133)
What's new in 1.3.2 (August 15, 2021)
These are the changes in pandas 1.3.2. See Release notes for a full changelog
including other versions of pandas.
-------------------------------------------------------------------------------
Fixed regressions
* Performance regression in DataFrame.isin() and Series.isin() for nullable
data types (GH42714)
* Regression in updating values of Series using boolean index, created by
using DataFrame.pop() (GH42530)
* Regression in DataFrame.from_records() with empty records (GH42456)
* Fixed regression in DataFrame.shift() where TypeError occurred when
shifting DataFrame created by concatenation of slices and fills with values
(GH42719)
* Regression in DataFrame.agg() when the func argument returned lists and
axis=1 (GH42727)
* Regression in DataFrame.drop() does nothing if MultiIndex has duplicates
and indexer is a tuple or list of tuples (GH42771)
* Fixed regression where read_csv() raised a ValueError when parameters names
and prefix were both set to None (GH42387)
* Fixed regression in comparisons between Timestamp object and datetime64
objects outside the implementation bounds for nanosecond datetime64 (
GH42794)
* Fixed regression in Styler.highlight_min() and Styler.highlight_max() where
pandas.NA was not successfully ignored (GH42650)
* Fixed regression in concat() where copy=False was not honored in axis=1
Series concatenation (GH42501)
* Regression in Series.nlargest() and Series.nsmallest() with nullable
integer or float dtype (GH42816)
* Fixed regression in Series.quantile() with Int64Dtype (GH42626)
* Fixed regression in Series.groupby() and DataFrame.groupby() where
supplying the by argument with a Series named with a tuple would
incorrectly raise (GH42731)
-------------------------------------------------------------------------------
Bug fixes
* Bug in read_excel() modifies the dtypes dictionary when reading a file with
duplicate columns (GH42462)
* 1D slices over extension types turn into N-dimensional slices over
ExtensionArrays (GH42430)
* Fixed bug in Series.rolling() and DataFrame.rolling() not calculating
window bounds correctly for the first row when center=True and window is an
offset that covers all the rows (GH42753)
* Styler.hide_columns() now hides the index name header row as well as column
headers (GH42101)
* Styler.set_sticky() has amended CSS to control the column/index names and
ensure the correct sticky positions (GH42537)
* Bug in de-serializing datetime indexes in PYTHONOPTIMIZED mode (GH42866)
What's new in 1.3.1 (July 25, 2021)
These are the changes in pandas 1.3.1. See Release notes for a full changelog
including other versions of pandas.
-------------------------------------------------------------------------------
Fixed regressions
* Pandas could not be built on PyPy (GH42355)
* DataFrame constructed with an older version of pandas could not be
unpickled (GH42345)
* Performance regression in constructing a DataFrame from a dictionary of
dictionaries (GH42248)
* Fixed regression in DataFrame.agg() dropping values when the DataFrame had
an Extension Array dtype, a duplicate index, and axis=1 (GH42380)
* Fixed regression in DataFrame.astype() changing the order of noncontiguous
data (GH42396)
* Performance regression in DataFrame in reduction operations requiring
casting such as DataFrame.mean() on integer data (GH38592)
* Performance regression in DataFrame.to_dict() and Series.to_dict() when
orient argument one of 'records', 'dict', or 'split' (GH42352)
* Fixed regression in indexing with a list subclass incorrectly raising
TypeError (GH42433, GH42461)
* Fixed regression in DataFrame.isin() and Series.isin() raising TypeError
with nullable data containing at least one missing value (GH42405)
* Regression in concat() between objects with bool dtype and integer dtype
casting to object instead of to integer (GH42092)
* Bug in Series constructor not accepting a dask.Array (GH38645)
* Fixed regression for SettingWithCopyWarning displaying incorrect stacklevel
(GH42570)
* Fixed regression for merge_asof() raising KeyError when one of the by
columns is in the index (GH34488)
* Fixed regression in to_datetime() returning pd.NaT for inputs that produce
duplicated values, when cache=True (GH42259)
* Fixed regression in SeriesGroupBy.value_counts() that resulted in an
IndexError when called on a Series with one row (GH42618)
-------------------------------------------------------------------------------
Bug fixes
* Fixed bug in DataFrame.transpose() dropping values when the DataFrame had
an Extension Array dtype and a duplicate index (GH42380)
* Fixed bug in DataFrame.to_xml() raising KeyError when called with index=
False and an offset index (GH42458)
* Fixed bug in Styler.set_sticky() not handling index names correctly for
single index columns case (GH42537)
* Fixed bug in DataFrame.copy() failing to consolidate blocks in the result (
GH42579)
What's new in 1.3.0 (July 2, 2021)
These are the changes in pandas 1.3.0. See Release notes for a full changelog
including other versions of pandas.
Warning
When reading new Excel 2007+ (.xlsx) files, the default argument engine=None to
read_excel() will now result in using the openpyxl engine in all cases when the
option io.excel.xlsx.reader is set to "auto". Previously, some cases \
would use
the xlrd engine instead. See What's new 1.2.0 for background on this change.
-------------------------------------------------------------------------------
Enhancements
-------------------------------------------------------------------------------
Custom HTTP(s) headers when reading csv or json files
When reading from a remote URL that is not handled by fsspec (e.g. HTTP and
HTTPS) the dictionary passed to storage_options will be used to create the
headers included in the request. This can be used to control the User-Agent
header or send other custom headers (GH36688). For example:
In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
...: "https://download.bls.gov/pub/time.series/cu/cu.item",
...: sep="\t",
...: storage_options=headers
...: )
...:
-------------------------------------------------------------------------------
Read and write XML documents
We added I/O support to read and render shallow versions of XML documents with
read_xml() and DataFrame.to_xml(). Using lxml as parser, both XPath 1.0 and
XSLT 1.0 are available. (GH27554)
In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
...: <data>
...: <row>
...: <shape>square</shape>
...: <degrees>360</degrees>
...: <sides>4.0</sides>
...: </row>
...: <row>
...: <shape>circle</shape>
...: <degrees>360</degrees>
...: <sides/>
...: </row>
...: <row>
...: <shape>triangle</shape>
...: <degrees>180</degrees>
...: <sides>3.0</sides>
...: </row>
...: </data>"""
In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>
For more, see Writing XML in the user guide on IO tools.
-------------------------------------------------------------------------------
Styler enhancements
We provided some focused development on Styler. See also the Styler
documentation which has been revised and improved (GH39720, GH39317, GH40493).
+ The method Styler.set_table_styles() can now accept more natural CSS
language for arguments, such as 'color:red;' instead of [('color',
'red')] (GH39563)
+ The methods Styler.highlight_null(), Styler.highlight_min(), and
Styler.highlight_max() now allow custom CSS highlighting instead of the
default background coloring (GH40242)
+ Styler.apply() now accepts functions that return an ndarray when axis=
None, making it now consistent with the axis=0 and axis=1 behavior (
GH39359)
+ When incorrectly formatted CSS is given via Styler.apply() or
Styler.applymap(), an error is now raised upon rendering (GH39660)
+ Styler.format() now accepts the keyword argument escape for optional
HTML and LaTeX escaping (GH40388, GH41619)
+ Styler.background_gradient() has gained the argument gmap to supply a
specific gradient map for shading (GH22727)
+ Styler.clear() now clears Styler.hidden_index and Styler.hidden_columns
as well (GH40484)
+ Added the method Styler.highlight_between() (GH39821)
+ Added the method Styler.highlight_quantile() (GH40926)
+ Added the method Styler.text_gradient() (GH41098)
+ Added the method Styler.set_tooltips() to allow hover tooltips; this
can be used enhance interactive displays (GH21266, GH40284)
+ Added the parameter precision to the method Styler.format() to control
the display of floating point numbers (GH40134)
+ Styler rendered HTML output now follows the w3 HTML Style Guide (
GH39626)
+ Many features of the Styler class are now either partially or fully
usable on a DataFrame with a non-unique indexes or columns (GH41143)
+ One has greater control of the display through separate sparsification
of the index or columns using the new styler options, which are also
usable via option_context() (GH41142)
+ Added the option styler.render.max_elements to avoid browser overload
when styling large DataFrames (GH40712)
+ Added the method Styler.to_latex() (GH21673, GH42320), which also
allows some limited CSS conversion (GH40731)
+ Added the method Styler.to_html() (GH13379)
+ Added the method Styler.set_sticky() to make index and column headers
permanently visible in scrolling HTML frames (GH29072)
-------------------------------------------------------------------------------
DataFrame constructor honors copy=False with dict
When passing a dictionary to DataFrame with copy=False, a copy will no longer
be made (GH32960).
In [3]: arr = np.array([1, 2, 3])
In [4]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, \
copy=False)
In [5]: df
Out[5]:
A B
0 1 1
1 2 2
2 3 3
df["A"] remains a view on arr:
In [6]: arr[0] = 0
In [7]: assert df.iloc[0, 0] == 0
The default behavior when not passing copy will remain unchanged, i.e. a copy
will be made.
-------------------------------------------------------------------------------
PyArrow backed string data type
We've enhanced the StringDtype, an extension type dedicated to string data. (
GH39908)
It is now possible to specify a storage keyword option to StringDtype. Use
pandas options or specify the dtype using dtype='string[pyarrow]' to allow the
StringArray to be backed by a PyArrow array instead of a NumPy array of Python
objects.
The PyArrow backed StringArray requires pyarrow 1.0.0 or greater to be
installed.
Warning
string[pyarrow] is currently considered experimental. The implementation and
parts of the API may change without warning.
In [8]: pd.Series(['abc', None, 'def'], \
dtype=pd.StringDtype(storage="pyarrow"))
Out[8]:
0 abc
1 <NA>
2 def
dtype: string
You can use the alias "string[pyarrow]" as well.
In [9]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")
In [10]: s
Out[10]:
0 abc
1 <NA>
2 def
dtype: string
You can also create a PyArrow backed string array using pandas options.
In [11]: with pd.option_context("string_storage", "pyarrow"):
....: s = pd.Series(['abc', None, 'def'], dtype="string")
....:
In [12]: s
Out[12]:
0 abc
1 <NA>
2 def
dtype: string
The usual string accessor methods work. Where appropriate, the return type of
the Series or columns of a DataFrame will also have string dtype.
In [13]: s.str.upper()
Out[13]:
0 ABC
1 <NA>
2 DEF
dtype: string
In [14]: s.str.split('b', expand=True).dtypes
Out[14]:
0 string
1 string
dtype: object
String accessor methods returning integers will return a value with Int64Dtype
In [15]: s.str.count("a")
Out[15]:
0 1
1 <NA>
2 0
dtype: Int64
-------------------------------------------------------------------------------
Centered datetime-like rolling windows
When performing rolling calculations on DataFrame and Series objects with a
datetime-like index, a centered datetime-like window can now be used (GH38780).
For example:
In [16]: df = pd.DataFrame(
....: {"A": [0, 1, 2, 3, 4]}, \
index=pd.date_range("2020", periods=5, freq="1D")
....: )
....:
In [17]: df
Out[17]:
A
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
In [18]: df.rolling("2D", center=True).mean()
Out[18]:
A
2020-01-01 0.5
2020-01-02 1.5
2020-01-03 2.5
2020-01-04 3.5
2020-01-05 4.0
-------------------------------------------------------------------------------
Other enhancements
* DataFrame.rolling(), Series.rolling(), DataFrame.expanding(), and
Series.expanding() now support a method argument with a 'table' option that
performs the windowing operation over an entire DataFrame. See Window
Overview for performance and functional benefits (GH15095, GH38995)
* ExponentialMovingWindow now support a online method that can perform mean
calculations in an online fashion. See Window Overview (GH41673)
* Added MultiIndex.dtypes() (GH37062)
* Added end and end_day options for the origin argument in DataFrame.resample
() (GH37804)
* Improved error message when usecols and names do not match for read_csv()
and engine="c" (GH29042)
* Improved consistency of error messages when passing an invalid win_type
argument in Window methods (GH15969)
* read_sql_query() now accepts a dtype argument to cast the columnar data
from the SQL database based on user input (GH10285)
* read_csv() now raising ParserWarning if length of header or given names
does not match length of data when usecols is not specified (GH21768)
* Improved integer type mapping from pandas to SQLAlchemy when using
DataFrame.to_sql() (GH35076)
* to_numeric() now supports downcasting of nullable ExtensionDtype objects (
GH33013)
* Added support for dict-like names in MultiIndex.set_names and
MultiIndex.rename (GH20421)
* read_excel() can now auto-detect .xlsb files and older .xls files (GH35416,
GH41225)
* ExcelWriter now accepts an if_sheet_exists parameter to control the
behavior of append mode when writing to existing sheets (GH40230)
* Rolling.sum(), Expanding.sum(), Rolling.mean(), Expanding.mean(),
ExponentialMovingWindow.mean(), Rolling.median(), Expanding.median(),
Rolling.max(), Expanding.max(), Rolling.min(), and Expanding.min() now
support Numba execution with the engine keyword (GH38895, GH41267)
* DataFrame.apply() can now accept NumPy unary operators as strings, e.g.
df.apply("sqrt"), which was already the case for Series.apply() \
(GH39116)
* DataFrame.apply() can now accept non-callable DataFrame properties as
strings, e.g. df.apply("size"), which was already the case for \
Series.apply
() (GH39116)
* DataFrame.applymap() can now accept kwargs to pass on to the user-provided
func (GH39987)
* Passing a DataFrame indexer to iloc is now disallowed for
Series.__getitem__() and DataFrame.__getitem__() (GH39004)
* Series.apply() can now accept list-like or dictionary-like arguments that
aren't lists or dictionaries, e.g. ser.apply(np.array(["sum", \
"mean"])),
which was already the case for DataFrame.apply() (GH39140)
* DataFrame.plot.scatter() can now accept a categorical column for the
argument c (GH12380, GH31357)
* Series.loc() now raises a helpful error message when the Series has a
MultiIndex and the indexer has too many dimensions (GH35349)
* read_stata() now supports reading data from compressed files (GH26599)
* Added support for parsing ISO 8601-like timestamps with negative signs to
Timedelta (GH37172)
* Added support for unary operators in FloatingArray (GH38749)
* RangeIndex can now be constructed by passing a range object directly e.g.
pd.RangeIndex(range(3)) (GH12067)
* Series.round() and DataFrame.round() now work with nullable integer and
floating dtypes (GH38844)
* read_csv() and read_json() expose the argument encoding_errors to control
how encoding errors are handled (GH39450)
* GroupBy.any() and GroupBy.all() use Kleene logic with nullable data types (
GH37506)
* GroupBy.any() and GroupBy.all() return a BooleanDtype for columns with
nullable data types (GH33449)
* GroupBy.any() and GroupBy.all() raising with object data containing pd.NA
even when skipna=True (GH37501)
* GroupBy.rank() now supports object-dtype data (GH38278)
* Constructing a DataFrame or Series with the data argument being a Python
iterable that is not a NumPy ndarray consisting of NumPy scalars will now
result in a dtype with a precision the maximum of the NumPy scalars; this
was already the case when data is a NumPy ndarray (GH40908)
* Add keyword sort to pivot_table() to allow non-sorting of the result (
GH39143)
* Add keyword dropna to DataFrame.value_counts() to allow counting rows that
include NA values (GH41325)
* Series.replace() will now cast results to PeriodDtype where possible
instead of object dtype (GH41526)
* Improved error message in corr and cov methods on Rolling, Expanding, and
ExponentialMovingWindow when other is not a DataFrame or Series (GH41741)
* Series.between() can now accept left or right as arguments to inclusive to
include only the left or right boundary (GH40245)
* DataFrame.explode() now supports exploding multiple columns. Its column
argument now also accepts a list of str or tuples for exploding on multiple
columns at the same time (GH39240)
* DataFrame.sample() now accepts the ignore_index argument to reset the index
after sampling, similar to DataFrame.drop_duplicates() and
DataFrame.sort_values() (GH38581)
-------------------------------------------------------------------------------
Notable bug fixes
These are bug fixes that might have notable behavior changes.
-------------------------------------------------------------------------------
Categorical.unique now always maintains same dtype as original
Previously, when calling Categorical.unique() with categorical data, unused
categories in the new array would be removed, making the dtype of the new array
different than the original (GH18291)
As an example of this, given:
In [19]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)
In [20]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)
In [21]: original = pd.Series(cat)
In [22]: unique = original.unique()
Previous behavior:
In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False
New behavior:
In [23]: unique
Out[23]:
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']
In [24]: original.dtype == unique.dtype
Out[24]: True
-------------------------------------------------------------------------------
Preserve dtypes in DataFrame.combine_first()
DataFrame.combine_first() will now preserve dtypes (GH7509)
In [25]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, \
3]}, index=[0, 1, 2])
In [26]: df1
Out[26]:
A B
0 1 1
1 2 2
2 3 3
In [27]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, \
3]}, index=[2, 3, 4])
In [28]: df2
Out[28]:
B C
2 4 1
3 5 2
4 6 3
In [29]: combined = df1.combine_first(df2)
Previous behavior:
In [1]: combined.dtypes
Out[2]:
A float64
B float64
C float64
dtype: object
New behavior:
In [30]: combined.dtypes
Out[30]:
A float64
B int64
C float64
dtype: object
-------------------------------------------------------------------------------
Groupby methods agg and transform no longer changes return dtype for callables
Previously the methods DataFrameGroupBy.aggregate(), SeriesGroupBy.aggregate(),
DataFrameGroupBy.transform(), and SeriesGroupBy.transform() might cast the
result dtype when the argument func is callable, possibly leading to
undesirable results (GH21240). The cast would occur if the result is numeric
and casting back to the input dtype does not change any values as measured by
np.allclose. Now no such casting occurs.
In [31]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})
In [32]: df
Out[32]:
key a b
0 1 True True
1 1 False True
Previous behavior:
In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
a b
key
1 True 2
New behavior:
In [33]: df.groupby('key').agg(lambda x: x.sum())
Out[33]:
a b
key
1 1 2
-------------------------------------------------------------------------------
float result for GroupBy.mean(), GroupBy.median(), and GroupBy.var()
Previously, these methods could result in different dtypes depending on the
input values. Now, these methods will always return a float dtype. (GH41137)
In [34]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})
Previous behavior:
In [5]: df.groupby(df.index).mean()
Out[5]:
a b c
0 True 1 1.0
New behavior:
In [35]: df.groupby(df.index).mean()
Out[35]:
a b c
0 1.0 1.0 1.0
-------------------------------------------------------------------------------
Try operating inplace when setting values with loc and iloc
When setting an entire column using loc or iloc, pandas will try to insert the
values into the existing data rather than create an entirely new array.
In [36]: df = pd.DataFrame(range(3), columns=["A"], \
dtype="float64")
In [37]: values = df.values
In [38]: new = np.array([5, 6, 7], dtype="int64")
In [39]: df.loc[[0, 1, 2], "A"] = new
In both the new and old behavior, the data in values is overwritten, but in the
old behavior the dtype of df["A"] changed to int64.
Previous behavior:
In [1]: df.dtypes
Out[1]:
A int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False
In pandas 1.3.0, df continues to share data with values
New behavior:
In [40]: df.dtypes
Out[40]:
A float64
dtype: object
In [41]: np.shares_memory(df["A"], new)
Out[41]: False
In [42]: np.shares_memory(df["A"], values)
Out[42]: True
-------------------------------------------------------------------------------
Never operate inplace when setting frame[keys] = values
When setting multiple columns using frame[keys] = values new arrays will
replace pre-existing arrays for these keys, which will not be over-written (
GH39510). As a result, the columns will retain the dtype(s) of values, never
casting to the dtypes of the existing arrays.
In [43]: df = pd.DataFrame(range(3), columns=["A"], \
dtype="float64")
In [44]: df[["A"]] = 5
In the old behavior, 5 was cast to float64 and inserted into the existing array
backing df:
Previous behavior:
In [1]: df.dtypes
Out[1]:
A float64
In the new behavior, we get a new array, and retain an integer-dtyped 5:
New behavior:
In [45]: df.dtypes
Out[45]:
A int64
dtype: object
-------------------------------------------------------------------------------
Consistent casting with setting into Boolean Series
Setting non-boolean values into a Series with dtype=bool now consistently casts
to dtype=object (GH38709)
In [46]: orig = pd.Series([True, False])
In [47]: ser = orig.copy()
In [48]: ser.iloc[1] = np.nan
In [49]: ser2 = orig.copy()
In [50]: ser2.iloc[1] = 2.0
Previous behavior:
In [1]: ser
Out [1]:
0 1.0
1 NaN
dtype: float64
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
New behavior:
In [51]: ser
Out[51]:
0 True
1 NaN
dtype: object
In [52]: ser2
Out[52]:
0 True
1 2.0
dtype: object
-------------------------------------------------------------------------------
GroupBy.rolling no longer returns grouped-by column in values
The group-by column will now be dropped from the result of a groupby.rolling
operation (GH32262)
In [53]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, \
2, 3]})
In [54]: df
Out[54]:
A B
0 1 0
1 1 1
2 2 2
3 3 3
Previous behavior:
In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
A B
A
1 0 NaN NaN
1 2.0 1.0
2 2 NaN NaN
3 3 NaN NaN
New behavior:
In [55]: df.groupby("A").rolling(2).sum()
Out[55]:
B
A
1 0 NaN
1 1.0
2 2 NaN
3 3 NaN
-------------------------------------------------------------------------------
Removed artificial truncation in rolling variance and standard deviation
Rolling.std() and Rolling.var() will no longer artificially truncate results
that are less than ~1e-8 and ~1e-15 respectively to zero (GH37051, GH40448,
GH39872).
However, floating point artifacts may now exist in the results when rolling
over larger values.
In [56]: s = pd.Series([7, 5, 5, 5])
In [57]: s.rolling(3).var()
Out[57]:
0 NaN
1 NaN
2 1.333333e+00
3 4.440892e-16
dtype: float64
-------------------------------------------------------------------------------
GroupBy.rolling with MultiIndex no longer drops levels in the result
GroupBy.rolling() will no longer drop levels of a DataFrame with a MultiIndex
in the result. This can lead to a perceived duplication of levels in the
resulting MultiIndex, but this change restores the behavior that was present in
version 1.1.3 (GH38787, GH38523).
In [58]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', \
'label2'])
In [59]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)
In [60]: df
Out[60]:
a b
label1 label2
idx1 idx2 1 2
Previous behavior:
In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
a b
label1
idx1 1.0 2.0
New behavior:
In [61]: df.groupby('label1').rolling(1).sum()
Out[61]:
a b
label1 label1 label2
idx1 idx1 idx2 1.0 2.0
-------------------------------------------------------------------------------
Backwards incompatible API changes
-------------------------------------------------------------------------------
Increased minimum versions for dependencies
Some minimum supported versions of dependencies were updated. If installed, we
now require:
Package Minimum Version Required Changed
numpy 1.17.3 X X
pytz 2017.3 X
python-dateutil 2.7.3 X
bottleneck 1.2.1
numexpr 2.7.0 X
pytest (dev) 6.0 X
mypy (dev) 0.812 X
setuptools 38.6.0 X
For optional libraries the general recommendation is to use the latest version.
The following table lists the lowest version per library that is currently
being tested throughout the development of pandas. Optional libraries below the
lowest tested version may still work, but are not considered supported.
Package Minimum Version Changed
beautifulsoup4 4.6.0
fastparquet 0.4.0 X
fsspec 0.7.4
gcsfs 0.6.0
lxml 4.3.0
matplotlib 2.2.3
numba 0.46.0
openpyxl 3.0.0 X
pyarrow 0.17.0 X
pymysql 0.8.1 X
pytables 3.5.1
s3fs 0.4.0
scipy 1.2.0
sqlalchemy 1.3.0 X
tabulate 0.8.7 X
xarray 0.12.0
xlrd 1.2.0
xlsxwriter 1.0.2
xlwt 1.3.0
pandas-gbq 0.12.0
See Dependencies and Optional dependencies for more.
-------------------------------------------------------------------------------
Other API changes
* Partially initialized CategoricalDtype objects (i.e. those with categories=
None) will no longer compare as equal to fully initialized dtype objects (
GH38516)
* Accessing _constructor_expanddim on a DataFrame and _constructor_sliced on
a Series now raise an AttributeError. Previously a NotImplementedError was
raised (GH38782)
* Added new engine and **engine_kwargs parameters to DataFrame.to_sql() to
support other future 'SQL engines'. Currently we still only use
SQLAlchemy under the hood, but more engines are planned to be supported
such as turbodbc (GH36893)
* Removed redundant freq from PeriodIndex string representation (GH41653)
* ExtensionDtype.construct_array_type() is now a required method instead of
an optional one for ExtensionDtype subclasses (GH24860)
* Calling hash on non-hashable pandas objects will now raise TypeError with
the built-in error message (e.g. unhashable type: 'Series'). Previously it
would raise a custom message such as 'Series' objects are mutable, thus
they cannot be hashed. Furthermore, isinstance(<Series>,
abc.collections.Hashable) will now return False (GH40013)
* Styler.from_custom_template() now has two new arguments for template names,
and removed the old name, due to template inheritance having been
introducing for better parsing (GH42053). Subclassing modifications to
Styler attributes are also needed.
-------------------------------------------------------------------------------
Build
* Documentation in .pptx and .pdf formats are no longer included in wheels or
source distributions. (GH30741)
-------------------------------------------------------------------------------
Deprecations
-------------------------------------------------------------------------------
Deprecated dropping nuisance columns in DataFrame reductions and
DataFrameGroupBy operations
Calling a reduction (e.g. .min, .max, .sum) on a DataFrame with numeric_only=
None (the default), columns where the reduction raises a TypeError are silently
ignored and dropped from the result.
This behavior is deprecated. In a future version, the TypeError will be raised,
and users will need to select only valid columns before calling the function.
For example:
In [62]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": \
pd.date_range("2016-01-01", periods=4)})
In [63]: df
Out[63]:
A B
0 1 2016-01-01
1 2 2016-01-02
2 3 2016-01-03
3 4 2016-01-04
Old behavior:
In [3]: df.prod()
Out[3]:
Out[3]:
A 24
dtype: int64
Future behavior:
In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'
In [5]: df[["A"]].prod()
Out[5]:
A 24
dtype: int64
Similarly, when applying a function to DataFrameGroupBy, columns on which the
function raises TypeError are currently silently ignored and dropped from the
result.
This behavior is deprecated. In a future version, the TypeError will be raised,
and users will need to select only valid columns before calling the function.
For example:
In [64]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": \
pd.date_range("2016-01-01", periods=4)})
In [65]: gb = df.groupby([1, 1, 2, 2])
Old behavior:
In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1 2
2 12
Future behavior:
In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations
In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
A
1 2
2 12
-------------------------------------------------------------------------------
Other Deprecations
* Deprecated allowing scalars to be passed to the Categorical constructor (
GH38433)
* Deprecated constructing CategoricalIndex without passing list-like data (
GH38944)
* Deprecated allowing subclass-specific keyword arguments in the Index
constructor, use the specific subclass directly instead (GH14093, GH21311,
GH22315, GH26974)
* Deprecated the astype() method of datetimelike (timedelta64[ns], datetime64
[ns], Datetime64TZDtype, PeriodDtype) to convert to integer dtypes, use
values.view(...) instead (GH38544)
* Deprecated MultiIndex.is_lexsorted() and MultiIndex.lexsort_depth(), use
MultiIndex.is_monotonic_increasing() instead (GH32259)
* Deprecated keyword try_cast in Series.where(), Series.mask(),
DataFrame.where(), DataFrame.mask(); cast results manually if desired (
GH38836)
* Deprecated comparison of Timestamp objects with datetime.date objects.
Instead of e.g. ts <= mydate use ts <= pd.Timestamp(mydate) or \
ts.date() <=
mydate (GH36131)
* Deprecated Rolling.win_type returning "freq" (GH38963)
* Deprecated Rolling.is_datetimelike (GH38963)
* Deprecated DataFrame indexer for Series.__setitem__() and
DataFrame.__setitem__() (GH39004)
* Deprecated ExponentialMovingWindow.vol() (GH39220)
* Using .astype to convert between datetime64[ns] dtype and DatetimeTZDtype
is deprecated and will raise in a future version, use obj.tz_localize or
obj.dt.tz_localize instead (GH38622)
* Deprecated casting datetime.date objects to datetime64 when used as
fill_value in DataFrame.unstack(), DataFrame.shift(), Series.shift(), and
DataFrame.reindex(), pass pd.Timestamp(dateobj) instead (GH39767)
* Deprecated Styler.set_na_rep() and Styler.set_precision() in favor of
Styler.format() with na_rep and precision as existing and new input
arguments respectively (GH40134, GH40425)
* Deprecated Styler.where() in favor of using an alternative formulation with
Styler.applymap() (GH40821)
* Deprecated allowing partial failure in Series.transform() and
DataFrame.transform() when func is list-like or dict-like and raises
anything but TypeError; func raising anything but a TypeError will raise in
a future version (GH40211)
* Deprecated arguments error_bad_lines and warn_bad_lines in read_csv() and
read_table() in favor of argument on_bad_lines (GH15122)
* Deprecated support for np.ma.mrecords.MaskedRecords in the DataFrame
constructor, pass {name: data[name] for name in data.dtype.names} instead (
GH40363)
* Deprecated using merge(), DataFrame.merge(), and DataFrame.join() on a
different number of levels (GH34862)
* Deprecated the use of **kwargs in ExcelWriter; use the keyword argument
engine_kwargs instead (GH40430)
* Deprecated the level keyword for DataFrame and Series aggregations; use
groupby instead (GH39983)
* Deprecated the inplace parameter of Categorical.remove_categories(),
Categorical.add_categories(), Categorical.reorder_categories(),
Categorical.rename_categories(), Categorical.set_categories() and will be
removed in a future version (GH37643)
* Deprecated merge() producing duplicated columns through the suffixes
keyword and already existing columns (GH22818)
* Deprecated setting Categorical._codes, create a new Categorical with the
desired codes instead (GH40606)
* Deprecated the convert_float optional argument in read_excel() and
ExcelFile.parse() (GH41127)
* Deprecated behavior of DatetimeIndex.union() with mixed timezones; in a
future version both will be cast to UTC instead of object dtype (GH39328)
* Deprecated using usecols with out of bounds indices for read_csv() with
engine="c" (GH25623)
* Deprecated special treatment of lists with first element a Categorical in
the DataFrame constructor; pass as pd.DataFrame({col: categorical, ...})
instead (GH38845)
* Deprecated behavior of DataFrame constructor when a dtype is passed and the
data cannot be cast to that dtype. In a future version, this will raise
instead of being silently ignored (GH24435)
* Deprecated the Timestamp.freq attribute. For the properties that use it (
is_month_start, is_month_end, is_quarter_start, is_quarter_end,
is_year_start, is_year_end), when you have a freq, use e.g.
freq.is_month_start(ts) (GH15146)
* Deprecated construction of Series or DataFrame with DatetimeTZDtype data
and datetime64[ns] dtype. Use Series(data).dt.tz_localize(None) instead (
GH41555, GH33401)
* Deprecated behavior of Series construction with large-integer values and
small-integer dtype silently overflowing; use Series(data).astype(dtype)
instead (GH41734)
* Deprecated behavior of DataFrame construction with floating data and
integer dtype casting even when lossy; in a future version this will remain
floating, matching Series behavior (GH41770)
* Deprecated inference of timedelta64[ns], datetime64[ns], or DatetimeTZDtype
dtypes in Series construction when data containing strings is passed and no
dtype is passed (GH33558)
* In a future version, constructing Series or DataFrame with datetime64[ns]
data and DatetimeTZDtype will treat the data as wall-times instead of as
UTC times (matching DatetimeIndex behavior). To treat the data as UTC
times, use \
pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz) or
pd.Series(data.view("int64"), dtype=dtype) (GH33401)
* Deprecated passing lists as key to DataFrame.xs() and Series.xs() (GH41760)
* Deprecated boolean arguments of inclusive in Series.between() to have
{"left", "right", "neither", "both"} \
as standard argument values (GH40628)
* Deprecated passing arguments as positional for all of the following, with
exceptions noted (GH41485):
+ concat() (other than objs)
+ read_csv() (other than filepath_or_buffer)
+ read_table() (other than filepath_or_buffer)
+ DataFrame.clip() and Series.clip() (other than upper and lower)
+ DataFrame.drop_duplicates() (except for subset), Series.drop_duplicates
(), Index.drop_duplicates() and MultiIndex.drop_duplicates()
+ DataFrame.drop() (other than labels) and Series.drop()
+ DataFrame.dropna() and Series.dropna()
+ DataFrame.ffill(), Series.ffill(), DataFrame.bfill(), and Series.bfill
()
+ DataFrame.fillna() and Series.fillna() (apart from value)
+ DataFrame.interpolate() and Series.interpolate() (other than method)
+ DataFrame.mask() and Series.mask() (other than cond and other)
+ DataFrame.reset_index() (other than level) and Series.reset_index()
+ DataFrame.set_axis() and Series.set_axis() (other than labels)
+ DataFrame.set_index() (other than keys)
+ DataFrame.sort_index() and Series.sort_index()
+ DataFrame.sort_values() (other than by) and Series.sort_values()
+ DataFrame.where() and Series.where() (other than cond and other)
+ Index.set_names() and MultiIndex.set_names() (except for names)
+ MultiIndex.codes() (except for codes)
+ MultiIndex.set_levels() (except for levels)
+ Resampler.interpolate() (other than method)
-------------------------------------------------------------------------------
Performance improvements
* Performance improvement in IntervalIndex.isin() (GH38353)
* Performance improvement in Series.mean() for nullable data types (GH34814)
* Performance improvement in Series.isin() for nullable data types (GH38340)
* Performance improvement in DataFrame.fillna() with method="pad" or \
method=
"backfill" for nullable floating and nullable integer dtypes (GH39953)
* Performance improvement in DataFrame.corr() for method=kendall (GH28329)
* Performance improvement in DataFrame.corr() for method=spearman (GH40956,
GH41885)
* Performance improvement in Rolling.corr() and Rolling.cov() (GH39388)
* Performance improvement in RollingGroupby.corr(), ExpandingGroupby.corr(),
ExpandingGroupby.corr() and ExpandingGroupby.cov() (GH39591)
* Performance improvement in unique() for object data type (GH37615)
* Performance improvement in json_normalize() for basic cases (including
separators) (GH40035 GH15621)
* Performance improvement in ExpandingGroupby aggregation methods (GH39664)
* Performance improvement in Styler where render times are more than 50%
reduced and now matches DataFrame.to_html() (GH39972 GH39952, GH40425)
* The method Styler.set_td_classes() is now as performant as Styler.apply()
and Styler.applymap(), and even more so in some cases (GH40453)
* Performance improvement in ExponentialMovingWindow.mean() with times (
GH39784)
* Performance improvement in GroupBy.apply() when requiring the Python
fallback implementation (GH40176)
* Performance improvement in the conversion of a PyArrow Boolean array to a
pandas nullable Boolean array (GH41051)
* Performance improvement for concatenation of data with type
CategoricalDtype (GH40193)
* Performance improvement in GroupBy.cummin() and GroupBy.cummax() with
nullable data types (GH37493)
* Performance improvement in Series.nunique() with nan values (GH40865)
* Performance improvement in DataFrame.transpose(), Series.unstack() with
DatetimeTZDtype (GH40149)
* Performance improvement in Series.plot() and DataFrame.plot() with entry
point lazy loading (GH41492)
-------------------------------------------------------------------------------
Bug fixes
-------------------------------------------------------------------------------
Categorical
* Bug in CategoricalIndex incorrectly failing to raise TypeError when scalar
data is passed (GH38614)
* Bug in CategoricalIndex.reindex failed when the Index passed was not
categorical but whose values were all labels in the category (GH28690)
* Bug where constructing a Categorical from an object-dtype array of date
objects did not round-trip correctly with astype (GH38552)
* Bug in constructing a DataFrame from an ndarray and a CategoricalDtype (
GH38857)
* Bug in setting categorical values into an object-dtype column in a
DataFrame (GH39136)
* Bug in DataFrame.reindex() was raising an IndexError when the new index
contained duplicates and the old index was a CategoricalIndex (GH38906)
* Bug in Categorical.fillna() with a tuple-like category raising
NotImplementedError instead of ValueError when filling with a non-category
tuple (GH41914)
-------------------------------------------------------------------------------
Datetimelike
* Bug in DataFrame and Series constructors sometimes dropping nanoseconds
from Timestamp (resp. Timedelta) data, with dtype=datetime64[ns] (resp.
timedelta64[ns]) (GH38032)
* Bug in DataFrame.first() and Series.first() with an offset of one month
returning an incorrect result when the first day is the last day of a month
(GH29623)
* Bug in constructing a DataFrame or Series with mismatched datetime64 data
and timedelta64 dtype, or vice-versa, failing to raise a TypeError (GH38575
, GH38764, GH38792)
* Bug in constructing a Series or DataFrame with a datetime object out of
bounds for datetime64[ns] dtype or a timedelta object out of bounds for
timedelta64[ns] dtype (GH38792, GH38965)
* Bug in DatetimeIndex.intersection(), DatetimeIndex.symmetric_difference(),
PeriodIndex.intersection(), PeriodIndex.symmetric_difference() always
returning object-dtype when operating with CategoricalIndex (GH38741)
* Bug in DatetimeIndex.intersection() giving incorrect results with non-Tick
frequencies with n != 1 (GH42104)
* Bug in Series.where() incorrectly casting datetime64 values to int64 (
GH37682)
* Bug in Categorical incorrectly typecasting datetime object to Timestamp (
GH38878)
* Bug in comparisons between Timestamp object and datetime64 objects just
outside the implementation bounds for nanosecond datetime64 (GH39221)
* Bug in Timestamp.round(), Timestamp.floor(), Timestamp.ceil() for values
near the implementation bounds of Timestamp (GH39244)
* Bug in Timedelta.round(), Timedelta.floor(), Timedelta.ceil() for values
near the implementation bounds of Timedelta (GH38964)
* Bug in date_range() incorrectly creating DatetimeIndex containing NaT
instead of raising OutOfBoundsDatetime in corner cases (GH24124)
* Bug in infer_freq() incorrectly fails to infer 'H' frequency of
DatetimeIndex if the latter has a timezone and crosses DST boundaries (
GH39556)
* Bug in Series backed by DatetimeArray or TimedeltaArray sometimes failing
to set the array's freq to None (GH41425)
-------------------------------------------------------------------------------
Timedelta
* Bug in constructing Timedelta from np.timedelta64 objects with
non-nanosecond units that are out of bounds for timedelta64[ns] (GH38965)
* Bug in constructing a TimedeltaIndex incorrectly accepting np.datetime64
("NaT") objects (GH39462)
* Bug in constructing Timedelta from an input string with only symbols and no
digits failed to raise an error (GH39710)
* Bug in TimedeltaIndex and to_timedelta() failing to raise when passed
non-nanosecond timedelta64 arrays that overflow when converting to
timedelta64[ns] (GH40008)
-------------------------------------------------------------------------------
Timezones
* Bug in different tzinfo objects representing UTC not being treated as
equivalent (GH39216)
* Bug in dateutil.tz.gettz("UTC") not being recognized as equivalent \
to other
UTC-representing tzinfos (GH39276)
-------------------------------------------------------------------------------
Numeric
* Bug in DataFrame.quantile(), DataFrame.sort_values() causing incorrect
subsequent indexing behavior (GH38351)
* Bug in DataFrame.sort_values() raising an IndexError for empty by (GH40258)
* Bug in DataFrame.select_dtypes() with include=np.number would drop numeric
ExtensionDtype columns (GH35340)
* Bug in DataFrame.mode() and Series.mode() not keeping consistent integer
Index for empty input (GH33321)
* Bug in DataFrame.rank() when the DataFrame contained np.inf (GH32593)
* Bug in DataFrame.rank() with axis=0 and columns holding incomparable types
raising an IndexError (GH38932)
* Bug in Series.rank(), DataFrame.rank(), and GroupBy.rank() treating the
most negative int64 value as missing (GH32859)
* Bug in DataFrame.select_dtypes() different behavior between Windows and
Linux with include="int" (GH36596)
* Bug in DataFrame.apply() and DataFrame.agg() when passed the argument func=
"size" would operate on the entire DataFrame instead of rows or \
columns (
GH39934)
* Bug in DataFrame.transform() would raise a SpecificationError when passed a
dictionary and columns were missing; will now raise a KeyError instead (
GH40004)
* Bug in GroupBy.rank() giving incorrect results with pct=True and equal
values between consecutive groups (GH40518)
* Bug in Series.count() would result in an int32 result on 32-bit platforms
when argument level=None (GH40908)
* Bug in Series and DataFrame reductions with methods any and all not
returning Boolean results for object data (GH12863, GH35450, GH27709)
* Bug in Series.clip() would fail if the Series contains NA values and has
nullable int or float as a data type (GH40851)
* Bug in UInt64Index.where() and UInt64Index.putmask() with an np.int64 dtype
other incorrectly raising TypeError (GH41974)
* Bug in DataFrame.agg() not sorting the aggregated axis in the order of the
provided aggregation functions when one or more aggregation function fails
to produce results (GH33634)
* Bug in DataFrame.clip() not interpreting missing values as no threshold (
GH40420)
-------------------------------------------------------------------------------
Conversion
* Bug in Series.to_dict() with orient='records' now returns Python native
types (GH25969)
* Bug in Series.view() and Index.view() when converting between datetime-like
(datetime64[ns], datetime64[ns, tz], timedelta64, period) dtypes (GH39788)
* Bug in creating a DataFrame from an empty np.recarray not retaining the
original dtypes (GH40121)
* Bug in DataFrame failing to raise a TypeError when constructing from a
frozenset (GH40163)
* Bug in Index construction silently ignoring a passed dtype when the data
cannot be cast to that dtype (GH21311)
* Bug in StringArray.astype() falling back to NumPy and raising when
converting to dtype='categorical' (GH40450)
* Bug in factorize() where, when given an array with a numeric NumPy dtype
lower than int64, uint64 and float64, the unique values did not keep their
original dtype (GH41132)
* Bug in DataFrame construction with a dictionary containing an array-like
with ExtensionDtype and copy=True failing to make a copy (GH38939)
* Bug in qcut() raising error when taking Float64DType as input (GH40730)
* Bug in DataFrame and Series construction with datetime64[ns] data and dtype
=object resulting in datetime objects instead of Timestamp objects (GH41599
)
* Bug in DataFrame and Series construction with timedelta64[ns] data and
dtype=object resulting in np.timedelta64 objects instead of Timedelta
objects (GH41599)
* Bug in DataFrame construction when given a two-dimensional object-dtype
np.ndarray of Period or Interval objects failing to cast to PeriodDtype or
IntervalDtype, respectively (GH41812)
* Bug in constructing a Series from a list and a PandasDtype (GH39357)
* Bug in creating a Series from a range object that does not fit in the
bounds of int64 dtype (GH30173)
* Bug in creating a Series from a dict with all-tuple keys and an Index that
requires reindexing (GH41707)
* Bug in infer_dtype() not recognizing Series, Index, or array with a Period
dtype (GH23553)
* Bug in infer_dtype() raising an error for general ExtensionArray objects.
It will now return "unknown-array" instead of raising (GH37367)
* Bug in DataFrame.convert_dtypes() incorrectly raised a ValueError when
called on an empty DataFrame (GH40393)
-------------------------------------------------------------------------------
Strings
* Bug in the conversion from pyarrow.ChunkedArray to StringArray when the
original had zero chunks (GH41040)
* Bug in Series.replace() and DataFrame.replace() ignoring replacements with
regex=True for StringDType data (GH41333, GH35977)
* Bug in Series.str.extract() with StringArray returning object dtype for an
empty DataFrame (GH41441)
* Bug in Series.str.replace() where the case argument was ignored when regex=
False (GH41602)
-------------------------------------------------------------------------------
Interval
* Bug in IntervalIndex.intersection() and IntervalIndex.symmetric_difference
() always returning object-dtype when operating with CategoricalIndex (
GH38653, GH38741)
* Bug in IntervalIndex.intersection() returning duplicates when at least one
of the Index objects have duplicates which are present in the other (
GH38743)
* IntervalIndex.union(), IntervalIndex.intersection(),
IntervalIndex.difference(), and IntervalIndex.symmetric_difference() now
cast to the appropriate dtype instead of raising a TypeError when operating
with another IntervalIndex with incompatible dtype (GH39267)
* PeriodIndex.union(), PeriodIndex.intersection(),
PeriodIndex.symmetric_difference(), PeriodIndex.difference() now cast to
object dtype instead of raising IncompatibleFrequency when operating with
another PeriodIndex with incompatible dtype (GH39306)
* Bug in IntervalIndex.is_monotonic(), IntervalIndex.get_loc(),
IntervalIndex.get_indexer_for(), and IntervalIndex.__contains__() when NA
values are present (GH41831)
-------------------------------------------------------------------------------
Indexing
* Bug in Index.union() and MultiIndex.union() dropping duplicate Index values
when Index was not monotonic or sort was set to False (GH36289, GH31326,
GH40862)
* Bug in CategoricalIndex.get_indexer() failing to raise InvalidIndexError
when non-unique (GH38372)
* Bug in IntervalIndex.get_indexer() when target has CategoricalDtype and
both the index and the target contain NA values (GH41934)
* Bug in Series.loc() raising a ValueError when input was filtered with a
Boolean list and values to set were a list with lower dimension (GH20438)
* Bug in inserting many new columns into a DataFrame causing incorrect
subsequent indexing behavior (GH38380)
* Bug in DataFrame.__setitem__() raising a ValueError when setting multiple
values to duplicate columns (GH15695)
* Bug in DataFrame.loc(), Series.loc(), DataFrame.__getitem__() and
Series.__getitem__() returning incorrect elements for non-monotonic
DatetimeIndex for string slices (GH33146)
* Bug in DataFrame.reindex() and Series.reindex() with timezone aware indexes
raising a TypeError for method="ffill" and \
method="bfill" and specified
tolerance (GH38566)
* Bug in DataFrame.reindex() with datetime64[ns] or timedelta64[ns]
incorrectly casting to integers when the fill_value requires casting to
object dtype (GH39755)
* Bug in DataFrame.__setitem__() raising a ValueError when setting on an
empty DataFrame using specified columns and a nonempty DataFrame value (
GH38831)
* Bug in DataFrame.loc.__setitem__() raising a ValueError when operating on a
unique column when the DataFrame has duplicate columns (GH38521)
* Bug in DataFrame.iloc.__setitem__() and DataFrame.loc.__setitem__() with
mixed dtypes when setting with a dictionary value (GH38335)
* Bug in Series.loc.__setitem__() and DataFrame.loc.__setitem__() raising
KeyError when provided a Boolean generator (GH39614)
* Bug in Series.iloc() and DataFrame.iloc() raising a KeyError when provided
a generator (GH39614)
* Bug in DataFrame.__setitem__() not raising a ValueError when the right hand
side is a DataFrame with wrong number of columns (GH38604)
* Bug in Series.__setitem__() raising a ValueError when setting a Series with
a scalar indexer (GH38303)
* Bug in DataFrame.loc() dropping levels of a MultiIndex when the DataFrame
used as input has only one row (GH10521)
* Bug in DataFrame.__getitem__() and Series.__getitem__() always raising
KeyError when slicing with existing strings where the Index has
milliseconds (GH33589)
* Bug in setting timedelta64 or datetime64 values into numeric Series failing
to cast to object dtype (GH39086, GH39619)
* Bug in setting Interval values into a Series or DataFrame with mismatched
IntervalDtype incorrectly casting the new values to the existing dtype (
GH39120)
* Bug in setting datetime64 values into a Series with integer-dtype
incorrectly casting the datetime64 values to integers (GH39266)
* Bug in setting np.datetime64("NaT") into a Series with \
Datetime64TZDtype
incorrectly treating the timezone-naive value as timezone-aware (GH39769)
* Bug in Index.get_loc() not raising KeyError when key=NaN and method is
specified but NaN is not in the Index (GH39382)
* Bug in DatetimeIndex.insert() when inserting np.datetime64("NaT") \
into a
timezone-aware index incorrectly treating the timezone-naive value as
timezone-aware (GH39769)
* Bug in incorrectly raising in Index.insert(), when setting a new column
that cannot be held in the existing frame.columns, or in Series.reset_index
() or DataFrame.reset_index() instead of casting to a compatible dtype (
GH39068)
* Bug in RangeIndex.append() where a single object of length 1 was
concatenated incorrectly (GH39401)
* Bug in RangeIndex.astype() where when converting to CategoricalIndex, the
categories became a Int64Index instead of a RangeIndex (GH41263)
* Bug in setting numpy.timedelta64 values into an object-dtype Series using a
Boolean indexer (GH39488)
* Bug in setting numeric values into a into a boolean-dtypes Series using at
or iat failing to cast to object-dtype (GH39582)
* Bug in DataFrame.__setitem__() and DataFrame.iloc.__setitem__() raising
ValueError when trying to index with a row-slice and setting a list as
values (GH40440)
* Bug in DataFrame.loc() not raising KeyError when the key was not found in
MultiIndex and the levels were not fully specified (GH41170)
* Bug in DataFrame.loc.__setitem__() when setting-with-expansion incorrectly
raising when the index in the expanding axis contained duplicates (GH40096)
* Bug in DataFrame.loc.__getitem__() with MultiIndex casting to float when at
least one index column has float dtype and we retrieve a scalar (GH41369)
* Bug in DataFrame.loc() incorrectly matching non-Boolean index elements (
GH20432)
* Bug in indexing with np.nan on a Series or DataFrame with a
CategoricalIndex incorrectly raising KeyError when np.nan keys are present
(GH41933)
* Bug in Series.__delitem__() with ExtensionDtype incorrectly casting to
ndarray (GH40386)
* Bug in DataFrame.at() with a CategoricalIndex returning incorrect results
when passed integer keys (GH41846)
* Bug in DataFrame.loc() returning a MultiIndex in the wrong order if an
indexer has duplicates (GH40978)
* Bug in DataFrame.__setitem__() raising a TypeError when using a str
subclass as the column name with a DatetimeIndex (GH37366)
* Bug in PeriodIndex.get_loc() failing to raise a KeyError when given a
Period with a mismatched freq (GH41670)
* Bug .loc.__getitem__ with a UInt64Index and negative-integer keys raising
OverflowError instead of KeyError in some cases, wrapping around to
positive integers in others (GH41777)
* Bug in Index.get_indexer() failing to raise ValueError in some cases with
invalid method, limit, or tolerance arguments (GH41918)
* Bug when slicing a Series or DataFrame with a TimedeltaIndex when passing
an invalid string raising ValueError instead of a TypeError (GH41821)
* Bug in Index constructor sometimes silently ignoring a specified dtype (
GH38879)
* Index.where() behavior now mirrors Index.putmask() behavior, i.e.
index.where(mask, other) matches index.putmask(~mask, other) (GH39412)
-------------------------------------------------------------------------------
Missing
* Bug in Grouper did not correctly propagate the dropna argument;
DataFrameGroupBy.transform() now correctly handles missing values for
dropna=True (GH35612)
* Bug in isna(), Series.isna(), Index.isna(), DataFrame.isna(), and the
corresponding notna functions not recognizing Decimal("NaN") objects (
GH39409)
* Bug in DataFrame.fillna() not accepting a dictionary for the downcast
keyword (GH40809)
* Bug in isna() not returning a copy of the mask for nullable types, causing
any subsequent mask modification to change the original array (GH40935)
* Bug in DataFrame construction with float data containing NaN and an integer
dtype casting instead of retaining the NaN (GH26919)
* Bug in Series.isin() and MultiIndex.isin() didn't treat all nans as
equivalent if they were in tuples (GH41836)
-------------------------------------------------------------------------------
MultiIndex
* Bug in DataFrame.drop() raising a TypeError when the MultiIndex is
non-unique and level is not provided (GH36293)
* Bug in MultiIndex.intersection() duplicating NaN in the result (GH38623)
* Bug in MultiIndex.equals() incorrectly returning True when the MultiIndex
contained NaN even when they are differently ordered (GH38439)
* Bug in MultiIndex.intersection() always returning an empty result when
intersecting with CategoricalIndex (GH38653)
* Bug in MultiIndex.difference() incorrectly raising TypeError when indexes
contain non-sortable entries (GH41915)
* Bug in MultiIndex.reindex() raising a ValueError when used on an empty
MultiIndex and indexing only a specific level (GH41170)
* Bug in MultiIndex.reindex() raising TypeError when reindexing against a
flat Index (GH41707)
-------------------------------------------------------------------------------
I/O
* Bug in Index.__repr__() when display.max_seq_items=1 (GH38415)
* Bug in read_csv() not recognizing scientific notation if the argument
decimal is set and engine="python" (GH31920)
* Bug in read_csv() interpreting NA value as comment, when NA does contain
the comment string fixed for engine="python" (GH34002)
* Bug in read_csv() raising an IndexError with multiple header columns and
index_col is specified when the file has no data rows (GH38292)
* Bug in read_csv() not accepting usecols with a different length than names
for engine="python" (GH16469)
* Bug in read_csv() returning object dtype when delimiter="," with \
usecols
and parse_dates specified for engine="python" (GH35873)
* Bug in read_csv() raising a TypeError when names and parse_dates is
specified for engine="c" (GH33699)
* Bug in read_clipboard() and DataFrame.to_clipboard() not working in WSL (
GH38527)
* Allow custom error values for the parse_dates argument of read_sql(),
read_sql_query() and read_sql_table() (GH35185)
* Bug in DataFrame.to_hdf() and Series.to_hdf() raising a KeyError when
trying to apply for subclasses of DataFrame or Series (GH33748)
* Bug in HDFStore.put() raising a wrong TypeError when saving a DataFrame
with non-string dtype (GH34274)
* Bug in json_normalize() resulting in the first element of a generator
object not being included in the returned DataFrame (GH35923)
* Bug in read_csv() applying the thousands separator to date columns when the
column should be parsed for dates and usecols is specified for engine=
"python" (GH39365)
* Bug in read_excel() forward filling MultiIndex names when multiple header
and index columns are specified (GH34673)
* Bug in read_excel() not respecting set_option() (GH34252)
* Bug in read_csv() not switching true_values and false_values for nullable
Boolean dtype (GH34655)
* Bug in read_json() when orient="split" not maintaining a numeric string
index (GH28556)
* read_sql() returned an empty generator if chunksize was non-zero and the
query returned no results. Now returns a generator with a single empty
DataFrame (GH34411)
* Bug in read_hdf() returning unexpected records when filtering on
categorical string columns using the where parameter (GH39189)
* Bug in read_sas() raising a ValueError when datetimes were null (GH39725)
* Bug in read_excel() dropping empty values from single-column spreadsheets (
GH39808)
* Bug in read_excel() loading trailing empty rows/columns for some filetypes
(GH41167)
* Bug in read_excel() raising an AttributeError when the excel file had a
MultiIndex header followed by two empty rows and no index (GH40442)
* Bug in read_excel(), read_csv(), read_table(), read_fwf(), and
read_clipboard() where one blank row after a MultiIndex header with no
index would be dropped (GH40442)
* Bug in DataFrame.to_string() misplacing the truncation column when index=
False (GH40904)
* Bug in DataFrame.to_string() adding an extra dot and misaligning the
truncation row when index=False (GH40904)
* Bug in read_orc() always raising an AttributeError (GH40918)
* Bug in read_csv() and read_table() silently ignoring prefix if names and
prefix are defined, now raising a ValueError (GH39123)
* Bug in read_csv() and read_excel() not respecting the dtype for a
duplicated column name when mangle_dupe_cols is set to True (GH35211)
* Bug in read_csv() silently ignoring sep if delimiter and sep are defined,
now raising a ValueError (GH39823)
* Bug in read_csv() and read_table() misinterpreting arguments when
sys.setprofile had been previously called (GH41069)
* Bug in the conversion from PyArrow to pandas (e.g. for reading Parquet)
with nullable dtypes and a PyArrow array whose data buffer size is not a
multiple of the dtype size (GH40896)
* Bug in read_excel() would raise an error when pandas could not determine
the file type even though the user specified the engine argument (GH41225)
* Bug in read_clipboard() copying from an excel file shifts values into the
wrong column if there are null values in first column (GH41108)
* Bug in DataFrame.to_hdf() and Series.to_hdf() raising a TypeError when
trying to append a string column to an incompatible column (GH41897)
-------------------------------------------------------------------------------
Period
* Comparisons of Period objects or Index, Series, or DataFrame with
mismatched PeriodDtype now behave like other mismatched-type comparisons,
returning False for equals, True for not-equal, and raising TypeError for
inequality checks (GH39274)
-------------------------------------------------------------------------------
Plotting
* Bug in plotting.scatter_matrix() raising when 2d ax argument passed (
GH16253)
* Prevent warnings when Matplotlib's constrained_layout is enabled (GH25261)
* Bug in DataFrame.plot() was showing the wrong colors in the legend if the
function was called repeatedly and some calls used yerr while others didn
t (GH39522)
* Bug in DataFrame.plot() was showing the wrong colors in the legend if the
function was called repeatedly and some calls used secondary_y and others
use legend=False (GH40044)
* Bug in DataFrame.plot.box() when dark_background theme was selected, caps
or min/max markers for the plot were not visible (GH40769)
-------------------------------------------------------------------------------
Groupby/resample/rolling
* Bug in GroupBy.agg() with PeriodDtype columns incorrectly casting results
too aggressively (GH38254)
* Bug in SeriesGroupBy.value_counts() where unobserved categories in a
grouped categorical Series were not tallied (GH38672)
* Bug in SeriesGroupBy.value_counts() where an error was raised on an empty
Series (GH39172)
* Bug in GroupBy.indices() would contain non-existent indices when null
values were present in the groupby keys (GH9304)
* Fixed bug in GroupBy.sum() causing a loss of precision by now using Kahan
summation (GH38778)
* Fixed bug in GroupBy.cumsum() and GroupBy.mean() causing loss of precision
through using Kahan summation (GH38934)
* Bug in Resampler.aggregate() and DataFrame.transform() raising a TypeError
instead of SpecificationError when missing keys had mixed dtypes (GH39025)
* Bug in DataFrameGroupBy.idxmin() and DataFrameGroupBy.idxmax() with
ExtensionDtype columns (GH38733)
* Bug in Series.resample() would raise when the index was a PeriodIndex
consisting of NaT (GH39227)
* Bug in RollingGroupby.corr() and ExpandingGroupby.corr() where the groupby
column would return 0 instead of np.nan when providing other that was
longer than each group (GH39591)
* Bug in ExpandingGroupby.corr() and ExpandingGroupby.cov() where 1 would be
returned instead of np.nan when providing other that was longer than each
group (GH39591)
* Bug in GroupBy.mean(), GroupBy.median() and DataFrame.pivot_table() not
propagating metadata (GH28283)
* Bug in Series.rolling() and DataFrame.rolling() not calculating window
bounds correctly when window is an offset and dates are in descending order
(GH40002)
* Bug in Series.groupby() and DataFrame.groupby() on an empty Series or
DataFrame would lose index, columns, and/or data types when directly using
the methods idxmax, idxmin, mad, min, max, sum, prod, and skew or using
them through apply, aggregate, or resample (GH26411)
* Bug in GroupBy.apply() where a MultiIndex would be created instead of an
Index when used on a RollingGroupby object (GH39732)
* Bug in DataFrameGroupBy.sample() where an error was raised when weights was
specified and the index was an Int64Index (GH39927)
* Bug in DataFrameGroupBy.aggregate() and Resampler.aggregate() would
sometimes raise a SpecificationError when passed a dictionary and columns
were missing; will now always raise a KeyError instead (GH40004)
* Bug in DataFrameGroupBy.sample() where column selection was not applied
before computing the result (GH39928)
* Bug in ExponentialMovingWindow when calling __getitem__ would incorrectly
raise a ValueError when providing times (GH40164)
* Bug in ExponentialMovingWindow when calling __getitem__ would not retain
com, span, alpha or halflife attributes (GH40164)
* ExponentialMovingWindow now raises a NotImplementedError when specifying
times with adjust=False due to an incorrect calculation (GH40098)
* Bug in ExponentialMovingWindowGroupby.mean() where the times argument was
ignored when engine='numba' (GH40951)
* Bug in ExponentialMovingWindowGroupby.mean() where the wrong times were
used the in case of multiple groups (GH40951)
* Bug in ExponentialMovingWindowGroupby where the times vector and values
became out of sync for non-trivial groups (GH40951)
* Bug in Series.asfreq() and DataFrame.asfreq() dropping rows when the index
was not sorted (GH39805)
* Bug in aggregation functions for DataFrame not respecting numeric_only
argument when level keyword was given (GH40660)
* Bug in SeriesGroupBy.aggregate() where using a user-defined function to
aggregate a Series with an object-typed Index causes an incorrect Index
shape (GH40014)
* Bug in RollingGroupby where as_index=False argument in groupby was ignored
(GH39433)
* Bug in GroupBy.any() and GroupBy.all() raising a ValueError when using with
nullable type columns holding NA even with skipna=True (GH40585)
* Bug in GroupBy.cummin() and GroupBy.cummax() incorrectly rounding integer
values near the int64 implementations bounds (GH40767)
* Bug in GroupBy.rank() with nullable dtypes incorrectly raising a TypeError
(GH41010)
* Bug in GroupBy.cummin() and GroupBy.cummax() computing wrong result with
nullable data types too large to roundtrip when casting to float (GH37493)
* Bug in DataFrame.rolling() returning mean zero for all NaN window with
min_periods=0 if calculation is not numerical stable (GH41053)
* Bug in DataFrame.rolling() returning sum not zero for all NaN window with
min_periods=0 if calculation is not numerical stable (GH41053)
* Bug in SeriesGroupBy.agg() failing to retain ordered CategoricalDtype on
order-preserving aggregations (GH41147)
* Bug in GroupBy.min() and GroupBy.max() with multiple object-dtype columns
and numeric_only=False incorrectly raising a ValueError (GH41111)
* Bug in DataFrameGroupBy.rank() with the GroupBy object's axis=0 and the
rank method's keyword axis=1 (GH41320)
* Bug in DataFrameGroupBy.__getitem__() with non-unique columns incorrectly
returning a malformed SeriesGroupBy instead of DataFrameGroupBy (GH41427)
* Bug in DataFrameGroupBy.transform() with non-unique columns incorrectly
raising an AttributeError (GH41427)
* Bug in Resampler.apply() with non-unique columns incorrectly dropping
duplicated columns (GH41445)
* Bug in Series.groupby() aggregations incorrectly returning empty Series
instead of raising TypeError on aggregations that are invalid for its
dtype, e.g. .prod with datetime64[ns] dtype (GH41342)
* Bug in DataFrameGroupBy aggregations incorrectly failing to drop columns
with invalid dtypes for that aggregation when there are no valid columns (
GH41291)
* Bug in DataFrame.rolling.__iter__() where on was not assigned to the index
of the resulting objects (GH40373)
* Bug in DataFrameGroupBy.transform() and DataFrameGroupBy.agg() with engine=
"numba" where *args were being cached with the user passed function (
GH41647)
* Bug in DataFrameGroupBy methods agg, transform, sum, bfill, ffill, pad,
pct_change, shift, ohlc dropping .columns.names (GH41497)
-------------------------------------------------------------------------------
Reshaping
* Bug in merge() raising error when performing an inner join with partial
index and right_index=True when there was no overlap between indices (
GH33814)
* Bug in DataFrame.unstack() with missing levels led to incorrect index names
(GH37510)
* Bug in merge_asof() propagating the right Index with left_index=True and
right_on specification instead of left Index (GH33463)
* Bug in DataFrame.join() on a DataFrame with a MultiIndex returned the wrong
result when one of both indexes had only one level (GH36909)
* merge_asof() now raises a ValueError instead of a cryptic TypeError in case
of non-numerical merge columns (GH29130)
* Bug in DataFrame.join() not assigning values correctly when the DataFrame
had a MultiIndex where at least one dimension had dtype Categorical with
non-alphabetically sorted categories (GH38502)
* Series.value_counts() and Series.mode() now return consistent keys in
original order (GH12679, GH11227 and GH39007)
* Bug in DataFrame.stack() not handling NaN in MultiIndex columns correctly (
GH39481)
* Bug in DataFrame.apply() would give incorrect results when the argument
func was a string, axis=1, and the axis argument was not supported; now
raises a ValueError instead (GH39211)
* Bug in DataFrame.sort_values() not reshaping the index correctly after
sorting on columns when ignore_index=True (GH39464)
* Bug in DataFrame.append() returning incorrect dtypes with combinations of
ExtensionDtype dtypes (GH39454)
* Bug in DataFrame.append() returning incorrect dtypes when used with
combinations of datetime64 and timedelta64 dtypes (GH39574)
* Bug in DataFrame.append() with a DataFrame with a MultiIndex and appending
a Series whose Index is not a MultiIndex (GH41707)
* Bug in DataFrame.pivot_table() returning a MultiIndex for a single value
when operating on an empty DataFrame (GH13483)
* Index can now be passed to the numpy.all() function (GH40180)
* Bug in DataFrame.stack() not preserving CategoricalDtype in a MultiIndex (
GH36991)
* Bug in to_datetime() raising an error when the input sequence contained
unhashable items (GH39756)
* Bug in Series.explode() preserving the index when ignore_index was True and
values were scalars (GH40487)
* Bug in to_datetime() raising a ValueError when Series contains None and NaT
and has more than 50 elements (GH39882)
* Bug in Series.unstack() and DataFrame.unstack() with object-dtype values
containing timezone-aware datetime objects incorrectly raising TypeError (
GH41875)
* Bug in DataFrame.melt() raising InvalidIndexError when DataFrame has
duplicate columns used as value_vars (GH41951)
-------------------------------------------------------------------------------
Sparse
* Bug in DataFrame.sparse.to_coo() raising a KeyError with columns that are a
numeric Index without a 0 (GH18414)
* Bug in SparseArray.astype() with copy=False producing incorrect results
when going from integer dtype to floating dtype (GH34456)
* Bug in SparseArray.max() and SparseArray.min() would always return an empty
result (GH40921)
-------------------------------------------------------------------------------
ExtensionArray
* Bug in DataFrame.where() when other is a Series with an ExtensionDtype (
GH38729)
* Fixed bug where Series.idxmax(), Series.idxmin(), Series.argmax(), and
Series.argmin() would fail when the underlying data is an ExtensionArray (
GH32749, GH33719, GH36566)
* Fixed bug where some properties of subclasses of PandasExtensionDtype where
improperly cached (GH40329)
* Bug in DataFrame.mask() where masking a DataFrame with an ExtensionDtype
raises a ValueError (GH40941)
-------------------------------------------------------------------------------
Styler
* Bug in Styler where the subset argument in methods raised an error for some
valid MultiIndex slices (GH33562)
* Styler rendered HTML output has seen minor alterations to support w3 good
code standards (GH39626)
* Bug in Styler where rendered HTML was missing a column class identifier for
certain header cells (GH39716)
* Bug in Styler.background_gradient() where text-color was not determined
correctly (GH39888)
* Bug in Styler.set_table_styles() where multiple elements in CSS-selectors
of the table_styles argument were not correctly added (GH34061)
* Bug in Styler where copying from Jupyter dropped the top left cell and
misaligned headers (GH12147)
* Bug in Styler.where where kwargs were not passed to the applicable callable
(GH40845)
* Bug in Styler causing CSS to duplicate on multiple renders (GH39395,
GH40334)
-------------------------------------------------------------------------------
Other
* inspect.getmembers(Series) no longer raises an AbstractMethodError (GH38782
)
* Bug in Series.where() with numeric dtype and other=None not casting to nan
(GH39761)
* Bug in assert_series_equal(), assert_frame_equal(), assert_index_equal()
and assert_extension_array_equal() incorrectly raising when an attribute
has an unrecognized NA type (GH39461)
* Bug in assert_index_equal() with exact=True not raising when comparing
CategoricalIndex instances with Int64Index and RangeIndex categories (
GH41263)
* Bug in DataFrame.equals(), Series.equals(), and Index.equals() with
object-dtype containing np.datetime64("NaT") or \
np.timedelta64("NaT") (
GH39650)
* Bug in show_versions() where console JSON output was not proper JSON (
GH39701)
* pandas can now compile on z/OS when using xlc (GH35826)
* Bug in pandas.util.hash_pandas_object() not recognizing hash_key, encoding
and categorize when the input object type is a DataFrame (GH41404)
What's new in 1.2.5 (June 22, 2021)
These are the changes in pandas 1.2.5. See Release notes for a full changelog
including other versions of pandas.
-------------------------------------------------------------------------------
Fixed regressions
* Fixed regression in concat() between two DataFrame where one has an Index
that is all-None and the other is DatetimeIndex incorrectly raising (
GH40841)
* Fixed regression in DataFrame.sum() and DataFrame.prod() when min_count and
numeric_only are both given (GH41074)
* Fixed regression in read_csv() when using memory_map=True with an non-UTF8
encoding (GH40986)
* Fixed regression in DataFrame.replace() and Series.replace() when the
values to replace is a NumPy float array (GH40371)
* Fixed regression in ExcelFile() when a corrupt file is opened but not
closed (GH41778)
* Fixed regression in DataFrame.astype() with dtype=str failing to convert
NaN in categorical columns (GH41797)
|