Pandas 3.0 brings unified string type and performance optimization
With version 3.0, the Python library introduces a unified string type and improves the copy-on-write method.
(Image: Funtap / Shutterstock.com)
- Manuel Masiero
Almost three years after the last major release, version 3.0 of pandas, the Python data analysis library, is now available. Key changes include the dedicated string data type str, an improved copy-on-write method, and a new default resolution for date and time-like data. The latter defaults to microseconds instead of nanoseconds to avoid boundary errors for dates before 1678 or after 2262.
More Efficient Data Analysis
With the PyArrow library installed, pandas 3.0 interprets string columns automatically as the str data type instead of NumPy-object. This is intended to improve performance and more efficient assignment of Python objects. The following example shows what the new code might look like:
# Old behavior (pandas < 3.0)
>>> ser = pd.Series(["a", "b"])
>>> ser
0 a
1 b
dtype: object # <-- numpy object dtype
# New behavior (pandas 3.0)
>>> ser = pd.Series(["a", "b"])
>>> ser.dtype
>>> ser
0 a
1 b
dtype: str # <-- new string dtype
With pandas 3.0, Copy-on-Write (CoW) is now the default memory management technique. This means that every index result behaves like a copy, so changes to the result do not affect the original DataFrame.
Videos by heise
Since chained assignments no longer work, SettingWithCopyWarning is obsolete. This eliminates the need for copy() calls to suppress this warning, which also means improved performance.
# Old behavior (pandas < 3.0) - chained assignment
df["foo"][df["bar"] > 5] = # This might modify df (unpredictable)
# New behavior (pandas 3.0) - must do the modification in one step (e.g. with .loc)
df.loc[df["bar"] > 5, "foo"] = 100
Phased Upgrade Recommended
With the new release, the Pandas team has removed some deprecated features. Therefore, it is recommended to first upgrade to pandas 2.3 to ensure that the code runs without error messages. Only then should you proceed with the switch to version 3.0.
pandas 3.0 can be installed via PyPI with python -m pip install --upgrade pandas==3.0.* or via conda-forge with conda install -c conda-forge pandas=3.0.
In the release notes for pandas 3.0.0, all changes can be read in detail. Because they may require code updates, the developers provide migration guides, including for the new string data type and the copy-on-write method.
(wpl)