Pandas 3.0 brings unified string type and performance optimization

With version 3.0, the Python library introduces a unified string type and improves the copy-on-write method.

listen Print view
Hand clicking on the word Python

(Image: Funtap / Shutterstock.com)

2 min. read
By
  • Manuel Masiero

Almost three years after the last major release, version 3.0 of pandas, the Python data analysis library, is now available. Key changes include the dedicated string data type str, an improved copy-on-write method, and a new default resolution for date and time-like data. The latter defaults to microseconds instead of nanoseconds to avoid boundary errors for dates before 1678 or after 2262.

With the PyArrow library installed, pandas 3.0 interprets string columns automatically as the str data type instead of NumPy-object. This is intended to improve performance and more efficient assignment of Python objects. The following example shows what the new code might look like:

# Old behavior (pandas < 3.0)
>>> ser = pd.Series(["a", "b"])
>>> ser
0 a
1 b
dtype: object # <-- numpy object dtype

# New behavior (pandas 3.0)
>>> ser = pd.Series(["a", "b"])
>>> ser.dtype
>>> ser
0 a
1 b
dtype: str # <-- new string dtype

With pandas 3.0, Copy-on-Write (CoW) is now the default memory management technique. This means that every index result behaves like a copy, so changes to the result do not affect the original DataFrame.

Videos by heise

Since chained assignments no longer work, SettingWithCopyWarning is obsolete. This eliminates the need for copy() calls to suppress this warning, which also means improved performance.

# Old behavior (pandas < 3.0) - chained assignment
df["foo"][df["bar"] > 5] = # This might modify df (unpredictable)

# New behavior (pandas 3.0) - must do the modification in one step (e.g. with .loc)
df.loc[df["bar"] > 5, "foo"] = 100

With the new release, the Pandas team has removed some deprecated features. Therefore, it is recommended to first upgrade to pandas 2.3 to ensure that the code runs without error messages. Only then should you proceed with the switch to version 3.0.

pandas 3.0 can be installed via PyPI with python -m pip install --upgrade pandas==3.0.* or via conda-forge with conda install -c conda-forge pandas=3.0.

In the release notes for pandas 3.0.0, all changes can be read in detail. Because they may require code updates, the developers provide migration guides, including for the new string data type and the copy-on-write method.

(wpl)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.