Using C Libraries in Java 3: Complex Applications, Pitfalls, and Best Practices

The FFM API makes accessing C libraries convenient but also presents challenges. Helper functions and best practices make it manageable.

listen Print view
Circle with API in the center and symbols around it

(Image: SWstock / Shutterstock.com)

8 min. read
By
  • Rudolf Ziegaus
Contents

Java's Foreign Function & Memory API (FFM) is used to access code in a shared library or DLL written in a programming language like C or Rust. However, the code must meet certain prerequisites.

Rudolf Ziegaus

Rudolf Ziegaus ist Software-Entwickler, Java-Trainer und Geschäftsführer der IO Software GmbH. Seine Lieblingsthemen sind PKi, Kryptographie und systemnahe Programmierung.

This three-part series of articles uses a demo library written in C to show how a Java application calls the library's functions, what preparations are necessary, and what rules to observe.

C-Libraries in Java nutzen

After the first two parts introduced the most important terms and techniques for accessing native code from Java via the FFM API, this third and final part covers some special considerations.

Some applications should not use the entire MemorySegment, but only a part of it – for example, if they receive a list but only need parts of it. In such cases, it's convenient not to always access the complete array, but to be able to create a kind of view over the memory area – this is precisely what the asSlice method does. It extracts a portion from the segment and returns a new segment for that area.

Videos by heise

For instance, if a MemorySegment is 64 bytes long, the last 16 bytes could be retrieved as follows:

MemorySegment segment = arena.allocate(64);
MemorySegment info = segment.asSlice(48, 16);

Here, segment contains the entire memory area. The asSlice() method extracts the 16 bytes starting from position 48.

The data is not copied; instead, a view of the selected area is created. If the content of the area changes (in the example, the info area), the original memory area segment also changes.

A problem arises if the segment has an incorrect length – the reinterpret method can be used to redefine the segment's length. In some cases, the segment might be returned with a length of 0, for instance, if the native function returns a void* pointer. Accessing the segment would then trigger an ArrayIndexOutOfBoundException. Therefore, you must first set the segment to the correct length, assuming it is known.

In the following example, the native function getMemory() returns a void* pointer to a memory area. It is also known that the memory area is 100 bytes in size. You can then access the result as follows:

MemorySegment segment = (MemorySegment) method.invoke();
MemorySegment value = segment.reinterpret(100);
String result = value.getString(0);
System.out.println("Result getMemory:" + result);

It is often unclear how many bytes a data type in C occupies on a specific platform. The following code retrieves the size of a data type on the platform being used. The code identifies all supported data types and displays the required size in bytes and the alignment for the long data type:

public void printTypeInfos()
{
  Map<String, MemoryLayout> typeInfos = linker.canonicalLayouts();
  System.out.println("Canonical layout keys: " + 
                     typeInfos.keySet());
  printTypeInfo(typeInfos, "long");
}

private void printTypeInfo(Map<String, MemoryLayout> typeInfos, 
                           String type)
{
  MemoryLayout typeLayout = typeInfos.get(type);
  if (typeLayout != null)
  {
    System.out.println("C '" + type + "' layout: " + typeLayout +  
                       ", size=" + typeLayout.byteSize() + ", 
                       align=" + typeLayout.byteAlignment());
  }
  else
  {
    System.out.println("Datentyp“ + type + " nicht in“  +  
    „canonicalLayouts() enthalten");
  }
}	

If you want to use a library across multiple operating systems, start with one operating system and, after successful implementation, check if the functions used can also be easily employed on other operating systems.

For example, in my project to access the Hardware Security Module (HSM), I found that portability can be very problematic as it depends on the available drivers and shared libraries. Accessing an HSM via opensc was not an issue under Linux, while under Windows, some functions could not be used at all and caused an Access Violation in the JVM.

If there are significant differences between platforms, one approach is to abstract the functions in a common base class and then implement them differently for each platform in derived classes.

Potential issues when accessing across platforms include:

  • different sizes of data types,
  • different sizes of structures,
  • different alignment of elements within a structure.

In such cases, access to the shared library's source code is helpful. If that's not possible, certain information about the size of data types on the target platform can be obtained using the printTypeInfos() method.

During my work with the Foreign Function & Memory API, several best practices have emerged. It is advisable to start with simple libraries and functions before tackling larger libraries or more complex functions.

It is advisable to catch Throwable and convert it into custom exceptions derived from RuntimeException.

If you need the same functions from the library multiple times, cache Method Handles in a Map to avoid repeatedly retrieving the same information. Always pass structures by address (ValueLayout.ADDRESS) and map structures as their own classes to keep the code clear.

When applications need to allocate memory areas, the arena should be as high up in the hierarchy as possible, and the application should create the arena using try-with-resources to ensure automatic memory deallocation.

It is better to avoid the jextract tool. A manual implementation is easier to understand and, above all, to maintain.

If you have control over the shared library's source code, use data types like int32_t and int64_t to ensure clarity about their byte sizes.

For larger projects, it may be advisable to implement a base layer for accessing C functions and then add a Java access layer on top, which should contain no FFM-specific details. For very extensive projects, an additional convenience layer that encapsulates the most important use cases is recommended.

When searching for the causes of errors in the interaction between Java and C with the Foreign Function & Memory API, a few questions can help:

  • Is the correct library path specified?
  • Is it the correct shared library (32-bit or 64-bit)?
  • Can the shared library be loaded at all?
  • Do the function names match?
  • Do the parameters (number and data types) and the return value match?
  • Regarding data types: Do the sizes match? In particular, the long data type in C is critical, as its size varies across platforms – on Windows, applications must treat it as JAVA_INT, while on Linux, it's ValueLayout.JAVA_LONG.

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.