Java Bytecode Normalization for Code Similarity Analysis (ECOOP 2024 - Technical Papers)

Who

Stefan Schott, Serena Elisa Ponta, Wolfram Fischer, Jonas Klauke, Eric Bodden

Track

ECOOP 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 18 Sep 2024 11:00 - 11:15 at EI 2 Pichelmayer - Synthesis and verification Chair(s): Peter Thiemann

Abstract

Analyzing the similarity of two code fragments has many applications, including code clone, vulnerability and plagiarism detection. Most existing approaches for similarity analysis work on source code. However, in scenarios like plagiarism detection, copyright violation detection or Software Bill of Materials creation source code is often not available and thus similarity analysis has to be performed on binary formats. Java bytecode is a binary format executable by the Java Virtual Machine and obtained from the compilation of Java source code. Performing similarity detection on bytecode is challenging because different compilers can compile the same source code to syntactically vastly different bytecode. In this work we assess to what extent one can nonetheless enable similarity detection by bytecode normalization, a procedure to transform Java bytecode into a representation that is identical for the same original source code, irrespective of the Java compiler and Java version used during compilation. Our manual study revealed 16 classes of compilation differences that various compilation environments may induce. Based on these findings, we implemented bytecode normalization in a tool jNorm. It uses Jimple as intermediate representation, applies common code optimizations and transforms all classes of compilation difference to a normalized form, thus achieving a representation of the bytecode that is identical despite different compilation environments. Our evaluation, performed on more than 300 popular Java projects, shows that solely the act of incrementing a compiler version may cause differences in 46% of all resulting bytecode files. By applying bytecode normalization, one can remove more than 99% of these differences, thus acting as a crucial enabler for subsequent applications of bytecode similarity analysis.

Stefan Schott

Heinz Nixdorf Institut, Paderborn University

Germany

Serena Elisa Ponta

SAP Security Research

Wolfram Fischer

SAP Security Research

Jonas Klauke

Heinz Nixdorf Institut, Paderborn University

Germany

Eric Bodden

Heinz Nixdorf Institut, Paderborn University and Fraunhofer IEM

Germany

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 18 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:00	Synthesis and verificationTechnical Papers at EI 2 Pichelmayer Chair(s): Peter Thiemann University of Freiburg, Germany

10:30 15m Talk		Inductive Predicate Synthesis Modulo Programs Technical Papers Scott Wesley Dalhousie University, Maria Christakis TU Wien, Jorge A. Navas Certora, Richard Trefler University of Waterloo, Valentin Wüstholz ConsenSys, Arie Gurfinkel University of Waterloo
10:45 15m Talk		Fearless Asynchronous Communications with Timed Multiparty Session Protocols Technical Papers Ping Hou University of Oxford, Nicolas Lagaillardie Imperial College London, Nobuko Yoshida University of Oxford
11:00 15m Talk		Java Bytecode Normalization for Code Similarity Analysis Technical Papers Stefan Schott Heinz Nixdorf Institut, Paderborn University, Serena Elisa Ponta SAP Security Research, Wolfram Fischer SAP Security Research, Jonas Klauke Heinz Nixdorf Institut, Paderborn University, Eric Bodden Heinz Nixdorf Institut, Paderborn University and Fraunhofer IEM
11:30 15m Talk		Higher-Order Specifications for Deductive Synthesis of Programs with Pointers Technical Papers David Young University of Kansas, USA, Ziyi Yang National University of Singapore, Ilya Sergey National University of Singapore, Alex Potanin Australian National University
11:45 15m Talk		Matching Plans for Frame Inference in Compositional Reasoning Technical Papers Andreas Lööw Imperial College London, Daniele Nantes-Sobrinho Imperial College London, Sacha-Élie Ayoun Imperial College London, Petar Maksimović Imperial College London, UK, Philippa Gardner Imperial College London

Information for Participants

Wed 18 Sep 2024 10:30 - 12:00 at EI 2 Pichelmayer - Synthesis and verification Chair(s): Peter Thiemann

Info for room EI 2 Pichelmayer:

Map: https://tuw-maps.tuwien.ac.at/?q=CF0235

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/15717