GSOC 2021 @ Julia

GsocLogo

Logo

Implement Escape Analysis in Julia Compiler

Xuanda Yang
University of California San Diego
th3charlie at gmail dot com
GitHub Profile
GSOC Project Page

Introduction

This is the summary page of my GSOC 2021 project: Implement Escape Analysis in Julia Compiler, mentored by Jameson Nash and Shuhei Kadowaki from the Julia language.

Motivation and Problem Description

In this project, we want to implement escape analysis inside the Julia compiler. Escape analysis is a classic compiler technique that identifies the "escape states" of variables of interest. Precisely analyzed escape states provide rich context for optimizations. Several well-known languages has deployed escape analysis into their systems to boost the performance, including V8 for Javascript and JVM for Java.

Julia has been popular for having strong dynamism while also keeping a high performance, thanks to its powerful just-in-time compiler. While the existing compiler performs well in most cases, escape analysis provides new potentials that are specific to the Julia language. To implement this analysis in the Julia compiler, there are two main objectives to complete: the first is to perform precise analysis and the second is to utilize the analyzed information to carry out optimizations.

Therefore, this blog post will be organized around these two main topics: analysis and optimization. I will discuss the design choices we made along the road as I find it is as important as the implementation, especially when designing from scratch. I will also describe how we implement the code to realize our design. For each topic, I will provide associated GitHub links so interested readers can access them directly. We will also discuss the remaining work and future directions based on this project and we will end with a short summary.

Analysis

Looking back from now, designing the escape analysis as a Julia compiler pass is the most challenging part of the whole project. The complexity comes from the rich design choices we could make:

Starting with the analysis targets, I originally thought it would be easy and straightforward to only track and analyze the heap-allocated objects, as they will become the target of the following optimization pass. However, this turned out to be impractical because the escape state of each object depends on all the intermediate objects and values until the program reaches the object. Only tracking the heap-allocated objects requires a lot of re-analysis as we fail to store the intermediate results and also makes inter-procedural analysis difficult. To solve this, we assign an escape state to every statement, and since we operate on Julia's SSA IR this will make sure every value has a state. There's no re-analysis when analyzing a new object, all objects are analyzed and ready when a function is fully analyzed.

The next question is about which kind of analysis we are looking for. Dataflow analysis is usually classified as forward or backward based on its direction, and flow-insensitive or flow-sensitive based on whether it cares about control flow. Returning from a function is one of the most common ways for an object to escape and returning usually happens near the end of a function. This first impression motivates us to design the analysis in a backward direction and this proves to be beneficial as we can propagate the state all the way to the function arguments. As for the flow sensitivity, we started with a flow-insensitive version to simplify the implementation at the cost of losing some precision. So our 1st version algorithm is an iterative, backward, flow-insensitive algorithm that stops when each object is assigned with a converged state.

After several trials and failures of prototyping, we eventually reached the point where EscapeAnalysis.jl was born:

This package significantly reduces the complexity when working inside Julia's codebase as it operates separately from the internal code. With this package, we quickly developed the basic functionalities to analyze Julia programs. This package was first developed by Shuhei and I contributed the following PRs:

To be more concrete, we provide an inter-procedural example and analyze it using the EscapeAnalysis.jl package. In function f, we first create a mutable object that will be allocated on the heap. We pass this object to another function getx. According to the function definition, we can tell although function getx escapes the argument x via returning its field member, the heap object r is never escaped from the function f and we expect our analysis to know this fact.

julia> mutable struct MyRef{T}; x::T; end

julia> @noinline getx(x::MyRef) = x.x
getx (generic function with 1 method)

julia> function f()
            r = MyRef("foo")
            x = getx(r)
            return sizeof(x)
        end
f (generic function with 1 method)

julia> using EscapeAnalysis

julia> @analyze_escapes f()
typeof(f)(_2::String ◌, _3::MyRef{String} ◌)
1 ✓ 1 ─ %1 = %new(MyRef{String}, "foo")::MyRef{String}
2 ◌ │   %2 = invoke Main.getx(%1::MyRef{String})::String
3 ◌ │   %3 = Core.sizeof::typeof(Core.sizeof)
4 ◌ │   %4 = (%3)(%2)::Int64
5 ◌ └──      return %4

By using the marco @analyze_escapes provided by the package, we get a formatted output of the code info and each statement is associated with its own escape state( and in this case). on %1 indicates that our analysis knows the heap allocated object does not escape from the function, which provides opportunity for our heap-to-stack optimization described later.

Optimization

In this project, we are interested in two kinds of optimizations that are enabled by escape analysis: heap-to-stack optimization and finalizer elision. Heap-to-stack optimization happens when a heap-allocated object is marked as no escape within its surrounding scope. This indicates that we can safely do the heap-to-stack conversion and this leads to eliminated GC operations. Finalizer elision happens when we find out an object with an associated finalizer is marked as no escape within its surrounding scope. This allows us to manually call the finalizer early, execute the code without having to wait for the GC system to do it.

For heap-to-stack optimization, we make the most of the existing infrastructure in the Julia compiler. Our analysis, after reaches a converged state for a function, will do an extra linear scan to mark every heap-allocated object using bit flags. This happens totally inside the Julia level of the compiler. The codegen part then fetches Julia object and sees if a no escape flag has been set. If so, it continues to pass this information by turning the bit flag into an LLVM metadata to ensure intermediate LLVM passes keep this information unchanged. Then, within the existing llvm-alloc-opt (LLVM allocation optimization) pass, we move the object to stack by replacing a call to Julia's GC allocation with a single LLVM alloc instruction and also modifying the following instructions to make sure the whole LLVM code is correct.

The optimization code relies on the LLVM codegen system inside the Julia compiler, so we don't separate it out as a package. Instead, we port the analysis code from EscapeAnalysis.jl to Base.Compiler to work on par with the optimization code. Eventually, the whole escape analysis will be merged into the Julia compiler, so this future porting work will happen in a similar spirit as we do in the following PR:

Following our previous code example that has been corrected analyzed using our analysis work, we continue to present how the optimization part does its job to convert the object from heap to stack. First, we show the LLVM code produced by the original llvm-alloc-opt pass. The key point of the LLVM code is %9 that calls the Julia GC allocation function to allocate the heap object.

julia> @code_llvm f()
;  @ REPL[3]:1 within `f`
; Function Attrs: sspstrong
define i64 @julia_f_121() #0 {
top:
  %0 = alloca {}*, align 8
  %gcframe7 = alloca [3 x {}*], align 16
  %gcframe7.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 0
  %1 = bitcast [3 x {}*]* %gcframe7 to i8*
  call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(24) %1, i8 0, i32 24, i1 false)
  %2 = call {}*** inttoptr (i64 140735013215221 to {}*** (i64)*)(i64 259) #7
;  @ REPL[3]:2 within `f`
; ┌ @ REPL[1]:1 within `MyRef` @ REPL[1]:1
    %3 = bitcast [3 x {}*]* %gcframe7 to i64*
    store i64 4, i64* %3, align 16
    %4 = load {}**, {}*** %2, align 8
    %5 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 1
    %6 = bitcast {}** %5 to {}***
    store {}** %4, {}*** %6, align 8
    %7 = bitcast {}*** %2 to {}***
    store {}** %gcframe7.sub, {}*** %7, align 8
    %ptls_field3 = getelementptr inbounds {}**, {}*** %2, i64 2305843009213693954
    %8 = bitcast {}*** %ptls_field3 to i8**
    %ptls_load45 = load i8*, i8** %8, align 8
    %9 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %ptls_load45, i32 1392, i32 16) #2
    %10 = bitcast {}* %9 to i64*
    %11 = getelementptr inbounds i64, i64* %10, i64 -1
    store atomic i64 4569682800, i64* %11 unordered, align 8
    %12 = bitcast {}* %9 to {}**
    store {}* inttoptr (i64 4556792584 to {}*), {}** %12, align 8
    %13 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 2
    store {}* %9, {}** %13, align 16
; └
;  @ REPL[3]:3 within `f`
  store {}* %9, {}** %0, align 8
  %14 = call nonnull {}* @j1_getx_123({}* inttoptr (i64 4553126288 to {}*), {}** nonnull %0, i32 1)
;  @ REPL[3]:4 within `f`
; ┌ @ Base.jl:179 within `sizeof`
    %15 = bitcast {}* %14 to i64*
    %16 = load i64, i64* %15, align 8
    %17 = load {}*, {}** %5, align 8
    %18 = bitcast {}*** %2 to {}**
    store {}* %17, {}** %18, align 8
; └
  ret i64 %16
}

After applying our change, the new generated LLVM code is shown below. Notice %3 is now the allocation instruction for the same object but is using a cheaper alloc instruction. This means our optimization pass successfully utilizes the analyzed facts to carry out heap-to-stack optimization.

julia> @code_llvm f()
;  @ REPL[3]:1 within `f`
; Function Attrs: sspstrong
define i64 @julia_f_121() #0 {
top:
  %0 = alloca {}*, align 8
;  @ REPL[3]:3 within `f`
  store {}* null, {}** %0, align 8
  %1 = call nonnull {}* @j1_getx_123({}* inttoptr (i64 4626346472 to {}*), {}** nonnull %0, i32 1)
;  @ REPL[3]:4 within `f`
; ┌ @ Base.jl:179 within `sizeof`
    %2 = bitcast {}* %1 to i64*
    %3 = load i64, i64* %2, align 8
; └
  ret i64 %3
}

We will leave the finalizer elision optimization target as a future work after the project ends.

Future Work

We believe this project explores the possibility of escape analysis in Julia and we are confident it will become a part of the Julia compiler in the near future. There are several TODO items to do after GSOC ends.

The first to improve the precision of the analysis. We start with a very conservative analysis by marking a lot of objects as escapes even though it may not be the case. Improving the precision of the analysis provides better-analyzed state and therefore improves the quality of the following optimizations.

The second is to implement the finalizer elision optimization. Our analysis has changed from flow-insensitive to flow-sensitive with recent improvement by Shuhei. Flow sensitivity provides an escape state for each object at each program point. As long as we spot qualified objects with associated finalizers, we can carry out the optimization. Unlike the heap-to-stack optimization, this optimization wouldn't require modifying LLVM codegen.

The third one is to merge our prototype into the Julia main working branch and to become a de-facto part of the compiler. This requires a lot of engineering efforts to verify our analysis and optimization are sound and proven to be effective.

Summary

In this GSOC project, we design and implement a prototype of escape analysis and its associated optimization in the Julia compiler. The results and feedback from the community members so far have been promising. We believe it will bring benefits to the Julia compiler and therefore we will continue to work on it to achieve this goal.

Acknowledgements

I would like to thank every individual and organization that makes my second GSOC a fantastic experience. I would first like to thank Google and the Julia programming language for providing me this precious opportunity.

It's hard to find words to describe my gratitude towards my two mentors Jameson Nash and Shuhei Kadowaki. I feel blessed to have them as my mentors.

Jameson is the senior guy in our three-man team and provides valuable ideas when we were discussing the design choices. He knows every detail of the Julia compiler and provides super detailed guidance when I was working on the optimization pass. I still remember he taught me how to use LLDB to fix a segfault virtually hand-by-hand via a Zoom call.

Shuhei is around my age while he is much smarter than me and he writes high-quality code so fast. Without his help, I think I will be trapped in the swamp of designing the prototype in the first half of the project. He is also super patient to have a lot of one-on-one instruction video calls to help me familiarize the tools and the code.

I want to also thank non-official mentors Julian Samaroo and Valentin Churavy for joining our meetings and providing useful advice. And I want to thank my parents for supporting me in working on this project and tolerating me making noise with my keyboard at night.

Finally, I'd like to thank Miku Kanemura for being my lighthouse through the difficult times, it's good to have you around.