This post includes all the weekly reports on my GSOC 2020 project with mypy, starting from the day coding starts(in my case, May 22nd after discussion with my mentors). Each weekly report will record significant progress, findings, and problems during the week.
Some useful links:
If you have any questions, please feel free to reach TH3CHARLie.
The following posts will be a stack, it always starts with the newest update.
This week we firstly introduced GetElementPtr
, an op to compute. the address of a field of an aggregate type. With this added, we then transformed builtins.len
for list primitive into the low-level style, which was originally represented as a custom inline function. We also defined both PyObjet
and PyVarObject
to provide type information for struct-access ops. The successful merge of this op shows our new design has the expressiveness to handle ops related with structs, pointers and etc.
Besides this, we also started to clean up the registry file-by-file, eliminating the remaining old-style ops.
As a summary, this week we have the following PRs:
This week we place our focus on implementing groundwork to represent C structures in mypyc, which eventually helps us to transform the primitive ops that rely heavily on C macros to the new low-level style.
We first introduced LoadMem
IR to read value from a given memory address, which we’ll need to read a struct attribute once we compute its address. We then introduced RStruct
type to represent known CPython structures. To ensure the name uniqueness of each structure, we use a StructInfo
to record the actual information and all StructInfo
would be built and stored in registry. Having a separate StructInfo
gives us the potential to change the current RTuple
design so struct-like types will have a unified design. To proper compute the offsets and size after alignment, we introduce a set of utility functions along with the RStruct
PR.
As a summary, this week we have the following PRs:
This week we introduced const integer optimization to handle the verbosity problem introduced by low-level IR. With this optimization, we can inline all LoadInt
ops in both pretty IR printing and generated C.
We also added missing signed type casts to generated C and completely removed the C wrappers and merged all int logical ops into the new style.
As a summary, this week we have the following PRs:
This week we made several integer-op related improvement and bugfixes. We fixed the performance regression by generating inline compare statement when both operands are short ints. We also supported swapping operands, negating result and checking both operands when not equality check, to support all logical ops.
For tagged ints, we stored doubled value in IR, instead of doubling them during codegen, this could potentially simplify other backends’ implementations.
As a summary, this week we have the following PRs:
Originally mypyc uses C wrapper functions to handle most integer operations so that the IR only need to generate simple calls to these wrapper. Although the IR side can be implemented very easily, the generalization can be hard to do for other backends. This week, we started to put our last week’s discussion into practice, we add new low-level integer operations and use transforms and irbuild to build blocks of IRs representing the old C wrappers. With these modifications, mypyc now generates much more low-level and verbose IR for integer operations while keep the generated C code largely unchanged.
We also merged a lot of other ops.
At the end of this week, mypyc-benchmarks is online so we can monitor performance changes. There are two microbenchmark regressions happen on one of recent commit.
As a summary, this week we have the following PRs:
Recent weeks’ progress enables us to represent most mypyc’s existing primitives in new low-level style. This week we focus on two sets of remaining ops: in
ops and exceptions-related ops. in
ops have different calling order in C(and potentially other backends) compared to python syntax, therefore we supported argument reordering via optional argument order in CallC
. For exception-related ops, they always fail and the old way uses inline function to generate a false boolean follows the call, which is both hard-to-generalize and inelegant. We introduced ERR_ALWAYS
error kind and related transform to generate IRs representing this semantics.
This week we also started to discuss the inline integer operations design at Integer binary ops design discussion.
I took several days off this week until next Tuesday to handle some administrative works of my graduation.
As a summary, this week we have the following PRs:
This week we focused on dealing with call_negative_bool_emit
and negative_int_emit
. These are two customized emit callback functions designed to handle the difference between C function return values and expected python values. To address this, we introduced a new error_kind
variant and its corresponding branch variant, adding a new Branch
op to handle the negative int comparison via the exception transform. These changes helped us to solve the call_negative_bool_emit
case. To handle the remaining one, we added a new Truncate
op to represent the cast from C int return value to bool explicitly in IR level.
After these two changes, we’d able to represent most of existing primitive ops, the remaining ones would be simple_emit
and name_ref
.
As a summary, this week we have the following PRs:
This week started with refactoring #8973, we decided to take one step at a time so we only supported literal names and replace related LoadStatic
usages to LoadGlobal
. Based on that, we’d further implement LoadAddress
to handle loading address specifically. Two LoadGlobal
and LoadAddress
will both support the registry-based style that is currently used by name_ref_ops
.
Another topic of this week was to merge more primitive ops to CallC
, since it’s design has become more and more mature. There are some remaining ops that are tricky and need special handling like adding new error kinds and corresponding error transforms.
On Thursday, Jukka, Sullivan and I had our first monthly meeting. We discussed about name generation in mypyc, and a plan to implement low-level integer/pointer arithmetic to help represent some marcos and inline functions.
As a summary, this week we have the following issues and PRs:
This week we focused on representing name_ref
with a new IR element since it’s semantically different from what CallC
represents. The name_ref_op
s load global names, therefore we purposed #8973 as the first attempt. Soon, we realized that LoadStatic
and LoadGlobal
have similar purposes and should be merged eventually. Jukka also mentioned about moving mypyc’s name generation/mangling logic from codegen to irbuild. Name generation is a little bit messy now since it happens in plenty of processes.
After this refactoring, we should have a LoadGlobal
IR that load a name without knowing all the name-related stuffs. The name generation logic for literal values is simple, while the one when groups and modules are involved is much more complicated. So it will be the focus of next week’s first monthly meeting.
As a summary, this week we have the following issues and PRs:
Issues:
PRs:
This week we aim to introduce more types of ops via CallC
. The remaining op variants are binary/unary/custom/name_ref ops. Supporting name_ref ops requires adding new IR so we postpone it to have more discussions. Binary and unary ops are easy and are supported via #8929 and #8933. The challenging part of this week is to support custom ops which are one-shot ops and are used very differently. We picked new_dict_op
to demonstrate our implementation on custom ops.
new_dict_op
has three problems to handle. Firstly, it uses a custom emit callback which exactly is we are replacing in this project, so the solution is to split the functionality of this emit callback and make it into two more concise CallC
s and change the irbuild process accordingly. Secondly, the new dict_build_op
calls a C function with variable arguments, so we support it in CallC
. Finally, the dict_build_op
’s C function also needs a integer argument. Mypyc has tagged integers and short integers but is lack of a low-level integer type, so we add a new RType
to represent this.
As a summary, this week we have the following issues and PRs:
In the coming week, we will focus on supporting name_ref ops.
In the first two days of the week, we reviewed and merged python/mypy#8880 into the master branch. As a summary, #8880 brings:
CallC
IR element, abstracting high-level code that eventually maps to C function calls. Compared to the very first implementation, the reviewed version added support of error_kind
and void types(via RType
)str.join
from PrimitiveOp
to CallC
, along with IR dump test.On Tuesday, Jukka and I had our first weekly video meeting via Zoom. We discussed naming issues in the review. He clarified how steals and is_borrowed work(related to reference counting) and the difference between self.emitter.emit and self.emit(the latter one is just wrapper). We also talked about implementing ops with different features(name reference op, top-level function calls, boolean op, etc) to refine the design.
As a result of our discussion, I implemented python/mypy#8902 to support top-level function call. Different from the previous PR which only considers method call, this introduces a new op registry and modifies the IR building process accordingly. Besides the main idea, the PR has some other significant spots:
steals
CallC
is always of ERR_MAGIC
The PR is currently under review but it should be merged soon.
Finally, as Jukka suggested. starting from #8902, I will open separate, fine-grained issues on mypyc’s tracker to better track our progress.
After discussions with both my mentors, we agree that I am familiar with the community now so we start the coding period a little bit early than the official date(June 1st) to buffer any unexpected delays.
On May 22nd I had the first daily sync with Jukka via Gitter. We discussed some implementation details about a new CFunctionCall
to replace PrimitiveOp
that simply calls a c function. To prevent unexpected change to codebase replies on mypyc, we changed our prototype op from frequently used list.__len__
to str.join
.
During weekends, I finished the PR demonstrated the CFunctionCall
idea: python/mypy#8880. The general idea is to have a CFunctionCall
IR op to hold a C function name string and related argument values, which can be easily generated to corresponding C code. Therefore replaces codegen callback style used in PrimitiveOp
. We still need some kind of low-level description LLOpDescription
so we can match the AST during irbuild.
In the following week, I expect we’d work mainly around this PR, responding to code reviews and discuss the following questions:
str.join
dumps textual output exactly like r2 = foo.join(r1)
, instead of r2 = PyUnicode_Join(foo, r1)
. I assume overloading to_str
method is the way to go but clearly something is missing here.CFunctionCall
accordingly.