-
Notifications
You must be signed in to change notification settings - Fork 6
Update compiler to use latest LLVM version #471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The plan is to generate a .ll file directly and compile and link that, to move away from the unmaintained Haskell interface to LLVM.
… convert from single assignment to true SSA.
… foreign lpvm instructions. Fix generation of llvm switches.
…pointers, represented as 'ptr' in LLVM. Add type representation in 'representation is' Wybe type declarations. Use CPointer as the type of manifest string constants. Generate an LLVM declaration for each string constant, and use that constant by name everywhere a manifest string constant appears in the code.
…ood LLVM code for many library modules.
Also document assumptions made by c_config.c
…t structures, needed to generate manifest constant strings.
Not handling OutByReference or TakeReference yet.
I am running an Ubuntu install right now and thought I would give this branch a spin because I am keen to see its completion So I did some tinkering... I found that basically all of the tests fail, just like with the GH worker. I have found that swapping out the Still, all of the execution tests fail. This is why...
Taking the dump of what emit logs I find this at the end:
I cant run that command in my terminal because the tmp files get removed, unforunately. But its strange. I will continue to see if I can find out whats wrong later this weeken |
Turns out there's an empty string in the list of arguments to cc. How odd! |
I fixed a few issues in #472. Hopefully merging those in would fix the build. I also have found that an issue on my nd with on of the exec tests (
When ran, this his executes In LLVM this dumps
Purely speculating on the issue, I think this is to do with how floats are passed around. I recall that floats can be passed to/from functions in special registers, and perhaps when LLVM is trying to unpack a I know in the past when we handled floats->generic types we bitcast the float into an i64, and then back on the return, but this doesnt happen anymore. I think we should align the type of the generic's return ( I can attempt a patch myself, but that may take time :) |
More digging... I wanted to remove the branch, just incase. Here is another MWE that has a similar issue...
Output:
These should both print LLVM:
Note: If I use an
Note: changing the bool to an int like this
Prints
Note: getting rid of the generics works fine! |
…472) * attempt to fix ubuntu build; normalie spaced LLVM array type syntax * clean CConfig * one more lpvm section name * normalise tmp dir in complex tests * normalise more paths * add path to complx-test call; fix type on signum * derive path in python
Okay, so it's not just my machine. The same test case fails! |
Mine, too (AARM64). But you fixed the problems with Ubuntu, which is a huge help! Thank you! I think you're right that the problem is in converting between float and generic types. I've fixed that problem in several places, but obviously not everywhere. I remember discussing type unification with you several years ago, specifically regarding type variables, and I think the problem actually lies there. We've been doing a lot of patching up of type difficulties in LLVM generation that I think should be handled explicitly in the LPVM code. I'm thinking now that the right solution is to consider "generic(T)" to be a separate type, distinct from T, that permits automatic conversion during type checking. So passing a float (or anything else but another generic) where a generic type is expected generates an explicit conversion instruction, and similarly for conversion in the other direction. But my plan for now is just to hack the LLVM generation to fix this bug and then merge the pull request, and add this as an issue to be addressed later. |
By "hack" I assume what you mean what we do at the moment:
EDIT: I'm not sure this is really a "hack". There is nothing in the LPVM type system that says this is disallowed (passing a concrete type for a generic), and it's only a requirement for LLVM. That being said, maybe we should modify the type system of LPVM to disallow this. We could introduce an I do wonder if having these casts could screw with some optimisations, namely for TCMC. Maybe we can just ignore tailing casts for TCMC, though I'm not certain this would occur often anyway. |
I've been trying to keep the LLVM generation clean and lean. By that I mean just using the known types of variables and values to determine the generated code. Unfortunately, we have cases where we generate LPVM such that a variable might have one type when assigned and a different type when used. That means I've had to track the type of a variable when it is assigned, and use that in place of the type decoration when it is used. We also have places where a value or variable of one type is passed as an argument to a proc that expects a different type as input or output. Both of these things happen in cases where one mention of the variable has AnyType as type (meaning it's generic), and another mention has a concrete type. So I don't think using AnyType as the type of a generic value is a good idea. I think having a variable or value winding up with AnyType should be considered to be an error. Instead, I think we need to use a distinct new kind of TypeSpec, something like BTW, LLVM generation does bitcasting and conversion (eg, sign extending) on every call and return. Again, I think it would be better to put this in the LPVM, so the LLVM could be pretty simple (there is some inevitable complication in the distinction between single assignment and static single assignment). |
I was hoping at LLVM generation time that the callee types would be available. If they are, then we would be able to know which arguments' types don't match the corresponding parameters'. When we marshal these mismatched arguments for a call, we can generate an extra instruction before the call to do the conversion, storing that in a tmp variable, and passing that as the argument with the expected type. For unmarshling, instead of unpacking the struct into the corresponding registers, we can unpack it to a "correctly" types register, and then do the conversion from that into what's expected. With this schema, I don't think we'd need to track what type a variable is in LLVM, besides what it is already tagged as in LPVM. |
That's exactly what I'm doing. But because the type attached to a variable can be different where it is assigned from where it is used (and because llc chucks a wobbly if that happens in the LLVM code), I need to track the assigned types and ignore the attached ones. The type checker should ensure that all uses of a variable have the same type, but then when the variable is passed to a generic proc, we need to: (1) bind the generic type variable, so we know when outputs have the same types, and (2) handle the type conversion between the known type and the generic type. We can't just bind the type variable and then think it's OK to pass that where a generic is expected. |
I'll just note that my latest push to the branch fixes both of your examples, but this code still doesn't work:
It compiles, but prints 0.00000 instead of 42.0000. |
Interesting... For my implementation of HO, all arguments need to be coerced into ints, as though they were generically typed. This is to allow for a uniform representation with generics, as the same closure proc can be passed as a generic or concretely typed parameter. I haven't checked the implementation, but this would be where I would check first. |
Yep, I think that's the problem. The LPVM HO call looks like this:
and app looks like this:
So it looks like app is being called with a float as input, but the parameter is generic, so it's expecting an i64. |
Actually, I see I've dealt with that problem. Here's the code I'm generating for the call to app:
so I am converting the float to an i64. The problem is actually the trampoline:
which is compiled in the obvious way:
Because this is called from HO code, it will actually receive an i64 as input, which it must bitcast to a double, and must finally bitcast its double output to i64 before returning it. That's not difficult, if there's some way for me to know that it's a trampoline, so its declared parameter types do not actually reflect its input and output types. Or am I missing something? |
The trampolines should have a flag in their proc def marking them as a "closure" from memory. Each of those parameters should be an i64, and then bitcast accordingly when used inside. |
OK, I've found the code that's checking for that and rewriting the trampoline. It's handled in a function What about the "free" variables (that are stored in the closure environment)? Are they stored as their intended types, or as generic (address) values needing the same transformations done? |
I did write that code, a while back now, and it does seem to overwrite the Param types with AnyType, but it never propagates that back into the Compiler's state. I can see you ported most of this into the LLVM module, but specifically not that bit. I should probably have documented why this was a requirement. I think that free Params are all smashed into generic types as well. This was probably done for simplicity's sake for a uniform representation. EDIT: all are cast into ints. See You might be able to optimise this into a more compact representation, but I am unsure because the first element of the closure is always a pointer (and hence word sized). So long as the environment marshalling and unmarshalling is identical, and the first element is the function reference, all should be fine. |
The new Not storing the modified LPVM back into the module is not a problem, because we're just going to generate LLVM from it and quit. The code that unpacks the closure assumes that each element is a single full word. It's extracting each element as its expected type, not as an AnyType, so I think that'll need to be fixed, too. Bitcasting something smaller than a word to a word to store it, and then reading it back as its intended size, may go wrong for the wrong endianness. |
Update trampoline parameters to have type AnyType and generated name, and generate code to convert from AnyType to the actual type of the parameter on entry, and similarly on exit for output parameters. Do similarly for closure arguments.
LGTM! Just one more request to delete the few .ll and DIFF files that are in the root dir. |
Unfortunately, it does this by generating a separate .ll file and explicitly compiling it with llc.