Skip to content

Conversation

pschachte
Copy link
Owner

Unfortunately, it does this by generating a separate .ll file and explicitly compiling it with llc.

pschachte added 30 commits May 17, 2024 16:28
The plan is to generate a .ll file directly and compile and link that,
to move away from the unmaintained Haskell interface to LLVM.
… foreign lpvm instructions. Fix generation of llvm switches.
…pointers, represented as 'ptr' in LLVM. Add type representation in 'representation is' Wybe type declarations. Use CPointer as the type of manifest string constants. Generate an LLVM declaration for each string constant, and use that constant by name everywhere a manifest string constant appears in the code.
Also document assumptions made by c_config.c
…t structures, needed to generate manifest constant strings.
Not handling OutByReference or TakeReference yet.
@jimbxb
Copy link
Collaborator

jimbxb commented Oct 11, 2024

I am running an Ubuntu install right now and thought I would give this branch a spin because I am keen to see its completion

So I did some tinkering... I found that basically all of the tests fail, just like with the GH worker.

I have found that swapping out the __LPVM.__lpvm with the MacOS __LLPM,__lpvm section name seem to get the final dumps working!

Still, all of the execution tests fail. This is why...

❯ ./wybemk -Lwybelibs test-cases/execution/string-head           
cc: error: : No such file or directory

Error detected during building outputs
cc: error: : No such file or directory

Taking the dump of what emit logs I find this at the end:

Emit: Generating final executable with command line: cc -no-pie -Wl,--gc-sections /tmp/wybe-ed4d53a612eeda26/tmpMain.o  /home/james/docs/wybe/test-cases/execution/string-head.o /home/james/docs/wybe/wybelibs/command_line.o /home/james/docs/wybe/wybelibs/wybe/_.o /home/james/docs/wybe/wybelibs/wybe/array.o /home/james/docs/wybe/wybelibs/wybe/bool.o /home/james/docs/wybe/wybelibs/wybe/c_string.o /home/james/docs/wybe/wybelibs/wybe/char.o /home/james/docs/wybe/wybelibs/wybe/comparison.o /home/james/docs/wybe/wybelibs/wybe/control.o /home/james/docs/wybe/wybelibs/wybe/count.o /home/james/docs/wybe/wybelibs/wybe/float.o /home/james/docs/wybe/wybelibs/wybe/int.o /home/james/docs/wybe/wybelibs/wybe/io.o /home/james/docs/wybe/wybelibs/wybe/list.o /home/james/docs/wybe/wybelibs/wybe/machine_word.o /home/james/docs/wybe/wybelibs/wybe/memory_management.o /home/james/docs/wybe/wybelibs/wybe/opaque_pointer.o /home/james/docs/wybe/wybelibs/wybe/phantom.o /home/james/docs/wybe/wybelibs/wybe/predicate.o /home/james/docs/wybe/wybelibs/wybe/range.o /home/james/docs/wybe/wybelibs/wybe/string.o /home/james/docs/wybe/wybelibs/wybe/cbits.o -lgc -lm -o /home/james/docs/wybe/test-cases/execution/string-head

I cant run that command in my terminal because the tmp files get removed, unforunately. But its strange. I will continue to see if I can find out whats wrong later this weeken

@jimbxb
Copy link
Collaborator

jimbxb commented Oct 12, 2024

Turns out there's an empty string in the list of arguments to cc. How odd!

@jimbxb
Copy link
Collaborator

jimbxb commented Oct 12, 2024

I fixed a few issues in #472. Hopefully merging those in would fix the build.

I also have found that an issue on my nd with on of the exec tests (type_generics.wybe). I have produced an MWE a follows that shows the issue.

def {noinline, test} mv(x:T):T = x

if { mv(1.0, _) :: 
    pass 
  | else :: !shouldnt 
}

When ran, this his executes !shouldnt!

In LLVM this dumps

define external fastcc void @"type_generics.<0>"() {
  %"tmp#4##0" = tail call fastcc {double, i1} @"type_generics.mv<0>"(double 1.0)
  %"tmp#0##0" = extractvalue {double, i1}%"tmp#4##0", 0
  %"tmp#1##0" = extractvalue {double, i1}%"tmp#4##0", 1
  br i1 %"tmp#1##0", label %if.then.0, label %if.else.0
if.then.0:
  ret void
if.else.0:
  %"tmp#3##0" = tail call fastcc i64 @"wybe.string.c_string<0>"(i64 ptrtoint( ptr @"string#2" to i64 ))
  call ccc void @error_exit(i64 ptrtoint( ptr @"cstring#1" to i64 ), i64 %"tmp#3##0")
  ret void
}

define external fastcc {i64, i1} @"type_generics.mv<0>"(i64 %"x##0") {
  %"tmp#0##0" = insertvalue {i64, i1} undef, i64 %"x##0", 0
  %"tmp#1##0" = insertvalue {i64, i1} %"tmp#0##0", i1 1, 1
  ret {i64, i1} %"tmp#1##0"
}

Purely speculating on the issue, I think this is to do with how floats are passed around. I recall that floats can be passed to/from functions in special registers, and perhaps when LLVM is trying to unpack a {double, i1}, it assumes that the double comes from one of those float registers. This was an issue when I was implementing HO, so I'm almost surprised that this hasn't arised there. Perhaps it was fixed, but I'm not sure.

I know in the past when we handled floats->generic types we bitcast the float into an i64, and then back on the return, but this doesnt happen anymore. I think we should align the type of the generic's return ({i64, i1}) to align with that of what is unpacked (we erroneously have {double,i1}). I think also the input should be bitcast into an i64 also.

I can attempt a patch myself, but that may take time :)

@jimbxb
Copy link
Collaborator

jimbxb commented Oct 12, 2024

More digging... I wanted to remove the branch, just incase. Here is another MWE that has a similar issue...

def {noinline} mv1(x0:T, ?x1:T, ?b:bool) {
    ?x1 = x0
    ?b = true
}


def {noinline} mv2(x0:T, ?b:bool, ?x1:T) {
    ?x1 = x0
    ?b = true
}

mv1(1.0, _, ?b)
!println(b)
mv2(1.0, ?b, _)
!println(b)

Output:

false
true

These should both print true, but the first one has issues...

LLVM:

define external fastcc void @"type_generics.<0>"() {
  %"tmp#8##0" = tail call fastcc {double, i1} @"type_generics.mv1<0>"(double 1.0)
  %"tmp#0##0" = extractvalue {double, i1}%"tmp#8##0", 0
  %"b##0" = extractvalue {double, i1}%"tmp#8##0", 1
  tail call fastcc void @"wybe.bool.print<0>"(i1 %"b##0")
  call ccc void @putchar(i8 10)
  %"tmp#9##0" = tail call fastcc {i1, double} @"type_generics.mv2<0>"(double 1.0)
  %"b##1" = extractvalue {i1, double}%"tmp#9##0", 0
  %"tmp#1##0" = extractvalue {i1, double}%"tmp#9##0", 1
  tail call fastcc void @"wybe.bool.print<0>"(i1 %"b##1")
  call ccc void @putchar(i8 10)
  ret void
}

define external fastcc {i64, i1} @"type_generics.mv1<0>"(i64 %"x0##0") {
  %"tmp#1##0" = insertvalue {i64, i1} undef, i64 %"x0##0", 0
  %"tmp#2##0" = insertvalue {i64, i1} %"tmp#1##0", i1 1, 1
  ret {i64, i1} %"tmp#2##0"
}

define external fastcc {i1, i64} @"type_generics.mv2<0>"(i64 %"x0##0") {
  %"tmp#1##0" = insertvalue {i1, i64} undef, i1 1, 0
  %"tmp#2##0" = insertvalue {i1, i64} %"tmp#1##0", i64 %"x0##0", 1
  ret {i1, i64} %"tmp#2##0"
}

Note: If I use an int, this all works fine and prints true twice. eg.

mv1(1, _, ?b)
!println(b)
mv1(1, _, ?b)
!println(b)

Note: changing the bool to an int like this

def {noinline} mv1(x0:T, ?x1:T, ?b:int) {
    ?x1 = x0
    ?b = 1
}

mv1(1, _, ?b)
!println(b)
mv1(1.0, _, ?b)
!println(b)

Prints

1
140575069984736

Note: getting rid of the generics works fine!

…472)

* attempt to fix ubuntu build; normalie spaced LLVM array type syntax

* clean CConfig

* one more lpvm section name

* normalise tmp dir in complex tests

* normalise more paths

* add path to complx-test call; fix type on signum

* derive path in python
@jimbxb
Copy link
Collaborator

jimbxb commented Oct 15, 2024

Okay, so it's not just my machine. The same test case fails!

@pschachte
Copy link
Owner Author

Mine, too (AARM64). But you fixed the problems with Ubuntu, which is a huge help! Thank you!

I think you're right that the problem is in converting between float and generic types. I've fixed that problem in several places, but obviously not everywhere.

I remember discussing type unification with you several years ago, specifically regarding type variables, and I think the problem actually lies there. We've been doing a lot of patching up of type difficulties in LLVM generation that I think should be handled explicitly in the LPVM code. I'm thinking now that the right solution is to consider "generic(T)" to be a separate type, distinct from T, that permits automatic conversion during type checking. So passing a float (or anything else but another generic) where a generic type is expected generates an explicit conversion instruction, and similarly for conversion in the other direction. But my plan for now is just to hack the LLVM generation to fix this bug and then merge the pull request, and add this as an issue to be addressed later.

@jimbxb
Copy link
Collaborator

jimbxb commented Oct 15, 2024

By "hack" I assume what you mean what we do at the moment:

  • When we generate a call's argument, if the Wybe types aren't the same we should be casting the argument to the expected (callee) type.
  • When we receive a returned value, we should cast it if the types don't align as well. This should be the case for all of the unpacking as well -- if the member's type differs, cast the unpacked value.

EDIT:

I'm not sure this is really a "hack". There is nothing in the LPVM type system that says this is disallowed (passing a concrete type for a generic), and it's only a requirement for LLVM.

That being said, maybe we should modify the type system of LPVM to disallow this.


We could introduce an lpvm cast instr for all of these cases. I hope the LLVM generation would be able to handle the (un)marshaling seamlessly with those in place!

I do wonder if having these casts could screw with some optimisations, namely for TCMC. Maybe we can just ignore tailing casts for TCMC, though I'm not certain this would occur often anyway.

@pschachte
Copy link
Owner Author

I've been trying to keep the LLVM generation clean and lean. By that I mean just using the known types of variables and values to determine the generated code. Unfortunately, we have cases where we generate LPVM such that a variable might have one type when assigned and a different type when used. That means I've had to track the type of a variable when it is assigned, and use that in place of the type decoration when it is used. We also have places where a value or variable of one type is passed as an argument to a proc that expects a different type as input or output. Both of these things happen in cases where one mention of the variable has AnyType as type (meaning it's generic), and another mention has a concrete type. So I don't think using AnyType as the type of a generic value is a good idea. I think having a variable or value winding up with AnyType should be considered to be an error. Instead, I think we need to use a distinct new kind of TypeSpec, something like Generic TypeSpec, for generic values. Then something like append would have arguments of type Generic AnyType, which could be unified with, say, Generic Float to unify the type variable with Float. At the LPVM level, there would be an explicit conversion between Generic Float and Float, meaning there would be two different LPVM variables with different types. That should allow us to type variables consistently, and to ensure that argument types always match the corresponding parameter types, which should allow LLVM generation to be a lot simpler.

BTW, LLVM generation does bitcasting and conversion (eg, sign extending) on every call and return. Again, I think it would be better to put this in the LPVM, so the LLVM could be pretty simple (there is some inevitable complication in the distinction between single assignment and static single assignment).

@jimbxb
Copy link
Collaborator

jimbxb commented Oct 15, 2024

I was hoping at LLVM generation time that the callee types would be available. If they are, then we would be able to know which arguments' types don't match the corresponding parameters'.

When we marshal these mismatched arguments for a call, we can generate an extra instruction before the call to do the conversion, storing that in a tmp variable, and passing that as the argument with the expected type.

For unmarshling, instead of unpacking the struct into the corresponding registers, we can unpack it to a "correctly" types register, and then do the conversion from that into what's expected.

With this schema, I don't think we'd need to track what type a variable is in LLVM, besides what it is already tagged as in LPVM.

@pschachte
Copy link
Owner Author

That's exactly what I'm doing. But because the type attached to a variable can be different where it is assigned from where it is used (and because llc chucks a wobbly if that happens in the LLVM code), I need to track the assigned types and ignore the attached ones. The type checker should ensure that all uses of a variable have the same type, but then when the variable is passed to a generic proc, we need to: (1) bind the generic type variable, so we know when outputs have the same types, and (2) handle the type conversion between the known type and the generic type. We can't just bind the type variable and then think it's OK to pass that where a generic is expected.

@pschachte
Copy link
Owner Author

I'll just note that my latest push to the branch fixes both of your examples, but this code still doesn't work:

def {noinline} app(f:(I, ?J), i:I):J = f(i)

!println(app({@=?@}, 42.0))

It compiles, but prints 0.00000 instead of 42.0000.

@jimbxb
Copy link
Collaborator

jimbxb commented Oct 17, 2024

Interesting...

For my implementation of HO, all arguments need to be coerced into ints, as though they were generically typed. This is to allow for a uniform representation with generics, as the same closure proc can be passed as a generic or concretely typed parameter.

I haven't checked the implementation, but this would be where I would check first.

@pschachte
Copy link
Owner Author

Yep, I think that's the problem. The LPVM HO call looks like this:

    foo.app<0>(foo.#anon#1<1><>:(wybe.float, ?wybe.float), 42.0:wybe.float, ?tmp#0##0:wybe.float) #0 @foo:3:10

and app looks like this:

0: foo.app<0>
app(f##0:(I, ?J) <{}; {}; {0}>, i##0:I <{}; {}; {1}>, ?#result##0:J <{}; {}; {0, 1}>)<{}; {}; {}>:
  AliasPairs: []
  InterestingCallProperties: []
    ~f##0:(I, ?J)(~i##0:I, ?#result##0:J) #0 @foo:1:40

So it looks like app is being called with a float as input, but the parameter is generic, so it's expecting an i64.

@pschachte
Copy link
Owner Author

Actually, I see I've dealt with that problem. Here's the code I'm generating for the call to app:

  %"tmp#5##0" = tail call fastcc i64 @"foo.app<0>"(i64 ptrtoint( ptr @"closure#0" to i64 ), i64 bitcast( double 42.0 to i64 ))

so I am converting the float to an i64. The problem is actually the trampoline:

proc #anon#1 > {inline} (1 calls)
1: foo.#anon#1<1>
#anon#1(anon#1#1##0:wybe.float, ?anon#1#2##0:wybe.float)<{}; {}; {}>:
  AliasPairs: []
  InterestingCallProperties: []
    foreign llvm move(~anon#1#1##0:wybe.float, ?anon#1#2##0:wybe.float) @foo:3:15

which is compiled in the obvious way:

define external fastcc double @"foo.#anon#1<1>"(i64 %"#env##0", double %"anon#1#1##0") {
  ret double %"anon#1#1##0"
}

Because this is called from HO code, it will actually receive an i64 as input, which it must bitcast to a double, and must finally bitcast its double output to i64 before returning it. That's not difficult, if there's some way for me to know that it's a trampoline, so its declared parameter types do not actually reflect its input and output types. Or am I missing something?

@jimbxb
Copy link
Collaborator

jimbxb commented Oct 18, 2024

The trampolines should have a flag in their proc def marking them as a "closure" from memory. Each of those parameters should be an i64, and then bitcast accordingly when used inside.

@pschachte
Copy link
Owner Author

OK, I've found the code that's checking for that and rewriting the trampoline. It's handled in a function closeClosure that I adapted from Blocks.hs; I assume you wrote that? It doesn't actually alter the parameter types, but it's possible I removed code to do that at some point. I'll put in code to do those transformations on entry and exit.

What about the "free" variables (that are stored in the closure environment)? Are they stored as their intended types, or as generic (address) values needing the same transformations done?

@jimbxb
Copy link
Collaborator

jimbxb commented Oct 18, 2024

I did write that code, a while back now, and it does seem to overwrite the Param types with AnyType, but it never propagates that back into the Compiler's state.

I can see you ported most of this into the LLVM module, but specifically not that bit. I should probably have documented why this was a requirement.


I think that free Params are all smashed into generic types as well. This was probably done for simplicity's sake for a uniform representation.

EDIT: all are cast into ints. See cgenArg' ArgClosure{} in Blocks.hs

You might be able to optimise this into a more compact representation, but I am unsure because the first element of the closure is always a pointer (and hence word sized).

So long as the environment marshalling and unmarshalling is identical, and the first element is the function reference, all should be fine.

@pschachte
Copy link
Owner Author

The new LLVM.hs module is a rewrite of Blocks.hs and Codegen.hs to directly generate .ll files. I pinched your closeClosure function from Blocks, but I guess I ripped out the part that changed param types to AnyType because it was causing problems. Wrong fix for the problems in LLVM generation, I guess.

Not storing the modified LPVM back into the module is not a problem, because we're just going to generate LLVM from it and quit.

The code that unpacks the closure assumes that each element is a single full word. It's extracting each element as its expected type, not as an AnyType, so I think that'll need to be fixed, too. Bitcasting something smaller than a word to a word to store it, and then reading it back as its intended size, may go wrong for the wrong endianness.

Update trampoline parameters to have type AnyType
and generated name, and generate code to convert
from AnyType to the actual type of the parameter
on entry, and similarly on exit for output
parameters.  Do similarly for closure arguments.
@jimbxb
Copy link
Collaborator

jimbxb commented Oct 18, 2024

LGTM! Just one more request to delete the few .ll and DIFF files that are in the root dir.

@pschachte pschachte merged commit 17687d8 into master Oct 18, 2024
1 check passed
@pschachte pschachte deleted the llvm-update branch October 18, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants