Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit: Threads: Add simple RMW operations #280

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

matetokodi
Copy link
Contributor

Add simple RMW operations: Add, Sub, And, Or, Xor, Xchg

Does not include RmwCmpxchg

src/jit/MemoryInl.h Show resolved Hide resolved
sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R1, 0, kContextReg, 0, SLJIT_IMM, OffsetOfContextField(tmp1));
}

sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, GET_SOURCE_REG(addr.memArg.arg, instr->requiredReg(0)), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addr.memArg.arg must be a register, the second argument is not needed. Furthermore, if addr.memArg.arg is SLJIT_R1, the previous code overwrites it.

case ByteCode::I64AtomicRmw16XchgUOpcode: {
operationSize = SLJIT_MOV_U16;
size = 2;
options |= MemAddress::CheckNaturalAlignment;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would set MemAddress::CheckNaturalAlignment as the default and set it to 0 in the 8 bit case.

#endif /* SLJIT_32BIT_ARCHITECTURE */

struct sljit_label* store_failure = sljit_emit_label(compiler);
sljit_emit_atomic_load(compiler, operationSize, tmpReg, GET_SOURCE_REG(addr.memArg.arg, instr->requiredReg(0)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this will not work on RISCV with 8/16 bit case, but we can work on that later.

}
#endif /* SLJIT_32BIT_ARCHITECTURE */

struct sljit_label* store_failure = sljit_emit_label(compiler);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this a restartOnFailure. Please don't use underscore in the names.


struct sljit_label* store_failure = sljit_emit_label(compiler);
sljit_emit_atomic_load(compiler, operationSize, tmpReg, GET_SOURCE_REG(addr.memArg.arg, instr->requiredReg(0)));
sljit_emit_op1(compiler, SLJIT_MOV, dst.arg, dst.argw, tmpReg, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also something RISCV don't like. Maybe we should add a feature to keep tmpReg on success if this is not always the case (compare and exchange often updates this on fail, but not on success).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the RISCV doc says:
An LR/SC sequence begins with an LR instruction and ends with an SC instruction. The dynamic
code executed between the LR and SC instructions can only contain instructions from the base ''I''
instruction set, excluding loads, stores, backward jumps, taken backward branches, JALR, FENCE,
and SYSTEM instructions. If the ''C'' extension is supported, then compressed forms of the
aforementioned ''I'' instructions are also permitted.

This operation might be a memory store in this case.

@zherczeg
Copy link
Collaborator

Nice patch!

@matetokodi matetokodi force-pushed the jit_threads_rmw branch 3 times, most recently from d4eb1a6 to ee7a837 Compare July 31, 2024 12:53
@matetokodi matetokodi marked this pull request as ready for review July 31, 2024 13:15
case ByteCode::I64AtomicRmw32XchgUOpcode: {
Instruction* instr = compiler->append(byteCode, Instruction::Atomic, opcode, 2, 1);
#if (defined SLJIT_32BIT_ARCHITECTURE && SLJIT_32BIT_ARCHITECTURE)
if (opcode == ByteCode::I64AtomicRmwAddOpcode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reorganize the code to do this without an if statement? Move these cases below the FALLTHROUGH; above, check, if info is zero (default value), and then set kIsCallback; on 32 bit mode.

#if (defined SLJIT_32BIT_ARCHITECTURE && SLJIT_32BIT_ARCHITECTURE)
static void atomicRmwAdd64(uint64_t* shared_p, uint64_t* value, uint64_t* result)
{
std::atomic<uint64_t>* shared = reinterpret_cast<std::atomic<uint64_t>*>(shared_p);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid to cast a pointer to a structure? If the implementation has a different layout, this could be a problem. Could we use some constructor? Something like std::atomic<uint64_t> shared(shared_p)? This way the compiler would decide how to turn a pointer to an atomic pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, there is no constructor that takes a pointer and creates an atomic pointer; the interpreter also uses reinterpret_cast:

std::atomic<T>* shared = reinterpret_cast<std::atomic<T>*>(m_buffer + (offset + addend));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These atomic* relative functions are defined as Memory member functions, which handle the same atomic operations with memory align checks.
Is it more efficient to use these special helper functions in JITC instead of reusing the Memory's member functions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since alignment check is already done, it should be a bit faster, but since it is only for 32 bit, and probably rarely used, we could use those as well. Btw, not all 32 bit cpus have 64 bit atomic operations (e.g. RISCV), so I have no idea how they do it.

https://github.com/Samsung/walrus/blob/main/src/runtime/Memory.h#L171

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized we have no access to the memory structure in the callback, we get the absolute address of the memory location. Shall we add static functions into the memory to do this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see
I think that adding more functions to get the address of the Memory would be complex too, so the current helper functions of JITC seems better :)

src/jit/MemoryInl.h Outdated Show resolved Hide resolved
src/jit/MemoryInl.h Outdated Show resolved Hide resolved
src/jit/MemoryInl.h Show resolved Hide resolved
offset = rmwOperation->offset();

Operand* operands = instr->operands();
MemAddress addr(options, instr->requiredReg(0), instr->requiredReg(1), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preloading the source value would be a good idea (the store operation also do this to reduce the load overhead).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the store operation do this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#endif /* SLJIT_32BIT_ARCHITECTURE */

struct sljit_label* restartOnFailure = sljit_emit_label(compiler);
sljit_emit_atomic_load(compiler, operationSize, tmpReg, SLJIT_EXTRACT_REG(addr.memArg.arg));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using SLJIT_TMP_DEST_REG instead of tmpReg?

@matetokodi matetokodi force-pushed the jit_threads_rmw branch 2 times, most recently from d21975c to 7095cd7 Compare August 1, 2024 09:06
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch is getting better.

@@ -230,6 +230,14 @@ static bool isFloatGlobal(uint32_t globalIndex, Module* module)

#endif /* SLJIT_32BIT_ARCHITECTURE */

#if defined(ENABLE_EXTENDED_FEATURES)
#define OPERAND_TYPE_LIST_ATOMIC \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a theoretical question: since the name "extended features" covers everything, I would call this OPERAND_TYPE_LIST_EXTENDED. The name OTAtomicRmwI32 refers to atomic operations anyway.

sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS3V(P, P, P), SLJIT_IMM, faddr);

sljit_emit_op1(compiler, SLJIT_MOV, dstArgPair.arg1, dstArgPair.arg1w, SLJIT_MEM1(kContextReg), OffsetOfContextField(tmp2) + WORD_LOW_OFFSET);
sljit_emit_op1(compiler, SLJIT_MOV, dstArgPair.arg2, dstArgPair.arg2w, SLJIT_MEM1(kContextReg), OffsetOfContextField(tmp2) + WORD_HIGH_OFFSET);
Copy link
Collaborator

@zherczeg zherczeg Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea: we should do this only if dstArgPair.arg1 != SLJIT_MEM1(kFrameReg). Otherwise we can put the result address directly into the SLJIT_R2 register, and this copy is unnecessary.

}
#endif /* SLJIT_32BIT_ARCHITECTURE */

addr.check(compiler, operands, offset, size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is not right. The sourceReg is passed as the last argument of MemAddress addr(), but you always pass 0 there. How this code works:

https://github.com/Samsung/walrus/blob/main/src/jit/MemoryInl.h#L250


struct sljit_label* restartOnFailure = sljit_emit_label(compiler);
sljit_emit_atomic_load(compiler, operationSize, SLJIT_TMP_DEST_REG, SLJIT_EXTRACT_REG(addr.memArg.arg));
sljit_emit_op1(compiler, SLJIT_MOV, dst.arg, dst.argw, SLJIT_TMP_DEST_REG, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this code after sljit_set_label. I mentioned that memory access between atomic load/store is not necessary supported. Now sljit_emit_atomic_store keeps the tmp reg value when the operation is successful, so you can safely store it.

sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_MEM1(kContextReg), OffsetOfContextField(tmp1) + WORD_LOW_OFFSET, srcArgPair.arg1, srcArgPair.arg1w);
sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_MEM1(kContextReg), OffsetOfContextField(tmp1) + WORD_HIGH_OFFSET, srcArgPair.arg2, srcArgPair.arg2w);

ASSERT(SLJIT_IS_REG(srcArgPair.arg1) || SLJIT_IS_IMM(srcArgPair.arg1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is unnecessary. It can be anything.

#endif /* SLJIT_32BIT_ARCHITECTURE */
JITArg src(operands + 1);
dst = JITArg(operands + 2);
srcReg = GET_SOURCE_REG(src.arg, instr->requiredReg(1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand something. SrcReg is set to instr->requiredReg(1).

Here sourceReg is the last argument, which is set to instr->requiredReg(2).
https://github.com/Samsung/walrus/blob/main/src/jit/MemoryInl.h#L37

Does not include RmwCmpxchg

Signed-off-by: Máté Tokodi [email protected]
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zherczeg
Copy link
Collaborator

@clover2123 what do you think of this patch?

Copy link
Collaborator

@clover2123 clover2123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clover2123 clover2123 merged commit 78d0d9a into Samsung:main Aug 12, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants