- [Low-Level System Information] (#low-level-sys-info)
- [Processor Architecture] (#processor-architecture)
- [Data Representation] (#data-representation)
- Register Convention
- Procedure Calling Convention
- ELF Object Files
- DWARF
This ARCv3 ELF ABI specification document is
© 2020 Synopsys, Claudiu Zissulescu [email protected]
Programs intended to execute on ARCv3-based processors use the ARCv3 instruction set and the instruction encoding and semantics of the architecture.
Assume that all instructions defined by the architecture are neither privileged nor exist optionally and work as documented.
To conform to ARCv3 System V ABI, the processor must do the following:
- implement the instructions of the architecture,
- perform the specified operations,
- produce the expected results.
The ABI neither places performance constraints on systems nor specifies what instructions must be implemented in hardware. A software emulation of the architecture can conform to the ABI.
❗ Caution
Some processors might support optional or additional instructions or capabilities that do not conform to the ARCv3 ABI. Executing programs that use such instructions or capabilities on hardware that does not have the required additional capabilities results in undefined behavior.
The architecture defines an eight-bit byte, a 16-bit halfword, a 32-bit word, and a 64-bit double word. Byte ordering defines how the bytes that make up halfwords, words, and doublewords are ordered in memory.
Most-significant-byte (MSB) ordering, also called as "big-endian", means that the most-significant byte is located in the lowest addressed byte position in a storage unit (byte 0).
Least-significant-byte (LSB) ordering, also called as "little-endian", means that the least-significant byte is located in the lowest addressed byte position in a storage unit (byte 0).
ARCv3-based processors support either big-endian or little-endian byte ordering. However, this specification defines only the base-case little-endian (LSB) architecture.
ARCv3-based processors access data memory using byte addresses and generally require that all memory addresses be aligned as follows:
- 64-bit double-words are aligned to
- 64-bit boundaries on ARC64.
- 32-bit boundaries on ARC32.
- 32-bit words are aligned to 32-bit word boundaries.
- 16-bit halfwords are aligned to 16-bit halfword boundaries.
Name | ABI Mnemonic | Meaning | Available across calls? |
---|---|---|---|
r0 | r0 | Argument 1/Return 1 | No |
r1 | r1 | Argument 2/Return 2 | No |
r2 | r2 | Argument 3/Return 3 | No |
r3 | r3 | Argument 4/Return 4 | No |
r4 | r4 | Argument 5 | No |
r5 | r5 | Argument 6 | No |
r6 | r6 | Argument 7 | No |
r7 | r7 | Argument 8 | No |
r8-r13 | r8-r13 | Temporary registers | No |
r14-r26 | r14-r26 | Callee-saved registers | Yes |
r27 | fp | Temp reg/Frame Pointer | Yes |
r28 | sp | Stack pointer | Yes |
r29 | ilink | Interrupt link reg | Yes |
r30 | gp | Global/Thread pointer | Yes (can be used as GPR) |
r31 | blink | Return address | No |
r32-r57 | r32-r57 | Extension registers | No |
r58 | acc0 | The only accumulator | No |
r59 | N.A. | Reserved | No |
r60 | lp_count | Reserved | -- (Unallocatable) |
r61 | slimm | Signed-extended LIMM | -- (Unallocatable) |
r62 | zlimm | Zero-extended LIMM | -- (Unallocatable) |
r63 | pcl | Program counter | -- (Unallocatable) |
In the standard ABI, procedures should not modify the integer register
Global/Thread Pointer (gp
), because signal handlers may rely upon
their values.
The pcl
register (r63
) contains the four-byte-aligned value of the
program counter.
📝 memo
The scratch registers are not preserved across function calls. When calling an external function, the compiler assumes that registers r0 through r13 are trashed; and that or r14 through r30 are preserved. The EV processors reserve r25.
📝 AGU registers
Address-generation unit (AGU) registers are caller-saved scratch registers. These registers exist on processors configured with DSP and AGU extensions.
Name | ABI Mnemonic | Meaning | Available across calls? |
---|---|---|---|
f0 | f0 | Argument 1/Return 1 | No |
f1 | f1 | Argument 2 | No |
f2 | f2 | Argument 3 | No |
f3 | f3 | Argument 4 | No |
f4 | f4 | Argument 5 | No |
f5 | f5 | Argument 6 | No |
f6 | f6 | Argument 7 | No |
f7 | f7 | Argument 8 | No |
f8-f15 | f8-f15 | Temporary registers | No |
f16-f31 | f16-f31 | Callee saved registers | Yes* |
*: Floating-point values in callee-saved registers are only preserved across calls if they are no larger than the width of a floating-point register in the targeted ABI. Therefore, these registers can always be considered temporaries if targeting the base integer calling convention.
📝 Many other configurations of the FPU registers are possible. To keep it simple this table defines just the maximal configuration.
Argument Passing
-
The base integer calling convention provides eight argument registers, r0-r7, the first four of which are also used to return values. In builds with a reduced register set, the first four words are loaded into r0 to r3.
-
The remaining arguments are passed by storing them into the stack immediately above the stack-pointer register.
Scalars that are at most 64 bits wide are passed in a single argument register, or on the stack by value if none is available. When passed in registers, scalars narrower than 64 bits are widened according to the sign of their type up to 32 bits, then sign-extended to 64 bits.
Scalars that are 2x64 bits wide are passed in a pair of argument registers, or on the stack by value if none are available. If exactly one register is available, the low-order 64 bits are passed in the register and the high-order 64 bits are passed on the stack.
Scalars wider than 2x64 are passed by reference and are replaced in the argument list with the address.
Aggregates whose total size is no more than 64 bits are passed in a register, with the fields laid out as though they were passed in memory. If no register is available, the aggregate is passed on the stack. Aggregates whose total size is no more than 2x64 bits are passed in a pair of registers; if only one register is available, the first half is passed in a register and the second half is passed on the stack. If no registers are available, the aggregate is passed on the stack. Bits unused due to padding, and bits past the end of an aggregate whose size in bits is not divisible by 64, are undefined.
Aggregates or scalars passed on the stack are aligned to the minimum of the object alignment and the stack alignment.
Aggregates larger than 2x64 bits are passed by reference and are replaced in the argument list with the address, as are C++ aggregates with nontrivial copy constructors, destructors, or vtables.
Empty structs or union arguments or return values are ignored by C compilers which support them as a non-standard extension. This is not the case for C++, which requires them to be sized types.
Bitfields are packed in little-endian fashion. A bitfield that would span the alignment boundary of its integer type is padded to begin at the next alignment boundary. For example,
struct
{
int x : 10;
int y : 12;
}
is a 32-bit type with x
in bits 9-0, y
in bits 21-10, and
bits 31-22 undefined. By contrast,
struct
{
short x : 10;
short y : 12;
}
is a 32-bit type with x
in bits 9-0, y
in bits 27-16, and
bits 31-28 and 15-10 undefined.
Arguments passed by reference may be modified by the callee.
Floating-point reals are passed the same way as aggregates of the same size, complex floating-point numbers are passed the same way as a struct containing two floating-point reals.
In the base integer calling convention, variadic arguments are passed in the same manner as named arguments. After a variadic argument has been passed on the stack, all future arguments will also be passed on the stack.
Values are returned in the same manner as a first named argument of the same type would be passed. If such an argument would have been passed by reference, the caller allocates memory for the return value, and passes the address as an implicit first parameter.
The stack grows towards negative addresses and the stack pointer shall be aligned to a 128-bit? boundary upon procedure entry. In the standard ABI, the stack pointer must remain aligned throughout procedure execution. Non-standard ABI code must realign the stack pointer prior to invoking standard ABI procedures. The operating system must realign the stack pointer prior to invoking a signal handler; hence, POSIX signal handlers need not realign the stack pointer. In systems that service interrupts using the interruptee's stack, the interrupt service routine must realign the stack pointer if linked with any code that uses a non-standard stack-alignment discipline, but need not realign the stack pointer if all code adheres to the standard ABI.
Procedures must not rely upon the persistence of stack-allocated data whose addresses lie below the stack pointer.
When the FPU is configured the ABI changes dramatically as floating point values are passed in FPU registers rather than core registers. This means the application must be compiled with runtime libraries that were compiled similarly.
Argument passing:
-
ARC32: The first eight words (32 bytes) of arguments are loaded into registers
r0
tor7
. In builds with a reduced register set, the first four words are loaded intor0
tor3
. -
ARC64: The first eight double words (64 bytes) of arguments are loaded into registers
r0
tor7
. -
The remaining arguments are passed by storing them into the stack immediately above the stack-pointer register.
-
Floating point values are passed in
f0
tof7
when the FPU is configured and the registers are wide enough for the specified type. -
Vectors of floating point types are passed in FPU registers when those vectors and floating point types are supported by the hardware configuration chosen. They are passed in
f0
tof7
. Afterf0
tof7
are consumed, the remainder are passed on the stack as overflow parameters.
Functions return the following results:
-
Any scalar or pointer type that is 32 bits or less in size (char, short, int) is returned in
r0
(and typelong
when ARC32). -
Eight-byte integers (long long, double, and float complex) are returned in
r0
andr1
on ARC32 and just inr0
on ARC64 (and typelong
is 64-bits and returned in justr0
on ARC64). -
Results of type complex double are returned in
r0
tor3
on ARC32 andr0
andr1
on ARC64 when no FPU is configured. -
Results of type complex float are returned in
r0
andr1
when no FPU is configured. -
Results of all complex floating point types are returned in
f0
andf1
when the FPU is configured and the floating-point element type is supported by that configuration. -
Results of type struct are returned in a caller-supplied temporary variable whose address is passed in
r0
. For such functions, the arguments are shifted so that they are passed in r1 and upwards.
NOTE: When structs (also unions, arrays, and vectors), are passed by value they are passed in the core regisers until those core registers are consumed, and the remainder are passed on the stack in the argument-overflow area. It is very difficult to describe precisely. The best practice is to create lots of examples and examine the generated code.
Varargs also changes significantly as the FPU introduces two separate register save areas (one for integers and one for floating point). This means va_list now becomes a struct (similar to what has been done for x86 for years).
typedef struct {
int32_t gp_offset;
int32_t fp_offset;
void *overflow_arg_area;
void *reg_save_area;
} va_list[1];
While various different ABIs are technically possible, for software compatibility reasons it is strongly recommended to use the following default ABIs:
Type | Size (Bytes) | Alignment (Bytes)
------------|---------------|------------------
bool/_Bool | 1 | 1
char | 1 | 1
short | 2 | 2
int | 4 | 4
wchar_t | 4 | 4
wint_t | 4 | 4
long | 8 | 8
long long | 8 | 8
__int128 | 16 | 16
void * | 8 | 8
__fp16 | 2 | 2
float | 4 | 4
double | 8 | 8
long double | 16 | 16
char
is unsigned. wchar_t
is signed. wint_t
is unsigned.
_Complex
types have the alignment and layout of a struct containing two
fields of the corresponding real type (float
, double
, or long double
),
with the first field holding the real part and the second field holding
the imaginary part.
Aggregates (structures, classes, and arrays) and unions assume the alignment of their most strictly aligned component, that is, the component with the largest alignment. The size of any object, including aggregates, classes, and unions, is always a multiple of the alignment of the object. Non-bitfield members always start on byte boundaries. The size of a struct or class is the sum of the sizes of its members, including alignment padding between members. The size of a union is the size of its largest member, padded such that its size is evenly divisible by its alignment. Enumerations can be mapped to one, two, or four bytes, depending on their size. An array uses the same alignment as its elements. Structure and union objects can be packed or padded to meet size and alignment constraints:
- An entire structure or union object is aligned on the same boundary as its most strictly aligned member, though a packed structure or union need not be aligned on word boundaries.
- Each member is assigned to the lowest available offset with the appropriate alignment. Such alignment might require internal padding, depending on the previous member.
- If necessary, a structure's size is increased to make it a multiple of the structure's alignment. Such alignment might require tail padding, depending on the last member.
For detailed information on C++ classes, see "Storage Mapping for Class Objects " see (#stormap)
☝️ Examples
Structure smaller than a word:
struct {
char c;
};
No Padding:
struct {
char c;
char d;
short s;
int n;
};
Internal Padding:
struct {
char c;
short s;
};
Internal and Tail Padding:
struct {
char c;
double d;
short s;
};
Union Allocation:
union {
char c;
short s;
int j;
};
C++ class objects must be mapped in accordance with the GNU Itanium ABI
C/C++ struct and union definitions can have bitfields, defining integral objects with a specified number of bits.
Bitfields are signed unless explicitly declared as unsigned. For example, a four-bit field declared as int can hold values from -8 to 7.
bitfield_types
shows the possible widths for bitfields, where w is
maximum width (in bits).
Bit Field Type | Max Width | Range of Values |
---|---|---|
signed char | 1 to 8 | -2(w-1) to 2(w-1)-1 |
char(default signedness) | 1 to 8 | 0 to 2^w - 1 |
unsigned char | 1 to 8 | 0 to 2^w - 1 |
short | 1 to 16 | -2^(w-1) to 2^(w-1) - 1 |
unsigned short | 1 to 16 | 0 to 2^(w-1) - 1 |
int | 1 to 32 | -2^(w-1) to 2^(w-1) - 1 |
long | 1 to 32/64 | -2^(w-1) to 2^(w-1) - 1 |
enum (unless signed values) | 1 to 32 | 2^(w-1) - 1 |
unsigned int | 1 to 32 | 2^(w-1) - 1 |
unsigned long | 1 to 32/64 | 2^(w-1) - 1 |
long long int | 1 to 64 | -2^(w-1) to 2^(w-1) -1 |
unsigned long long int | 1 to 64 | 0 to 2^(w-1) -1 |
Bitfields obey the same size and alignment rules as other structure and union members, with the following additions:
-
Bitfields are allocated from most to least significant bit on big-endian implementations.
-
Bitfields are allocated from least to most significant bit on little-endian implementations.
-
The alignment that a bit field imposes on its enclosing struct or union is the same as any ordinary (non-bit) field of the same type. Thus, a bitfield of type int imposes a four-byte alignment on the enclosing struct.
-
Bitfields are packed in consecutive bytes, except if a bitfield packed in consecutive bytes crosses a byte offset B where
B % sizeof(FieldType) == 0
.
In particular:
- A bitfield of type char
must not cross a byte boundary.
- A bitfield of type short
must not cross a halfword boundary.
- A bit field of type int
must not cross a word boundary.
- Because long long ints are four-byte-aligned on ARCv3-based
processors, a bitfield of type long long
must not cross two word boundaries. Thus, field B in the
following code starts on byte 4 of the parent struct:
struct S { int A:8; long long B:60; }
You can insert padding as needed to comply with these rules.
Unnamed bitfields of non-zero length do not affect the external alignment. In all other respects, they behave the same as named bitfields. An unnamed bitfield of zero length causes alignment to occur at the next unit boundary, based on its type.
The va_list
type is void*
. A callee with variadic arguments is responsible
for copying the contents of registers used to pass variadic arguments to the
vararg save area, which must be contiguous with arguments passed on the stack.
The va_start
macro initializes its va_list
argument to point to the start
of the vararg save area. The va_arg
macro will increment its va_list
argument according to the size of the given type, taking into account the
rules about 2x64 aligned arguments being passed in "aligned" register pairs.
This section describes the layout of the stack frame and registers that must be saved by the callee prolog code.
The stack-pointer (sp) register always points to the lowest used address of the most recently allocated stack frame. The value of sp is a four-byte-aligned address on ARC32 and an eight-byte-aligned value on ARC64.
The stack-pointer register is commonly used as a base register to access stack-frame-based variables, which always have a positive offset. However, when alloca() is called, the stack-pointer register might be arbitrarily decremented after the stack frame is allocated. In such a case, the frame pointer register is used to reference stack-frame-based variables.
The frame pointer register (fp) is used when a function calls alloca() to allocate space on the stack, and stack-frame-based variables must be accessed.
The callee's prolog code saves all registers that need to be saved.
Saved values include the value of the caller's frame-pointer register,
blink
(return address) register, callee-saved registers used by the
function.
📝 Note
fp
and blink
are saved next to each other when both require
saving. Secondly, only the order in which fp
and blink
are saved
is specified by the ABI. The debugger can properly display stack
frames with proper CFA directives no matter the order in which the
registers are saved (the same currently applies to C++ exception
unwinding).
The caller's stack-pointer (sp) register does not need to be saved because the compiler is able to restore the stack pointer for each function to its original value (for example, by using an add instruction).
ARC64 stack frame generated by GNU compiler looks like this:
+-------------------------------+
| |
| incoming stack arguments |
| |
+-------------------------------+ <-- incoming stack pointer (aligned)
| |
| callee-allocated save area |
| for register varargs |
| |
+-------------------------------+ <-- arg_pointer_rtx
| |
| GPR save area |
| |
+-------------------------------+
| Return address register |
| (if required) |
+-------------------------------+
| FP (if required) |
+-------------------------------+ <-- (hard) frame_pointer_rtx
| |
| Local variables |
| |
+-------------------------------+
| outgoing stack arguments |
| |
+-------------------------------+ <-- stack_pointer_rtx (aligned)
Dynamic stack allocations such as alloca insert data after local
variables. The stack frame must be maintained using the frame pointer
(fp
) instead of the stack pointer (sp
).
-
e_ident
- EI_CLASS: Specifies the base ISA:
- ELFCLASS64: ELF-64 Object File
- ELFCLASS32: ELF-32 Object File
- EI_DATA:
- ELFDATA2LSB: If execution environment is little-endian
- ELFDATA2LSB: If execution environment is little-endian
- EI_CLASS: Specifies the base ISA:
-
e_type: Nothing ARCv3 specific.
-
e_machine: Identifies the machine this ELF file targets. Always contains:
- EM_ARC_COMPACT3_64 (253 - 0xfd) for Synopsys ARCv3 64-bit
- EM_ARC_COMPACT3 (255 - 0xff) for Synopsys ARCv3 32-bit
-
e_flags: Describes the format of this ELF file. These flags are used by the linker to disallow linking ELF files with incompatible ABIs together.
The high bits are used to select the Linux OSABI:
Value | Mnemonic | Info |
---|---|---|
0x000 | OSABI_ORIG | v2.6.35 kernel (sourceforge) |
0x200 | OSABI_V2 | v3.2 kernel (sourceforge) |
0x300 | OSABI_V3 | v3.9 kernel (sourceforge) |
0x400 | OSABI_V4 | v24.8 kernel (sourceforge) |
The sections listed are used by the system and have the types and attributes shown.
Name | Type | Attributes |
---|---|---|
.arcextmap | SHT_PROGBITS | none |
.bss | SHT_NOBITS | SHF_ALLOC + SHF_WRITE |
.ctors | SHT_PROGBITS | SHF_ALLOC |
.data | SHT_PROGBITS | SHF_ALLOC + SHF_WRITE |
.fixtable | SHT_PROGBITS | SHF_ALLOC + SHF_WRITE |
.heap | SHT_NOBITS | SHF_ALLOC + SHF_WRITE |
.initdata | SHT_PROGBITS | SHF_ALLOC |
.offsetTable | SHT_PROGBITS | SHF_ALLOC + SHF_OVERLAY_OFFSET_TABLE + SHF_INCLUDE |
.overlay | SHT_PROGBITS | SHF_ALLOC + SHF_EXECINSTR + SHF_OVERLAY + SHF_INCLUDE |
.overlayMultiLists | SHT_PROGBITS | SHF_ALLOC + SHF_INCLUDE |
.pictable | SHT_PROGBITS | SHF_ALLOC |
.rodata_in_data | SHT_PROGBITS | SHF_ALLOC + SHF_WRITE |
.sbss | SHT_NOBITS | SHF_ALLOC + SHF_WRITE |
.sdata | SHT_PROGBITS | SHF_ALLOC + SHF_WRITE |
.stack | SHT_NOBITS | SHF_ALLOC + SHF_WRITE |
.text | SHT_PROGBITS | SHF_ALLOC + SHF_EXECINST |
.tls | SHT_PROGBITS | SHF_ALLOC + SHF_WRITE |
.ucdata | SHT_PROGBITS | SHF_ALLOC + SHF_WRITE |
.vectors | SHT_PROGBITS | SHF_ALLOC + SHF_EXECINST |
To be compliant with the ARCv3 ABI, a system must support .tls
,
.sdata
, and .sbss
sections, and must recognize, but may choose to
ignore, .arcextmap
and .stack
sections.
Special features might create additional sections. For details regarding overlay-related sections see the Automated Overlay Manager User's Guide.
-
.arcextmap Debugging information relating to processor extensions
-
.bss Uninitialized variables that are not const-qualified (startup code normally sets .bss to all zeros)
-
.ctors Contains an array of functions that are called at startup to initialize elements such as C++ static variables
-
.data Static variables (local and global)
-
.fixtable Function replacement prologs
-
.heap Uninitialized memory used for the heap
-
.initdata Initialized variables and code (usually compressed) to be copied into place during run-time startup
-
.offsetTable Overlay-offset table
-
.overlay All overlays defined in the executable
-
.overlayMultiLists Token lists for functions that appear in more than one overlay group
-
.pictable Table for relocating pre-initialized data when generating position-independent code and data
-
.rodata_in_data Read-only string constants when -Hharvard or -Hccm is specified.
-
.sbss Uninitialized data, set to all zeros by startup code and directly accessible from the %gp register
-
.sdata Initialized small data, directly accessible from the %gp register, and small uninitialized variables
-
.stack Stack information
-
.text Executable code
-
.tls Thread-local data
-
.ucdata Holds data accessed using cache bypass
-
.vectors Interrupt vector table
❗ Caution
Sections that contribute to a loadable program segment must not contain overlapping virtual addresses.
ARCv3-based processors that support the Linux operating system follow the Linux conventions for dynamic linking.
Programs may use a small-data area to reduce code size by storing
small variables in the .sdata
and .sbss
sections, where such data
can be addressed using small, signed offsets from the gp
register. If the program uses small data, program startup must
initialize the gp
register to the address of symbol _SDA_BASE_
Such initialization is typically performed by the default startup
code.
The names and functions of the processor registers are described in
regs
. Compilers may map variables to a register or registers as
needed in accordance with the rules described in arg_pass
and
ret_val
, including mapping multiple variables to a single register.
Compilers may place auto variables that are not mapped into registers at fixed offsets within the function's stack frame as required, for example to obtain the variable's address or if the variable is of an aggregate type.
ARCv3 is a classical RISC architecture that has densely packed non-word sized instruction immediate values. While the linker can make relocations on arbitrary memory locations, many of the RISC-V relocations are designed for use with specific instructions or instruction sequences. RISC-V has several instruction specific encodings for PC-Relative address loading, jumps, branches and the RVC compressed instruction set.
The purpose of this section is to describe the ARCv3 specific instruction sequences with their associated relocations in addition to the general purpose machine word sized relocations that are used for symbol addresses in the Global Offset Table or DWARF meta data.
The following table provides details of the ARCv3 ELF relocations (instruction specific relocations show the instruction type in the Details column):
Enum | Hex | ELF Reloc Type | Description | Details |
---|---|---|---|---|
0 | 0x00 | R_ARC_NONE | None | |
1 | 0x01 | R_ARC_8 | Runtime relocation | word8 = S + A |
2 | 0x02 | R_ARC_16 | Runtime relocation | word16 = S + A |
3 | 0x03 | R_ARC_24 | Runtime relocation | word24 = S + A |
4 | 0x04 | R_ARC_32 | Runtime relocation | word32 = S + A |
5 | 0x05 | R_ARC_64 | Runtime relocation | like R_ARC_32 but 64-bit |
6 | 0x06 | R_ARC_B22_PCREL | PC-relative | disp22 = (S + A - P) >> 2 |
7 | 0x07 | R_ARC_H30 | Runtime relocation | word32 = (S + A) >> 2 |
8 | 0x08 | R_ARC_N8 | Runtime relocation | word8 = A - S |
9 | 0x09 | R_ARC_N16 | Runtime relocation | word16 = A - S |
10 | 0x0a | R_ARC_N24 | Runtime relocation | word24 = A - S |
11 | 0x0b | R_ARC_N32 | Runtime relocation | word32 = A - S |
12 | 0x0c | R_ARC_SDA | SDA relocation | disp9 = ME ((S+A)-_SDA_BASE_) |
13 | 0x0d | R_ARC_SECTOFF | word32 = S - SECTSTART + A | |
14 | 0x0e | R_ARC_S21H_PCREL | PC-relative | disp21h = ME ((S+A-P)>>1) |
15 | 0x0f | R_ARC_S21W_PCREL | PC-relative | disp21w = ME ((S+A-P)>>2) |
16 | 0x10 | R_ARC_S25H_PCREL | PC-relative | disp25h = ME ((S+A-P)>>1) |
17 | 0x11 | R_ARC_S25W_PCREL | PC-relative | disp25w = ME ((S+A-P)>>2) |
18 | 0x12 | R_ARC_SDA32 | SDA relocation | word32 = ME ((S+A)-_SDA_BASE_) |
19 | 0x13 | R_ARC_SDA_LDST | SDA relocation | disp9ls = ME ((S+A)-_SDA_BASE_) |
20 | 0x14 | R_ARC_SDA_LDST1 | SDA relocation | disp9ls = ME (((S+A)-_SDA_BASE_)>>1) |
21 | 0x15 | R_ARC_SDA_LDST2 | SDA relocation | disp9ls = ME (((S+A)-_SDA_BASE_)>>2) |
22 | 0x16 | R_ARC_SDA16_LD | SDA relocation | disp9s = ((S+A)-_SDA_BASE_) |
23 | 0x17 | R_ARC_SDA_LD1 | SDA relocation | disp9s = (((S+A)-_SDA_BASE_)>>1) |
24 | 0x18 | R_ARC_SDA_LD2 | SDA relocation | disp9s = (((S+A)-_SDA_BASE_)>>2) |
25 | 0x19 | R_ARC_S13_PCREL | PC-relative | disp13s = ME ((S+A-P)>>2) |
26 | 0x1a | R_ARC_W | Runtime relocation | word32 = (S+A) AND (0x03) |
27 | 0x1b | R_ARC_32_ME | Runtime relocation | word32 = ME (S + A) |
28 | 0x1c | R_ARC_N32_ME | Runtime relocation | word32 = ME (A - S) |
29 | 0x1d | R_ARC_SECTOFF_ME | word32 = ME (S - SECTSTART + A) | |
30 | 0x1e | R_ARC_SDA32_ME | SDA relocation | word32 = ME ((S+A)-_SDA_BASE_) |
31 | 0x1f | R_ARC_W_ME | Runtime relocation | word32 = ME ((S+A) AND (0x03)) |
32 | 0x20 | R_ARC_H30_ME | Runtime relocation | word32 = ME ((S + A) >> 2) |
33 | 0x21 | R_ARC_SECTOFF_U8 | Runtime relocation | disp9 = ME (S - SECTSTART + A) |
34 | 0x22 | R_ARC_SECTOFF_S9 | Runtime relocation | disp9 = ME ((S - SECTSTART + A) - 256) |
35 | 0x23 | R_AC_SECTOFF_U8 | disp9ls = ME (S - SECTSTART + A) | |
36 | 0x24 | R_AC_SECTOFF_U8_1 | disp9ls = ME ((S - SECTSTART + A)>>1) | |
37 | 0x25 | R_AC_SECTOFF_U8_2 | disp9ls = ME ((S - SECTSTART + A)>>2) | |
38 | 0x26 | R_AC_SECTOFF_S9 | disp9ls = ME ((S - SECTSTART + A) - 256) | |
39 | 0x27 | R_AC_SECTOFF_S9_1 | disp9ls = ME ((S - SECTSTART + A - 256)>>1) | |
40 | 0x28 | R_AC_SECTOFF_S9_2 | disp9ls = ME ((S - SECTSTART + A - 256)>>2) | |
41 | 0x29 | R_ARC_SECTOFF_ME_1 | word32 = ME ((S - SECTSTART + A)>>1) | |
42 | 0x2a | R_ARC_SECTOFF_ME_2 | word32 = ME ((S - SECTSTART + A)>>2) | |
43 | 0x2b | R_ARC_SECTOFF_1 | word32 = (S - SECTSTART + A)>>1 | |
44 | 0x2c | R_ARC_SECTOFF_2 | word32 = (S - SECTSTART + A)>>2 | |
45 | 0x2d | R_ARC_SDA_12 | SDA relocation | disp12s = ME ((S+A)-_SDA_BASE_) |
46 | 0x2e | R_ARC_LDI_SECTOFF1 | Runtime relocation | u7 = (S - SECTSTART + A)>>1 |
47 | 0x2f | R_ARC_LDI_SECTOFF2 | Runtime relocation | s12 = (S - SECTSTART + A)>>2 |
48 | 0x30 | R_ARC_SDA16_ST2 | SDA relocation | disp9s1 = ((S+A)-_SDA_BASE_)>>2 |
49 | 0x31 | R_ARC_32_PCREL | PC-relative (data) | word32 = (S+A-PDATA) |
50 | 0x32 | R_ARC_PC32 | PC-relative | word32 = ME (S+A-P) |
51 | 0x33 | R_ARC_GOTPC32 | PC-relative GOT reference | word32 = ME (GOT + G + A - P) |
52 | 0x34 | R_ARC_PLT32 | PC-relative (PLT) | word32 = ME (L+A-P) |
53 | 0x35 | R_ARC_COPY | Runtime relocation | must be in executable |
54 | 0x36 | R_ARC_GLOB_DAT | GOT relocation | word32= S |
55 | 0x37 | R_ARC_JMP_SLOT | Runtime relocation | word32 = ME(S) |
56 | 0x38 | R_ARC_RELATIVE | Runtime relocation | word32 = ME(B+A) |
57 | 0x39 | R_ARC_GOTOFF | GOT relocation | word32 = ME(S+A-GOT) |
58 | 0x3a | R_ARC_GOTPC | PC-relative (GOT) | word32 = ME(GOT_BEGIN - P) |
59 | 0x3b | R_ARC_GOT32 | GOT relocation | word32 = (G + A) |
60 | 0x3c | R_ARC_S21W_PCREL_PLT | PC-relative (PLT) | disp21w = ME ((L+A-P)>>2) |
61 | 0x3d | R_ARC_S25H_PCREL_PLT | PC-relative (PLT) | disp25h = ME ((L+A-P)>>1) |
62 | 0x3e | R_ARC_SPE_SECTOFF | Unknown | u11 = ((S - + A) >> 2) |
63 | 0x3f | R_ARC_JLI_SECTOFF | JLI relocation | jli = ((S-JLI)>>2) |
64 | 0x40 | R_ARC_AON_TOKEN_ME | Automatic Overlay Manager | |
65 | 0x41 | R_ARC_AON_TOKEN | Automatic Overlay Manager | |
66 | 0x42 | R_ARC_TLS_DTPMOD | TLS relocation (dynamic) | word32 |
67 | 0x43 | R_ARC_TLS_DTPOFF | TLS relocation | word32 = ME (S - FINAL_SECTSTART + A) |
68 | 0x44 | R_ARC_TLS_TPOFF | TLS relocation (dynamic) | word32 |
69 | 0x45 | R_ARC_TLS_GD_GOT | TLS relocation | word32 = ME(G + GOT - P) |
70 | 0x46 | R_ARC_TLS_GD_LD | TLS relocation | Legacy |
71 | 0x47 | R_ARC_TLS_GD_CALL | TLS relocation | Legacy |
72 | 0x48 | R_ARC_TLS_IE_GOT | TLS reloaction | word32 = ME (G+GOT-P) |
73 | 0x49 | R_ARC_TLS_DTPOFF_S9 | TLS relocation | Legacy |
74 | 0x4a | R_ARC_TLS_LE_S9 | TLS relocation | Legacy |
75 | 0x4b | R_ARC_TLS_LE_32 | TLS relocation | word32 = ME(S+A+TLS_TBSS-TLS_REL) |
76 | 0x4c | R_ARC_S25W_PCREL_PLT | PC-relative (PLT) | disp25w = ME ((L+A-P)>>2) |
77 | 0x4d | R_ARC_S21H_PCREL_PLT | PC-relative (PLT) | disp21h = ME ((L+A-P)>>1) |
78 | 0x4e | R_ARC_NPS_CMEM16 | NPS relocation | bits16 = ME (S+A) |
79 | 0x4f | R_ARC_S9H_PCREL | PC-relative | bits9 = ME ( ( ( ( S + A ) - P ) >> 1 ) ) ) |
80 | 0x50 | R_ARC_S7H_PCREL | PC-relative | bits7 = (( S + A ) - P ) >> 1 |
81 | 0x51 | R_ARC_S8H_PCREL | PC-relative | disp8h = (( S + A ) - P ) >> 1 |
82 | 0x52 | R_ARC_S10H_PCREL | PC-relative | bits10 = (( S + A ) - P ) >> 1 |
83 | 0x53 | R_ARC_S13H_PCREL | PC-relative | bits13 = ME ( ( ( ( S + A ) - P ) >> 1 ) ) ) |
84 | 0x54 | R_ARC_ALIGN | Alignment statement | |
85 | 0x55 | R_ARC_ADD8 | 8-bit label addition | word8 = S + A |
86 | 0x56 | R_ARC_ADD16 | 16-bit label addition | word16 = S + A |
87 | 0x57 | R_ARC_SUB8 | 8-bit label subtraction | word8 = S - A |
88 | 0x58 | R_ARC_SUB16 | 16-bit label subtraction | word16 = S - A |
89 | 0x59 | R_ARC_SUB32 | 32-bit label subtraction | word32 = S - A |
90 | 0x5a | R_ARC_LO32 | Absolute address | word32 = (S + A) & 0xffffffff |
91 | 0x5b | R_ARC_HI32 | Absolute address | word32 = (S + A) >> 32 |
92 | 0x5c | R_ARC_LO32_ME | Absolute address | word32 = ME ((S + A) & 0xffffffff) |
93 | 0x5d | R_ARC_HI32_ME | Absolute address | word32 = ME ((S + A) >> 32) |
94 | 0x5e | R_ARC_N64 | Absolute address | word64 = *P - (S + A) |
95 | 0x5f | R_ARC_SDA_LDST3 | SDA relocation | disp9ls = (S + A - _SDA_BASE_) >> 3 |
96 | 0x60 | R_ARC_NLO32 | Absolute address | word32 = *P - ((S+A) & 0xffffffff) |
97 | 0x61 | R_ARC_NLO32_ME | Absolute address | word32 = ME(*P - ((S+A) & 0xffffffff)) |
98 | 0x62 | R_ARC_PCLO32_ME_2 | PC-relative address | word32 = ME ((S + A - P ) >> 2) |
99 | 0x63 | R_ARC_PLT34 | PC-relative (PLT) | word32 = ME ((L + A - P ) >> 2) |
100 | 0x64 | R_ARC_JLI64_SECTOFF | JLI offset | u10 = ((S - ) + A) >> 2 |
101 | 0x65 | R_ARC_S25W_PCREL_WCALL | PC-relative (weak) | disp25w = (S + A - P) >> 2 |
102 | 0x66 | R_ARC_S32_PCREL_ME | PC-relative | word32 = (S + A) - ((P-4) & ~3) |
103 | 0x67 | R_ARC_N32W | Absolute address | |
104 | 0x68 | R_ARC_N32W_ME | Absolute address | |
105 | 0x69 | R_ARC_NLO32W | Absolute address | |
106 | 0x6a | R_ARC_NLO32W_ME | Absolute address | |
192-255 | Reserved | Reserved for nonstandard ABI extensions |
- ARCv3: Conflicting or duplicated relocations, needs to be resolved.
Nonstandard extensions are free to use relocation numbers 192-255 for any purpose. These relocations may conflict with other nonstandard extensions.
This document specifies several types of relocatable fields used by relocations.
-
bits8 Specifies 8 bits of data in a separate byte.
-
bits16 Specifies 16 bits of data in a separate byte.
-
bits24 Specifies 24 bits of data in a separate byte.
-
disp7u Secifies a seven-bit unsigned displacement within a 16-bit instruction word, with bits 2-0 of the instruction stored in bits 2-0 and bits 6-3 of the instruction stored in bits 7-4.
-
disp9 Specifies a nine-bit signed displacement within a 32-bit instruction word.
-
disp9ls Specifies a nine-bit signed displacement within a 32-bit instruction word.
-
disp9s Specifies a 9-bit signed displacement within a 16-bit instruction word.
-
disp10u Specifies a 10-bit unsigned displacement within a 16-bit instruction word.
-
disp13s Specifies a signed 13-bit displacement within a 16-bit instruction word. The displacement is to a 32-bit-aligned location and thus bits 0 and 1 of the displacement are not explicitly stored.
-
disp21h Specifies a 21-bit signed displacement within a 32-bit instruction word. The displacement is to a halfword-aligned target location, and thus bit 0 of the displacement is not explicitly stored. Note that the 32-bit instruction containing this relocation field may be either 16-bit-aligned or 32-bit-aligned.
-
disp21w Specifies a signed 21-bit displacement within a 32-bit instruction word. The displacement is to a 32-bit-aligned target location, and thus bits 0 and 1 of the displacement are not explicitly stored. Note that the 32-bit instruction containing this relocation field may be either 16-bit-aligned or 32-bit-aligned.
-
disp25h Specifies a 25-bit signed displacement within a 32-bit instruction word. The displacement is to a halfword-aligned target location, and thus bit 0 is not explicitly stored. Note that the 32-bit instruction containing this relocation field may be either 16-bit-aligned or 32-bit-aligned.
-
disp25w Specifies a 25-bit signed displacement within a 32-bit instruction word. The displacement is to a 32-bit-aligned target location, and thus bits 0 and 1 are not explicitly stored. Note that the 32-bit instruction containing this relocation field may be either 16-bit-aligned or 32-bit-aligned.
-
disps9 Specifies a nine-bit signed displacement within a 16-bit instruction word. The displacement is to a 32-bit-aligned location, and thus bits 0 and 1 of the displacement are not explicitly stored. This means that effectively the field is bits 10-2, stored at 8-0.
-
disps12 Specifies a twelve-bit signed displacement within a 32-bit instruction word. The high six bits are in 0-5, and the low six bits are in 6-11.
-
word32 Specifies a 32-bit field occupying four bytes, the alignment of which is four bytes unless otherwise specified.
-
word32me (Little-Endian) Specifies a 32-bit field in middle-endian Storage. Bits 31..16 are stored first, and bits 15..0 are stored adjacently. The individual halfwords are stored in the native endian orientation of the machine.
-
word32me (Big-Endian) Specifies a 32-bit field in middle-endian Storage. Bits 31..16 are stored first, and bits 15..0 are stored adjacently. The individual halfwords are stored in the native endian orientation of the machine.
The following table provides details on the variables used in address calculation:
Variable | Description |
---|---|
A | Addend field in the relocation entry associated with the symbol |
B | The base address at which a shared object has been loaded into memory during execution |
S | Base address of a shared object loaded into memory |
G | The offset into the global offset table at which the address of the relocated symbol will reside during execution |
GOT | Offset of the symbol into the GOT (Global Offset Table) |
L | The place (section offset or address) of the PLT entry for a symbol |
MES | Middle-Endian Storage |
P | The place (section offset or address) of the storage unit being relocated (computed using r_offset). |
S | Value of the symbol in the symbol table |
SECTSTART | Start of the current section. Used in calculating offset types. |
_SDA_BASE | Base of the small-data area. |
JLI | Base of the JLI table. |
A relocation entry's r_offset value designates the offset or virtual address of the first byte of the field to be relocated. The relocation type specifies which bits to change and how to calculate their values. The ARCv3 architecture uses only Elf32_Rela relocation entries. The addend is contained in the relocation entry. Any data from the field to be relocated is discarded. In all cases, the addend and the computed result use the same byte order.
64-bit absolute addresses are loaded with a pair of instructions which
have an associated pair of relcations:
R_ARC_LO32_ME
and R_ARC_HI32_ME
.
The R_ARC_HI32_ME
refers to MOVHL
or ADDHL
instructions
containing the high 32-bits fo be reloacted to an absoute symbol
address. The movhl
instruction is followed by any 64-bit type
instruction which accepts zero extension of the 32-bits immediate
field with an R_ARC_LO32_ME
relocation. The address of pair of
reloactions are calculated like this:
hi32 = (symbol_address >> 32);
lo32 = (symbol_address & 0xffffffff);
The following assemby and relocations show loading an absolute address:
movhl_s r0,@symbol # R_ARC_HI32_ME (symbol)
orl_s r0,r0,@symbol # R_ARC_LO32_ME (symbol)
or alternatively:
movhl_s r0,@symbol@hi # R_ARC_HI32_ME (symbol)
orl_s r0,r0,@symbol@lo # R_ARC_LO32_ME (symbol)
For position independent code in dynamically linked objects, each shared object contains a GOT (Global Offset Table) which contains addresses of global symbols (objects and functions) referred to by the dynamically linked shared object. The GOT in each shared library is filled in by the dynamic linker during program loading, or on the first call to extern functions.
To avoid runtime relocations within the text segment of position independent code the GOT is used for indirection. Instead of code loading virtual addresses directly, as can be done in static code, addresses are loaded from the GOT. The allows runtime binding to external objects and functions at the expense of a slightly higher runtime overhead for access to extern objects and functions.
The dynamic linker may choose different memory segment addresses for the same shared object in different programs; it may even choose different library addresses for different executions of the same program. Nonetheless, memory segments do not change addresses after the process image is established. As long as a process exists, its memory segments reside at fixed virtual addresses.
The global offset table normally resides in the .got ELF section in an
executable or shared object. The symbol _GLOBAL_OFFSET_TABLE_
can be
used to access the table. This symbol can reside in the middle of the
.got section, allowing both positive and negative subscripts into the
array of addresses.
The entry at _GLOBAL_OFFSET_TABLE_[0]
is reserved for the address of
the dynamic structure, referenced with the symbol _DYNAMIC
. This
allows the dynamic linker to find its dynamic structure prior to the
processing of the relocations.
The entry at _GLOBAL_OFFSET_TABLE_[1]
is reserved for use by the
dynamic loader. The entry at _GLOBAL_OFFSET_TABLE_[2]
is reserved to
contain a dynamic the lazy symbol-resolution entry point.
If no explicit .pltgot is used, _GLOBAL_OFFSET_TABLE_[3 .. 3+F]
are
used for resolving function references, and
_GLOBAL_OFFSET_TABLE_[3+F+1 .. last]
are for resolving data
references. Addressability to the global offset table (GOT) is
accomplished using direct PC-relative addressing. There is no need for a
function to materialize an explicit base pointer to access the GOT.
GOT-based variables can be referenced directly using a single eight-byte
long-intermediate-operand instruction:
ld rdest,[pcl,varname@gotpc]
Similarly, the address of the GOT can be computed relative to the PC:
add rdest,pcl, _GLOBAL_OFFSET_TABLE_ @gotpc
This add instruction relies on the universal placement of the address of the _DYNAMIC variable at location 0 of the GOT.
References to the address of a function from an executable file or shared object and the shared objects associated with it might not resolve to the same value. References from within shared objects are normally resolved by the dynamic linker to the virtual address of the function itself. References from within the executable file to a function defined in a shared object are normally resolved by the link editor to the address of the procedure linkage table entry for that function within the executable file.
To allow comparisons of function addresses to work as expected, if an executable file references a function defined in a shared object, the link editor places the address of the PLT entry for that function in the associated symbol-table entry. The dynamic linker treats such symbol-table entries specially. If the dynamic linker is searching for a symbol, and encounters a symbol-table entry for that symbol in the executable file, it normally follows the rules below.
If the st_shndx member of the symbol-table entry is not SHN_UNDEF, the dynamic linker has found a definition for the symbol and uses its st_value member as the symbol's address. If the st_shndx member is SHN_UNDEF and the symbol is of type STT_FUNC and the st_value member is not zero, the dynamic linker recognizes the entry as special and uses the st_value member as the symbol's address.
Otherwise, the dynamic linker considers the symbol to be undefined within the executable file and continues processing.
Some relocations are associated with PLT entries. These entries are used for direct function calls rather than for references to function addresses. These relocations are not treated in the special way described above because the dynamic linker must not redirect procedure linkage table entries to point to themselves.
The PLT (Program Linkage Table) exists to allow function calls between dynamically linked shared objects. Each dynamic object has its own GOT (Global Offset Table) and PLT (Program Linkage Table).
The first entry of a shared object PLT is a special entry that calls
_dl_runtime_resolve
to resolve the GOT offset for the called
function. The _dl_runtime_resolve
function in the dynamic loader
resolves the GOT offsets lazily on the first call to any function,
except when LD_BIND_NOW
is set in which case the GOT entries are
populated by the dynamic linker before the executable is started. Lazy
resolution of GOT entries is intended to speed up program loading by
deferring symbol resolution to the first time the function is
called. The first entry in the PLT is:
1: ld r11,[pcl, _DYNAMIC@GOTPC+4]
ld r10,[pcl, _DYNAMIC@GOTPC+8]
j [r10]
Subsequent function entry stubs in the PLT load a function pointer
from the GOT. On the first call to a function, the entry redirects to
the first PLT entry which calls _dl_runtime_resolve
and fills in the
GOT entry for subsequent calls to the function:
1: ld r12,[pcl,func@gotpc]
j.d [r12]
mov r12,pcl
When executed, the PLT code loads the actual address of the function into r12 from the GOT. It then jumps through r12 to its destination. As it jumps, the delay-slot instruction loads r12 with the current value of the 32-bit-aligned PC address for identification. The PLT-entry PC address identifies the function called by allowing the lazy loader to calculate the index into the PLT, which also corresponds to the index of the relocation in the .rela.plt relocation section. The writable GOT or PLTGOT entry is initialized by the dynamic linker when the object is first loaded into memory. At first it is initialized to the special code stub that saves the volatile registers and calls the dynamic linker. The first time the function is called, the dynamic linker loads, links, and resolves the GOT or PLTGOT entry to point to the actual loaded function for subsequent calls.
The first entry in the PLT is reserved and is used as a reference to transfer control to the dynamic linker. At program load time, each GOT or PLTGOT entry is set to PLT[0], which is a hard-coded jump to the dynamic-link stub routine.
A relocation table (.rela.plt) is associated with the PLT. The DT_JMPREL entry in the dynamic section gives the location of the first relocation entry. The relocation table's entries parallel the PLT entries in a one-to-one correspondence. That is, relocation table entry 1 applies to PLT entry 1, and so on. The relocation type for each entry is R_ARC_JMP_SLOT. The relocation offset shall specify the address of the GOT or PLTGOT entry associated with the function, and the symbol table index shall reference the function's symbol in the .dynsym symbol table. The dynamic linker locates the symbol referenced by the R_ARC_JMP_SLOT relocation. The value of the symbol is the address of the first instruction of the function's PLT entry.
The dynamic linker can resolve the procedure linkage table relocations lazily, resolving them only when they are needed. Doing so might reduce program startup time. The LD_BIND_NOW environment variable can change dynamic linking behavior. If its value is non-null, the dynamic linker resolves the function call binding at load time, before transferring control to the program. That is, the dynamic linker processes relocation entries of type R_ARC_JMP_SLOT during process initialization. Otherwise, the dynamic linker evaluates procedure linkage table entries lazily, delaying symbol resolution and relocation until the first execution of a table entry.
Lazy binding generally improves overall application performance because unused symbols do not incur the dynamic-linking overhead. Nevertheless, some situations make lazy binding undesirable for some applications: The initial reference to a shared object function takes longer than subsequent calls because the dynamic linker intercepts the call to resolve the symbol, and some applications cannot tolerate such unpredictability.
If an error occurs and the dynamic linker cannot resolve the symbol, the dynamic linker terminates the program. Under lazy binding, this might occur at arbitrary times. Some applications cannot tolerate such unpredictability. By turning off lazy binding, the dynamic linker forces the failure to occur during process initialization, before the application receives control.
64-bit PC-relative relocations for symbol for symbol addresses on
sequences of instructions such as the ADDHL+ADDL
instruction pair,
and have an associated pair of relocations: R_ARC_PCREL_HI32_ME
plus
R_ARC_PCREL_LO323_ME
relocations.
The R_ARC_PCREL_HI32_ME
relocation referes to an ADDHL
instruction
containing the high 32-bits to be relocated to a symbol relative to
the program counter address of the ADDHL
instruction. This
instruction is followed by any instruction working on 64-bit datum and
sign-extending the 32-bit immediate field such as ADDL
instruction
with an R_ARC_PCREL_LO32_ME
relocation.
The R_ARC_PCREL_LO32_ME
relocation needs to resolve the lower 32-bit
of the symbol relative to the program counter address of the ADDHL
instruction which holds the higher 32-bits. Resolving the lower part
needs to know the relative offset between the ADDL
instruction and
ADDHL
instruction which will be used for correcting the lower
value. The addresses for pair of relocations are calculated like this:
hi32 = (symbol_address - PCL) >> 32 + ((symbol_address - PCL) >> 31 & 1);
lo32 = (symbol_address + hi32_off - PCL) & 0xffffffff;
hi32_off = lo32_reloc_address - hi32_reloc_address;
Here is an example assembler showing the relocation types:
label:
addhl_s r0,PCL,@symbol@pcl # R_ARC_PCREL_HI32 (symbol)
...
addl_s r0,r0,@symbol@pcl + (. - @label) # R_ARC_PCREL_LO32 (symbol)
For the case when the linker relaxation is in place the
R_ARC_PCREL_OFFSET
relocation to go in pair with R_ARC_PCREL_LO32
reloc to resolve hi32_off
at link-time:
hi32_off = (@label_address & -3 + 2) - PCL;
Here is an example assembler showing the all three relocation types:
label:
addhl_s r0,PCL,@symbol@pcl # R_ARC_PCREL_HI32 (symbol)
...
addl_s r0,r0,@symbol@pcl - @label@off # R_ARC_PCREL_LO32 (symbol)
# R_ARC_PCREL_OFFSET (label)
ARCv3 adopts the ELF Thread Local Storage Model in which ELF objects
define .tbss
and .tdata
sections and PT_TLS
program headers that
contain the TLS "initialization images" for new threads. The .tbss
and .tdata
sections are not referenced directly like regular
segments, rather they are copied or allocated to the thread local
storage space of newly created threads. See
https://www.akkadia.org/drepper/tls.pdf.
In The ELF Thread Local Storage Model, TLS offsets are used instead of
pointers. The ELF TLS sections are initialization images for the
thread local variables of each new thread. A TLS offset defines an
offset into the dynamic thread vector which is pointed to by the TCB
(Thread Control Block) held in the gp
register.
There are various thread local storage models for statically allocated or dynamically allocated thread local storage. The following table lists the thread local storage models:
Mnemonic | Model | Compiler flags |
---|---|---|
TLS LE | Local Exec | -ftls-model=local-exec |
TLS IE | Initial Exec | -ftls-model=initial-exec |
TLS LD | Local Dynamic | -ftls-model=local-dynamic |
TLS GD | Global Dynamic | -ftls-model=global-dynamic |
The program linker in the case of static TLS or the dynamic linker in the case of dynamic TLS allocate TLS offsets for storage of thread local variables.
Local exec is a form of static thread local storage. This model is used when static linking as the TLS offsets are resolved during program linking.
- Compiler flag
-ftls-model=local-exec
- Variable attribute:
__thread int i __attribute__((tls_model("local-exec")));
Initial exec is is a form of static thread local storage that can be
used in shared libraries that use thread local storage. TLS
relocations are performed at load time. dlopen
calls to libraries
that use thread local storage may fail when using the initial exec
thread local storage model as TLS offsets must all be resolved at load
time. This model uses the GOT to resolve TLS offsets.
- Compiler flag
-ftls-model=initial-exec
- Variable attribute:
__thread int i __attribute__((tls_model("initial-exec")));
ARCv3 local dynamic and global dynamic TLS models generate equivalent
object code. The Global dynamic thread local storage model is used
for PIC Shared libraries and handles the case where more than one
library uses thread local variables, and additionally allows libraries
to be loaded and unloaded at runtime using dlopen
. In the global
dynamic model, application code calls the dynamic linker function
__tls_get_addr
to locate TLS offsets into the dynamic thread vector
at runtime.
- Compiler flag
-ftls-model=global-dynamic
- Variable attribute:
__thread int i __attribute__((tls_model("global-dynamic")));
Example assembler load and store of a thread local variable i
using the
la.tls.gd
pseudoinstruction, with the emitted TLS relocations in comments:
In the Global Dynamic model, the runtime library provides the __tls_get_addr
function:
extern void *__tls_get_addr (tls_index *ti);
where the type tls index are defined as:
typedef struct
{
unsigned long int ti_module;
unsigned long int ti_offset;
} tls_index;
OS ABI consists of system calls provided by Linux kernel and call upon by user space library code.
-
ABI is similar to a regular function call in terms of arguments passing semantics. For example, 64-bit data in register pairs.
-
Up to eight arguments allowed in registers
r0
tor7
. -
Syscall number must be passed in register
r8
. -
Syscall return value is returned back in
r0
. -
All registers except
r0
are preserved by kernel across the Syscall.
The current Linux OS ABI (v4.8 kernel onwards) is ABIv4. For information on the ABI versions, see https://github.com/foss-for-synopsys-dwc-arc-processors/linux/wiki/ARC-Linux-Syscall-ABI-Compatibility
Dwarf Number | Register Name | Description |
---|