Skip to content

Basilisk II Core Emulation Analysis

Ricky Zhang edited this page Aug 21, 2017 · 43 revisions

Table of Contents

"Emulation core" refers to the logic of translating M68k CPU guest instructions into non-M68k CPU host instructions (Intel X86, AMD64, ARM and PPC). Besides CPU core emulation, we have a separate page to describe 68k Macintosh peripheral hardware emulation such as timer, ethernet, audio and etc. The majority of the following emulation analysis is based on the study of an AMD64 Linux host, but it should apply to different host architectures and operating systems.

History

The facts described below are purely based on:
If it is on the Internet it must be true.

Basilisk II CPU emulation was first started by Christian Bauer. From the initial source code, it has an original root from another M68k Amiga emulation project called UAE. By performing a diff between early commit set 2bebaceabc7646d in macemu git repo and UAE v0.8.10 source code, we find that the files build68k.c, cpuopti.c, gencpu.c and table68k are nearly identical to those in UAE. For further reading on UAE, you can view the UAE People Section in the WinUAE documentation.

Based on the commit history, Gwénolé Beauchesne is the key contributor to Basilisk II CPU emulation. He added JIT translation (a.k.a dynamic binary translation) to speed up emulation. TODO -- Add more when read JIT code.

Source Code

For non-M68k CPU emulation, the source code is under src/uae_cpu folder.'

TODO -- Add overview of Glue/Adapter, UAE CPU, FPU and JIT.

Addressing

Background

There are two different perspectives in terms of memory addressing in emulation.

The first one is from the host OS point of view. An emulation program such as Basilisk II runs as an application at ring 3 user space. The majority of modern host CPUs, such as Intel x86, AMD 64, PPC and ARM nowadays, contain an MMU, which may provide segmentation and paging. The majority of modern host OSes such as Linux, Mac OS X and Windows use virtual addressing, instead of direct physical addressing of memory.

For 32 bit CPUs, the CPU in theory can access up to 2^32 bytes (4GB) virtual memory. A 64 bit CPU can access a larger memory space than you can imagine. However, it doesn’t mean that applications can use an arbitrary virtual address. This usually depends on CPU architecture and host OS implementation. For example, 32 bit Linux by default will put aside the lower 3GB for user space and the upper 1GB for kernel space [1].

The second perspective is from the guest Macintosh OS point of view. In theory, the guest OS doesn’t know if it is running under a physical M68k CPU or an emulated CPU provided by BII. Therefore, BII needs to provide memory address mapping between the guest OS and BII's user space memory in the host OS when executing translated instructions.

According to the Wikipedia page on M68k series CPUs [2], only 68030 or above M68k series CPU have a built-in Paged MMU. In addition, Apple added virtual memory features to System 7. TODO -- Investigate if BII emulates the PMMU. Try to enable virtual memory in memory manager under control panel.

In terms of the address mapping provided by Basilisk II, there are three different types: direct addressing, real addressing and virtual addressing. By default, the GNU automake tools determine the proper addressing mapping strategy for you. If you know better than the automatic detection, you can override it by passing the enable-addressing option to the ./configure script. It accepts the options direct, real and banks. (Note that the banks option refers to virtual addressing). You can also see the addressing mode after running ./configure:

...
Assembly optimizations ................. : x86-64
Addressing mode ........................ : direct
Bad memory access recovery type ........ : siginfo
...

Addressing Strategy Selection

It is interesting to analyze the configure.ac script to figure out how the optimal addressing strategy is automatically determined. The selection logic depends on several tests of the host OS and platform.

  1. Check if OS supports VOSF a.k.a Video on Segmentation Fault (Regarding to the technical details of VOSF, please refer to [3]). Whether it supports VOSF or not is determined by another condition that whether OS supports segmentation fault signal handler. This relies on several compilation test in src/CrossPlatform folder controlled by macro. As long as any of test below is passed under Linux/Mac OS X, it will set variable CAN_VOSF as ‘yes’.
/*Test 1: Check if OS supports extended signal handlers via asm*/
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#define HAVE_ASM_UCONTEXT 1
#define HAVE_SIGINFO_T 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"


/*Test 2: Check if OS supports extended signal handlers*/
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#define HAVE_SIGINFO_T 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"


/*Test 3: Check if OS supports the hack*/
#define HAVE_SIGCONTEXT_SUBTERFUGE 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"
  1. Check if the OS allows page zero mapping in user space or another related page zero hack. Due to NULL pointer deference security concern, by default the modern Linux kernel has disabled page zero mapping. But Mac OS X still supports this.

In the end, the configure script uses the result of the tests:

  • If native M68k CPU, it uses real addressing.
  • If page zero mapping is supported and the host is able to do VOSF, it uses real addressing.
  • If page zero mapping is not supported, but it is able to do VOSF, it uses direct addressing.
  • Otherwise, it uses memory banks a.k.a virtual addressing.

See parts of configure.ac that determines addressing mode.

Common Data Structure and Utilities

Common data structure defined in src/uae_cpu/cpu_emulation.h

// RAM and ROM pointers
uint32 RAMBaseMac = 0;		// RAM base (Mac address space) gb-- initializer is important
uint8 *RAMBaseHost;			// RAM base (host address space)
uint32 RAMSize;				// Size of RAM
uint32 ROMBaseMac;			// ROM base (Mac address space)
uint8 *ROMBaseHost;			// ROM base (host address space)
uint32 ROMSize;				// Size of ROM

Common utilities functions defined in src/uae_cpu/cpu_emulation.h

static inline uint32 ReadMacInt32(uint32 addr) {return get_long(addr);}
static inline uint32 ReadMacInt16(uint32 addr) {return get_word(addr);}
static inline uint32 ReadMacInt8(uint32 addr) {return get_byte(addr);}
static inline void WriteMacInt32(uint32 addr, uint32 l) {put_long(addr, l);}
static inline void WriteMacInt16(uint32 addr, uint32 w) {put_word(addr, w);}
static inline void WriteMacInt8(uint32 addr, uint32 b) {put_byte(addr, b);}
static inline uint8 *Mac2HostAddr(uint32 addr) {return get_real_address(addr);}
static inline uint32 Host2MacAddr(uint8 *addr) {return get_virtual_address(addr);}

static inline void *Mac_memset(uint32 addr, int c, size_t n) {return memset(Mac2HostAddr(addr), c, n);}
static inline void *Mac2Host_memcpy(void *dest, uint32 src, size_t n) {return memcpy(dest, Mac2HostAddr(src), n);}
static inline void *Host2Mac_memcpy(uint32 dest, const void *src, size_t n) {return memcpy(Mac2HostAddr(dest), src, n);}
static inline void *Mac2Mac_memcpy(uint32 dest, uint32 src, size_t n) {return memcpy(Mac2HostAddr(dest), Mac2HostAddr(src), n);}

Direct Addressing

Host memory pre-allocation function in direct addressing

This allocates memory in host OS for guest OS's RAM and ROM.

TODO -- Change the following into sequence diagram later.

vm_acuqire_mac(RAMSize + 0x100000)  -- why ROM size only 1MB 0x100000?
=>
vm_acquire(size, VM_MAP_DEFAULT | VM_MAP_32BIT) -- from CrossPlatform VM allocator wrapper.
=>
vm_acquire calls on OS specific user space memory allocation function and setup read/write flag.

For example, in Linux vm_acquire function calls mmap [4] to map /dev/zero file to the starting address 0x10000000 (NOTE: virtual address of host OS starts at 256MB) with specific size. This creates chunks of pre-initialized memory in host OS. Then vm_acquire calls mprotect to specify allocated virtual memory with with read/write permission.

Mapping in direct addressing

The mapping of direct addressing between host and guest is simple and efficient.

Host: RAM starts at whatever return address from vm_acquire_mac function. Let’s denote it as RAMBaseHost. ROM starts at RAMBaseHost + RAMSize.

Guest: RAM starts at 0. ROM starts at RAMSIZE.

As you can easily tell, the mapping between host and guest is to apply a fixed difference -- RAMBaseHost. In my test, I found that Linux 64 and Linux ARMv7 uses direct addressing. You can run pmap command on BasiliskII process to confirm code analysis with actual memory usage at runtime. In my BII’s preference, 1GB RAM is specified. According to pmap, 1GB memory is allocated with read/write flag starting at virtual address 0x10000000.

[Ricky@gtx Unix]$ cat ~/.basilisk_ii_prefs | grep ramsize
ramsize 1073741824
[Ricky@gtx Unix]$ ps -A | grep BasiliskII
27151 pts/1    00:00:01 BasiliskII
[Ricky@gtx Unix]$ pmap 27151
27151:   ./BasiliskII
0000000010000000 1048640K rw---   [ anon ]
0000000050010000   1896K r----   [ anon ]
0000000078000000   2160K rwx-- BasiliskII
000000007821c000    544K rwx--   [ anon ]
000000007a1ef000   2920K rw---   [ anon ]

Real Addressing

Host memory pre-allocation function in real addressing

Real addressing requires host CPU to access host memory from 0x0000 to 0x2000. Basilisk II, as a user space application, needs to manually relocate its text segment and all other data segment properly so that they can avoid conflict with pre-allocated guest OS memory. The trick is done by linker script [5] defined in src/Unix/ldscripts folder. See -T option in linking.

...
g++ -o BasiliskII -Wl,-T,ldscripts/linux-x86_64.ld 	obj/main.o obj/prefs.o obj/prefs_items.o obj/sys_unix.o obj/rom_patches.o …
...

There are very few host OS and architecture can run in real addressing. But at that time, real addressing was by far the fastest addressing mapping scheme in emulation. That's why the original programmers tried very hard to overcome page zero problem. We will explore this later in modern AMD64 Linux host.

The following allocates memory in host OS for guest OS's RAM and ROM in contiguous location:

TODO -- Change the following into sequence diagram later.

vm_acquire_mac_fixed(0, RAMSize + 0x100000)
=>
vm_acquire_fixed(addr, size, VM_MAP_DEFAULT | VM_MAP_32BIT)
=>
vm_acquire_fixed calls on OS specific user space memory allocation function and setup read/write flag.

Mapping in real addressing

Compared to direct addressing, real addressing is very straightforward with zero overhead in terms of address translation. The address in guest CPU maps to the same address in the host CPU. There is no address mapping is needed.

Native M68k CPU can use real addressing. But just for fun, let’s do an experiment to trick Basilisk II into real addressing mode under modern Linux host.

First, we explicitly set vm.mmap_min_addr to 0 by sysctl in Linux so that we can use paging zero in user space.

[root@gtx vm]# echo 0 > /proc/sys/vm/mmap_min_addr

Secondly, run configure

./configure --enable-sdl-video --enable-sdl-audio --disable-jit-compiler --with-x --with-gtk --enable-addressing=real

However, it shows that the addressing uses memory_banks, instead. We already know that Linux can use VOSF. So it must be test that set $ac_cv_can_map_lm variable failed. I extract the test program from configure.ac

Run the following to compile and execute the test program. I got an illegal instruction error, instead of segfault. Here is test program source code and below is compilation and test result:

g++ -o conftest -g -O2 -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT    conftest.cpp -lm -lrt -lrt  -lSDL -lpthread
./conftest
Illegal instruction (core dumped)

Here is the more interesting finding. I run gdb and found that it failed at lm[0] = ‘z’. After disassembling the binary, I found that it is GCC optimization's fault.

.text:00000000004006C0                 call    _Z16vm_acquire_fixedPvmi ; vm_acquire_fixed(void *,ulong,int)
.text:00000000004006C5                 test    eax, eax
.text:00000000004006C7                 js      short loc_4006D3
.text:00000000004006C9                 mov     byte ptr ds:0, 0
.text:00000000004006D1                 ud2

GCC thought that since lm is a NULL pointer, there is no point to do assignment. That’s why you see ud2 undefined instruction, which raise an invalid opcode exception.

If you remove -O2 optimization flag in compilation, you will get a successful run.

g++ -o conftest -g -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT    conftest.cpp -lm -lrt -lrt  -lSDL -lpthread
./conftest

Here is the assembly from non-optimization binary:

.text:0000000000400A8F loc_400A8F:                             ; CODE XREF: main+1Aj
.text:0000000000400A8F                 mov     edx, 2
.text:0000000000400A94                 mov     esi, 2000h
.text:0000000000400A99                 mov     edi, 0
.text:0000000000400A9E                 call    _Z16vm_acquire_fixedPvmi ; vm_acquire_fixed(void *,ulong,int)
.text:0000000000400AA3                 shr     eax, 1Fh
.text:0000000000400AA6                 test    al, al
.text:0000000000400AA8                 jz      short loc_400AB4
.text:0000000000400AAA                 mov     edi, 1          ; status
.text:0000000000400AAF                 call    _exit
.text:0000000000400AB4 ; ---------------------------------------------------------------------------
.text:0000000000400AB4
.text:0000000000400AB4 loc_400AB4:                             ; CODE XREF: main+3Fj
.text:0000000000400AB4                 mov     rax, [rbp+var_8]
.text:0000000000400AB8                 mov     byte ptr [rax], 7Ah ; Ricky comment - 7Ah is ‘z’ ASCII
.text:0000000000400ABB                 mov     rax, [rbp+var_8]

Therefore, to workaround this you need to add compile flag as -O0 to disable GCC optimization.

CFLAGS=-O0;CPPFLAGS=-O0 ./configure --enable-sdl-video --enable-sdl-audio --disable-jit-compiler --with-x --with-gtk --enable-addressing=real
…
Assembly optimizations ................. : x86-64
Addressing mode ........................ : real
Bad memory access recovery type ........ : siginfo


...

Build and run BasiliskII in real mode

[Ricky@gtx zero_mem]$ ps -A | grep BasiliskII
30607 pts/2    00:00:00 BasiliskII
[Ricky@gtx zero_mem]$ pmap 30607
30607:   ./BasiliskII
0000000000000000 1048576K rw---   [ anon ]
0000000040425000     64K rw---   [ anon ]
0000000040435000   1896K r----   [ anon ]

Finally, BasiliskII can run in real addressing mode with a workaround.

Virtual Addressing

Host memory pre-allocation function in virtual addressing

Host memory pre-allocation in virtual addressing is the same as the one in direct addressing. The difference lies in initialization of memory banks, which are unique data structure in virtual addressing.

TODO -- Change the following into sequence diagram later.

InitAll(vmdir) -- from src/main.cpp
=>
Init680x0() -- from src/uae_cpu/basilisk_glue.cpp. Specify RAMBaseMac=0, ROMBaseMac address based on ROM type.
=>
memory_init(void) -- from src/uae_cpu/memory.cpp. It initializes mem_banks array and compute diff. See details below.

Inside memory_init function, it computes the address differences between host and guest in RAM, ROM and framebuffer.

In addition, it initializes mem_banks array, which are an array of struct that contains a group of memory read/write function pointers.

We will revisit mem-banks later in next section.

Mapping in virtual addressing

The mapping mechanism of virtual addressing a.k.a memory banks is the most complicated one among all three addressing modes.

Let’s first review some basic facts in host and guest OS.

Host: like direct addressing, RAM starts at whatever return address from vm_acquire_mac function. Let’s denote it as RAMBaseHost. ROM starts at RAMBaseHost + RAMSize.

Guest: RAM RAMBaseMac starts at 0. ROM ROMBaseMacstarts at the address based on ROM type.

Now we come back to the discussion of mem_banks array data structure.

In M68k CPU, 68020 and above support 32 bit addressing, while 68010 and below supports 24 bit addressing. In Basilisk II implementation, virtual addressing divides address space into equal sized banks. The size of each bank is 2^16 = 64Kbyte.

In 32 bit addressing, the upper 32 bit to 17 bit of address, i.e the virtual address left shift 16, is used as bank index. The maximum number of banks are also 2^16 = 65536. Each bank can be used as RAM, ROM and framebuffer. For each bank, Basilisk II assign different read/write policy during host memory initialization. For example, ROM is read only, the mapping of RAM and ROM between host and guest is based on different memory address difference. See details in src/uae_cpu/memory.h.

Host memory initialization is done in memory_init function. First it initializes the whole mem_banks array with a default struct of dummy memory accessing function pointers. Then based on RAM, frame buffer and ROM and also the condition whether it supports 24 bit or 32 bit addressing, it fills mem_banks array with corresponding memory access logic.

Basilisk II virtual address design is not a software PMMU emulation [6], although It looks similar to one level memory paging. But it just the way Basilisk II manage mapping between guest and host. I doubt that you can enable virtual memory options in System 7 control panel. TODO test this in Basilisk II. I have a hunch that Basilisk II emulation core doesn’t translate 68030 PMMU related instructions, such as PTEST, PLOAD, PFLUSH and PFLUSHA. TODO come back to this later when reading BII CPU instruction emulation code.

By default, Intel modern Mac OS X uses this. But you can also force Linux use virtual addressing in configure as well. As you can see, virtual addressing comes with additional overhead. But I can hardly tell the performance differences under Intel Core i7‑2700K 3.5 GHz CPU.

Static Analysis

TODO

Dynamic Analysis

TODO

Bibliography

  1. Virtual Memory and Linux
  2. Motorola_68000_series#Feature_map
  3. Explanation on VOSF from Basilisk II devel mailing list
  4. Linux mmap function
  5. Linker Script Guide
  6. Assembly Language Programming for the 68000 Family, Chapter 13 68030, Section Paged Memory Management