emmy.html

EMMY

EMMY is yet another x86 CPU / PC system emulator program (yaxcpsep), targeted for OS development. EMMY incorporates these features:

On-the-fly translation of instructions for speed
Contains a GDB-like debugger for source-level debugging

There are currently these restrictions on it:

Requires a modren Linux system as the host. I used a kernel 2.4.9 system for development. It uses the clone call to create a thread that runs the select call.
It only has a 80x25 character cell VGA-compatible screen, no graphics. I plan to import the Bochs VGA stuff someday.

EMMY provides this environment:

A generic PC-style motherboard with RAM, CMOS timer, 8259 PICs, IOAPIC, ISA and PCI bus
BIOS
Generic ATA disk and ATAPI CD, with PIO controller
Floppy drive & 8237 DMA controller
Character-cell 80x25 VGA screen and ASCII-only keyboard
SMP x86 CPUs (Pent Pros, FPU but no MMX, etc)
Realtek 8139 Ethernet card & PCI BIOS

Anyway, besides running OZONE, it will run Linux 2.4.20 and XEN 1.2.

EMMY is released under the GPL. You can download it here as a bzip2 tar saveset. It is approximately 380K. Rev date: July 27, 2004

Once you download it, unpack it and do a make. The make requires bcc and as86 to compile the BIOS (imported from Bochs). If your system does not have them, they are in the Dev86src-0.16.0.tar.gz saveset. There is also an included BIOS that uses gas to compile (as an alternative, see below).

Here is some documentation on Emmy. It is included as emmy.doc in the distribution saveset.

BIOS

About the BIOS stuff, it is a little mess. I originally started with the Bochs BIOS and made a few changes. It is still here, in emmy_rombios_old.c. It requires bcc and as86 to compile and assemble. I wrote a replacement BIOS in emmy_rombios_new.s that just requires the standard gas (GNU assembler) to assemble. It is not as comprehensive as the Bochs BIOS but is sufficient to run GRUB, boot protected mode OS's, and will even run MS-DOS.

Anyway, the makefile will try to compile both emmy_rombios_old and emmy_rombios_new. It will also set a emmy_rombios.bin softlink to whichever was the latest to build. If you want to use one or the other, set it manually to either emmy_rombios_old.bin or emmy_rombios_new.bin.

Along with that, there is the file emmy_vgabios.s that is the video BIOS. I could not use the Bochs BIOS at all (as the licensing terms state it is for use with Bochs only), so I wrote one myself. Of course, as of today, it is text only, but is sufficient for MS-DOS and booting protected mode OS's.

How the translator works

Internally, Emmy has a struct that represents the state of the CPU, consisting of registers and the like. The translated code consists of instructions which modify this struct. For example, a register-to-register add instruction translates to code which adds two elements of the struct, placing the result in the struct, then modifying the struct location for the eflags.

      For example, an 'addl %ebx,%eax' translates to:

         movl emulated_ebx(%ebx),%ecx		# get source register
         addl %ecx,emulated_eax(%ebx)		# add to destination register
         pushfl					# save resultant flags
         andw $0xF700,emulated_eflags(%ebx)	# clear OSZAPC bits in emulated eflags
         popl %eax				# get resultant flags
         andb $0x08,%ah				# filter just leaving OSZAPC bits
         orw  %ax,emulated_eflags(%ebx)		# stuff them in emulated eflags

If the instruction is followed by something that sets eflags to something else, the eflags update above gets wiped out, so all you usually end up with is just the movl and the addl.

      An translation of 'movl %eax,0x12345678' would be:

         movl  $0x12345678,%edi			# get linear address in %edi
         pushl emulated_eax(%ebx)		# push value to write
         movl  $emmy_x86_x_writelin_wrap,%eax	# point to wrapper routine
         pushl $4				# writing a long to memory
         pushl %edi				# push its address
         pushl %ebx				# push CPU struct pointer
         call  *%eax				# call wrapper routine
         addl  $16,%esp				# pop call args from stack

When Emmy comes to execute some code in a page for the first time, it makes some checks first before attempting to translate. It will only attempt translation if all the below are YES. This is how most OSs run, so I decided not to generate code that is not used in the common case. If any of the tests fail, the interpreter (Emmy_X86::i_interp) is called to execute instructions until the tests succeed.

Is the code located in the general RAM array (not in some device like BIOS ROM)?
This must be true as the translator will write-protect the page before attempting to translate instructions. Then, when anything tries to write to the page, all translations will be discarded.
Is the CPU in 32-bit mode?
This assumption is checked so that the translator knows that opcode 01 without any prefix is an ADDL, not an ADDW. Otherwise, it would have to output twice the code, as well as code to test which set is used, only one of which ever would anyway.
Is tracing turned off (EFLAGS<TF>=0)?
If it were allowed to be either way, the translated code would have to test the bit and cause an exception.
Is the CPU executing in ring 0 or ring 3 (not 1 or 2)?
This makes testing stuff like memory protection easy for the translated code.
Is the IOPL ring 0 or ring 3 (not 1 or 2)?
Likewise for testing whether an IO instruction is permitted or not.
Are there no hardware debugging registers active?
Otherwise, the translator would have to emit code to test debug registers.
Do the CS, DS, ES, SS registers all indicate flat mode?
This is assumed so the translator doesn't have to emit code to add the segment base (which is almost always zero), and test against a limit which is almost always the max possible. We don't require this of FS and GS, as OSs sometimes use these registers, and the translator won't translate any instruction that uses them.

When all the above conditions are satisfied, it will start generating translated code. It will continue translating until one of these conditions happen:

Some instruction that can break one of the above assumptions is executed (like an IRET or a seg FS/GS prefix). However, it will translate stuff like POP %DS, with code to then check to make sure it's still flat, exiting the translation if not. I did this as these are mindlessly present in flat interrupt service handlers.
It reaches the end of the page or branches off the page. This includes things like RET and 'computed' jumps.
Any instruction which spans a page boundary is not translated (something might overwrite the second half without overwriting the first half).
Many of the MOV %Rn,%CRn instructions and stuff like that won't translate, they are deferred to the interpreter. But MOV %CRn,%Rn translates as there are no side-effects of executing this instruction and is especially useful in a pagefault handler for reading CR2.
Floating-point instructions are not translated (I'm too lazy).

The 'epilog' of a translation will clean things up and return to the interpreter. The interpreter will always interpret the next instruction (even if there is a known translation). Thus the instruction that stopped the translation will be interpreted. This is necessary because of the case where a POPFL was translated, but the translator saw that it would result in an assumption being broken (like setting IOPL 1 or 2).

When the translation of a page is performed, it starts at the instruction it is given and proceeds until one of the above happens. The address of each instruction's translation is stored in an array of 4096 pointers. This is how the translation can be jumped to arbitrarily in the middle of a sequence of instructions (like for looping). The array is initialized to all zeroes. When the translation of a string of instructions on that page is successful, the address of the translation for each individual instruction is stored in its corresponding index in the array. If the translator determines that it cannot translate an instruction (like an IRET), it will put a one in the array to indicate the instruction must be interpreted. This prevents the stepper from repeatedly trying to translate the untranslatable over-and-over again.

This array and the corresponding translation buffers remain in Emmy's memory until the codepage they came from is overwritten. Any write to a page will discard all translations for the page. This is necessary because once the original page is write-enabled, it is 'fair game' for any modifications, and the common case is that the page is being re-used for another process. The far less common case is for self-modifying code or pages with mixed code and read/write data.

Instructions that access registers are translated to instructions that access members of the CPU struct. Instructions that access memory and IO ports are translated to calls to subroutines back to the main emulator code. Any exceptions (like divide-by-zero or pagefault) are done via subroutine calls back to the emulator main code which does a longjmp to wipe the emulator's stack.

Backward branches are translated a bit strangely. Since the emulator can't step IO devices and check for control-C while it's executing translated code, an infinite loop in translated code could prevent it from ever gaining control. To prevent this, the translation for backward branches test for control-C or general timeout. If there is an infinite loop, the translation will time-out (after 1 to 2 seconds) and will be flagged to exit back to the interpreter. The interpreter will do things like check for control-C and IO interrupts, etc. If there is nothing special, it will automatically resume the translated code, where it is free to continue in the infinite loop until the next timeout or control-C.

      So a 'jne backward' translates to:

         movb emulated_eflags(%ebx),%ah		# test the condition
         sahf
         je   9f				# just stay in translated code if we are making forward progress
         cmpb $0,emmy_x86_xlated_stop		# time to go backward, check for timeout or control-C
         je   translated_address_for_backward	# if no timeout or control-C, jump backward
         jmp  return_to_interpreter		# got a timeout or control-C, return to interpreter
      9:

GDB

EMMY orignally used a modified GDB v6.0. You can download it here as a bzip2 tar saveset. It is approx 6.5 Meg (mostly GDB stuff). Rev date: June 9, 2004

I decided to discontinue using GDB as, you can see, it is very large. Since I wrote a symbolic debugger for OZONE anyway, it was not much extra work to interface it to Emmy. And now, I have a symbolic debugger that I actually know how it works.

For space considerations, I did not include the whole GDB 6.0 source tree, only the part that is GDB itself is included above. If you want the whole thing, you can get gdb-6.0.tar.gz from GNU.ORG. If you do that, unpack it first, configure it for native Linux use, then overlay my gdb-6.0/gdb... stuff on top of it and make as above. My makefile creates a library called libgdb.a that the Emmy makefile links to.

Once you download it, unpack it. There are two makes you have to do:

cd emmy/gdb-6.0/gdb ; make
cd ../.. ; make

The second make requires bcc and as86 to compile the BIOS (imported from Bochs). If your system does not have them, they are in the Dev86src-0.16.0.tar.gz saveset.

Changes as of July 27, 2004

emmy_x86_i_fpu.c:

  1) fix some fpu load and store datatypes

Changes as of July 26, 2004

emmy_atadisk.c:

  1) up to 4G drives
  2) print out default CHS values
  3) no interrupt after last data read
  4) no interrupt before first data acceptance
  5) check for write to read-only media

emmy_atapicd.c:

  1) direct connect to raw disks/CD's
  2) print media size out
  3) accept 'prevent/allow medium removal' command

emmy_pc.c:

  1) display CPU status when translation loops for a second

emmy_pcmobo.c,.h:

  1) better modelling of 'default configuration' using IOAPIC and 8259s
     it previously required an EOI to local APIC when using 8259s via local APIC virtual wire mode

emmy_pit8254.c,.h:

  1) emulate port 61 (speaker control) and timer #2

emmy_rombios_new.s:

  1) support int 41 & 46 vector hard disk geometry data
  2) fixed comments so it will assemble with gas 2.13

emmy_rtc146818.c,.h:

  1) emulate alarm interrupts

emmy_vgabios.s:

  1) accept AH=01 (set text-mode cursor shape) calls as a nop
  2) fix repeat count on AH=09 call to fill screen

emmy_x86.c:

  1) parametrize CPUID vendor name (status cpu0 vend_AuthenticAMD or whatever)
  2) move translated divide-error traps from explicit checking to catching signal
  3) implement 'semi-flat' translation (ie, base 0, all limits page-aligned and equal) (for Xen)
  4) fixes for IOAPIC/LAPIC/8259 stuff mentioned above

emmy_x86.h:

  1) 'semi-flat' mode chages
  2) parametrize CPUID vendor name
  3) task/call gate routine prototypes
  4) fixed to compile on gcc 3.3.2

emmy_x86_exception.c:

  1) fixed to compile on gcc 3.3.2
  2) task gate support

emmy_x86_i_fpu.c:

  1) fixed push to fpu stack macro, didn't always mark register occupied

emmy_x86_i_interp.c:

  1) changed var from eflags to eflagx so it can't get confused with class element 'eflags'
  2) task gates, call gates, etc
  3) treat AMD's PREFETCH as a NOP

emmy_x86_i_macs.h:

  1) NEG instruction not setting condition codes correctly
     (changed var from eflags to eflagx so it can't get confused with class element 'eflags')
  2) fixed signed idiv overflow checking
  3) call/task gate implementation
  4) IORD/IOWR macros for consistent IO error handling

emmy_x86_internal.h:

  1) parameterize CPUID vendor name string
  2) add PGE flag to CPUID capabilities mask
  3) add I_USER macro for consistent user/system mode checking
  4) add IORD and IOWR macros for consisten IO error handling
  5) call/task gate implementation

emmy_x86_lapic.c:

  1) just return error status, don't mcheck illegal writes
  2) fixes for IOAPIC/LAPIC/8259 stuff mentioned above

emmy_x86_memory.c:

  1) fixed to compile on gcc 3.3.2
  2) implement 'semi-flat' mode checks
  3) fixed bogus user/system access checks

emmy_x86_modlin_wraps.c:

  1) changed var from eflags to eflagx so it can't get confused with class element 'eflags'

emmy_x86_stsbpt.c:

  1) added pseudo-register VEND to set and display CPUID vendor name
  2) fixed to compile on gcc 3.3.2

emmy_x86_x_wraps.c:

  1) macroize in&out for consistent error checking
  2) implement 'semi-flat' mode checks
  3) changed var from eflags to eflagx so it can't get confused with class element 'eflags'

emmy_x86_x_xlate.c:

  1) divide error checking moved to signal handler
  2) changed var from eflags to eflagx so it can't get confused with class element 'eflags'
  3) treat AMD's PREFETCH as a NOP
  4) fixed to compile on gcc 3.3.2

debug/common/cmd_file.c:

  1) text size and address removed (meaningless in some cases)

debug/common/findfinalstyp.c:

  1) process N_EXCL but still doesn't work right

debug/common/findfuncparam.c:

  1) var's type can be defined in module other than what is referencing it

debug/common/findfunctionname.c:

  1) move function names into their own list as they are not always in address order within modules

debug/common/findsourceline.c:

  1) move function names into their own list as they are not always in address order within modules

debug/common/findstypstr.c:

  1) process N_EXCL but still doesn't work right

debug/common/findvariable.c:

  1) couldn't find some variables

debug/common/setcurthread.c:

  1) fixed types in print statements

debug/o_emmy/readwritemem.c:

  1) fixed error messages

debug/x_elf/readexefile.c:

  1) read functions into separate list sorted by ascending address for better searching
     (they aren't always listed in module by ascending address)