EMMY is yet another x86 CPU / PC system emulator program (yaxcpsep), targeted for OS development. EMMY incorporates these features:
Anyway, besides running OZONE, it will run Linux 2.4.20 and XEN 1.2.
EMMY is released under the GPL. You can download it here as a bzip2 tar saveset. It is approximately 380K. Rev date: July 27, 2004
Once you download it, unpack it and do a make. The make requires bcc and as86 to compile the BIOS (imported from Bochs). If your system does not have them, they are in the Dev86src-0.16.0.tar.gz saveset. There is also an included BIOS that uses gas to compile (as an alternative, see below).
Here is some documentation on Emmy. It is included as emmy.doc in the distribution saveset.
About the BIOS stuff, it is a little mess. I originally started with the Bochs BIOS and made a few changes. It is still here, in emmy_rombios_old.c. It requires bcc and as86 to compile and assemble. I wrote a replacement BIOS in emmy_rombios_new.s that just requires the standard gas (GNU assembler) to assemble. It is not as comprehensive as the Bochs BIOS but is sufficient to run GRUB, boot protected mode OS's, and will even run MS-DOS.
Anyway, the makefile will try to compile both emmy_rombios_old and emmy_rombios_new. It will also set a emmy_rombios.bin softlink to whichever was the latest to build. If you want to use one or the other, set it manually to either emmy_rombios_old.bin or emmy_rombios_new.bin.
Along with that, there is the file emmy_vgabios.s that is the video BIOS. I could not use the Bochs BIOS at all (as the licensing terms state it is for use with Bochs only), so I wrote one myself. Of course, as of today, it is text only, but is sufficient for MS-DOS and booting protected mode OS's.
Internally, Emmy has a struct that represents the state of the CPU, consisting of registers and the like. The translated code consists of instructions which modify this struct. For example, a register-to-register add instruction translates to code which adds two elements of the struct, placing the result in the struct, then modifying the struct location for the eflags.
For example, an 'addl %ebx,%eax' translates to: movl emulated_ebx(%ebx),%ecx # get source register addl %ecx,emulated_eax(%ebx) # add to destination register pushfl # save resultant flags andw $0xF700,emulated_eflags(%ebx) # clear OSZAPC bits in emulated eflags popl %eax # get resultant flags andb $0x08,%ah # filter just leaving OSZAPC bits orw %ax,emulated_eflags(%ebx) # stuff them in emulated eflagsIf the instruction is followed by something that sets eflags to something else, the eflags update above gets wiped out, so all you usually end up with is just the movl and the addl.
An translation of 'movl %eax,0x12345678' would be: movl $0x12345678,%edi # get linear address in %edi pushl emulated_eax(%ebx) # push value to write movl $emmy_x86_x_writelin_wrap,%eax # point to wrapper routine pushl $4 # writing a long to memory pushl %edi # push its address pushl %ebx # push CPU struct pointer call *%eax # call wrapper routine addl $16,%esp # pop call args from stack
When Emmy comes to execute some code in a page for the first time, it makes some checks first before attempting to translate. It will only attempt translation if all the below are YES. This is how most OSs run, so I decided not to generate code that is not used in the common case. If any of the tests fail, the interpreter (Emmy_X86::i_interp) is called to execute instructions until the tests succeed.
When all the above conditions are satisfied, it will start generating translated code. It will continue translating until one of these conditions happen:
When the translation of a page is performed, it starts at the instruction it is given and proceeds until one of the above happens. The address of each instruction's translation is stored in an array of 4096 pointers. This is how the translation can be jumped to arbitrarily in the middle of a sequence of instructions (like for looping). The array is initialized to all zeroes. When the translation of a string of instructions on that page is successful, the address of the translation for each individual instruction is stored in its corresponding index in the array. If the translator determines that it cannot translate an instruction (like an IRET), it will put a one in the array to indicate the instruction must be interpreted. This prevents the stepper from repeatedly trying to translate the untranslatable over-and-over again.
This array and the corresponding translation buffers remain in Emmy's memory until the codepage they came from is overwritten. Any write to a page will discard all translations for the page. This is necessary because once the original page is write-enabled, it is 'fair game' for any modifications, and the common case is that the page is being re-used for another process. The far less common case is for self-modifying code or pages with mixed code and read/write data.
Instructions that access registers are translated to instructions that access members of the CPU struct. Instructions that access memory and IO ports are translated to calls to subroutines back to the main emulator code. Any exceptions (like divide-by-zero or pagefault) are done via subroutine calls back to the emulator main code which does a longjmp to wipe the emulator's stack.
Backward branches are translated a bit strangely. Since the emulator can't step IO devices and check for control-C while it's executing translated code, an infinite loop in translated code could prevent it from ever gaining control. To prevent this, the translation for backward branches test for control-C or general timeout. If there is an infinite loop, the translation will time-out (after 1 to 2 seconds) and will be flagged to exit back to the interpreter. The interpreter will do things like check for control-C and IO interrupts, etc. If there is nothing special, it will automatically resume the translated code, where it is free to continue in the infinite loop until the next timeout or control-C.
So a 'jne backward' translates to: movb emulated_eflags(%ebx),%ah # test the condition sahf je 9f # just stay in translated code if we are making forward progress cmpb $0,emmy_x86_xlated_stop # time to go backward, check for timeout or control-C je translated_address_for_backward # if no timeout or control-C, jump backward jmp return_to_interpreter # got a timeout or control-C, return to interpreter 9:
EMMY orignally used a modified GDB v6.0. You can download it here as a bzip2 tar saveset. It is approx 6.5 Meg (mostly GDB stuff). Rev date: June 9, 2004
I decided to discontinue using GDB as, you can see, it is very large. Since I wrote a symbolic debugger for OZONE anyway, it was not much extra work to interface it to Emmy. And now, I have a symbolic debugger that I actually know how it works.
For space considerations, I did not include the whole GDB 6.0 source tree, only the part that is GDB itself is included above. If you want the whole thing, you can get gdb-6.0.tar.gz from GNU.ORG. If you do that, unpack it first, configure it for native Linux use, then overlay my gdb-6.0/gdb... stuff on top of it and make as above. My makefile creates a library called libgdb.a that the Emmy makefile links to.
Once you download it, unpack it. There are two makes you have to do:
emmy_x86_i_fpu.c: 1) fix some fpu load and store datatypes
emmy_atadisk.c: 1) up to 4G drives 2) print out default CHS values 3) no interrupt after last data read 4) no interrupt before first data acceptance 5) check for write to read-only media emmy_atapicd.c: 1) direct connect to raw disks/CD's 2) print media size out 3) accept 'prevent/allow medium removal' command emmy_pc.c: 1) display CPU status when translation loops for a second emmy_pcmobo.c,.h: 1) better modelling of 'default configuration' using IOAPIC and 8259s it previously required an EOI to local APIC when using 8259s via local APIC virtual wire mode emmy_pit8254.c,.h: 1) emulate port 61 (speaker control) and timer #2 emmy_rombios_new.s: 1) support int 41 & 46 vector hard disk geometry data 2) fixed comments so it will assemble with gas 2.13 emmy_rtc146818.c,.h: 1) emulate alarm interrupts emmy_vgabios.s: 1) accept AH=01 (set text-mode cursor shape) calls as a nop 2) fix repeat count on AH=09 call to fill screen emmy_x86.c: 1) parametrize CPUID vendor name (status cpu0 vend_AuthenticAMD or whatever) 2) move translated divide-error traps from explicit checking to catching signal 3) implement 'semi-flat' translation (ie, base 0, all limits page-aligned and equal) (for Xen) 4) fixes for IOAPIC/LAPIC/8259 stuff mentioned above emmy_x86.h: 1) 'semi-flat' mode chages 2) parametrize CPUID vendor name 3) task/call gate routine prototypes 4) fixed to compile on gcc 3.3.2 emmy_x86_exception.c: 1) fixed to compile on gcc 3.3.2 2) task gate support emmy_x86_i_fpu.c: 1) fixed push to fpu stack macro, didn't always mark register occupied emmy_x86_i_interp.c: 1) changed var from eflags to eflagx so it can't get confused with class element 'eflags' 2) task gates, call gates, etc 3) treat AMD's PREFETCH as a NOP emmy_x86_i_macs.h: 1) NEG instruction not setting condition codes correctly (changed var from eflags to eflagx so it can't get confused with class element 'eflags') 2) fixed signed idiv overflow checking 3) call/task gate implementation 4) IORD/IOWR macros for consistent IO error handling emmy_x86_internal.h: 1) parameterize CPUID vendor name string 2) add PGE flag to CPUID capabilities mask 3) add I_USER macro for consistent user/system mode checking 4) add IORD and IOWR macros for consisten IO error handling 5) call/task gate implementation emmy_x86_lapic.c: 1) just return error status, don't mcheck illegal writes 2) fixes for IOAPIC/LAPIC/8259 stuff mentioned above emmy_x86_memory.c: 1) fixed to compile on gcc 3.3.2 2) implement 'semi-flat' mode checks 3) fixed bogus user/system access checks emmy_x86_modlin_wraps.c: 1) changed var from eflags to eflagx so it can't get confused with class element 'eflags' emmy_x86_stsbpt.c: 1) added pseudo-register VEND to set and display CPUID vendor name 2) fixed to compile on gcc 3.3.2 emmy_x86_x_wraps.c: 1) macroize in&out for consistent error checking 2) implement 'semi-flat' mode checks 3) changed var from eflags to eflagx so it can't get confused with class element 'eflags' emmy_x86_x_xlate.c: 1) divide error checking moved to signal handler 2) changed var from eflags to eflagx so it can't get confused with class element 'eflags' 3) treat AMD's PREFETCH as a NOP 4) fixed to compile on gcc 3.3.2 debug/common/cmd_file.c: 1) text size and address removed (meaningless in some cases) debug/common/findfinalstyp.c: 1) process N_EXCL but still doesn't work right debug/common/findfuncparam.c: 1) var's type can be defined in module other than what is referencing it debug/common/findfunctionname.c: 1) move function names into their own list as they are not always in address order within modules debug/common/findsourceline.c: 1) move function names into their own list as they are not always in address order within modules debug/common/findstypstr.c: 1) process N_EXCL but still doesn't work right debug/common/findvariable.c: 1) couldn't find some variables debug/common/setcurthread.c: 1) fixed types in print statements debug/o_emmy/readwritemem.c: 1) fixed error messages debug/x_elf/readexefile.c: 1) read functions into separate list sorted by ascending address for better searching (they aren't always listed in module by ascending address)