[LWN Logo]
[LWN.net]
From:	 "Rick A. Hohensee" <rickh@Capaccess.org>
To:	 cola@stump.algebra.com
Subject: osimplay, formerly shasm, is now beta
Date:	 Sat, 05 Jan 2002 04:25:06 -0500
Cc:	 lwn@lwn.net

osimplay, formerly shasm, is an x86 macro-assembler, "mid-level-language",
or "compembler". It is implemented entirely in GNU Bash 2 without
dependance on any external utils. Coverage is roughly 386, real and pmode,
no FPU, with Linux syscalls. osimplay has simple analogues of a nice set
of C and Forth features, and some unique features such as the "xray"
jump-table construct, without creating any syntactic seam between
high-level and low-level. There is no asm("") or CODE/ENDCODE. osimplay
can now build working examples of small Linux ELF executables, and a
bootsector, and the sources are included. osimplay is thus at beta
development level. It's reasonably useable, and the bugs that arise may
now be small enough to not always require the author to fix, although I
would love to know about them. This version of osimplay is public domain.
Programmers and would-be programmers that enjoy having thier assumptions
challenged should find osimplay amusing.
included) to a mode-changing bootsector (also included, working.). In
ftp://ftp.gwdg.de/pub/cLIeNUX/interim/osimplay.tgz
rickh@capaccess.org             Rick Hohensee, sole author



long blurb......................................

asmacs begat shasm begat osimpa begat osimplay, and I'm saying


                   osimplay is now beta.


asmacs
 was just a bunch of m4 macros for Gas that simply transliterated Intel
opcode and register names to names I consider massively clearer and/or
more convenient. Intel MOVx is = in osimplay, and LMSW is
loadmachinestatusword. = is about 25% of most code, and I believe there's
one occurance of LMSW in Linux, and I think that's there out of nostalgia.
Main register names in osimplay are A, B, C, D, SP, BP, SI and DI. I found
asmacs very helpful, and this simple renaming remains the big win in
osimplay. High-level languages have frozen the evolution of assemblers,
and some catch-up is about 35 years overdue.

shasm
 got rid of most of the need for sized register names like A - AX - AL
with "byte" and "cell" keywords. The cell concept also hides some
fundamental machine information elegantly, and thus is seen previous to
shasm (by 1970 or so) in Forth and BCPL, and is very helpful with the fact
that a 386 is two different size machines, 16 bit rmode and 32 bit pmode.
The concept may be "forward-compatible" to IA64 also, but I don't know
that architecture. shasm also allows source/dest or dest/source (AT&T or
Intel) syntaciis by expanding the usual "," arguments-delimiter to "to",
"from" or "with". shasm got Slashdotted before it could really produce
much working 386 code, but it did produce some shortly thereafter. shasm
and it's existing subsequent versions are 100% GNU Bash 2 shell scripts.
That's right, just a recent sh. No dd, sed, etc. "Installing", running,
and reading some operator-specific osimplay help on Linux/Bash is...

        tar xzvf osimplay.tgz
        cd osimplay_
        . osimplay
        = h

osimpa
 was shasm+enthusiasm. osimpa added various rustic imitations of C and
Forth constructs to shasm, and a couple features I suspect are unique,
without losing seamless access to assembly. A seam is typified by the
asm("") seam between C and Gas in the GNU toolchain. osimpa features
include; "allot", data "clump"s, "print", "text", "Linux" (syscalls),
"entrance" procedures, "heap" (like .bss), "ELF" (executables only) and so
on. In the course of adding all that featurism, shasm real mode support
was broken, but writing small Linux utilities became almost convenient.

Deliberately avoided to remain an assembler; data types, structured flow
control abstractions like DO/WHILE/FOR/ELSE, and of course there are no
Obstacle-Oriented Programming techniqueMethodMechanism()s. Although I
don't do IF/ELSE/ENDIF and so on, osimpa "when" conditional branches are
pretty nice for what they are, and osimpa has real execution arrays (jump
tables, not heavily tested).

osimplay
 means writing operating systems is simply childsplay. That is hype, and
is thus deliberately outrageous, but there's a sliver of truth to it. It
should make playing with OS design easier. osimplay can build anything
from a Linux console text editor (a fair wad of the beginnings of one are
included) to a mode-changing bootsector (also included, working.). In
other words, real mode is fixed, pmode is almost convenient, and thus
osimplay probably does merit the term "beta".

Result.
 Even high-level languages as low-level as C or Forth work from some
abstraction back to the machine. osimplay is pure bottom-up, being an
attempt at a Forth for one-stack machines. There are two areas where I
believe this has been worth the effort.

Systems programming suffers at the machine/abstraction seam, and there is
no such seam in osimplay. That seam is normally considered the cost of
portability, but I believe that cost can be greatly reduced in an
assembler-like model closer to the machine than C, and besides, there's
plentys of 386s out there.

I also suspect that osimplay is relatively easy to learn, particularly to
self-teach. No pointers (C), no stack-dancing (Forth), no REPxx (x86),
fairly interactive ...

An area where it hasn't been worth the effort is in runtime performance. C
is impressive, even on x86, which isn't a PDP-11. Even if I can beat Gcc,
it's not usually by much, but certain areas (switch/case, recursion, very
finely factored code...) still bear a closer look. Conversely, it's not so
hard to get close to C in assembly in most cases either. Optimized Gcc is
good, but unoptimized Gcc can be pretty, uh, amusing.

Beyond,
 osimplay visually looks pretty CPU-independant, and I believe, could be
completely portable (across commodity desktop CPUs) with a few more
tricks. The great genius of C is good portability with excellent
performance. Everything else about C is minor, including some mistakes.
The same is achievable much more simply, even via a shell script. One
lesson of Forth is that simplicity is robust.

I can't find the quote on Google, but I believe Rob Pike once told me in
9fans that UNIX naming tradition is horrid. Whether Mr. Plan 9 said so or
not, it is. Linux people are repulsed and enraged by my fits of
neologistic frenzy. Forth people obsess over names. There is excellent
reason for the latter. Bad names don't matter to machines, but frequently
cause humans to write dysfunctional, often totally self-extraneous code,
and this effect is self-compounding, and I believe people don't appreciate
how bad the situation is. To put it positively, I believe renaming is
currently a huge opportunity in computing, starting with assembly, which
is the point at which names start to matter. So go get osimplay before I
decide the name is wrong again :o) It's a script, so feel free to decide
the names are all wrong :o)

beyond beyond,
 C claims portability by only modeling the execution engine of the CPU in
the core of the language. Forth also. It would be nice if more operating
system mechanism was part of a standard portable language. I personally
don't know of such a language with systems-grade performance, and if it
exists I doubt it's very general. A compembler can help investigate that,
even one written in a unix sh. osimplay is now a distinct language
independant of implementation. Not too distinct though; most of it
shouldn't be too alien to good programmers, other than the basic fact that
in the current implementation your assembler source is a shell script.

ftp://ftp.gwdg.de/pub/cLIeNUX/interim/osimplay.tgz

and browse the cLIeNUX dirs above that :o)

That version of osimplay is public domain.

Rick Hohensee
rickh@capaccess.org
http://linux01.gwdg.de/~rhohen