Writing Hello World for the IBM 1401
You might have seen this video about compiling a FORTRAN II program on the IBM 1401. But the 1401 was mostly a teaching aid for FORTRAN. It can barely do it (in fact, it is a miracle it can do it at all) and most of the time you would never program it like that. instead, the 1401 was designed to be programmed directly in machine code or with the help of what we would call an assembler today, and was called Autocoder by IBM at the time.
And now you can write and run your own program for the 1401, since we've implemented a 1401 emulator and an Autocoder compiler on the PC (and Mac and Linux). In this article, I will explain how to write a simple "Hello World" program. Except it's nothing but simple.
ROPE: IBM 1401 Programming and Emulation Environment
Everyone can write and run an IBM 1401 programs on a PC (Windows, Mac or Linux) thanks to the IBM 1401 ROPE compiler and emulator. In fact, that's how we develop and test our programs too, before punching them on real cards and running them on the real 1401. You'll need:
- ROPE 2018-1-30.zip : Note that there is a big annoying quirk: the File->Open... menu does not work. But instead if you click on the Browse... button on the editor window, this one will work. Don't ask... The source code for the ROPE project is here: https://github.com/lucaseverini/ROPE, in case you want to improve it.
- IBM 1401 Programs.zip : A library of example programs that you can compile and run using ROPE
The ROPE environment looks like this:
The window at the left is your source code editing window. That's where you edit your source code, set the Autocoder compiler options, and finally click Assemble to compile your program using an emulation of the original IBM Autocoder (written by Van Snyder, of FORTRAN standardization committee fame). If this succeeds, it will open the window you see at the right, which is the IBM 1401 emulator window. It will show the compiled code, which you can then run or single step through. You can even see an entire live image of the 16k memory content. You can also setup files to act as simulated card input data for your program, others to act as simulated output card data, or do the same for magnetic tape input and output. Finally, the printer output, which usually is your program output (unless your program outputs a tape file or punches a stack of virtual cards), can be seen in the green-bar paper on the left bottom window.
IBM 1401 Programming References
Here are some essential references to help in programming the IBM 1401:
- http://bitsavers.org/pdf/ibm/1401/Programming_the_1401_1962.pdf: This is a tutorial on 1401 programming. I recommend starting with this if you want to write code.
- http://bitsavers.org/pdf/ibm/1401/A24-1403-1401_Reference_Apr62.pdf: This describes all of the instructions of the 1401. It includes some of the optional instructions not covered in the previous book.
- http://bitsavers.org/pdf/ibm/1401/C24-3319-0_Autocoder_on_Tape_Laguage_Specifications_and_Operating_Procedures_Nov64.pdff : This is the reference for Autocoder, which is the assembly compiler you'll use in ROPE to write and compile your program.
- IBM 1401 Coding Techniques TIE-0064.pdf: The juicy stuff is in neither of the two official references above. Programming the 1401 efficiently requires some dirty tricks like self modifying code and using the SBR instruction side effects for incrementing addresses. Even simple loops are unobvious. All these dirty but essential dirty tricks of practical programming are hidden away in app notes. My favorite one is this TIE-0064.
For even more info, dive into http://ibm-1401.info/1401SoftwDevel.html for an exhaustive list of software development documents
IBM 1401 Architecture Crash Course
In order to program in machine code or assembly, you need to understand the CPU architecture. However, the IBM 1401 architecture will feel completely alien to a modern day assembly programmer. First of all, the 1401 is not even a binary machine! Instead, it operates on variable length fields of 6-bit BCD coded characters, going through each character in turn. Although there is no concept of a (yet to be invented) byte , the memory width is 8 bits, but mostly by accident. There are 6 bits for the BCD character (called 1, 2, 4, 8 , A and B, described further below), one special bit called a Word Mark (WM) that delimits the end of a character field and also indicates instruction characters, and one parity bit called C, for error checking. There are no accumulator registers. The memory addressing scheme is so contrived that you can't use normal address arithmetic. And some of the instructions which would seem essential, are actually optional, and therefore hidden in the depth of the documentation.
BCD and the punch card encoding mess
The IBM 1401 does not use our familiar binary encoding scheme. Instead it uses a BCD character coding, derived from the character representation on a 12 row punch card, which it manages to collapse on just 6 bits. So our first stop is to look at an IBM-coded punched card. Or more precisely, at the several different versions thereof - herein lies the mess. Let's start by the card generated by the then popular IBM 026 punch, equipped with the FORTRAN option.
You can tell just by looking at the card how the punch coding evolved. The first cards, invented by Herman Hollerith, had only 10 rows which could be easily used for numbers, numbered 0 to 9. You just punched the appropriate number in the column, as shown in the 10 left columns of the punch card above. Then letter coding was added, and two new punch positions called "zone punches" appeared two extra lines at the top of the card: they are called zone 12 and zone 11. And the 0 line became dual use: punched by itself, it would still mean number 0, as in its original use. But when combined with another punch in the same column, it is used as a third zone punch called zone 0. Also, notice that the order of punches has now become quite illogical. Going from top to bottom, it is: 12-11-0-1-2-3-4-5-6-7-8-9. Letters A through I use two punches: one punch in zone 12 and a number punch. Similarly J through R use zone 11 plus a 1 to 9 punch, and S to Z use zone 0 plus a 1 to 9 punch. If you don't punch anything, it represents a blank (space as we'd call it now). But that's not all. They soon needed to add special characters, but had already run out of dual punch codes. So triple punch codes were invented. You see them towards the right of the card. These are the ones that cause trouble. This particular IBM 026 punch can punch 10 triple-punch special characters. They include
( ) + and
' which are very necessary to program in FORTRAN and make a program much easier to read in AUTOCODER. The result is known as the IBM BCD FORTRAN character set, and is the preferred character set to use when programming in Autocoder. But that's not the only one used.
There was another (even more) popular option of the IBM 026 used with the IBM 1401, the Commercial option. It punches like this:
It is the same as the FORTRAN codes, except for the three-punch special characters. It uses the same triple-punch combinations, but some are assigned to a different printed character. Many are weird IBM-only characters that don't even have a modern ASCII equivalent. Also, it does not have parenthesis, nor does it have a
+ (it puts
& there instead). This is known as the IBM BCD Commercial character set.
To make things even more complicated, the next punch generation, the IBM 029, tried to remedy to this annoying situation by extending the numbers of 3-punch combinations, and including the previous FORTRAN, Commercial, and some new characters all in one coding. Which is known as EBCD (Extended BCD), and resulted in one more incompatible shuffling of the 3-punch characters:
Eventually, more and more multi-punch codes were added, until all 256 different combinations were coded, leading to the infamously illogical EBCDIC (eb-see-dic) encoding. IBM stuck for far too long with EBCDIC, way after the communication standard derived ASCII coding had cleaned up the mess. This was obviously because they wanted to maintain backwards compatibility with their historical punched-card machine base.
A and H character chains
The IBM 1401 coped with the dual punch standards (FORTRAN and Commercial) by providing two sets of character for the 1403 printer. The printer characters sets come in the form of a chain, so the printer could be equipped with the so-called A-Chain (Commercial set) and the H-Chain (FORTRAN set). They follow the IBM 026 punch representation Commercial or FORTRAN representation. But sure enough, it adds some more characters of its own. The 1401 also defines some special characters that we'll later find in the 029, but here they can't be printed, nor are they quite punched like in the 029. Just to annoy a little.
IBM 1401 internal BCD character coding
The IBM 1401 represents the 12 punch encoding described above using 6 BCD bits + 1 parity bit + 1 Word Mark bit. This the IBM 1401 memory has words of 8 bits, although they don't represent a binary byte as we are used to. In order, the bits are called: CBA8421M. You can see the 8-bits of the register B all proudly lit on picture of the 1401 front panel below:
C - parity bit
B - zone B bit
A - zone A bit
8 - weight 8 BCD bit
4 - weight 4 BCD bit
2 - weight 2 BCD bit
1 - weight 1 BCD bit
M- Word Mark bit
The conversion from punch code to the 6 main data bits, the AB8421 bits, goes as follows:
- Punches 1 to 9 are represented in usual binary by bits 8421. For example, Punch 9 is bit 8 and bit 1 (1001). Punch 6 is bit 4 and bit 2 (0110).
- Punch 0 appearing all on its own, representing number 0, is treated weirdly. It is represented internally by bit 8 and bit 2 (1010), and NOT by (0000) as you'd expect.
- Blank (no punches at all) is represented by (0000). This is NOT zero !
- Zone punches 0 (i.e. punch 0 in conjunction with any other punch), 11 and 12: These zone punches are coded in zone bits A and B. Zone bit 0 represented by bit A. Zone 11 is represented by bit B. Zone 12 is represented by both bit A and B together. If there are no zone punches, both A and B bits are off. Note that there is no way to represent two of the zone punches appearing simultaneously! Fortunately such a case never happens in the IBM 026 punching scheme.
Two more bits are then added:
- The all important Bit M (Word Mark) will be explained later. It is a unique bit that flags that the character represents either an instruction, or the end of a data field. There are special instructions used to manage this bit.
- Bit C (parity) is then calculated by the machine itself. It will be set or cleared so that the resulting parity of a whole 8 bit word (including the Word Mark and C bit) is odd. A valid 8-bit character in memory should always have an odd number of bits set. The machine continuously checks for parity of characters it reads from memory and in any of its registers. If an even parity is detected, it knows something went wrong - probably a hardware fault - and the machine immediately stops. Actually, in the picture above, taken during a hardware debug session, the register B has all the bits set. Which is even parity. We have a big bad parity error! The C bit should be off. You can't see it in the picture, but the machine has detected the problem already. There is a bright red error light on top of the panel and the machine has stopped.
Now you should be able to understand the column "BCD code" in the printer chain table that was shown previously. It is simply the memory bit combination that represents the character, using the CBA8421 bit naming, and the method of punch card to memory representation conversion outlined above.
Memory addressing is odd and requires special mention. Fortunately, you rarely have to deal with it directly when writing in Autocoder, which takes entries in the natural number form, and translates them in the 3 BCD character coded form.
The complications arises because memory addresses codes in an instruction are constrained to a field of only 3 BCD characters in length. For the first 1,000 locations, you use the number of the location and the meaning is straightforward: 000 is the first location and 999 is the last one.
But the 1401 originally shipped with 4,000 locations of memory. How would you address these staying with BCD codes (remember this is the only thing the 1401 can deal with, it has no concept of binary addresses or numbers)? By adding extra A and B bits of course, which result into zone punches on the cards, corresponding to characters which do not bear relationship with the number they represent, unless you are very fluent in Hollerith code. So for accessing 1000 to 1999, you add zone bit A to the hundreds number (zone punch 0 on the card), for 2000 to 2999 you add zone bit B (zone punch 11 on the card), and for 3000 to 3999 you add bits A and B (zone 12 punch on the card). You therefore end up with and adress that looks like below, with one alphanumerical character in the hundreds, and two decimal characters for the tens and units.