IBM 1401
PROGRAMMING
Writing Hello World for the IBM 1401
You might have seen this video about compiling a FORTRAN II program on the IBM 1401. But the 1401 was mostly a teaching aid for FORTRAN. It can barely do it (in fact, it is a miracle it can do it at all) and most of the time you would never program it like that. Instead, the 1401 was designed to be programmed directly in machine code or with the help of what we would call an assembler today, and was called Autocoder by IBM at the time.
And now you can write and run your own program for the 1401, since we've implemented a 1401 emulator and an Autocoder compiler on the PC (and Mac and Linux). In this article, I will explain how to write a simple "Hello World" program. Except it's nothing but simple.
ROPE: IBM 1401 Programming and Emulation Environment
Everyone can write and run an IBM 1401 programs on a PC (Windows, Mac or Linux) thanks to the IBM 1401 ROPE compiler and emulator. In fact, that's how we develop and test our programs too, before punching them on real cards and running them on the real 1401. You'll need:
ROPE 2018-1-30.zip : Note that there is a big annoying quirk: the File->Open... menu does not work. But instead if you click on the Browse... button on the editor window, this one will work. Don't ask... The source code for the ROPE project is here: https://github.com/lucaseverini/ROPE, in case you want to improve it.
IBM 1401 Programs.zip : A library of example programs that you can compile and run using ROPE
The ROPE environment looks like this:
The window at the left is your source code editing window. That's where you edit your source code, set the Autocoder compiler options, and finally click Assemble to compile your program using an emulation of the original IBM Autocoder (written by Van Snyder, of FORTRAN standardization committee fame). If this succeeds, it will open the window you see at the right, which is the IBM 1401 emulator window. It will show the compiled code, which you can then run or single step through. You can even see an entire live image of the 16k memory content. You can also setup files to act as simulated card input data for your program, others to act as simulated output card data, or do the same for magnetic tape input and output. Finally, the printer output, which usually is your program output (unless your program outputs a tape file or punches a stack of virtual cards), can be seen in the green-bar paper on the left bottom window.
IBM 1401 Programming References
Here are some essential references to help in programming the IBM 1401:
http://bitsavers.org/pdf/ibm/1401/Programming_the_1401_1962.pdf: This is a tutorial on 1401 programming. I recommend starting with this if you want to write code.
http://bitsavers.org/pdf/ibm/1401/A24-1403-1401_Reference_Apr62.pdf: This describes all of the instructions of the 1401. It includes some of the optional instructions not covered in the previous book.
http://bitsavers.org/pdf/ibm/1401/C24-3319-0_Autocoder_on_Tape_Laguage_Specifications_and_Operating_Procedures_Nov64.pdff : This is the reference for Autocoder, which is the assembly compiler you'll use in ROPE to write and compile your program.
IBM 1401 Coding Techniques TIE-0064.pdf: The juicy stuff is in neither of the two official references above. Programming the 1401 efficiently requires some dirty tricks like self modifying code and using the SBR instruction side effects for incrementing addresses. Even simple loops are unobvious. All these dirty but essential dirty tricks of practical programming are hidden away in app notes. My favorite one is this TIE-0064.
For even more info, dive into http://ibm-1401.info/1401SoftwDevel.html for an exhaustive list of software development documents
IBM 1401 Architecture Crash Course
In order to program in machine code or assembly, you need to understand the CPU architecture. However, the IBM 1401 architecture will feel completely alien to a modern day assembly programmer. First of all, the 1401 is not even a binary machine! Instead, it operates on variable length fields of 6-bit BCD coded characters, going through each character in turn. Although there is no concept of a (yet to be invented) byte , the memory width is 8 bits, but mostly by accident. There are 6 bits for the BCD character (called 1, 2, 4, 8 , A and B, described further below), one special bit called a Word Mark (WM) that delimits the end of a character field and also indicates instruction characters, and one parity bit called C, for error checking. There are no accumulator registers. The memory addressing scheme is so contrived that you can't use normal address arithmetic. And some of the instructions which would seem essential, are actually optional, and therefore hidden in the depth of the documentation.
BCD and the punch card encoding mess
The IBM 1401 does not use our familiar binary encoding scheme. Instead it uses a BCD character coding, derived from the character representation on a 12 row punch card, which it manages to collapse on just 6 bits. So our first stop is to look at an IBM-coded punched card. Or more precisely, at the several different versions thereof - herein lies the mess. Let's start by the card generated by the then popular IBM 026 punch, equipped with the FORTRAN option.
You can tell, by just by looking at the card, how the punch coding evolved. The first cards, invented by Herman Hollerith, had only 10 rows, which could be easily used for numbers. The rows were numbered 0 to 9, easy enough. You just punched the appropriate number in the column, as shown on the 10 left columns of the punch card above. Later, alphabetical characters were added. To accomodate this, two new rows of punch positions, called "zone punches", were added at the top of the cards. They appear as the two first punched lines at the top of the card, and they are called zone 12 (top most) and zone 11 (second line). On top of that, the 0 number line became dual use: punched by itself, it would still mean number 0, as in its original use. But when combined with second punch in the same column, it is used as a third zone punch, called zone 0.
Also, notice that the naming order of punch rows has now become illogical. From top to bottom, it is: 12-11-0-1-2-3-4-5-6-7-8-9. Letters A through I use two punches: one punch in zone 12 and a number punch. Similarly, J through R use a zone 11 plus a 1 to 9 punch. S to Z use a zone 0 plus a 1 to 9 punch. If you don't punch anything, it represents a blank (space as we'd call it now).
But that's not all. They soon needed to add special characters. But they had already run out of dual punch codes. So, of course, triple punch codes were added. You see them at the right of the card. They are the ones that cause trouble, because there are at least three different interpretations for the special punch codes. This particular IBM 026 punch can punch 10 triple-punch special characters. They include ( ) + and ' which are very necessary to program in FORTRAN, and make a program easier to read in AUTOCODER. The result is known as the IBM BCD FORTRAN character set, and is the preferred character set to use when programming in Autocoder. But that's not the only one that was used.
There was another more popular option of the IBM 026 taht was used with the IBM 1401, called the Commercial character set. It punches the same, but the characters at the top print differently, like this:
It is mostly the same as the FORTRAN codes, except for the three-punch special characters. It uses the same triple-punch combinations, but some are assigned to different printed characters. Many are weird, IBM-only characters, that don't even have a modern ASCII counterparts. This set does not have parenthesis, nor does it have a + (it puts the & character there instead). This is formally known as the IBM BCD Commercial character set.
This dual character mess was sort of cleaned up with the next keypunch generation, the IBM 029. It extended the numbers of 3-punch combinations, which allwed to include all of the FORTRAN and Commercial characters, and aloowed some much needed new characters. This coding, which was contemporary with the release of the IBM 360, got known as EBCD (Extended BCD). The big downside is that it is incompatible with the earlier BCD codeing, because of the required shuffling of the 3-punch characters:
Eventually, more and more multi-punch codes were added, until all 256 different combinations were coded. That lead to the infamously illogical EBCDIC (eb-see-dic) encoding, which now stood for Extended BCD Interchange Code. IBM stuck far too long with EBCDIC, way after the communication-derived ASCII coding had been standardized. But you can tell why: IBM wanted to maintain historical compatibility with their punched-card machines, and the telco ASCII scheme made little sense for them.
A and H character chains
The IBM 1401 coped with the dual punch standards (FORTRAN and Commercial) by providing two sets of character for the 1403 printer. The printer characters sets come in the form of a chain, so the printer could be equipped with the so-called A-Chain (Commercial set) and the H-Chain (FORTRAN set). They follow the IBM 026 punch representation Commercial or FORTRAN representation. But sure enough, it adds some more characters of its own! The 1401 also defines some special characters that we'll later find in the 029, but the 1403 can't print them, nor are they quite punched like in the 029. Just to annoy a little.
IBM 1401 internal BCD character coding
The IBM 1401 represents the 12 punch encoding described above using 6 BCD bits + 1 parity bit + 1 Word Mark bit. This the IBM 1401 memory has 8-bit words, although they don't represent a binary byte as we are used to. The bits are called: CBA8421M. You can see all of the 8-bits of register B all proudly lit on this picture of the 1401 front panel:
C - parity bit
B - zone B bit
A - zone A bit
8 - weight 8 BCD bit
4 - weight 4 BCD bit
2 - weight 2 BCD bit
1 - weight 1 BCD bit
M- Word Mark bit
The conversion from punch card code to the 6 main data bits representation internal to the 1401, the AB8421 bits, goes as follows:
Punches 1 to 9 are represented in usual BCD binary by bits 8421. For example, Punch 9 is bit 8 and bit 1 (1001). Punch 6 is bit 4 and bit 2 (0110).
Punch 0 appearing all on its own, representing number 0, is treated weirdly. It is represented internally by bit 8 and bit 2 (1010), and NOT by (0000) as you'd expect.
Blank (no punches at all) is represented by (0000). This is NOT zero ! It's the space character.
Zone punches 0 (i.e. punch 0 in conjunction with any other punch), 11 and 12: These zone punches are coded in zone bits A and B. Zone bit 0 represented by bit A. Zone 11 is represented by bit B. Zone 12 is represented by both bit A and B together. If there are no zone punches, both A and B bits are off. Note that there is no way to represent two of the zone punches appearing simultaneously. Fortunately such a case never happens in the IBM 026 punching scheme.
For triple punch characters, which include an added 8 as the third punch, the bit 8 is simply added to the normal code represent the extra punch.
That's it for representing the punched characters into 6 bits of memory. But there are two additional bits:
The all important Bit M (Word Mark) will be explained in more details later. It is a flag bit that indicates that the character in memeory represents either an instruction, or the end of a data field. There are special instructions used to manage this bit.
Bit C (parity) is calculated by the machine itself. It will be set or cleared so that the resulting parity of a whole 8 bit memory word (including the Word Mark and C bits) is odd. A valid 8-bit character in memory should always have an odd number of bits set. The machine continuously checks for the parity of characters it reads from its memory and any of its registers. If an even parity is detected, something went wrong due to a hardware fault, and the machine stops immediately. Actually, in the picture above, taken during a hardware debug session, the register B has all the bits set. Which is even parity. We have a big bad parity error! The C bit should be off. You can't see it in the picture, but the machine has detected the problem already. There is a bright red error light on top of the panel and the machine has stopped.
Now you should be able to understand the column "BCD code" in the printer chain table that was shown previously. It is simply the memory bit combination that represents the character, using the CBA8421 bit naming, and the method of punch card to memory representation conversion outlined above.
Memory Addressing
Memory addressing is very odd and requires special mention. Fortunately, you rarely have to deal with it directly when writing in Autocoder, which takes memory address entries in decimal number form (from 0 to 16,000), and translates them in the 3 character form use internally by the machine instructions.
The complication arises because the 16,000 memory addresses are constrained to a field of only 3 BCD characters in length. For the first 1,000 locations, the addresses are straightforward: 3-word character string 000 is the first location and 999 is the last one. The original designers had just 1000 locations of memory in mind, to keep costs low - core memory was one of the primary cost drivers.
However, by the time the 1401 shipped, its memory had increased to a more practical 4,000 locations. But by the time this decision was made, the address field in the instructions had been constrained to 3 characters by hardware. They dealt with it by using the extra A and B bits, which result into zone punches on the cards. The downside is that corresponding characters do not bear a direct relationship with the memory address they represent anymore.
This is how it works: memory addresses 1000 to 1999, are encoded starting with a 000 to 999 code as if you were addressing the first 1000 locations, but you add zone bit A to the hundreds number (equivalent to an added zone punch 0 on the card). For 2000 to 2999, you add zone bit B instead (zone punch 11 on the card), and for 3000 to 3999 you add bits A and B (zone 12 punch on the card). You end up with the following address coding, with one alphanumerical character in the hundreds, and two decimal characters for the tens and units.
(Oh my, we haven't even written a single line of "Hello World". Rest of the tutorial coming up soon)
And I can probably stop here, because Raul Rolfssen has made an incredible simulation environment and videos that explain how to program the 1401:
Start here for your fisrt lesson: