Status of the 45GS02 instruction set

append delete fredrikr

Fabulous project! I hope to own a Mega 65 soon enough. I would also like to get started developing programs for it. I've reached out to Marco Baye who develops the excellent ACME cross-assembler, and he'd like to add support for the 45GS02 instruction set, when there is final documentation on it. So...

What is the status of the instruction set for the CPU - do you expect it to change further?

Is there a (somewhat) complete specification of it somewhere?

What are the mnemonics for the new instructions?

Reply RSS


append delete #1. LGB

Basically (but see later) it's the same as C65 had, which is again basically the 65CE02 instruction set, with two notable differences: "EOM" opcode, which is in fact just a "NOP", just have a new meaning as well, but the very same opcode. The other, is the 65CE02's reserved opcode "AUG" which is intended for future two-byte opcodes. On C65, it's redefined as being a single byte opcode "MAP".

So, if you search for information of 65CE02 opcodes on the net (you can find many, in fact, I wrote my own Commodore 65 and later Mega 65 emulators (Xemu) based on sources like those, at least initially) that's it, just the differences mentioned above.

HOWEVER: though as you can see, all the opcodes are "used" there is not so much free opcode byte to implement new thing in GS02. So in addition to those above (so in addition to Commodore 65 ...) new opcodes are defined with some trick, ie using "unusual opcode sequence" of otherwise well-defined opcodes, which hopefully does not occur by "accident" just if the programmer really wants to use the new things. One of them is the ZP based 32 bit linear addressing. As far as I remember, all opcodes having the "(ZP),Z" as addressing mode can be "prefixed" with a NOP. Then together, it means something completely different, ie, using 4 bytes at ZP (zero page) to form a 32 bit address pointer, then adding register Z to it, and do the operation on that byte. Other than that, there is another trick, combining all the A,X,Y,Z registers into one 32 bit register for an operation, sorry, I can't remember now how it's done in Mega65, Xemu does not even emulate that yet, and I am not sure if it's even used anywhere [yet ...].

For assemblers: Mega65 project uses (currently, afaik) Ophis, which does now what it should, as Mega65's kickstart ROM is written in assembly. Also, CC65 package's CA65 also supports "4510": Also, tass64 supports 65ce02, as far as I remember, its author asked me about those stuffs, or such, I can't remember now too well.

So, there are already assemblers, knowing "something", but maybe not (?) the full 'GS02 instructions, I mean, I am not sure with those "prefixed" and special ones, I am not even sure there is some proposed syntax for eg 32 bit addressing ... It's an interesting question, since the syntax should be clear, and not to be confused with 65816 syntaxes like [....] and such for "banked address" (ie, 24 bit addressing, since it's 32, and works also quite differently at opcode encoding level, etc etc).

append delete #2. gardners


Perhaps poke Marco to contact me directly.

As mentioned, apart from the C65 additional instructions, all we have added are:

NOP + XXX ($nn),Z -> 32 bit flat address space access
NEG + NEG + XXX -> Use A,X,Y,Z as virtual 32-bit register

Both can be used together.

I suggest coming up with an alternative syntax for the 32-bit pointer addressing mode. I'm open to ideas on that. Maybe (($nn)),Z or ($nn)32,Z or something like that.

Then the 32-bit virtual register opcodes should probably get different opcode names. To avoid confusion, this should not just be sticking 32 on the end, but maybe AXYZ where it makes sense, e.g:

LDAXYZ $1234 ; Load 32-bit constant from $1234-$1237
ADCAXYZ #$93 ; Add constant $00000093 to 32-bit value in AXYZ
STAXYZ $1234 ; Put result back in $1234-$1237

Note there is no 32-bit constant loading option, since you can just do that with LDA, LDX, LDY, LDZ. But a convenience pseudo instruction that is a macro that expands to those could be created, e.g.:

LDAXYZ #$12345678

Anyway, happy to discuss with anyone who wants to help standardise this, and add support to an assembler. We would then move to convert all the assembly code in the MEGA65 to using this new assembler.

Bonus points for adding support to CA65, so that we can use them in mixed C/assembler work where we use CC65 as the C compiler at the moment.


append delete #3. rosettif

Why there is no 32-bit constant loading option? It would be better to also implement, for two reasons at least:

1.) Smaller and faster (7 vs 8 bytes and also less steps to execute).

2.) The complete 32-bit value could be found contiguously in the code space in the memory as a whole (instead of four separated 8-bit fragrants) which is much better if using self-modification or other referencing to that area from the outside.

:: @rosettif added on 11 Mar ’19 · 09:08

The INC A and DEC A instructions might also be extended to 32-bit ones when using the NEG + NEG prefix to become INC AXYZ and DEC AXYZ as well.

append delete #4. LGB

Honestly, I don't like this AXYZ stuff too much, I mean at assembly syntax level. I was always fan of the "three letters" only, used by any 65xx CPU assembly syntax. yes, I know, it's more a personal opinion and feeling, rather than actual "science", that's true :)

I would choose a new name for the '32 bit register", let's say "Q" (just a proposal, Q would mean Quad-byte, basically the "concatenated" registers of AXYZ ...), so LDQ would be LDAXYZ ... That's look much more cleaner for me, but this is only my opinion. Also, like "CMP" used for X register is "CPX", I would name "ADCAXYZ" as like "ADQ" for example instead, so the third letter shows the difference. If there is "INC AXYZ" at all, I would call that "INC Q" (like INC A) or "INQ" (like INX, when register name is part of the asm token name itself).

The ((ZP)),Z syntax seems to be OK, ((ZP))32,Z it's a kinda strange to look at it :) If it's need it's still better - I think - this way: 32:(ZP), Z or kinda other syntaxes like @(ZP),Z or whatever, where "@" or "32:" shows the referenced memory (in ZP here) holds a 32 bit pointer, so it's special compared to the usual 16 bit address referenced.

For the 32 bit load constant ... I see one extra value here: without a dedicated "single" opcode, there is a chance that an IRQ routine happens meanwhile. By using a "pseudo" op (at assembler level ...) you may expect that it's an "atomic" operation, as it's a single "opcode", so there is some - though minor ... - danger here, I am not sure it's even worth to be mentioned.

:: @LGB added on 11 Mar ’19 · 14:57

In fact, INQ sounds quite well, as 65CE02 already knows INW, though in the second case, it's not a register but a read-modify-write style op on a 16 bit value. But it does not matter too much, as it can be thought of different addressing modes, and after all, different addressing modes have different opcodes anyway :)

Other proposed syntax for the 32 bit ZP stuff: L(nn),Z or !(nn),Z, maybe '@' is used somewhere else.

In general, what I like about the asm syntax of any 65xx CPU that it's quite simple, always three byte names for ops, and easy to parse, even for a "hand written" (in 65xx asm!) monitor program or such (both for asm and disasm). But surely, the asm syntax is only a syntax, does not involve too much about the implementation details, so it's more like a cosmetic topic, but can be important for future assemblers want to support M65 stuff, and especially when some wants to write native tools running on M65 itself, where it can be important that the syntax should be simple and compact enough (it does not matter too much for a cross-platform assembler running on PC or such ...).

append delete #5. gardners


I am quite happy to have shorter names, as you have suggested, I just couldn't think of any at the time. Q as the Quad-byte register seems reasonable to me.

For the 32 pointer syntax, how about we just use {$nn},Z instead of ($nn),Z, or does it look too similar? It needs to be immediately obvious, without being harder to type. In that regard <$nn> would be more obvious, but < and > get used to pick halves of word values. Or maybe just (q$nn),Z ? or how about we just make it like C? LDA $nn[Z] or ($nn)[Z] ? That way it at least has some logic. But again, I am just thinking out loud, and invite suggestions


append delete #6. LGB

Btw, I choose Q since it sounds logical to quad-byte. And quite close to X,Y,Z letters anyway. W would not make sense as it means 'Word' and also already used with 65CE02 for INW. 'D' for 'double' (word) or 'L' for 'long' would be logical, but somehow it looks strange for LDL and LDD, LDQ is kinda easier to recognize :)

< and > are problematic because of used as lo,hi bytes operators in many assemblers, I think. {...} sounds nice on PC screen maybe, but dangerous on lower resolutions and easy to confuse with (...), as you also suggested. The (q$nn),Z has other problem, that the $nn can be actually a label/symbol, so: MYLOC = $AA then (qMYLOC),Z is not sane too much, as there is no way to separate the 'q' from the label. Ok, with a space, maybe ... I would try to keep ($nn),Z similarities btw, since it's basically really similar, the same addressing mode, just the ZP location is extended from 16 bit memory pointer to 32. So I like ideas:

LDA (($nn)),Z
LDA !($nn),Z
LDA [[$nn]],Z

or anything which similar to the original addressing mode, just with a bit "extra" added, to signal the difference between 16 and 32 bit pointer value on the ZP. Btw, I would try to avoid put anything between (...), as basically that's the location, which is ZP even with this addressing mode. Logically, the difference is how that ZP location is interpreted, a 16 bit value stored from that ZP address, or 32, which then can be referenced after Z added to the value fetched from ZP. Well, or such :) So, technically with assemblers support it, this is valid even now with 6502 only:


Like, ca65, "Z:" just signals that LOC must be interpreted as zero page address, ie just one byte. Surely, it does not make sense here to be used, as this addressing mode only supports that ... But you can use it even here, if you want. I just want to show with this, that inside the (...) signs it's more about the address of the pointer itself, and not about the interpretation of that data (ie being a 16 bit address then to be used, or a 32 bit for M65 ...). In this synax ( and ) signs means that the ZP location (which is still 8 bits, even with M65 32bit stuff) must be read than as a 16 (6502) or 32 bit address to be used then to add with Z to form the address of the operand. This, I think, only outside of (...) should be modified to be logical, or (...) must be replaced maybe with other signs, or whatever. I would also keep the structure being (...),Y and (...),Z intact since it's basically the same function just 32 bit pointer instead of 16 bit. Etc, etc, sorry, I write too much again, I guess, as usual, I struggle to express myself well enough and in a compact form, if it's English and not Hungarian :)

But that's only my logic, for sure ;)

append delete #7. gardners

Your input is always helpful :) Don't worry about how long it ends up being.

I am currently thinking ($nn)+Z might make the most sense. It is no longer, and no harder to type, but + and , are quite different to one another. Can you think of any problems with this?


append delete #8. fredrikr

Marco: Poked.

I like the Q idea above.

For 32-bit indirect addressing: I think it would make sense to keep using a comma, since the new instructions are offsetting the address with the Z-register, which is exactly what ",Z" means. In the same way, the parentheses have the same meaning as usual. This is more of a modifier, but an important one. For this reason, I would prefer to just add a character that is easily spottable when reading throught the source code or a disassembled program.

Three-letter mnemonics are indeed nice. Figuring out new three-letter mnemonics for all the instructions that can use this addressing mode would be messy and make assembler harder to learn and use. Adding an F at the end of the existing mnemonics (i.e. LDAF for Load Accumulator Far) would be preferrable, even if it meant getting some four-letter mnemonics.

I would prefer:

LDA *($nn),Z

LDA ($nn),Z*

LDAF ($nn),Z

(in that order)

Of course, * is commonly used in assemblers for PC and for multiplication, but it can't be confused for any of those in this position.

append delete #9. LGB

Actually, 65xx asm syntax always used index register syntax with comma, so Paul's idea about ($nn)+Z does not feel somehow right for me, sorry about that ... But I can leave with that, I won't kill myself if I see only that :)

As fredrikr suggested, I would say something like that, though not "LDAF", that's not my taste, about my always-three-letters-fetish ;-P

Some assemblers already introduced modifiers like Z: to denote zero page, like LDA (Z:$nn),Y which is (I had this example before ...) helps to signal the Zeropage addressing used. I would extend this idea, however the opcode still holds only 1 byte long ZP address, thus, not inside the brackets, but outside! That is for example:

LDA Q:($nn),Z

In this example "Q:" means that the "pointer reference" done by zero page addressing is meant for a Q (again, Quad-byte) "entity", that is 32 bit. This would even work:

LDQ Q:(Z:$nn),Z

Ok, it's just too cryptic maybe :) For the "Z:" part is kinda just for the fun here as an example, but that part is understood at least by CA65, I don't know others. But the more clear and not so "artificial" example is:

LDQ Q:($nn),Z

This combined the "LDQ" (load Quad byte) with the 32 bit addressing mode when fetched a Quad-byte as pointer from the zeropage starting at $nn. Basically it's the same as for example:

LDA *($nn),Z

just to have a more clear way to define the meaning of "*" converted into "Q:". That even harmonizes well with the usage of "Q", within opcode (like LDQ) is the data which is 32 bit, within addressing part, the "Q" in "Q:" means the address uses 32 bit (32 bit, again = Quad byte, thus Q).

append delete #10. gpz

One idea that came up in a chat with marco... why not kill the useless decimal mode when the cpu is running in "native" mode - and use two free opcodes to indicate the extra instructions? using NOPs certainly looks like asking for trouble, its a trap builtin by design =P (and you could even go one step further and eliminate the very little used indirect x indexed addressing mode as well, and free some more opcodes that way)

append delete #11. gardners

Hi Groepaz,

Yes, I thought about doing exactly that. The reason I went for NOP + XXX ($nn),Z is that the ,Z modes aren't available on 6502s, so there is no software that could accidentally use it, and it doesn't stop the decimal mode from being used, if some deranged person decides that they want to use it.

Similarly for the quad-word operations the prefix is NEG + NEG, which also doesn't exist on the 6502, and has no combined effect, except for refreshing the N and Z flags based on the value of the accumulator.


append delete #12. fredrikr

But there could be old programs which accidentally use NOP + XXX ($nn),Z if they use illegal opcodes?

append delete #13. rosettif

@fredrikr: There are no illegal opcodes in the 4510 instruction set. And there are no such old C65 software, either. If I understood it well, for the C64 mode the good old NMOS 6510 behaviour is planned to be set up by default (including all original illegal opcodes etc. while excluding these extensions) and this entire new 4510 (65CE02) one can only be accessed from there if you manually switch over by a special hidden bit. (As opposed to the native mode where it is vice versa.) So they generally never confrontate this way.

append delete #14. LGB

Yes, indeed as rosetiff described ... C65 didn't know about illegal opcodes, so in this sense programs trying to use illegal opcodes of NMOS 6502+friends would result in crash anyway, or at least very "interesting" things, so not so much difference what causes the crash then :). HOWEVER as Mega65 wants to implement more C64 compatibility in C64 mode, new C65+M65 opcodes will be "hidden" in that mode, and you can enjoy the "NMOS instruction set" including illegal opcodes. So no accident can happen ever. In C65/M65 mode, there are virtually no "old" programs at all exists, we can say, so it's not a problem there ... The only "accident" I can imagine that newly written M65 programs accidentally using NOP + XXX (nn),Z where programmer does not have the intend to use the new instruction ... I don't think it's a common problem in the future, but anyway, future assemblers supporting M65 may can emit warnings, if it detects the generated code consist of the sequence not the result of a new opcode's token though, like some existing 65xx assemblers can warn user if it emits JMP indirect on page boundary which is known to be buggy on NMOS 65xx CPUs.

append delete #15. gardners

@rosetiff: More or less correct. There are indeed no known programs that do NOP + XXX ($nn),Z, since there are only a dozen or so C65 programs, anyway. Also, the M65 specific extensions can be turned off, if you really want to be paranoid.

There will be a C64-only mode of operation, where the C65 side of things is as hidden as possible, and the CPU will indeed default to NMOS operation. It might even be possible to do this automatically in C64 mode with a C65 ROM -- the only catch is that the trap to the C65 DOS in the ROM requires the 4510 instructions. I have been thinking about having the CPU in NMOS mode in C64 mode only when running instructions from RAM, instead of from ROM, as this should solve that problem completely.

Note also that we plan to have an integrated 1541 that uses the SD card, so that true C64 mode without the C65 DOS can still do useful things. This will also make it much easier to get C64 track-loading games and demos to work. I'll hopefully be doing an initial blog post about the progress so far on that in the near future.



(Leave this as-is, it’s a trap!)

There is no need to “register”, just enter the same name + password of your choice every time.

Pro tip: Use markup to add links, quotes and more.

Your friendly neighbourhood moderators: Deft, gardners, Ralph Egas