append delete rosettif

Could anyone try it on a C65 or MEGA65, please?

:: @rosettif added on 16 Aug ’17 · 17:14

And here is the direct download link:

:: @rosettif added on 10 Sep ’17 · 05:09

Reply RSS


append delete #51. LGB

Hmm, probably we should move to Xemu related topic ... If your question is too specific (don't get me wrong, just I am not sure if everybody is interested in long posts between us with some very deep problem about a very specific issue) feel free to write me to my mail address (it's in the file) even in Hungarian, for sure :)

Unhandled memory issue:

Xemu does not crash here, it's a simple exit by intent on the situation. It was an important requirement especially for M65 (like the kickstart) development that it should NOT happen ever, and useless to even continue the emulation if this happen. So that is. But anyway, there is a command line switch since then, -skipunhandledmem which deactivates this behaviour. Or simply, just give the -h switch to Xemu/M65 emulator for the available options. Now, you are right in one thing here: maybe it's time to reverse the behaviour soon :) So by default let's just continue the emulation, and a special switch would be needed for the current default though. I can understand your problem here, especially if your software is trying to scan the full address space by will, so it shouldn't be treated as "fatal error" then.

Honestly, it's really interesting what you do here, ie scanning the full address space. As far as I can see, the full 32 bit are useless, since it's truncated to the lower 28 bits of the 32 though. At least this is what I do in Xemu. But your try can show some problems in Xemu, that's another value :) Now the lack of used -skipunhandledmem is not an Xemu bug actually, as it's by intent.

One thing here though: be careful with scanning the full address space, especially with write access. Several interesting things can happen, some of them which I can think of now:

* you can trigger a hypervisor trap in the I/O area
* on M65, the I/O area is mapped in the high memory area for all VIC I/O modes, so you can even write I/O "by accident" even the "classic" I/O area at $D000 ("C64-style") is not accessed at all
* on M65, the C65 ROM area is actually not ROM but RAM. I am not very sure now (ehhh ...) how I implemented this in Xemu, on real M65 "it should be" a bit which sets if you can write "the ROM" (as being used as RAM) or not. Now I am really not sure by heart that I implemented this in Xemu at all, btw. And also, you may not know what is the situation of this feature if you don't set it by your own, that's the second comment.

Home vs clear screen:

Interesting, well, your idea about the DMA problem can be an explanation. Which ROM image do you use? Since I know about two different revisions of the DMA chip used in C65 prototypes, and they are incompatible (well, C65 was never finished, no wonder there are multiple even incompatible changes even at the hardware level ... unfortunately, there is no a single standard which can be named as "the C65"). Newer ROM images needs the newer DMA revision. Also, you can see the problem of DMA chip hw revision versus ROM image used, if you have problem with scrolling the screen, as far as I remember, as it's also done with DMA. Just move the cursor downwards, and see what happens, without any program written :)

Btw, if you plan to test the DMA by your own too, you should be aware this multiple and incompatible DMA revisions as well.

As far as I know (this is not done by Xemu yet!!!) it's already implemented in M65 that you can set the needed DMA hardware revision even in run-time. Also, M65 implemented some other stuffs for the DMA like "step" for its counters intended for texture scaling for example. This is also not emulated by Xemu yet.

What Xemu has though, for both of C65 and M65 emulators, there is the -dmarev command line switch at least. But keep in mind, that the newer one (1) is not emulated to well, only some "workaround" now in relation of the extended size of the DMA list one byte, but the actual difference is more than just this, especially if you use anything more complex than a simple memory copy/fill for example, or like that.

I can't comment your point #3 too much now :-O :)

append delete #52. rosettif

32-bit memory scan:

I tried to make it as careful and elegant as possible (I think...).

I overwrite only two bytes per each 256K "block" for indicating them by some ordering numbers: that is only 32768 bytes in total (from the whole 4 GB). I save the original values in the main memory and after the test I restore them everywhere (this is why I call it "non-destructive"). I make every step in decreasing order (starting from the highermost address towards the zero), so thus if there is any kind of "mirroring" at present (i.e. the highermost bits are not in use, like you said it's truncated to 28-bit momentarily), then the later given values simply overwrite the former - and so when I read them back later, I can always get the real amount of whole RAM finally.

The lower word is fixed at $4000, only the higher word (aka "bank number") is counting down by 4 (and so jumps always a round 4 x 64 = 256K away). I assume these addresses are harmless, and they are always even restored by the way.

They must not conflict to anything you mentioned here (as they do not touch the ROM area at all... and so on).

I don't worry about it. However, I wanted to plan it also future-safe: thus, if at any time in the (near? far?) future, a larger FPGA board (with more than 256 MB) and/or some kind of virtual memory would be applied, then it can be instantly detected and displayed, too. (The algorithm can easily be found at the very end of source code by the way.)


I am aware of the two revisions of the DMAgic and supporting both (by the way the newer only needs an extra zero byte). I tested it in MESS with all six available Kernal binaries (but normally I use the same one as the MEGA65). As I describe that here (at page 44):

Still it all above has nothing to do with printing a "screen clear" character, that I am only doing via the standard JSR $FFD2 Kernal call, and so thus if it doesn't clear the screen, that cannot be my fault, of course... (However, the Kernal uses the DMA for its job, so if it fails, then it means that something must be wrong about the DMA.)

append delete #53. gardners

If the screen clear is not happening in C65 mode, it could be that you don't leave the IO mode in VIC-III, which might confuse the C65 ROM. But this is only speculation.

Regarding 28 versus 32 bit address space, the CPU's MAP instruction can only do 28 bits at the moment, but the logical address space is 32 bit. DMA can do 32-bit, and the 32-bit ZP-indirect instructions can also do 32-bit.

There is a register that lets you set write protection on the 128KB "ROM" area, but I don't remember immediately the register and bit, and whether it is a user-land thing or only a hypervisor register.

Basically all the memory space that can have bad side-effects is in $FFxxxxx, i.e., the 256th MB. Expansion memory will be limited to the first 255 MB because of this.

Colour RAM really lives at $FF80000-$FF8FFFF, i.e., allowing 64KB, but currently only 32KB populated. You can certainly probe the size of that space and report colour RAM size. It is made visible for C65 emulation at $1F800-$1FFFF and thus also at $D800-$DBFF/$DFFF for C64/C65 compatibility. This leads to one subtle incompatibility with a real C65, in that you can't use the 2KB at $1F800-$1FFFF as real chip RAM, e.g., to hold sprites, character set or screen/bitplane data, and it also has a 1 cycle delay when reading, because it isn't actually chip RAM on the M65. I might have worked around this by making writes to $1F800-$1FFFF write to both. If I haven't, it is possible to do.

"slow" memory will be from $8000000-$FEFFFFF, and should be some number of MB in size.

Chip and fast memory will be all below $7FFFFFF. Note that the address layout will probably change at some point, as I plan to keep the first MB as C65 emulation layout, but the chip RAM and fast RAM (currently 128KB of each) will be also available at some other address ranges in this space, e.g., $1000000-$3FFFFFF for slow RAM and $4000000-$7FFFFFF for chip RAM -- but don't go depending on those figures as they might change, and it is also possible that future versions may merge chip and fast RAM together to give more effective chip RAM. This would be one of the planned effects of eventually moving to the pipelined CPU and reworked VIC-IV, or similarly, of reducing the maximum physical video resolution further to reduce the pixel clock. However, all work on those fronts is currently stalled, so don't expect anything there too quickly.

DMA revision is set by modifying $D703 bit 0 in VIC-IV IO mode. 1 = Rev B (with extra byte in DMA list), 0 = Rev A (original shorter DMA list length).

LGB: This reminds me as well, there are some other recent changes I need to email you about for including in Xemu when you have the chance, mostly around the reworked virtualised keyboard framework.

append delete #54. LGB


OK, I missed the point of your work somewhat I'm afraid. So it's something to be "future proof" thus not for the current design only in relation of M65-specific tests. Nice enough then, indeed!

The DMA difference, indeed only one byte (the DMA list entry length), but actually more on details, honestly - as I've told already - the situation is emulated only this level in Xemu, and not even configurable currently at run time (like on M65) but needs a command line switch for Xemu. I had the idea that you use bad pairing of DMA rev and ROMs since it was a problem for me at the beginning multiple times, with similar issues. But OK then, it's not that in this case.


In my opinion, the colour-RAM incompatibility can be cured (though I am not sure about the exact details on FPGA timing constraints, etc), with one of the proposed ideas here:

Since, writing colour RAM is 1 wait-state operation anyway, maybe it's not a problem to always write the "covered" chip RAM part, and that can be used by VIC-IV, so the incompatibility issue can be solved if you can implement the policy to always write two chunks of memory at any modifications (by CPU or DMA for example). The issue above has other entries as well, which are not so relevant here, but honestly, I am a fan of having chip+fast RAM as a big 256K block mapped somewhere also without the "disturbing" factor of 0/1 addresses used as the "CPU port" (not really, but the name stuck) and the 2K C65-style colour RAM. If someone needs a big chunk of "not slow" memory without any limitations the mentioned stuffs cause.

As the physical memory layout of M65 going complex, I found more easy to try to make some doc on it, like this (but this can be outdated now maybe), btw:

This list does not intend to get details about the I/O areas for example, it's more an "overview" stuff for the full (currently?) 28 bit address space.

It would be great to get some information on decent changes indeed! I'm more or less following the commit log in mega65-core m65pcb-ports branch, but since I am really not a VHDL wizard - as you know :) - it's not always the more efficient way for me to understand the details :) Especially about the virtualised keyboard framework: indeed, it sounds exciting enough. Currently I'm sorting out some of the stupidities of Xemu, starting with the file handling mess all around, etc ... But surely, improvements in features of M65 compatibility is much more interesting (for the users too, I assume).

append delete #55. MIRKOSOFT

In case of CP/M Cartridge was accessing $de00 and $deff fixed?


append delete #56. rosettif

@MIRKOSOFT: I am working on the next release (probably I can upload it tomorrow for testing).

@LGB: It turned out that the screen clear problem was caused by my forgetting it in VIC-II (OldVIC) mode before printing, as Paul suggested it (I switched to NewVIC mode once again for sure that fixed it). But the new question is then, why the problem only occurred in XEMU - and not on the MEGA65 (when tested by Miro)? So it seems now to me that your emulator is more precisely "MEGA65 compatible" at the moment than the FPGA stream itself.

Yesterday I played around a little bit more with XEMU, and noticed that it also seemed much faster than the FPGA stream. At least, by the CPU speed measurement of my program reported only about 36 MHz on MEGA65 before, whereas the same result in XEMU is about 56 MHz in fast mode (after typing POKE 0,65). Even in this fast mode, it makes only some 10-12 % of CPU load on my PC.

append delete #57. LGB

Oh yes. Please, do not depend on Xemu's timing at all!! This is a major flaw currently in Xemu :( To be precise: there is no timing at all at some cases (like that colour ram needs +1 wait state etc, for real), on the CPU in general, there is only the total execution time of the opcode. That data (number of cycles needed for opcodes) is from my previous C65 emulator (also in Xemu) which is based on a 65CE02 datasheet, in fact, what I could found. This is the really same used for M65 emulator in Xemu, though it is *NOT* correct. It's because M65 does not have the same amount of cycles per opcode at all, at least not in "fast" mode ... However in my emulator, I have no difference currently :( Actually M65 - if I am correct - somewhat (?) slower in term of cycles than a real 65CE02 (or 4502 "core" - you see, why I use 65CE02, even if it's not that exactly "discrete" CPU is used in C65 but more like an MCU - 4502 - which "core" however is a 65CE02 like entity with minor modifications, like the MAP opcode instead of reserved AUG, etc). What I mean here, that the average needed cycles of execution some opcodes on M65 usually more than on C65. However that is not a problem for M65, since it can be clocked on 48MHz (50MHz on newer bitstreams) so on the C65-fast mode (~3.5MHz) it can deliver that performance anyway to match with the C65's original speed without any problem of course. Or something like that, sorry about it, but it seems to be quite hard for me to express this in English :-O Btw, it's another story, that Paul plans (as far as I know, and see commit logs, etc) to use ZP/stack cache, and also a much efficient pipelined/etc CPU design, maybe, in the future at least.

I'm sure, Paul is much more suitable to explain this, than me ... Acutally, I am also curious what are the exact cycle numbers for opcodes on M65 (in fast mode etc, I guess, in C65-fast mode, it's tried to be kept on the C65 value anyway), but I am not sure if there is such an exact list for all opcodes :-O Especially opcodes are problematic, like the 32 bit flat addressing mode of M65: in Xemu, I don't emulate the time needed for fetching the 32 bit pointer from the ZP! This is a design problem currently: the 65CE02 emulation core what I written can serve to emulate 65C02, 65CE02 as well, and special cases like M65 specific things, or even C65 MAP/EOM stuff, etc are handled as "call-backs" defined by the emulator to be multi-purpose CPU emulation inside Xemu. However it also means, that there is hard to modify the timing of the CPU emulation by the extre feature. Anyway, 65XX emulation in Xemu is horrible currently. I plan to do a major reorganization soon. It will also cause to be fast btw (less CPU needs from the PC which runs Xemu ...).

:: @LGB added on 04 Sep ’17 · 10:57

Some explanation: I started my C65 emulator project before any plan to write M65 emulator ... And since I do not have a C65, it was much harder to figure out how it should work, eg for opcode timings, I found some random PDF specification on the 65CE02. It's not even very sure that the C65's 4502 uses the same timing of fast mode at least, and I still don't know what it does exactly in C64 mode (ie, only the clock is near ~1MHz, or the opcode timing is modified, eg for NOP opcode it's 1 cycle on 65CE02 but 2 for 6502/6510/etc). And it's just only the pre-M65 era of my emulation problems, btw ...

:: @LGB added on 04 Sep ’17 · 14:29

Btw, another difference between Xemu and the 'real' M65: as far as I remember, M65 always uses the fast-clock for DMA regardless of the used CPU clock. This is not the case of Xemu currently :-/ But this is also a thing one can consider to detect clock frequency of an M65 (maybe!): as it seems (?) M65 does not have (yet) a stable standard for its fast mode (exactly what amount of cycles needed for a given opcode etc), some can measure the speed of DMA, as it seems always at the fast clock. But then, unfortunately, current Xemu cannot be used to test this behaviour as I've mentioned ... Surely, I have plans as always to cure Xemu's problems like this in the future!

append delete #58. rosettif

Maybe the memory accessing cycles make it any slower, too? That must be something like on SuperCPU. Which goes "nominally" on its 20 MHz, but when I measure that with my method, it only gives an effective 8.37 MHz result. Compared to this, that 36 MHz is not so bad at all, as meaning some 4.5x faster than the SuperCPU.

My measuring method is based upon the 6502 cycles, and so it is giving only a real, precise MHz value on the 65xx processors, but not on the C/CE/etc. variants. I call it therefore as an "effective 6502 MHz". E.g. when the 4510 goes on a ~1 MHz clock in C64 mode, it reports that as something like 1.18 or 1.13 MHz just because of this (the faster CPU timing included, plus the absence of the VIC badlines etc.). However, I definitely think it can be better used as a relative comparation between/among two or all systems.

:: @rosettif added on 04 Sep ’17 · 16:56

I also measure the speedness of the DMA (see it above in the same topic!), but for that purpose I rather use the timer of the CIA. If the CIA is already the part of CPU, and so thus its timer might be getting halted during the DMA, then the results may show it also somewhat faster than real (as the time in-between cannot be measured because of that, of course).

append delete #59. rosettif

I have just noticed that in XEMU in the C64 mode, the $d030/$01 bit for 2 MHz (for the C128 fast mode emulation) is also there - but it works the other way around! It should turn the 2 MHz on when set (and off when cleared).

append delete #60. LGB

Ok, then, with your "effective 6502 MHz" you find out "36MHz" for M65, and "56MHz" for Xemu, both of - in theory using 48MHz clock (just the timing is not identical). It makes sense, the ratio is about 1.55 then. I've written software emulation of the 8080 CPU for M65 to be able to use CP/M on M65 (in the future, I would go for Z80 btw). I've run Microsoft's CP/M BASIC under it without problem. There I got factor about 1.4 between the emulation (of 8080) speed on Xemu and the 'real' M65, which is not so far from 1.55. As this measure is really depends on what opcodes (and how frequently) use, it can be even treated as "about the same" (though with my work, there large amount of 65CE02/4510 specific opcodes, but this is not the point now).

As I can imagine this difference will be smaller later, if better timing is applied for Xemu when trying to emulate M65. Also, your result of 36MHz 'effective clock' - in my opinion - will increase in the future with some of the ongoing works on M65 anyway, but that's another point.

About 65816: I somewhat know the 65816 but I don't know too much its application in C64/C128 as the SuperCPU, how it is exactly implemented. I guess it's really the memory access which is the limitation factor there (AFAIK, SuperCPU has DRAM, but it has got even much more trouble when it needs to access the C64's memory, which can be accessed on 1MHz only - though - again as far as I remember - there is some cache or so which can try to improve the situation somewhat. Anyway, I guess, I go into a very off-topic direction right now and right here :)

About the "C128-fast" mode bit (or how it's called): you're probably right. Honestly, I've never checked it, some features just implemented in Xemu by looking the I/O layout of M65, some VHDL code puzzling etc, but I can't say that I test everything then if it's really that (for most things, of course yes, but not all). I'll check that, thanks for noticing. And anyway, you can always fill an issue ticket if you like:

append delete #61. rosettif


- MEGA65 bugfix (total loop is reorganized again)
- C65/MEGA65 bugfixes (some RAM/ROM memory paging conflicts fixed)
- IDE64 bugfix (avoiding $de00 I/O space conflict)
- CP/M bugfix (avoiding $de00 I/O space conflict)
- several other minor bugfixes

The last known bug which is still left there is the breaking into monitor on MEGA65 after exiting to Basic. I have still no idea for that, but at least I have got some help with these emulators now. That is also happening in the M65 emulation in XEMU (xmega65), but not in the C65 one (xc65). So it must have something to do with the differences between the two. (I'm still working on it in the next few days... If I won't succeed to find it, then I rather leave it so.)

append delete #62. LGB


Just a quick view (I haven't had time for more): it seems the ROM content was modified on M65 after running your software. As I've told, on M65, the ROM is really not ROM, it can be used as RAM, thus can be overwritten. And by default it's allowed (at least currently, at least in Xemu). So this can be a reason. On C65 it does not happen since then there ROM is ROM, you can't overwrite it anyway. With only one line of temporary Xemu patch (allow "ROM" write in hypervisor mode only - it's probably not the the right workaround, but a great test to see and to be sure, nothing can overwritten the ROM) it fixed the problem, no more monitor after exit issue. So it seems it can be a reason.

:: @LGB added on 05 Sep ’17 · 10:18

Maybe you can try to enable ROM protection on M65 before running your tests:

GS $D67D. bit2 - Hypervisor write protect C65 ROM $20000-$3FFFF

I'm not sure, if it's the default (write enabled) in current KS/bitstream too, but as far as I remember it was, when the time I was coded that part of Xemu/M65, so I don't feel as a bug in Xemu too much then. But it's possible that newer M65 "firmware" disables ROM write after booting. What I did with my one line hack is just quickly check this is the reason but it's not a "fix" which can be in Xemu, since that is not how M65 should work anyway.

Again: just after two minutes for testing/patching, so I can be wrong ...

:: @LGB added on 05 Sep ’17 · 11:24

Just to be more precise: the lower 128K of the physical address space (and the 2K part at the end as the "C65 colour RAM", as a separated entity, so really 126K only) is the RAM, what is also RAM on a stock C65, and that's also called (by Amiga notions, started to be used by Paul, if I am right) as "chip RAM", and that is which can be used by VIC-IV. The next 128K is the ROM on the C65. On M65 however, it's also RAM, also called "fast RAM" (VIC-IV can't use it btw). It's sane to be RAM anyway, as on M65 the "ROM content" (of C65) must be loaded from SD-card there, so it must be writeable, anyway. In theory, the mentioned bit above can control if the "ROM" is writeable or not. Otherwise, for M65 specific programs it's very handy to be used as RAM, if you need more than 128K "full speed" (ie, not the slow ~ DDR RAM on Nexys4 DDR for example) RAM. That's why I mentioned in one of my first comments in the topic, that by scanning / writing whatever the full address space, you must be careful not to overwrite the ROM itself. Or you can try to switch write protect on (previous comment). Also you just skip the second 128K from the linear address space to be tested.

One open question for me (maybe Paul can answer), what is the future of this feature. Ie, for C65 compatibility the best would be enable write protection of the "ROM" by default after done with booting by the KS. And maybe, M65-specific software can re-enable it anyway if they need it?

append delete #63. gardners

Yes, it would be easy and a probably good idea to enable write-protecting of the "ROM" when KS exits on boot. This is actually really easy to implement in 2 lines of assembly language.

Regarding the slower speed of M65 versus Xemu: Some LDA/X/Y/Z operations take one cycle extra on the M65 compared with a real 4510/65CE02/6502. This will indeed be fixed in the pipe-lined CPU which will hopefully be 5x - 10x faster than the current one -- but probably won't be finished for a year or more.

On the other hand, some other instructions are faster: 8-bit branches are only 2 cycles, even if taken or crossing a page boundary. 16-bit branches take only 2 cycles if not taken and 3 if taken, even if cross in a page boundary. IIRC, RTS or JSR is also a bit faster.

As LGB mentioned, I am also planning a ZP cache to speed up ZP-indirect instructions, and also a stack cache to speed up RTS and PLA/X/Y/Z/P instructions. This will give only a little bit of speed improvement, probably <10%, and isn't a real high priority for me right now, in part because it is a bit complex on the M65, because you can move ZP and the stack around the place and the CPU state machine is frankly a bit of an over-complicated mess compared to what it should be.

Also LGB is correct that DMA on the M65 always runs at the native clock speed. I might also add some improvements there in the future when DMA copying between different memories, because it would then be possible to read one and write the other at the same time. But this is also rather low priority compared to getting the machine in a minimally finished state.

Getting a nice Hyper-Freeze menu and getting the cartridge and 1541 ports working are much more important for usability than further increasing the CPU speed.

append delete #64. rosettif

I have also thought about the ROM overwrite as a possibilitiy, but the main problem is that there should not be any ROM paged in when I actually make my tests. I try to page out everything, just to have all RAM. Not only I MAP all registers to zero, but also I clear the paging bits at $d030, and even at $01. What am I missing still? Maybe something in the uppermost region (from $c000 onwards? or so) always remains there.

Probably the easiest way would be to enable the write protect at first. I was not aware of that bit, thanks. By the way, it is not a good idea if it is not write protected by default, as there must be a plenty of software (especially the native C64 ones, but also some native C65 applications maybe) which will have some problems definetely with it.

append delete #65. LGB

Well, one thing here: if the (physical, linear, call it was you like) address is between $20000 and $3FFFF that's the ROM. Not only you can see the ROM is it's "paged-in" with $D030, or with "CPU port" (what is address 0 and 1 on C64 and part of 6510, but on C64 it's actually part of VIC-III but it does not matter). The memory can be accessed directly by linear (physical, etc) address, ie if you MAP it with the MAP opcode, or if you DMA it, or if you use the M65 specific linear addressing mode with the ZP + Z register stuff (NOP prefixed op technique).

So we talk about two different things here more or less. You talks about $C000 and such, so CPU addresses. But I mean about linear addresses, you can access memory everywhere in the M65 address space even if no ROM is "paged in" with the corresponding $D030 bits, etc. I think, you confuse things here a bit, that you mean about ROM area paged in from the view point of CPU (eg $D030 bits), but the _linear_ addresses, which is more than 16 bits.

If you like this way: on C64 there are no "physical address" for ROM. They can be seen by CPU or not seen depending the state of the CPU I/O port at address 0/1. This is NOT the case even not on the C65 (also not in M65), where ROM actually can be addressed directly by its "linear address". By meaning linear address, I mean, that the first 128K of the linear address space if 128K RAM (minus 2K for colour RAM at the end) and the next 128K is the ROM.

Ok, an example if it's more clear this way:

Let's assume we have a ZP location with label "zploc". Then this:

LDA #2
STA zploc+2
LDA #0
STA zploc
STA zploc+1
STA zploc+3
NOP ; note: NOP + STA (zp),Z together is an M65 specific 32 (28 ...) bit addressing
STA (zploc),Z
BNE @loop

Will overwrite the first 256 bytes of the ROM with zeroes. Even if the ROM is not paged in (by $D030 with MAP opcode, etc), since linear addressing mode bypasses anything how CPU decodes the addresses to translate them to linear addresses. This is also true for the DMA, btw, it also sees linear/physical addresses not CPU addresses (so not affected by $D030 and MAPping done with the MAP opcode)!

Hopefully my example is correct, I've just typed here without too much thinking or checking it :-P

:: @LGB added on 05 Sep ’17 · 17:37

"but on C64 it's actually part of VIC-III but it does not matter" ... typo, wanted to be C65. Of course. Annoying that you can't edit only append your previous post. Well, or I should be more careful, that's also a great point to note :D

append delete #66. rosettif

@LBG: I have no problems with linear addresses, as I am always stepping through the bank 2/3 (and mostly the whole first 256K at all) when using them. All my problems seem to occur right within the 64K main memory: either when I make the first main memory test, or later when I create some arrays there as temporary storage place. I would need to access all, or at least the very most of that, but when I try to use the 64K RAM, it seems I do not manage to page some of the ROM areas out of the way somewhere.

append delete #67. LGB

Ok, what I mean here, that one of your earlier posts you mentioned that you scan the whole 32 bit address space with writing (I am not sure how you do this, I can imagine three methods: using MAP opcode, using DMA, or using the linear addressing mode opcode like in my code example). Are you really sure you skipped the area between $20000 - $3FFFF (note: 5 hex digits, not 4 ...)? Since that is the "ROM". I have the suspect, since if you don't have the ROM mapped in (not via $D030, not via mapping ROM to somewhere with the MAP opcode, and also not by not-mapped memory state but having ROM paged in controlled with C64-style CPU I/O port), and you're sure about it, the CPU directly can't affect the ROM, unless you use the linear addressing mode opcodes as in my example, or if you use DMA to directly access the linear address space.

With a quick hack on Xemu code, I can only tell, that the PC of CPU was $189A when this first happen. It can be one byte off to the real address, as Xemu may incremented PC before the opcode decoding or anything like that. But if you can tell, what code is at that location after loading your test program (I've tried in C65 mode btw), then it may helps ... It's $159F when I start from C64 mode.

Btw, with extra spammy debugging mode, I see this before the first event that something writes the ROM:

CPU: MAP opcode, input A=$00 X=$E3 Y=$00 Z=$B3
MEM: applying new memory configuration because of MAP CPU opcode
LOW -OFFSET = $30000, MB = $00
HIGH-OFFSET = $30000, MB = $00
MEM: memory_set_do_map() applied
CPU: EOM, interrupts were disabled because of MAP till the EOM
CPU: MAP opcode, input A=$00 X=$00 Y=$00 Z=$00
MEM: applying new memory configuration because of MAP CPU opcode
LOW -OFFSET = $000, MB = $00
HIGH-OFFSET = $000, MB = $00
MASK = $00
MEM: memory_set_do_map() applied
MEM: CPUIOPORT: port composite value (new one) is 0
MEM: CPUIOPORT: new config had been applied
VIC3: interrupt change inactive -> active
VIC3: interrupt change active -> inactive
VIC3: interrupt change inactive -> active
VIC3: interrupt change active -> inactive
MEM: CPUIOPORT: port composite value (new one) is 7
MEM: CPUIOPORT: new config had been applied
CPU: MAP opcode, input A=$00 X=$E3 Y=$00 Z=$B3
MEM: applying new memory configuration because of MAP CPU opcode
LOW -OFFSET = $30000, MB = $00
HIGH-OFFSET = $30000, MB = $00
MEM: memory_set_do_map() applied
CPU: EOM, interrupts were disabled because of MAP till the EOM
CPU: NOP not treated as EOM (no MAP before)
CPU: NOP not treated as EOM (no MAP before)
ERROR: overwriting ROM, at CPU address $189A

:: @LGB added on 05 Sep ’17 · 18:36

I have the hope here, that you can figure out what code of yours at there PC values (or nearby, as I've told, there can be some bytes difference because of the internal workflow of the CPU emulation, this is not designed for reporting, I've just patched Xemu with this check now quickly for this very purpose).

append delete #68. LGB

OK, I *think* I found it, by reading your source ASM file:

byte $eb,$a9,$ea

; on 65CE02: ROW $EAA9 (a = $01)
; on 65C02: NOP / LDA #$EA (a = $ea)
; on 65816: XBA / LDA #$EA (a = $ea, b = $01)
; (on 65xx: SBC #$A9 / NOP)

The ROW opcode will rotate word at $EAA9 which is - I guess - is a ROM area (mapped there) at the given time. Surely, it's just one instance, maybe there are more places where ROM is overwritten, later too ... I found these in total it seems:

lgb@oxygene:~/prog_here/xemu/targets/mega65$ grep overwriti debug.log | sort | uniq -c | sort -n
1 overwriting ROM, at CPU address $1905
1 overwriting ROM, at CPU address $190D
1 overwriting ROM, at CPU address $1914
2 overwriting ROM, at CPU address $15A6
2 overwriting ROM, at CPU address $189A
4 overwriting ROM, at CPU address $159F
13 overwriting ROM, at CPU address $1447
16 overwriting ROM, at CPU address $F2B0
4096 overwriting ROM, at CPU address $1FC5

append delete #69. rosettif

Well, it is indeed not so easy to maintain some more than 5500 lines of monolithic Assembly source code written for a dozen or so different Commodore environments at the same time. :D Which is also full of self-modifications and some other tricks. I need to manoeuvre with all of them in mind (and with also testing against all of them). The actual memory layouts of C65 and MEGA65 are just one of them. Some others (like the VIC-20 and the PET) may have RAM, ROM and I/O blocks intermixed at various places (in various configurations). So thus it is rather complicated sometimes to always autodetect everything and also in the right order.

I will try to turn on the write protection as a next step, and see what happens. Is that $d67d register only accessible in the VIC-IV mode? Since there must probably be nothing dangerous at the same address on any of the other machines, maybe I can set it somewhere at the very start already, to be sure.

append delete #70. rosettif


- MEGA65 bugfix (write ROM protect)

The write protection is now enabled, thus it must be definitely a bugfix... Still the monitor problem is the very same. Actually, it has never changed a bit. Always exactly the same behaviour, all along the nine revisions, regardless, just whenever or whatever I do. Now I think that finally I leave it so (und tschüss). Unless someone reports something new.

@MIRKOSOFT: Is the CP/M fix working on your machine?

:: @rosettif added on 06 Sep ’17 · 17:45

append delete #71. PGSmobile

$D67d can only be written from in the hypervisor mode, which is why it hasn't fixed the bug for you. I need to add a hypervisor trap to toggle this flag.

append delete #72. rosettif

@PGSmobile: Thank you! I have made it via this sequence:

LDA #$47
LDA #$53

LDA #$04

Is it correct by the way? May I only leave it so, and will it work then, one day in the future (once the system makes it possible)? Or should I also make some changes here?

I am giving my total source code the last once-over (again) at the moment, and I have already found some minor things (to be fixed or changed or optimized), and so will I still compile just one more revision (maybe tomorrow, or the next day after tomorrow), but I would like that to be the very final version indeed. If anything needs to be changed yet, then I can do it now.

append delete #73. LGB

Btw, on 65CE02/4502 (I guess also on 65816 and 65C02 maybe), you can do the second part (setting the bit) with only two opcodes:

LDA #4

And interestingly, it's also good to "archive" the old status of the bit, since Z flag is also set based on the old value before set. If it's needed ...

:: @LGB added on 07 Sep ’17 · 08:30

But surely, if you do this anyway on every targets your program can run (regardless of being M65 or C64 or whatever), it does not help to use an opcode which does not exist on plain 6502, I've just got your point maybe ...

append delete #74. gardners


That is what would work if it were possible to set this flag from outside of the hypervisor. However, at the moment, it can only be done by the hypervisor. Thus we need a patch in kickstart to either set it automatically, or to have an trap that allows you to ask for it to be enabled or disabled.

I have just patched the m65pcb-ports branch so that:
1. By default, ROM is write-protected.
2. You can ask the hypervisor to make it read-write with lda #$02 / STA $D642 from a program. Note that this is a hypervisor trap, not an ordinary register access.

These are untested, and I haven't built a bitstream using them yet.

LGB: Are you able to test this, or do you need a bitstream generated for you to test against? (I don't have an M65 here in Vanuatu, so can't test it myself).


append delete #75. LGB

@gardners I can make the bitstream if you've committed the patch as you've described. Truly, it takes something like four hours on my not so fast machine to do, but anyway :D

I will test it

:: @LGB added on 07 Sep ’17 · 16:11

I am not yet at home, this will be at the evening (according to CEST). But what I don't understand too much: KS seems to do this now (according to your commit):

lda #$02
tsb hypervisor_hardware_virtualisation ; $d659

How it does work? As it's a hypervisor trap as you told, but now it's used from the KS in already hypervisor mode (?). Have I missed something? Does a hypervisor trap work when you are in hypervisor mode already? Shouldn't be $D67D written too in KS in hypervisor mode? hypervisor_feature_enables, I mean. Like here:

lda #$6a ; 01101010
sta hypervisor_feature_enables

But, I am sure I miss something, though I would need to understand it, to develop Xemu further.

:: @LGB added on 07 Sep ’17 · 20:36

Oh, and what is $D642? Afaik that should be Y register storage in hypervisor mode. I am quite confused now, since at least three different addresses are mentioned in relation of ROM protection on/off stuff ... :(


(Leave this as-is, it’s a trap!)

There is no need to “register”, just enter the same name + password of your choice every time.

Pro tip: Use markup to add links, quotes and more.

Your friendly neighbourhood moderators: Deft, gardners, Ralph Egas