How to Create Chip Support Scripts


Summary

Every Cortex-M chip has its own way to write flash. Usually this is documented in the chip's user guide. Generally these methods are some combination of loading data into latches, setting some bits, and some sort of waiting. In some cases the chip has a ROM and that handles flash writing. In order to program a chip's flash CortexProg needs to correctly identify the chip. After that, it'll load a piece of code into the chip's RAM, and use it to write to flash. In CortexProg's terms, this piece of code is called a device script or a chip support script.


Identifying the Chip

Chip identification is performed in a few steps. First, the ROMTABLE in the chip is queried to read the JEDEC ID from the chip. Some chips do not have this set at all - all zeroes. Some other manufacturers use just one ID for all their chips. Others yet use a different ID for each chip in a family. Regardless of which case applies, the matching here is performed using masks and comparison values. A normal ROMTABLE CPUID has 8 32-bit words. Each device script thus has 8 32-bit masks and 8 32-bit compare values. If the read values AND-ed with the masks match the compare values, the chip identification proceeds to the next step.

As mentioned above, some chips have a useless set of ROMTABLE ID values. How does CortexProg disambiguate these? MatchVals are the answer. Every device script can have any number of MatchVals. What is a MatchVal? It is a set of three 32-bit words. First is an address to read, the second is a mask to apply, and the third is the value to match against. This can be used for a number of ways. For example, with mask and match both set to zero, a MatchVal can be used to verify that a given address is readable. Why would this be useful? For example, PSoC4 chips have some flash at 0x0FFF0000, which is a very strange uncommon address. So, a MatchVal of {0x0FFF0000, 0, 0} can be used to check for a PSoC4 chip. Another example could be the device script for the STM32F4xxx. STM32F4xxx has an ASCII device "unique id" stored at the 8 bytes at 0x1FFF7A14. Two MatchVals can be used to probabilistically eliminate devices that are definitely not that. The values would be {0x1FFF7A14, 0x80808080, 0} and {0x1FFF7A18, 0x80808080, 0}. If both of these MatchVals are verified, we know that likely the 8 bytes at 0x1FFF7A14 are ASCII (top bits clear). The STM32L1xx series chips can provide another example. These chips all have a "device id" at 0xE0042000. The values vary, but they all agree in the bits 0xFF070FC0. If AND-ed with that, all IDs are 0x10000400. Thus to eliminate many devices that are not STM32L1xx, the MatchVal is {0xE0042000, 0xFF070FC0, 0x10000400}. If MatchVals all match, the chip support script may be the correct one for a given chip. The next step will determine for sure.

Some chip families are simply badly designed in terms of making them identifiable. A great example of this is the LPC13xx family. Some members of this family have a device ID register at 0x400483F4 and some at 0x400483F8. Of course, given that the address is not sure, a MatchVal will not help here. The last step in chip identification is the cpuid() entrypoint in the chip support script. How chip support script entrypoints work will be explained later, but for now, about this cpuid() function. This function will be called after the script is loaded into RAM. It can do whatever it wants to positively determine if this chip support script can support this chip. To confirm that the function ran well, it will return a special value 0xACEFACE5 in the R1 register. In R0, it will return a zero if the chip cannot be supported, and anything nonzero otherwise. In the case of LCP13xx, the sample device script will match all the known device IDs against a list and return whether a value was found.


Script File Format

The device script is loaded into the chip's RAM when the chip is being identified, when chip's flash layout is needed and/or when flash needs to be written. The build system for device scripts does the proper thing to make this happen. When you write a device script, you need to decide where in RAM to place it. Generally the start of RAM is a good place. The file name of the source code of your device script is usually of the form FreescaleMKE04.1fffff00.m0.S, where "FreescaleMKE04" is the file name you pick, which has no meaning, "1fffff00" is the 32-bit address where you'd like your file to be loaded, "m0" means the CPU core is a Cortex-M0 (for M3 or M4, use "m3"). The Makefile in the SCRIPTS folder will build all scripts in the folder. The device script is basically just an ARM assembly file with a particular structure. Each file begins with a constant header:

.syntax unified
.thumb
.section .text

.globl entrypts
entrypts:

syscall:
1:
  nop
  b 1b

The syscall() area is important and must be exactly these two instructions. The specifics of it will be explained later. The next 7 words are either valid BL instructions that jump to the proper functions or exactly .word 0 in case a given entrypoint is not supported. The current version of the device script format defines 7 entrypoints, which in order are: init_stage_1(), init_stage_2(), init_stage_3(), mass_erase(), block_erase(), block_write(), cpuid_verify(). BL instructions are used because they are always 4 bytes long, which is the real requirement here. A B.W, or a B.N followed by a NOP would also do. None of these entrypoints ever return in a conventional way. They all "return" to the debugger by executing a BKPT instruction. After the entrypoint jumps, the actual code for them comes next. It will be explained later. Let's look at the file footer first.

.align 2
.section .text.2
.globl info
info:

//checkvals
  //make sure the ROM is where it should be
  .word 0x1fff0000
  .word 0x00000000
  .word 0x00000000
  
  //make sure the flash is where it should be
  .word 0x08000000
  .word 0x00000000
  .word 0x00000000
  
  //make sure device ID is where it should be
  .word 0x1FFF7A10
  .word 0x00000000
  .word 0x00000000
  
  //device ID word 2 is ascii?
  .word 0x1FFF7A14
  .word 0x80808080
  .word 0x00000000
  
  //device ID word 3 is ascii?
  .word 0x1FFF7A18
  .word 0x80808080
  .word 0x00000000
  

//now the fixed-length footer:
  //load address of this code
  .word 0x20000000

  //flash staging area in ram
  .word 0x20000400

  //number of checkvals above
  .word 5
  
  //reserved for future use and should be zero
  .word 0
  
  //cpuid value masks from cpuid
  .word 0xffffffff
  .word 0xffffffff
  .word 0xffffffff
  .word 0xffffffff
  .word 0xffffffff
  .word 0xffffffff
  .word 0xffffffff
  .word 0xffffffff
  
  //cpuid match values from cpuid
  .word 0x00000000
  .word 0x00000000
  .word 0x00000000
  .word 0x00000000
  .word 0x00000011
  .word 0x00000004
  .word 0x0000000a
  .word 0x00000000
  
  //this word must be zero
  .word 0
  
  //human-friendly name
  .ascii "STM32F4-series"

This is the file footer for the STM32F4xxx device script. First, comes the info header. this indicates to the build system that the following data does not need to be loaded into RAM when the script is loaded. This information is purely informational. Next, come the MatchVals. Each one is three words, as described earlier. As many of these as you'd like can be present. The device will be considered to have matched if every listed MatchVal matches. Next, comes some more information. First, a word that represents the load address of this script. Next is the flash data staging address. More will be explained about this later. The next word is the number of MatchVals that there are. In this example, there are 5. The next word is reserved and should be zero. Next come the 8 32-bit masks for the ROMTABLE ID. And after them, the 8 matching values. For STM32F4xxx, the ROMTABLE ID is exact, for some chips you might want to unmask some bits for matching. The next word is reserved for future use and is currently required to be zero. The last line is the user-visible CPU or CPU family name. It is used only for user-facing functionality and has no other purpose.


Init Entrypoints

The script must have at least one of the init entrypoints. Up to three can be present: init_stage_1(), init_stage_2(), init_stage_3(). They will all be called in order until the last is reached. The last defined init entrypoint must provide to CortexProg the flash layout of the chip. Why does the device script do this? Oftentimes CPUID and MatchVals are not enough to uniquely identify the chip in the sub-family it is in. But different chips in the same family may have differing amounts of flash. Usually by running in the chip, the code can determine the amount and the layout of flash. This is why the script does this on the chip. Upon return, the last implemented init entrypoint must return in the R0 register a word-aligned address of a fully-populated FlashInfoRaw structure. Optionally in R1, the device script can return some information on the names of various flash regions. If this info is provided, the script should set R1 to a word-aligned pointer to a filled-in FlashNamesRaw structure. If no name info is provided by this script, R1 should be set to zero. In R2 the device script returns the flags bitfield. Currently no bits of it are defined and thus R2 must be returned as zero. What does the init code do? It can prepare some register values (Registers R4 - R11 and SP are preserved across entrypoint calls), unlock flash, etc. The flash layout info data and flash name-info data may overlap the usual flash staging area. CortexProg will make a copy so this is OK. The flash layout and name info structs look like this:


Flash Info

struct FlashInfoRaw {
  uint32_t numPieces;
  struct FlashInfoPieceRaw {
    uint32_t base;
    uint8_t eraseSz;
    uint8_t writeSz;
    uint16_t numIdentical; //0 means 65536
  } pieces[];
};

struct FlashNamesRaw {
  uint32_t numPieces;
  struct FlashNamePieceRaw {
    uint32_t base;
    uint32_t nameStrAddr;
  } pieces[];
};

//ENCODING info for eraseSz & writeSz:
if (enc < 32)
  val = 1 << enc
if ((enc & 0xE0) > 0x80 && (enc & 0x1F) < 30)
  val = ((enc & 0xE0) >> 5) << (enc & 0x1F)
else
  val = UNDEFINED, RESERVED FOR FUTURE USE

The flash info structure is designed for maximum flexibility and compactness. Some chips have non-equal flash block sizes, some have block sizes that are not powers of two. All of this is supported. A layout, for example, where flash starts at 0x08000000 and has 4 16K blocks and then a 64K block, could be encoded in just two FlashInfoPieceRaw entries. The first will have a base value of 0x08000000, sz vals will be set to 14, which decodes to 16K. The numIdentical value will be set to 4, since 4 such identical block exist. The second field will have the base set to 0x08010000 and sizes set to 16 (decodes to 64K) and numIdentical set to 1, since there is one such block. Some chips have discontinuous flash blocks, this is also supported. In some cases, the flash blocks are not even a power of two in size. Some more common cases of this are supported. Any size that can be represented by the encoding shown above can be used. For example, to encode 48K, the value would be 0xCD. The numIdentical value can encode 1 - 65,536 identical blocks.

Why are there different erase and write sizes given? Due to how flash is written (more about flash staging later), there may not be enough RAM in a chip to write a complete erase block at a time. CortexProg allows the write size to be equal to or smaller than the erase size, as long as it is an integer multiple of the erase size. Also, the write size is the smallest write that CortexProg will allow. Any writes smaller than this will be rounded up to this size and zero-padded. The tradeoff here is that if this is too small, writing will be slow, if this is too large, too much data will be required to be written. A good decision usually is setting write size to the same value as erase size, if there is enough RAM. If there is not, something like 8K - 16K is good. If that does not fit, decide how much RAM there is for staging and use that. Erase sizes are a lot less flexible. CortexProg expects to be able to erase any given erase block at any time without affecting any others around it. This means that the erase block layout must match what the chip flash really looks like or be a superset of it.

CortexProg will reassemble all contiguous flash chunks into an area for UI purposes. A chip may have an arbitrary number of areas, but most will have 1 - 4. You may name areas. This is what flash name info structure is for. Each entry is composed of just two pieces of information. The first is the base address of the area and the second is a pointer to a zero-terminated string that is the name of this area. This can be used, for example for convenience. In an nRF52 chip, the name info might have the info {2, {0x00000000, "FLASH"}, {0x10001000, "UICR"}}. CortexProg will read and cache all of this data, so it may overlap the flash staging area for RAM savings.


Syscalls

During CPU identification, it may be convenient to read memory without taking a fault in case the memory is unreadable. It may also be convenient to be able to set breakpoints and watchpoints. Why? In some chips the ROM needs to be run before the flash is ready. How can one intercept the ROM being done? Usually a watchpoint on the address 0 works, since the ROM will read it to see if there is a valid application. And what would reading memory without faulting help with? CPU identification. CortexProg provides all of these features to the running device script for your convenience. That is what the syscall() label is for. It may look like an infinite loop, but it is not actually one. You can call it via a BL instruction with the proper register values to get these services. It will return like a function too. What are the requisite register values? R0 gets the syscall number. Zero is for "set watchpoint", one is for "clear watchpoint", and two is for "safe read". For syscall zero, the parameters are in R1(address), R2(size), and R3(type). The known types are: PC aka breakpoint(4), read (5), write (6) and access (7). There is no parameter to syscall number one. Syscall number two takes a single parameter in R1 - the address. It returns a boolean indicating success in R0 and the read value, if any is returned in R1. For many device scripts, none of this is necessary. Only a few cases for these exist, but they are provided just for those cases.


Flash Entrypoints

The other entrypoints are simple. mass_erase() entrypoint takes no parameters and simply performs whatever this chip considers to be a mass erase. If the chip has no such functionality, this entrypoint maybe omitted and flash will be erased block-by-block. Keep in mind that in some chips mass-erase is not exactly the same as erasing all the flash block-by-block so do consider implementing this. This entrypoint will return a value in the R0 register to indicate its success. A zero indicates a failure, anything else will be considered a success. The block_erase() entrypoint receives a single parameter in R0 upon entry - the start address of a flash block to erase. It should do this and return the result, in R0, just like mass_erase(). This entrypoint is also optional. If it is omitted, users can just mass-erase the chip before each write. Once again, while it is optional, it is a good idea to implement this.

The block_write() entrypoint is the most interesting of these. Its purpose is to write flash. Its only parameter, just like block_erase(), is the address of a write block, passed in R0. Where is the data then? It is staged at the address given in the device script footer. The amount of data will always match the write block size described in the flash layout for this exact block starting at this exact address. The return code is, just like above, stored in R0.


The Process for Writing a Device Script

To write a device script, there are a few steps to follow. Usually it is simplest to start with an existing one to modify. Once you figure out if the chip is a Cortex-M0/M0+ or M3/M4/M7, rename the script appropriately. Next, you'll need the ROMTABLE ID values. Few, if any, manufacturers document this, so the simplest way to find these values is to connect the chip in question to a CortexProg and run the command CortexProg info. This will produce a lot of output, including the array of ROMTABLE ID values. In almost all cases, the mask values you'll want are all 0xFFFFFFFF and the match values you'll want are the ID values CortexProg shows you. In rare cases multiple ID values exist in a single chip family. Try a few chips to get an idea of this and adjust mask and match values appropriately. Next up, it is a good idea to look for any sort of "device id" memory locations in the chip documentation. If any exist, create MatchVals for them. If you find none, look for strange memory locations in the documentation (something that most chips do not have). For example, PSoC4's 0x0FFF0000 SFLASH is a rather uncommon location. A MatchVal for mask and val set to zero can be used to verify this address exists and make the chance that your script matches another device less likely. It is very uncommon that you'd need a cpuid() entrypoint. Next step is to figure out what the memory layout of the chip's flash is. In most cases this is just some number of identical-sized flash blocks starting at 0x00000000. Sometimes, it is something more complex. Most chip families have members with differing flash size amounts. There's usually a register that describes the flash size. In some cases, when there is not, one can call some flash operation on flash and see where it starts giving error as a way to see how much flash there is. Operations like "verify" usually are of use for this. See the PSoC4 script for an example. If all else fails, try syscall() number 2 to probe how far you can read to gauge flash size. Create the code to lay out the flash info structures in RAM. If the chip has disjoint flash areas, names are nice to have. Lay that info out as well. In most cases, just one init function is enough - init_stage_2() and init_stage_3() will not be implemented. The exception is when you need to let a ROM run. Then you'd set a watchpoint to catch it in init_stage_1() and let it run. Then when it hits, init_stage_2() will be run and you can finish. Last step is to write the code to erase and write flash. Here, you should just follow what the chip documentation says should be done. The examples provided can be used as a reference. Download the Reference Manuals for the example chips and compare to the code in the sample device scripts. At this point, after testing, your device script should be complete. If you want, send it to use here and we'll post it on the site for others to use and enjoy.


Special Scripts

Some chips have special operations that need to be performed in complicated ways. An example is writing the option bytes in STM32F1, or unlocking various chips from being code readout protected. Usually such operations require much lower-level operations than simple memory I/O. In some cases they require parameters. In either case, CortexProg provides a framework to perform all of these actions. How? CortexProg embeds a LUA interpreter. You can write relatively complex LUA scripts which have access to the debugger and the debugged chip. The exported functions include things as low level as raw SWD transactions, and as high level operations as memory and register access and flash writing via a device script. Samples provided include unlocking chips from readout protection, PSoC4 ROM dumping, etc. The special scripts have two required functions: init() and main(step, haveDbg, haveCpu, haveScpt). Init will return an integer that is an OR-ed mask of what this special script needs. The defines that define these bits are all named starting with TOOL_OP_WANTS_ or TOOL_OP_NEEDS_. These will be used to make sure the script has what it needs. Through this mechanism the special script can request that the chip be identified, flash be loaded, etc. The main() function will be called multiple times, depending on the stage CortexProg is operating on. These steps are defined in the defines named starting with TOOL_OP_STEP_. Here is the information on the exported defines and functions.

TOOL_OP_WANTS_DEBUGGER        0x00000001  //wants debuggger but can live without it
TOOL_OP_NEEDS_DEBUGGER        0x00000002  //needs debuggger
TOOL_OP_WANTS_CPU             0x00000004  //wants cpu to be attached but can live without it
TOOL_OP_NEEDS_CPU             0x00000008  //needs cpu to be attached
TOOL_OP_WANTS_SCRIPT          0x00000010  //wants script but can live without it
TOOL_OP_NEEDS_SCRIPT          0x00000020  //needs script
TOOL_OP_NEEDS_SCPT_WRITE      0x00000040  //needs script to be able to write
TOOL_OP_NEEDS_SCPT_ERASEBLOCK 0x00000080  //needs script to be able to erase a block
TOOL_OP_NEEDS_SCPT_ERASEALL   0x00000100  //needs script to be able to erase whole chip

TOOL_OP_STEP_PRE_DEBUGGER     0 //before debugger is contacted
TOOL_OP_STEP_PRE_DEBUGGER_ID  1	//before debugger is contacted
TOOL_OP_STEP_PRE_CPUID        2 //before CPUID is done
TOOL_OP_STEP_PRE_SCRIPT       3 //before script is loaded
TOOL_OP_STEP_POST_SCRIPT      4 //after script is loaded (but before init is called)
TOOL_OP_STEP_POST_SCRIPT_INIT 5 //after script is inited (do not do work here, just preflight it)
TOOL_OP_STEP_MAIN             6 //do actual work here

CPU_STAT_CODE_HALT_OR_STEP    0x01 //halt or step happened
CPU_STAT_CODE_BKPT            0x02 //BKPT executed
CPU_STAT_CODE_DWPT            0x04 //data watchpoint
CPU_STAT_CODE_VCATCH          0x08 //vector happened
CPU_STAT_CODE_EXTERNAL        0x10 //external debug request

SCRIPT_OP_FLAG_HAVE_ERASE_ALL    1
SCRIPT_OP_FLAG_HAVE_ERASE_BLOCK  2
SCRIPT_OP_FLAG_HAVE_WRITE_BLOCK  4

//funcs
dbgSwdRead(ap, a23) -> u32 OR nil
dbgSwdWrite(ap, a23, u32 val) -> true OR nil
cpuWordRead(addr) -> u32 or nil
cpuWordWrite(addr, u32 val) -> true or nil
cpuRegGet(u32 regNum) -> u32 or nil
cpuRegSet(u32 regNum, u32 val) -> true or nil
cpuStop() -> CPU_STAT_CODE_* or nil
cpuReset() -> true or nil
cpuGo() -> true or nil
cpuStep() -> CPU_STAT_CODE_* or nil
cpuIsStoppedAndWhy() -> CPU_STAT_CODE_* or nil
cpuHasFpu() -> bool
cpuIsV7() -> bool

scriptGetFlashWriteStageAreaAddr() -> u32
scriptGetSupportedOps() -> OR mask of SCRIPT_OP_FLAG_HAVE_*
scriptGetFlashBlockSize(u32 base, bool forWrite) -> u32 or nil // get flash block info

scriptEraseAll() -> bool
scriptEraseBlock(u32 addr) -> bool
scriptWriteBlock(u32 addr) -> bool


Samples

CortexProg sources ship with lots of examples of all the things you see described in this document. Download the sources and play with them for yourself to learn more and write your own chip support scripts or special scripts. If you write some that you want to share, please contact us, and well be happy to post them here on the CortexProg website.


Build one

CortexProg AVR schematic
You can build a CortexProg-AVR device easily using an ATtiny85 device. It will be fully supported by the PC-side CortexProg tool in every way. Simply put, it is just the standard V-USB device with ModulaR bootloader and the CortexProg-AVR firmware on top. You can read more about it all here. The downloads page here will even host firmware updates we'll provide for free to add future features and/or bug fixes to your homebuilt CortexProg-AVR.

© 2017-2018