Firmware ======== The project's fitmware is split in three parts: the bootloader, the main firmware and the test firmware. PMBus command infrastructure ---------------------------- A common command handling infrastructure has been put in place, such that both the main firmware and the bootloader can easily implement different subsets of PMBus and extended commands. The basic construct of this implementation is the :code:`cmd_t` structure: .. doxygenstruct:: cmd_t :project: common :members: An array of these structs makes up a command space: .. doxygenstruct:: cmd_space_t :project: common :members: From the user's point of view, these structures are defined and used just once, in the function .. doxygenfunction:: setup_I2C_slave :project: common :outline: This function will configure the inturrupt handlers, below, with the command spaces defined in the specific user implementation (main firmware or bootloader). From that point on, the only interaction will be through the user-defined callbacks. .. doxygengroup:: I2C_int_handlers :project: common :outline: :content-only: Bootloader ---------- The bootloader, after bringing up the device, will check for the special word :code:`0xBEC0ABCD` in the flash storage (see struct below) and, depending on the value, will either hand control to the main FW, or enter remote programming mode. .. doxygenstruct:: user_flash_t :project: common :members: PMBus and extended commands overview ------------------------------------ The full list of PMBus and extended commands implemented by the MoniMod can be found in :numref:`Tables %s ` and :numref:`%s `. All physical quantities are expressed in the 16-bit PMBus Linear data format (:numref:`linear_fmt`), instead of the Direct format PMBus also supports, (which is somewhat more complex). An 11-bit mantissa (Y) and a 5-bit exponent (N), expressed in 2's complement, form a floating-point number X according to :math:`X = Y \cdot 2^N`. .. _linear_fmt: .. figure:: figures/latex/linear.png :align: center :scale: 50% The PMBus Linear data format .. _pmbus_cmds: .. table:: PMBus commands implemented by the MoniMod +--------------+--------------------+------------------------+-------------+----------------------+ | Command code | Command name | Transaction type | Data length | Description | +==============+====================+========================+=============+======================+ | 00 | PAGE | Byte write / read | 1 | set get page | +--------------+--------------------+------------------------+-------------+----------------------+ | 1A | QUERY | Block w / r proc. call | 1 | query cmd props | +--------------+--------------------+------------------------+-------------+----------------------+ | 3A | FAN_CONFIG_1_2 | Byte write / read | 1 | config fans 1&2 | +--------------+--------------------+------------------------+-------------+----------------------+ | 3B | FAN_COMMAND_1 | Word write / read | 2 | set fan 1 speed | +--------------+--------------------+------------------------+-------------+----------------------+ | 3C | FAN_COMMAND_2 | Word write / read | 2 | set fan 2 speed | +--------------+--------------------+------------------------+-------------+----------------------+ | 3D | FAN_CONFIG_3_4 | Byte write / read | 1 | config fan 3 | +--------------+--------------------+------------------------+-------------+----------------------+ | 3E | FAN_COMMAND_3 | Word write / read | 2 | set fan 3 speed | +--------------+--------------------+------------------------+-------------+----------------------+ | 8B | READ_VOUT | Word read | 2 | read voltage | +--------------+--------------------+------------------------+-------------+----------------------+ | 8C | READ_IOUT | Word read | 2 | read current | +--------------+--------------------+------------------------+-------------+----------------------+ | 8D | READ_TEMPERATURE_1 | Word read | 2 | read temp. sensor 1 | +--------------+--------------------+------------------------+-------------+----------------------+ | 8E | READ_TEMPERATURE_2 | Word read | 2 | read temp. sensor 2 | +--------------+--------------------+------------------------+-------------+----------------------+ | 8F | READ_TEMPERATURE_3 | Word read | 2 | read temp. sensor 3 | +--------------+--------------------+------------------------+-------------+----------------------+ | 90 | READ_FAN_SPEED_1 | Word read | 2 | read fan 1 speed | +--------------+--------------------+------------------------+-------------+----------------------+ | 91 | READ_FAN_SPEED_2 | Word read | 2 | read fan 2 speed | +--------------+--------------------+------------------------+-------------+----------------------+ | 92 | READ_FAN_SPEED_3 | Word read | 2 | read fan 3 speed | +--------------+--------------------+------------------------+-------------+----------------------+ | 96 | READ_POUT | Word read | 2 | read power | +--------------+--------------------+------------------------+-------------+----------------------+ | 99 | MFR_ID | Block read | var | manufacturer ID | +--------------+--------------------+------------------------+-------------+----------------------+ | 9A | MFR_MODEL | Block read | var | model | +--------------+--------------------+------------------------+-------------+----------------------+ | 9B | MFR_REVISION | Block read | var | revision | +--------------+--------------------+------------------------+-------------+----------------------+ | 9C | MFR_LOCATION | Block read | var | location | +--------------+--------------------+------------------------+-------------+----------------------+ | 9D | MFR_DATE | Block read | var | date | +--------------+--------------------+------------------------+-------------+----------------------+ | 9E | MFR_SERIAL | Block read | var | serial number | +--------------+--------------------+------------------------+-------------+----------------------+ | FF | PMBUS_COMMAND_EXT | Extended command | var | access extended cmds | +--------------+--------------------+------------------------+-------------+----------------------+ .. _ext_cmds: .. table:: Extended commands implemented by the MoniMod +--------------+-------------------+------------------------+-------------+-------------------------------+ | Command code | Command name | Transaction type | Data length | Description | +==============+===================+========================+=============+===============================+ | 01 | WRITTEN_FW_SIZE | Word write | 2 | size of the FW to be written | +--------------+-------------------+------------------------+-------------+-------------------------------+ | 02 | WRITTEN_FW_BLOCK | MultiByte write | 8 | FW block to be written | +--------------+-------------------+------------------------+-------------+-------------------------------+ | 03 | WRITTEN_FW_CHKSUM | Word write | 2 | checksum of the written FW | +--------------+-------------------+------------------------+-------------+-------------------------------+ | 05 | BOOT_NEW_FW | Byte write | 1 | turn on btldr pgm mode, reset | +--------------+-------------------+------------------------+-------------+-------------------------------+ | 06 | UC_RESET | Byte write | 1 | reset the uC | +--------------+-------------------+------------------------+-------------+-------------------------------+ | A0 | TMR_ERROR_CNT | MultiByte read | 4 | get TMR error count | +--------------+-------------------+------------------------+-------------+-------------------------------+ | B0 | USE_PEC | Byte write / read | 1 | turn PEC on / off | +--------------+-------------------+------------------------+-------------+-------------------------------+ | C0 | TEMP_CURVE_POINTS | MultiByte write / read | 13 | set / get temp. curve points | +--------------+-------------------+------------------------+-------------+-------------------------------+ | C1 | TEMP_MATRIX_ROW | MultiByte write / read | 7 | set / get temp. matrix points | +--------------+-------------------+------------------------+-------------+-------------------------------+ | C4 | TC_ONOFF | Byte write / read | 1 | turn temp. control on / off | +--------------+-------------------+------------------------+-------------+-------------------------------+ Detailed list of PMBus and extended commands -------------------------------------------- PAGE ~~~~ | Command code: **00** | Transaction type: **Byte write / read** | Data length: **1** The PAGE command is used to select a power rail for the READ_VOUT, READ_IOUT and READ_POUT commands. Allowed values for the page parameter are :math:`0 \leq N \leq 2`. QUERY ~~~~~ | Command code: **1A** | Transaction type: **Block w / r proc. call** | Data length: **1** The QUERY command takes a command code as an argument and replies with information on the command: whether it is supported, if read or write is supported, and what data format it works with. FAN_CONFIG_n_m ~~~~~~~~~~~~~~ | Command codes: **3A, 3D** | Transaction type: **Byte write / read** | Data length: **1** The FAN_CONFIG_1_2 and FAN_CONFIG_3_4 commands are used to configure the fans at positions 1, 2, and 3. The format of the configuration byte can be seen in :numref:`fan_config_cmd`. The two bits that set the tachometer pulses / revolution, which take the values 0--3, correspond to 1--4 pulses per revolution. .. _fan_config_cmd: .. table:: FAN_CONFIG_1_2 and FAN_CONFIG_3_4 data byte format +--------+-----------+-----------------------------------+ | Bit(s) | Value | Meaning | +--------+-----------+-----------------------------------+ | 7 | 1 | Fan 1 / 3 installed | + +-----------+-----------------------------------+ | | 0 | Fan 1 / 3 not installed | +--------+-----------+-----------------------------------+ | 6 | 1 | Fan 1 / 3 commanded in RPM | + +-----------+-----------------------------------+ | | 0 | Fan 1 / 3 commanded in duty cycle | +--------+-----------+-----------------------------------+ | 5:4 | 0--3 | Fan 1 / 3 tachometer pulses / rev | +--------+-----------+-----------------------------------+ | 3 | 1 | Fan 2 installed | + +-----------+-----------------------------------+ | | 0 | Fan 2 not installed | +--------+-----------+-----------------------------------+ | 2 | 1 | Fan 2 commanded in RPM | + +-----------+-----------------------------------+ | | 0 | Fan 2 commanded in duty cycle | +--------+-----------+-----------------------------------+ | 1:0 | 0--3 | Fan 2 tachometer pulses / rev | +--------+-----------+-----------------------------------+ FAN_COMMAND_n ~~~~~~~~~~~~~ | Command code: **3B, 3C, 3E** | Transaction type: **Word write / read** | Data length: **2** The FAN_COMMAND_n commands set the desired speed of the attached fans. The value set is either in RPMs (when the fan is configured to be controlled like that) or duty cycle, in the range 0--1000. READ_VOUT ~~~~~~~~~ | Command code: **8B** | Transaction type: **Word read** | Data length: **2** The READ_VOUT command is used to get the measured voltage of the rail indicated by the last PAGE command (by default that would be the first one). READ_IOUT ~~~~~~~~~ | Command code: **8C** | Transaction type: **Word read** | Data length: **2** The READ_IOUT command is used to get the measured current of the rail indicated by the last PAGE command (by default that would be the first one). READ_TEMPERATURE_N ~~~~~~~~~~~~~~~~~~ | Command code: **8D, 8E, 8F** | Transaction type: **Word read** | Data length: **2** The READ_TEMPERATURE_n commands return the measured temperature from the three installed temperature sensors. READ_FAN_SPEED_N ~~~~~~~~~~~~~~~~ | Command code: **90, 91, 92** | Transaction type: **Word read** | Data length: **2** The READ_FAN_SPEED_n return the fan speed of an installed fan, or 0 in case no fan is installed in the pertinent location. READ_POUT ~~~~~~~~~ | Command code: **96** | Transaction type: **Word read** | Data length: **2** The READ_POUT command is used to get the measured power of the rail indicated by the last PAGE command (by default that would be the first one). MFR_ID ~~~~~~ | Command code: **99** | Transaction type: **Block read** | Data length: **var** This returns the manufacturer ID string, "CERN (BE/CO)". MFR_MODEL ~~~~~~~~~ | Command code: **9A** | Transaction type: **Block read** | Data length: **var** This returns the manufacturer model string, "DI/OT MoniMod". MFR_REVISION ~~~~~~~~~~~~ | Command code: **9B** | Transaction type: **Block read** | Data length: **var** This returns the manufacturer revision string. MFR_LOCATION ~~~~~~~~~~~~ | Command code: **9C** | Transaction type: **Block read** | Data length: **var** This returns the manufacturer ID string, "Geneva". MFR_DATE ~~~~~~~~ | Command code: **9D** | Transaction type: **Block read** | Data length: **var** This returns the manufacturer date string, which currently corresponds to the date of the last release (and not the build used, for example). MFR_SERIAL ~~~~~~~~~~ | Command code: **9E** | Transaction type: **Block read** | Data length: **var** This returns a manufacturer serial string (currently unused, returns "123456789"). PMBUS_COMMAND_EXT ~~~~~~~~~~~~~~~~~ | Command code: **FF** | Transaction type: **Extended command** | Data length: **var** To access the extended commands, described below, the PMBUS_COMMAND_EXT is used: the command code of the extended command is passed as the next data byte and the rest of the transaction continues like a regular command. (ext.) WRITTEN_FW_SIZE ~~~~~~~~~~~~~~~~~~~~~~ | Ext. Command code: **01** | Transaction type: **Word write** | Data length: **2** Before writing a new FW binary through the bootloader, its size in bytes has to be given using this command. (ext.) WRITTEN_FW_BLOCK ~~~~~~~~~~~~~~~~~~~~~~~ | Ext. Command code: **02** | Transaction type: **MultiByte write** | Data length: **8** A new binary is written to the bootloader in consecutive chunks of 8 bytes, using this command. (ext.) WRITTEN_FW_CHKSUM ~~~~~~~~~~~~~~~~~~~~~~~~ | Ext. Command code: **03** | Transaction type: **Word write** | Data length: **2** After setting the size of the FW binary with WRITTEN_FW_SIZE and writing it with the WRITTEN_FW_BLOCK command, its SYS-V checksum should be checked with this command. This command also resets the write pointers. (ext.) BOOT_NEW_FW ~~~~~~~~~~~~~~~~~~ | Ext. Command code: **05** | Transaction type: **Byte write** | Data length: **1** The BOOT_NEW_FW command passes execution to the bootloader. A special code is written to the flash memory to direct the bootloader to switch to PMBus mode, supporting extended commands. When already in bootloader mode, this clears the special code and boots to the main FW, instead. (ext.) UC_RESET ~~~~~~~~~~~~~~~ | Ext. Command code: **06** | Transaction type: **Byte write** | Data length: **1** Writing any byte to this command triggers a uC reset. (ext.) TMR_ERROR_CNT ~~~~~~~~~~~~~~~~~~~~ | Ext. Command code: **A0** | Transaction type: **Word read** | Data length: **4** When software mitigation through COAST is enabled (see :ref:`coast_sec`), one can access the TMR_ERROR_CNT counter using this command. (ext.) USE_PEC ~~~~~~~~~~~~~~ | Ext. Command code: **B0** | Transaction type: **Byte write / read** | Data length: **1** The SMBus specification indicates that a device's PEC support could be enabled or disabled at will. Using this command with a zero byte disables PEC; any non-zero value enables it. The command itself is used without a PEC byte appended, no matter whether the function is enabled or not. (ext.) TEMP_CURVE_POINTS ~~~~~~~~~~~~~~~~~~~~~~~~ | Ext. Command code: **C0** | Transaction type: **Block write / read** | Data length: **13** As described in the :ref:`temp_control_sec` section, the temperature curve can be set separately for each fan. To do this, the format in :numref:`temp_curve_fmt` has to be used. .. _temp_curve_fmt: .. figure:: figures/latex/temp_curve_format.png :align: center :scale: 33% Temperature curve data frame (ext.) TEMP_MATRIX_ROW ~~~~~~~~~~~~~~~~~~~~~~ | Ext. Command code: **C1** | Transaction type: **Block write / read** | Data length: **7** As described in the :ref:`temp_control_sec` section, the temperature matrix can be set separately for each fan. The data format for the operation is illustrated in :numref:`temp_matrix_fmt`. .. _temp_matrix_fmt: .. figure:: figures/latex/temp_matrix_format.png :align: center :scale: 33% Temperature matrix data frame (ext.) TC_ONOFF ~~~~~~~~~~~~~~~ | Ext. Command code: **C4** | Transaction type: **Byte write / read** | Data length: **1** Using the TC_ONOFF command with a zero argument disables Temperature Control, while any non-zero value enables it. Fan control PID --------------- When the fans connected provide a tachometer output, fan speed control can be enabled. This is implemented using PID controllers, with each fan having its own instance. The main data structure of the PID implementation is .. doxygenstruct:: pid_cntrl_t :members: This is used by the main software to set the PID setpoint, and by the PID controller to hold integration data. The main function that has to be called every timestep is described below: .. doxygenfunction:: pid_compute .. _temp_control_sec: Temperature control ------------------- The MoniMod implements a very flexible temperature control scheme. Each fan can be assigned its own 3-point temperature--speed curve, as in :numref:`temp_curve_fig`. Temperatures outside the set range will adopt the speed of the minimum and maximum temperature, accordingly. .. _temp_curve_fig: .. figure:: figures/latex/temp_curve.png :width: 50% :align: center Temperature curve Moreover, the temperature each fan considers for its curve is a weighted product of all three monitored temperatures, as in :numref:`temp_matrix_fig`. This allows one to easily configure the MoniMod to match a wide variety of fan / sensor setups, e.g.: * each fan is assigned its own temperature sensor * all three temperatures are averaged to give a more precise system temperature * one fan blows directly on a sensitive component which is monitored, the other two fans handle the rest of the system .. _temp_matrix_fig: .. figure:: figures/latex/temp_matrix.png :width: 50% :align: center Temperature mixing matrix Test firmware ------------- To help with development, a test firmware has been written for a `Feather M0 Basic `_ minimal development board. Mitigation measures ------------------- The MoniMod will be used in radiation environments. Although its function is not critical and it can be remotely reset upon loss of communication, some measures have been taken to minimize interruptions and data corruption, leading to an improved QoS. .. _coast_sec: TMR using COAST ~~~~~~~~~~~~~~~ The `COAST `_ LLVM passes can be optionally used to automatically implement TMR (Triple Modular Redundancy) in important and long-lived variables. This can particularly benefit the integrity of dynamic configuration data that gets set in the memory once and then gets read periodically, or state machines such as the one in the I2C interrupt handlers which is critical for stable communications. NOPs and trampolines ~~~~~~~~~~~~~~~~~~~~ The Program Counter is also sensitive to SEUs; in fact, execution can sometimes jump to an invalid address. To help mitigate failures owed to this mechanism, any region of unused memory space has been filled with :code:`NOP` instructions, and a small trampoline function as an epilogue that will reset the stack pointer and jump to the device initialization code. Furthermore, the instruction that comprises the main loop has been placed at a "strategic" location, aligned by :code:`0x8000`: that way, a bit-flip in any of the lower bits will send execution to the upper memory region, filled with the NOPs and concluding at the trampoline. Watchdog ~~~~~~~~ The uC integrates a watchdog peripheral: this is fed every time the main timer callback runs, i.e. every 10ms. The watchdog is set to trigger if it doesn't get fed for 20ms -- as soon as the main loop skips a beat. That ensures a quick revival of the uC and should lead to minimal downtime. Bling scrubbing ~~~~~~~~~~~~~~~ Blindly scrubbing the configuration of peripherals can be used to reduce gradual corruption of their configuration during operation. The frequency has to be carefully selected to minimize downtime. .. note:: This hasn't been implemented yet, this is a reminder to do it. Stack protection ~~~~~~~~~~~~~~~~ The compilers' stack protection feature is enabled to catch the corner case that some loop goes awry and corrupts the stack due to some SEU. In case that happens, the uC quickly gets reset. Toolchains ---------- The project can be built with GCC and Clang / LLVM compilers; one can switch between the two simply by setting a Makefile variable. Note, however, that TMR only works with Clang.