Posted by virantha on Sat 28 December 2024

Final Version of the Enigma in Amaranth HDL

This post is part 3 of the "Enigma Cipher Chip" series:

  • Part 1 - An Enigma I Cipher Machine on Skywater 130nm Via TinyTapeout
  • Part 2 - Version 1 of the Enigma in Amaranth HDL
  • Part 3 - Final Version of the Enigma in Amaranth HDL

1   Optimized Enigma Implementation

The first part of this series described the background of the Enigma Cipher Machine with the following datapath:

plugboard

Complete encryption/decryption path

In this post, I'll wrap up this series with the final, optimized version, building on the work from Part 2.

You can find the complete source, including tests, at my TT 10 repo

2   Version 2 - Pipelined datapath

My initial design modeled the Enigma shown in the figure above as a combinational datapath. The 5-bit input character flows through each block (plugboard, three rotors, reflector, three rotors, and back through the plugboard) and comes out transformed into the final encrypted 5-bit value.

For the improved version, I made each stage into a single-cycle pipeline stage. Therefore, instead of getting a scrambled character at the next clock edge, it takes, at minimum, 9 clock cycles to get through the 9 blocks. In reality, some of the blocks can take multiple cycles to generate outputs (for example, a Rotor may need to increment before scrambling, which takes multiple cycles).

2.1   V2 Rotor

Although we could have three separate physical pipeline stages for each Rotor "slot", this would mean three copies of the same logic (since each slot could potentially implement any one of the 5 Rotor wiring patterns). To keep the area lower, I decided to instead implement the three Rotors with a single logic block that can select which slot and rotor type it's currently computing as. In other words, the 3 Rotors are collapsed into a single block as shown below:

rotor schematic

Rotor conceptual view. Note that the Slots are not exactly implemented as shown for efficiency reasons, as we'll see later.

Here's how this generic Rotor block functions:

  • en, a one-hot signal, is used to select which slot is active on any given cycle.
  • The input character (muxed_din) can come from three different places based on din_sel:
    1. The plugboard (din)
    2. The previous rotor (dout)
    3. The reflector (reflector_in)
  • The scrambled character output dout is registered.
  • There are three is_at_turnover signals to denote when each slot is at its turnover point
  • There are five control signals shared among the slots:
    1. load_start: Sets the enabled Rotor's counter value from din
    2. load_ring: Sets the enabled Rotor's ring setting value from din
    3. load_rotor_type: Sets the enabled Rotor's rotor type from din
    4. inc: Increment the enabled Rotor's counter
    5. ltor: If asserted, assume the data is flowing left-to-right (reverse path after reflector). Otherwise, assume data is flowing right-to-left

There is a state machine responsible for generating the control signals which we'll discuss later. Beyond that, the most involved piece of design is representing the wiring patterns of each Rotor, and allowing each slot to contain one of 5 different wiring patterns. Let's dive into the implementation to see how the expressiveness of design hardware in Python makes using Amaranth a much more tractable and less error-prone design exercise.

2.1.1   V2 Rotor initialization

First, let's look at the setup of the Rotor component (while omitting the ports for brevity, which are shown in the diagram above anyway).

class Rotor (wiring.Component):
    # All the ports
    # ....

    N = 3  # Number of rotor slots

    def __init__(self):
        N = self.N
        self.cnts = Array([Signal(5) for i in range(N)])   # Registers for each of the rotors (as well as start position)
        self.ring_settings = Array([Signal(5) for i in range(N)])
        self.generate_wirings()
        super().__init__()

In the constructor, I'm defining two sets of registers:

  1. cnts[2:0]: A 5-bit counter for each of the 3 rotors to keep track of the current rotational position.
  2. ring_settings[2:0]: A 5-bit register for each of the 3 rotors to save the ring setting.

We don't need to save the start setting of each Rotor because we can just load that value into cnt at the start of each encryption session.

Now, let's look at the first part of the method used to generate the wiring patterns that will swizzle each input character.

 1def generate_wirings(self):
 2    from test.enigma import Enigma
 3    _wirings = []
 4    _turnovers = []
 5
 6    for rotor_type in Rotors:
 7        # Gather up the wiring and turnover positions from the golden model
 8        rotor_class = Enigma.ROTORS[rotor_type]
 9        _wirings.append(rotor_class.wiring)
10        _turnovers.append(rotor_class.turnover)
11
12        wiring_list_right_to_left = []
13        wiring_list_left_to_right = []
14
15        for w in _wirings:
16            mapping = [ord(c)-65 for c in w]
17            mapping_ltor = [0]*26
18            for i, c in enumerate(w):
19                mapping_ltor[ord(c)-65] = i
20
21            wiring_list_right_to_left.append(Array(mapping)) # right_to_left mapping
22            wiring_list_left_to_right.append(Array(mapping_ltor))
23
24        self.Wiring_right_to_left = Array(wiring_list_right_to_left)
25        self.Wiring_left_to_right = Array(wiring_list_left_to_right)
26
27        self.turnovers = Array([Const(ord(t)-65) for t in _turnovers])

In lines 2-10 we are directly importing our golden model so there's no danger of the Rotor wiring patterns and turnover points getting out of sync. I've defined 5 different rotor types (from the WWII Enigma I Army kit), so _wirings and _turnovers will be Python lists of length 5, with each entry being a string.

Next, we convert the ASCII characters into 5-bit numbers (lines 12-18), and convert the Python lists into Amaranth Arrays that can be synthesized into table lookups in the hardware.

Now, self.Wiring_right_to_left is a two-dimensional Array. It's first-dimension can be indexed at runtime by a 3-bit number that selects the Rotor type. The result of that indexing will be the wiring pattern for that Rotor, which is an 1D Array. That wiring pattern can be indexed at runtime by a 5-bit number (input contact point) that yields the output contact point (swizzle). So, for example, self.Wiring_right_to_left[2][9] returns the output contact point if entering a Rotor of type Rotor III at the 10th input contact point.

self.Wiring_left_to_right is just the reverse mapping when going back from the reflector.

2.1.2   V2 Rotor Computation

Now let's look at the elaborate method that Amaranth uses to build the hardware. The first thing we'll do in lines 2-11 is create a Module for the Rotor, and then define some hardware signals/wires with associated bit-widths. Then, we'll define a register that keeps track of what Rotor type is in each slot: self.slot[0] will store the Rotor type in Slot 0 (right-most Rotor). We initialize this (unless the state machine loads this in explicitly) with Rotor I, Rotor II, and Rotor III in slots 0, 1, and 2 respectively.

Next, we instantiate the input mux (shown in rotor_schematic on the top-right) with a Switch statement.

 1def elaborate(self, platform):
 2    m = Module()
 3    N = self.N
 4
 5    ring_setting = Signal(5)       # The enabled Rotor's ring setting
 6    cnt = Signal(5)                # The enabled Rotor's position
 7    cnt_ring_combined = Signal(5)  # Computed position based on current cnt, and ring_setting
 8    muxed_din = Signal(5)          # Output of input data mux
 9    right_ptr = Signal(5)          # Index into wiring swizzle table
10    wiring_rtol = Signal(5)        # The output character after swizzling going rtol
11    wiring_ltor = Signal(5)        # The output character after swizzling if going ltor
12    cnt_eq_25 = Signal(1)    # Whether the cnt is going to overflow
13
14    # Picking the rotor for each slot
15    self.slot = Array([Signal(3, init=0), Signal(3,init=1), Signal(3, init=2)]) # Choose from one of 8 rotor types for each of the 3 slots
16
17    # Mux on the inputs
18    with m.Switch(self.din_sel):
19        with m.Case(1):
20            m.d.comb += muxed_din.eq(self.dout)
21        with m.Case(2):
22            m.d.comb += muxed_din.eq(self.reflector_in)
23        with m.Default():
24            m.d.comb += muxed_din.eq(self.din)

Now, we come to the meat of the swizzle computation with dynamically selected Rotors and slots. Here's an interesting design pattern from Amaranth for Switch/Case statements: We can generate case statements using Python loops to reduce hugely repetitive code. Below, we're switching on a one-hot encoded en signal that picks the Rotor slot. The for-loop iterates over all the one-hot codes (and also has the associated conversion to binary rotor number) to generate each Case statement. Within each Case, we can use the for-loop variables to index into the wiring tables generated above like so:

self.Wiring_right_to_left[self.slot[rotor]][right_ptr]

With the three loop-generated Case statements below, we've taken care of selecting the proper Rotor type in each slot, and getting the associated swizzled output. The synthesis tool (Yosys) should be able to optimize all the lookup tables for the rotor wiring for us.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
    with m.Switch(self.en):
         # Pick settings based on which Rotor Slot is selected by self.en (one-hot)
        for one_hot, rotor in [(0b001, 0), (0b010,1), (0b100, 2)]:
            with m.Case(one_hot):
                m.d.comb += [
                    ring_setting.eq(self.ring_settings[rotor]),
                    cnt.eq(self.cnts[rotor]),
                    cnt_eq_25.eq(cnt==25),
                    # Pick the wiring swizzles based on which Rotor type is loaded into each slot
                    wiring_rtol.eq(self.Wiring_right_to_left[self.slot[rotor]][right_ptr]),
                    wiring_ltor.eq(self.Wiring_left_to_right[self.slot[rotor]][right_ptr]),
                ]
                with m.If(self.load_start):
                    m.d.sync += self.cnts[rotor].eq(muxed_din)
                with m.Elif(self.load_ring):
                    m.d.sync += self.ring_settings[rotor].eq(muxed_din)
                with m.Elif(self.load_rotor_type):
                    m.d.sync += self.slot[rotor].eq(muxed_din[0:3])
                with m.Elif(self.inc):
                    with m.If(cnt_eq_25):
                        m.d.sync += self.cnts[rotor].eq(0)
                    with m.Else():
                        m.d.sync += self.cnts[rotor].eq(cnt+1)
        with m.Default():
            m.d.comb += [
                ring_setting.eq(0),
                cnt.eq(0)
            ]

Now, let's take a look at how the right_ptr or index into the wiring swizzles are calculated: Recall from the previous part that our calculations looked like the following:

cnt_ring_combined = (cnt - ring_setting) % 26
contact_point     = (input_char + cnt_ring_combined) % 26
exit_point = rotor_swizzle[contact_point]
result = (exit_point - cnt_ring_combined) % 26

We want to avoid the modulus operation, so we can define subtract and addition routines that check for overflow to implement the wrap-around:

# Return (a - b) % 26
def sub2_mod_26(out, a, b):
    with m.If(b > a):
        m.d.comb += out.eq( (26-(b-a))[0:5])
    with m.Else():
        m.d.comb += out.eq( (a-b)[0:5])

# Return (a + b) % 26
def add2_mod_26(sum_signal, a,b):
    s = Signal(6)
    m.d.comb += s.eq(a+b)
    with m.If(s > 25):
        m.d.comb += sum_signal.eq( (s-26)[0:5])
    with m.Else():
        m.d.comb += sum_signal.eq( s[0:5])

We make sure to slice the output signal to 5 bits, just to avoid any issues with Amaranth's implicit extension to 6-bits and subsequent comparison.

With these helper inner functions, we can then complete the elaborate method:

sub2_mod_26(cnt_ring_combined, cnt, ring_setting)
add2_mod_26(right_ptr, muxed_din, cnt_ring_combined)

# right to left traversal
swizz_minus_cnt_ring = Signal(5)
sub2_mod_26(swizz_minus_cnt_ring, wiring_rtol, cnt_ring_combined)

# left to right traversal
swizz_l_minus_cnt_ring = Signal(5)
sub2_mod_26(swizz_l_minus_cnt_ring, wiring_ltor, cnt_ring_combined)

m.d.sync += self.dout.eq(
    Mux(self.ltor,
        swizz_l_minus_cnt_ring,
        swizz_minus_cnt_ring,))

return m

Putting everything together yields the following block diagram, with the input contact point being calculated and fed to all three slots, and then the exit contact point muxed from the correct slot and adjusted for the cnt and ring_setting before being output:

enigma_detail

A detailed view of the optimized Rotor architecture

2.2   V2 Plugboard

I'm going to keep the V1 architecture, except for substituting latches for registers. In order to this, I'll be using inferred latches and defining the structure manually instead of relying on Amaranth's memory abstraction.

We'll be mapping to the sky130_fd_sc_hd_dlxtp standard-cell in the Skywater high-density library. This is a simple level sensitive latch with no clear/reset. Since Amaranth does not support latches directly, I used a verilog module to infer a latch and imported this into my Amaranth plugboad:

dlatch

A D-latch used to implement the Plugboard lookup table

// latch.v used to infer the SKY130 d-latch
module d_latch (
    input wire d,
    input wire clk,
    output reg q
);

    always_latch begin
        if (clk) begin
            q = d;
        end
    end
endmodule

Then, I include the verilog-based d_latch into Amaranth by using the Instance class.

class Latch(wiring.Component):

    d: In(1)
    q: Out(1)
    en: In(1)

    def elaborate(self, platform):
        return Instance("d_latch", i_d = self.d, i_clk = self.en, o_q = self.q,)
plugboard

Plugboard using 1W/1R Latches

The latches are instantiated as an array, and each bit is muxed out based on the address (either the left-to-right or right-to-left input) during the scrambling phase. For loading, we first load the write address register (using wr_data and wr_addr_en). This register value will be decoded to one of 26 wordlines that will be enabled if wr_data_en is asserted. Next, we can put the actual write data on wr_data and pulse wr_data_en, making sure the data stays stable for a margin after the enable deassertion. The next section describes the FSM control that ensures this margin.

There is also a bypass path for the input data, when we want to load the Rotor values and need to pass them through unchanged.

2.3   V2 Control Finite-State-Machine (FSM)

Our controller state machine diagram is shown below. This is mostly a Moore-type machine with some Mealy-type states to keep the number of states manageable.

The state machine is designed to take a 3-bit command and 5-bit data. Each command can take a variable number of cycles to execute, and it will assert the external ready signal when it's ready to process the next command.

Command summary
Encoding Command Data Description
000 NOP N/a Do nothing
001 LOAD_START Setting 0-25 (A-Z) Set the start position of a rotor. Do this three times in succession to set each of the three rotors (right to left)
010 LOAD_RING Setting 0-25 (A-Z) Set the ring setting of a rotor. Do this three times in succession to set each of the three rotors (right to left)
011 RESET N/a Go back to the initial state
100 SCRAMBLE Input char 0-24 (A-Z) Run a letter through the rotor. The Ready signal will be asserted when the scrambled character is output
101 LOAD_PLUG_ADDR Src 0-25 (A-Z) Set an internal register to where the start of the plug should go. This command should be followed by LOAD_PLUG_DATA to set the destination
110 LOAD_PLUG_DATA Dst 0-25 (A-Z) Set the other end of the plug. Note that this connection is unidirectional, so if you want A,B connected, then you need to do two sequences of these commands to first set A->B and then B->A
111 SET_ROTORS Rotor 0-4 Pick the Rotor type for each slot where Rotor I=0, Rotor II=1, ... Rotor V=4. Do this three times in succession to pick each of the rotors (right to left). Default is Rotor I, II, III from right to left, where Rotor I is closest to the plugboard

Please see the FSM command docs for the exact set of commands and syntax available.

fsm

Finite state machine controller

Most of the FSM is self-explanatory. The only tricky part is the way the double-step is implemented; we use a state variable double_step to note when the next character should double step Rotor 1 and step Rotor 2.

Amaranth makes it very easy to take such a diagram and generate FSMs. Each state has a string name, and it's very straightforward to write out the state transitions without having to deal with the encoding of states.

with m.FSM():
    with m.State("Initial"):
        m.d.comb += [
            inc.eq(0),
            din_sel.eq(Din.DIN),
            is_ltor.eq (0),
            self.load_start.eq(0),
            self.load_ring.eq(0),
        ]
        m.d.sync += [
            cnt.eq(0),
            double_step.eq(0),
        ]
        m.next = "Get command"

    with m.State("Get command"):
        m.d.comb += [
            self.ready.eq(1),
            self.plugboard_en.eq(0),
        ]

        with m.Switch(self.cmd):
            with m.Case(Cmd.RESET):
                # TODO: figure out how to reset
                m.next = "Initial"
            with m.Case(Cmd.LOAD_START):
                m.d.sync += cnt.eq(cnt+1)
                m.next = "Load start"
            with m.Case(Cmd.LOAD_RING):
                m.d.sync += cnt.eq(cnt+1)
                m.next = "Load ring"
            with m.Case(Cmd.SET_ROTORS):
                m.d.sync += cnt.eq(cnt+1)
                m.next = "Set rotors"
            with m.Case(Cmd.LOAD_PLUG_ADDR):
                m.next = "Load plug addr"
            with m.Case(Cmd.LOAD_PLUG_DATA):
                m.next = "Load plug data"
            with m.Case(Cmd.SCRAMBLE):
                m.next = "Scramble"
            with m.Default():
                m.next = "Get command"
    ...

3   Testing

I'll add more in a separate article here.

4   Synthesis Results

Once synthesized into SKY130 with Yosys and place and routed, we end up with the following metrics:

Statistics
Metric Value
Utilization 82.9%
Cell count 1565
Dffs 67 dfxtp
Latches 130 dlxtp
Frequency 35MHz (target)
PD execution 7m 54s

5   Summary

That wraps up this series on implementing an Enigma cipher machine on Skywater 130nm CMOS process. I wasn't sure I'd be able to squeeze everything into the ~160um x 100um area budget, but working with Amaranth HDL made it faster to converge on a working solution. Once the chips are back (late 2025, early 2026?), I'll post another article on how the testing goes. I hope everyone reading this thinks about doing a TinyTapeout project; the people there have done an amazing job of making chip design accessible for almost anyone!

© Virantha Ekanayake. Built using Pelican. Modified svbhack theme, based on theme by Carey Metcalfe