This post is part 3 of the "Enigma Cipher Chip" series:
- Part 1 - An Enigma I Cipher Machine on Skywater 130nm Via TinyTapeout
- Part 2 - Version 1 of the Enigma in Amaranth HDL
- Part 3 - Final Version of the Enigma in Amaranth HDL
1 Optimized Enigma Implementation
The first part of this series described the background of the Enigma Cipher Machine with the following datapath:
In this post, I'll wrap up this series with the final, optimized version, building on the work from Part 2.
You can find the complete source, including tests, at my TT 10 repo
2 Version 2 - Pipelined datapath
My initial design modeled the Enigma shown in the figure above as a combinational datapath. The 5-bit input character flows through each block (plugboard, three rotors, reflector, three rotors, and back through the plugboard) and comes out transformed into the final encrypted 5-bit value.
For the improved version, I made each stage into a single-cycle pipeline stage. Therefore, instead of getting a scrambled character at the next clock edge, it takes, at minimum, 9 clock cycles to get through the 9 blocks. In reality, some of the blocks can take multiple cycles to generate outputs (for example, a Rotor may need to increment before scrambling, which takes multiple cycles).
2.1 V2 Rotor
Although we could have three separate physical pipeline stages for each Rotor "slot", this would mean three copies of the same logic (since each slot could potentially implement any one of the 5 Rotor wiring patterns). To keep the area lower, I decided to instead implement the three Rotors with a single logic block that can select which slot and rotor type it's currently computing as. In other words, the 3 Rotors are collapsed into a single block as shown below:
Here's how this generic Rotor block functions:
- en, a one-hot signal, is used to select which slot is active on any given cycle.
- The input character (muxed_din) can come from three different places based on din_sel:
- The plugboard (din)
- The previous rotor (dout)
- The reflector (reflector_in)
- The scrambled character output dout is registered.
- There are three is_at_turnover signals to denote when each slot is at its turnover point
- There are five control signals shared among the slots:
- load_start: Sets the enabled Rotor's counter value from din
- load_ring: Sets the enabled Rotor's ring setting value from din
- load_rotor_type: Sets the enabled Rotor's rotor type from din
- inc: Increment the enabled Rotor's counter
- ltor: If asserted, assume the data is flowing left-to-right (reverse path after reflector). Otherwise, assume data is flowing right-to-left
There is a state machine responsible for generating the control signals which we'll discuss later. Beyond that, the most involved piece of design is representing the wiring patterns of each Rotor, and allowing each slot to contain one of 5 different wiring patterns. Let's dive into the implementation to see how the expressiveness of design hardware in Python makes using Amaranth a much more tractable and less error-prone design exercise.
2.1.1 V2 Rotor initialization
First, let's look at the setup of the Rotor component (while omitting the ports for brevity, which are shown in the diagram above anyway).
class Rotor (wiring.Component):
# All the ports
# ....
N = 3 # Number of rotor slots
def __init__(self):
N = self.N
self.cnts = Array([Signal(5) for i in range(N)]) # Registers for each of the rotors (as well as start position)
self.ring_settings = Array([Signal(5) for i in range(N)])
self.generate_wirings()
super().__init__()
In the constructor, I'm defining two sets of registers:
- cnts[2:0]: A 5-bit counter for each of the 3 rotors to keep track of the current rotational position.
- ring_settings[2:0]: A 5-bit register for each of the 3 rotors to save the ring setting.
We don't need to save the start setting of each Rotor because we can just load that value into cnt at the start of each encryption session.
Now, let's look at the first part of the method used to generate the wiring patterns that will swizzle each input character.
1def generate_wirings(self):
2 from test.enigma import Enigma
3 _wirings = []
4 _turnovers = []
5
6 for rotor_type in Rotors:
7 # Gather up the wiring and turnover positions from the golden model
8 rotor_class = Enigma.ROTORS[rotor_type]
9 _wirings.append(rotor_class.wiring)
10 _turnovers.append(rotor_class.turnover)
11
12 wiring_list_right_to_left = []
13 wiring_list_left_to_right = []
14
15 for w in _wirings:
16 mapping = [ord(c)-65 for c in w]
17 mapping_ltor = [0]*26
18 for i, c in enumerate(w):
19 mapping_ltor[ord(c)-65] = i
20
21 wiring_list_right_to_left.append(Array(mapping)) # right_to_left mapping
22 wiring_list_left_to_right.append(Array(mapping_ltor))
23
24 self.Wiring_right_to_left = Array(wiring_list_right_to_left)
25 self.Wiring_left_to_right = Array(wiring_list_left_to_right)
26
27 self.turnovers = Array([Const(ord(t)-65) for t in _turnovers])
In lines 2-10 we are directly importing our golden model so there's no danger of the Rotor wiring patterns and turnover points getting out of sync. I've defined 5 different rotor types (from the WWII Enigma I Army kit), so _wirings and _turnovers will be Python lists of length 5, with each entry being a string.
Next, we convert the ASCII characters into 5-bit numbers (lines 12-18), and convert the Python lists into Amaranth Arrays that can be synthesized into table lookups in the hardware.
Now, self.Wiring_right_to_left is a two-dimensional Array. It's first-dimension can be indexed at runtime by a 3-bit number that selects the Rotor type. The result of that indexing will be the wiring pattern for that Rotor, which is an 1D Array. That wiring pattern can be indexed at runtime by a 5-bit number (input contact point) that yields the output contact point (swizzle). So, for example, self.Wiring_right_to_left[2][9] returns the output contact point if entering a Rotor of type Rotor III at the 10th input contact point.
self.Wiring_left_to_right is just the reverse mapping when going back from the reflector.
2.1.2 V2 Rotor Computation
Now let's look at the elaborate method that Amaranth uses to build the hardware. The first thing we'll do in lines 2-11 is create a Module for the Rotor, and then define some hardware signals/wires with associated bit-widths. Then, we'll define a register that keeps track of what Rotor type is in each slot: self.slot[0] will store the Rotor type in Slot 0 (right-most Rotor). We initialize this (unless the state machine loads this in explicitly) with Rotor I, Rotor II, and Rotor III in slots 0, 1, and 2 respectively.
Next, we instantiate the input mux (shown in rotor_schematic on the top-right) with a Switch statement.
1def elaborate(self, platform):
2 m = Module()
3 N = self.N
4
5 ring_setting = Signal(5) # The enabled Rotor's ring setting
6 cnt = Signal(5) # The enabled Rotor's position
7 cnt_ring_combined = Signal(5) # Computed position based on current cnt, and ring_setting
8 muxed_din = Signal(5) # Output of input data mux
9 right_ptr = Signal(5) # Index into wiring swizzle table
10 wiring_rtol = Signal(5) # The output character after swizzling going rtol
11 wiring_ltor = Signal(5) # The output character after swizzling if going ltor
12 cnt_eq_25 = Signal(1) # Whether the cnt is going to overflow
13
14 # Picking the rotor for each slot
15 self.slot = Array([Signal(3, init=0), Signal(3,init=1), Signal(3, init=2)]) # Choose from one of 8 rotor types for each of the 3 slots
16
17 # Mux on the inputs
18 with m.Switch(self.din_sel):
19 with m.Case(1):
20 m.d.comb += muxed_din.eq(self.dout)
21 with m.Case(2):
22 m.d.comb += muxed_din.eq(self.reflector_in)
23 with m.Default():
24 m.d.comb += muxed_din.eq(self.din)
Now, we come to the meat of the swizzle computation with dynamically selected Rotors and slots. Here's an interesting design pattern from Amaranth for Switch/Case statements: We can generate case statements using Python loops to reduce hugely repetitive code. Below, we're switching on a one-hot encoded en signal that picks the Rotor slot. The for-loop iterates over all the one-hot codes (and also has the associated conversion to binary rotor number) to generate each Case statement. Within each Case, we can use the for-loop variables to index into the wiring tables generated above like so:
self.Wiring_right_to_left[self.slot[rotor]][right_ptr]
With the three loop-generated Case statements below, we've taken care of selecting the proper Rotor type in each slot, and getting the associated swizzled output. The synthesis tool (Yosys) should be able to optimize all the lookup tables for the rotor wiring for us.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | with m.Switch(self.en): # Pick settings based on which Rotor Slot is selected by self.en (one-hot) for one_hot, rotor in [(0b001, 0), (0b010,1), (0b100, 2)]: with m.Case(one_hot): m.d.comb += [ ring_setting.eq(self.ring_settings[rotor]), cnt.eq(self.cnts[rotor]), cnt_eq_25.eq(cnt==25), # Pick the wiring swizzles based on which Rotor type is loaded into each slot wiring_rtol.eq(self.Wiring_right_to_left[self.slot[rotor]][right_ptr]), wiring_ltor.eq(self.Wiring_left_to_right[self.slot[rotor]][right_ptr]), ] with m.If(self.load_start): m.d.sync += self.cnts[rotor].eq(muxed_din) with m.Elif(self.load_ring): m.d.sync += self.ring_settings[rotor].eq(muxed_din) with m.Elif(self.load_rotor_type): m.d.sync += self.slot[rotor].eq(muxed_din[0:3]) with m.Elif(self.inc): with m.If(cnt_eq_25): m.d.sync += self.cnts[rotor].eq(0) with m.Else(): m.d.sync += self.cnts[rotor].eq(cnt+1) with m.Default(): m.d.comb += [ ring_setting.eq(0), cnt.eq(0) ] |
Now, let's take a look at how the right_ptr or index into the wiring swizzles are calculated: Recall from the previous part that our calculations looked like the following:
cnt_ring_combined = (cnt - ring_setting) % 26
contact_point = (input_char + cnt_ring_combined) % 26
exit_point = rotor_swizzle[contact_point]
result = (exit_point - cnt_ring_combined) % 26
We want to avoid the modulus operation, so we can define subtract and addition routines that check for overflow to implement the wrap-around:
# Return (a - b) % 26
def sub2_mod_26(out, a, b):
with m.If(b > a):
m.d.comb += out.eq( (26-(b-a))[0:5])
with m.Else():
m.d.comb += out.eq( (a-b)[0:5])
# Return (a + b) % 26
def add2_mod_26(sum_signal, a,b):
s = Signal(6)
m.d.comb += s.eq(a+b)
with m.If(s > 25):
m.d.comb += sum_signal.eq( (s-26)[0:5])
with m.Else():
m.d.comb += sum_signal.eq( s[0:5])
We make sure to slice the output signal to 5 bits, just to avoid any issues with Amaranth's implicit extension to 6-bits and subsequent comparison.
With these helper inner functions, we can then complete the elaborate method:
sub2_mod_26(cnt_ring_combined, cnt, ring_setting)
add2_mod_26(right_ptr, muxed_din, cnt_ring_combined)
# right to left traversal
swizz_minus_cnt_ring = Signal(5)
sub2_mod_26(swizz_minus_cnt_ring, wiring_rtol, cnt_ring_combined)
# left to right traversal
swizz_l_minus_cnt_ring = Signal(5)
sub2_mod_26(swizz_l_minus_cnt_ring, wiring_ltor, cnt_ring_combined)
m.d.sync += self.dout.eq(
Mux(self.ltor,
swizz_l_minus_cnt_ring,
swizz_minus_cnt_ring,))
return m
Putting everything together yields the following block diagram, with the input contact point being calculated and fed to all three slots, and then the exit contact point muxed from the correct slot and adjusted for the cnt and ring_setting before being output:
2.2 V2 Plugboard
I'm going to keep the V1 architecture, except for substituting latches for registers. In order to this, I'll be using inferred latches and defining the structure manually instead of relying on Amaranth's memory abstraction.
We'll be mapping to the sky130_fd_sc_hd_dlxtp standard-cell in the Skywater high-density library. This is a simple level sensitive latch with no clear/reset. Since Amaranth does not support latches directly, I used a verilog module to infer a latch and imported this into my Amaranth plugboad:
// latch.v used to infer the SKY130 d-latch
module d_latch (
input wire d,
input wire clk,
output reg q
);
always_latch begin
if (clk) begin
q = d;
end
end
endmodule
Then, I include the verilog-based d_latch into Amaranth by using the Instance class.
class Latch(wiring.Component):
d: In(1)
q: Out(1)
en: In(1)
def elaborate(self, platform):
return Instance("d_latch", i_d = self.d, i_clk = self.en, o_q = self.q,)
The latches are instantiated as an array, and each bit is muxed out based on the address (either the left-to-right or right-to-left input) during the scrambling phase. For loading, we first load the write address register (using wr_data and wr_addr_en). This register value will be decoded to one of 26 wordlines that will be enabled if wr_data_en is asserted. Next, we can put the actual write data on wr_data and pulse wr_data_en, making sure the data stays stable for a margin after the enable deassertion. The next section describes the FSM control that ensures this margin.
There is also a bypass path for the input data, when we want to load the Rotor values and need to pass them through unchanged.
2.3 V2 Control Finite-State-Machine (FSM)
Our controller state machine diagram is shown below. This is mostly a Moore-type machine with some Mealy-type states to keep the number of states manageable.
The state machine is designed to take a 3-bit command and 5-bit data. Each command can take a variable number of cycles to execute, and it will assert the external ready signal when it's ready to process the next command.
Encoding | Command | Data | Description |
---|---|---|---|
000 | NOP | N/a | Do nothing |
001 | LOAD_START | Setting 0-25 (A-Z) | Set the start position of a rotor. Do this three times in succession to set each of the three rotors (right to left) |
010 | LOAD_RING | Setting 0-25 (A-Z) | Set the ring setting of a rotor. Do this three times in succession to set each of the three rotors (right to left) |
011 | RESET | N/a | Go back to the initial state |
100 | SCRAMBLE | Input char 0-24 (A-Z) | Run a letter through the rotor. The Ready signal will be asserted when the scrambled character is output |
101 | LOAD_PLUG_ADDR | Src 0-25 (A-Z) | Set an internal register to where the start of the plug should go. This command should be followed by LOAD_PLUG_DATA to set the destination |
110 | LOAD_PLUG_DATA | Dst 0-25 (A-Z) | Set the other end of the plug. Note that this connection is unidirectional, so if you want A,B connected, then you need to do two sequences of these commands to first set A->B and then B->A |
111 | SET_ROTORS | Rotor 0-4 | Pick the Rotor type for each slot where Rotor I=0, Rotor II=1, ... Rotor V=4. Do this three times in succession to pick each of the rotors (right to left). Default is Rotor I, II, III from right to left, where Rotor I is closest to the plugboard |
Please see the FSM command docs for the exact set of commands and syntax available.
Most of the FSM is self-explanatory. The only tricky part is the way the double-step is implemented; we use a state variable double_step to note when the next character should double step Rotor 1 and step Rotor 2.
Amaranth makes it very easy to take such a diagram and generate FSMs. Each state has a string name, and it's very straightforward to write out the state transitions without having to deal with the encoding of states.
with m.FSM():
with m.State("Initial"):
m.d.comb += [
inc.eq(0),
din_sel.eq(Din.DIN),
is_ltor.eq (0),
self.load_start.eq(0),
self.load_ring.eq(0),
]
m.d.sync += [
cnt.eq(0),
double_step.eq(0),
]
m.next = "Get command"
with m.State("Get command"):
m.d.comb += [
self.ready.eq(1),
self.plugboard_en.eq(0),
]
with m.Switch(self.cmd):
with m.Case(Cmd.RESET):
# TODO: figure out how to reset
m.next = "Initial"
with m.Case(Cmd.LOAD_START):
m.d.sync += cnt.eq(cnt+1)
m.next = "Load start"
with m.Case(Cmd.LOAD_RING):
m.d.sync += cnt.eq(cnt+1)
m.next = "Load ring"
with m.Case(Cmd.SET_ROTORS):
m.d.sync += cnt.eq(cnt+1)
m.next = "Set rotors"
with m.Case(Cmd.LOAD_PLUG_ADDR):
m.next = "Load plug addr"
with m.Case(Cmd.LOAD_PLUG_DATA):
m.next = "Load plug data"
with m.Case(Cmd.SCRAMBLE):
m.next = "Scramble"
with m.Default():
m.next = "Get command"
...
3 Testing
I'll add more in a separate article here.
4 Synthesis Results
Once synthesized into SKY130 with Yosys and place and routed, we end up with the following metrics:
Metric | Value |
---|---|
Utilization | 82.9% |
Cell count | 1565 |
Dffs | 67 dfxtp |
Latches | 130 dlxtp |
Frequency | 35MHz (target) |
PD execution | 7m 54s |
5 Summary
That wraps up this series on implementing an Enigma cipher machine on Skywater 130nm CMOS process. I wasn't sure I'd be able to squeeze everything into the ~160um x 100um area budget, but working with Amaranth HDL made it faster to converge on a working solution. Once the chips are back (late 2025, early 2026?), I'll post another article on how the testing goes. I hope everyone reading this thinks about doing a TinyTapeout project; the people there have done an amazing job of making chip design accessible for almost anyone!