一人一党党

一人一人の、一人一人による、一人一人のための政治制度を!

Variable length instruction ISA has almost no benefit

When you build a micro-processor which achives good code density, you tend to choose a ISA(instruction set architecture) which instruction length is short. Likewise, you want to make each instruction have same length which demands less amount of circuit. Small circuit also allows high clock cycle. Hitachi's SuperH RISC engine chose 16bit fixed length and ISA, then showed high code density while keep small and fast circuit of RISC.
J Cores | Open Processor Foundation
http://0pf.org/j-core.html
But such a short instruction set pays a price. To shrink length, each instruction of SuperH holds only 2 operand fields. The register specified one of the 2 fields is accumulator, used both input and output, and loses value for input. So, extra instruction is required to do same job which a 3-operand instruction (like MIPS ISA) does.

For example, to calcurate "$1 + $2 = $3":

2-operand ISA like SuperH:
mov $1 $3 (16bit length)
add $2 $3 (16bit length)

3-operand ISA like MIPS:
add $1 $2 $3 (32bit length)

Thus, 2-operand one needs one more instruction than 3-operand one, eating one more clock cycles. Since increasing instruction count offsets high clock cycle, RISC-V ISA uses instruction length 32bit and 16bit mixed.

 But, if you decide to implement a instruction decoder which deals multiple instruction length, some 2-operand instruction sequence as above example can be treated as one 3-operand instruction. Where you want a 3-operand instruction, use "mov A C; calculate B C" sequence, then the decoder treats this two 16bit = one 32bit pattern as a "calculate A B C" and the 3-operand RISC engine executes this two 16bit pattern at one clock cycle. In this example, there is almost no benefit for adding new 32bit 3-operand instructions to 16bit 2-operand ISA.

Thus, I conclude fixed short length instruction ISA can be executed as fast as variable length instruction ISA. I discussed 2-operand case above, but if you are interested in more code density and simplicity, "0-operand" stack machine ISA is left.
ZPU - the worlds smallest 32 bit CPU with GCC toolchain :: Overview :: OpenCores
http://opencores.org/project,zpu

-- postscript --
Even though there is a research applying Tomasulo out-of-order execution:
BOOST: Berkeley's Out-of-Order Stack Thingy
http://www.researchgate.net/publication/228556746_BOOST_Berkeley's_Out-of-Order_Stack_Thingy
in-order single issue micro-processor has interesting niche - Soft_microprocessor on FPGA. Both Out-of-Order and super-scalar require multi-ports RAM, but building a "fast" multi-ports RAM on FPGA is a hard work.

-- 2017-02-07 more postscript --
According to the paper of BOOST, applying this technique to stack ISA is called "instruction folding" which is already implemented on some processor such as "picoJava-II".