o p t i m i z e

Question

in this thread we're making yaxpeax-x86 faster. i have a super bad microbenchmark that lives in the repo, it's my spot check for "did i make it obviously worse". it involves disassembling like 500 bytes of instructions or something, so the whole disassembler and dataset pretty quickly end up in cache. it gets really good ipc numbers and makes me feel good about acing synthetic workloads :) perf looks like this, today: Samples: 862K of event 'cycles', Event count (approx.): 36077639848 Overhead  Command          Shared Object           Symbol   98.06%  bench-bda9664bd  bench-bda9664bdc9d6027  [.] bench::do_decode_swathe    1.78%  bench-bda9664bd  bench-bda9664bdc9d6027  [.] yaxpeax_x86::long_mode::read_E which is to say, the whole decoder gets inlined into the `do_decode_swathe` function, with `read_E` being left out. it's a small part of the overall time, but lets give it a look - it's much smaller than trying to eyeball the entire disassembler.. first off, the source:     #[inline]     fn width_to_gp_reg_bank(width: u8, rex: bool) -> RegisterBank {         match width {             1 => return if rex { RegisterBank::rB } else { RegisterBank::B },             2 => return RegisterBank::W,             4 => return RegisterBank::D,             8 => return RegisterBank::Q,             _ => unsafe { unreachable_unchecked(); }         }     } now from perf this thing is almost 600 bytes of instructions. check out this annotated trace: Samples: 862K of event 'cycles', 100000 Hz, Event count (approx.): 36077639848 Percent│       │     Disassembly of section .text:       │       │      0000000000020d90 :       │      _ZN11yaxpeax_x869long_mode6read_E17h102120264f0d061fE():   0.49 │       push   %rbp   0.55 │       push   %r15   1.28 │       push   %r14   5.54 │       push   %rbx   0.59 │       mov    $0x4,%al   2.33 │       add    $0xff,%cl   0.01 │       movzbl %cl,%ecx   0.56 │       lea    some_misleading_symbol+0x3bf8,%rbp        │       movslq 0x0(%rbp,%rcx,4),%rcx  31.39 │       add    %rbp,%rcx

Answer

You don't need that toxic energy, good on you for getting rid of that and standing up for yourself and your instruction decoder.

« Back to the Da Slop Pit Forum

o p t i m i z e

Posted by iximeow

1 Reply

Reply by yuu