Lawdy, even that subtitle was long. If you decided to skip straight to bug bounty and not bother with nerd things like computer science or networking like I did—well, it turns out you need to know these things.

This article will be a long one, but compared to getting a college degree, it’s nothing. Also, it’s free (you’re welcome). So, time to dust the cobwebs off the portions of your brain responsible for studying. First, we’ll cover the background knowledge you need to tackle the topics in this article.

Then, and only then—in the second part of this series—will we cover tricks to sneak your payloads past the cyberguards.

(Source: Me)

Binary

This year, Nvidia unveiled its Blackwell B200, a single microchip that has 208 billion transistors. Yes, billion. Still not in awe? What if I told you it is roughly the size of a Toaster Strudel®? How many years do you think are equivalent to 208 million seconds? 6.58 years. Okay, now how many years is 208 billion seconds? 6,577.68 years. Neat. But why is this such a big deal?

Transistors in a computer’s microchip act as switches to create binary code. Electronic devices, such as computers, do not actually understand English, German, or Russian. What they do understand is binary, which consists of two numbers: 0 and 1. Just like a light switch, a transistor can be set to either on (1) or off (0) by controlling the flow of electricity.

So, the more light switches, the more processing power to do cool things.

The numbers you are most familiar with belong to the decimal base-10 system, which uses ten numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. In base-10, the position of a number in a multidigit number relates to its value, and the value increases by multiples of ten from right to left. Although you already knew all this, to visualize it, let’s use the decimal number 101:

1 0 1
100th place 10th place 1st place

Binary code is a base-2 system (since, again, binary consists of just 0s and 1s). You may be wondering how then a computer can understand numbers besides 0, 1, and combinations of 0s and 1s.

A bit, short for binary digit, is the smallest unit of data in computing. A bit can store either (you guessed it) a 0 or a 1. A single-digit binary number consists of a series of eight of these bits. This group of 8 bits is called a byte.

With 8 bits comprising one of two values, the number of possible unique combinations is 28, which amounts to 256. Let’s take our example decimal number of 101 from earlier and show how it is represented as a byte:

 

0 1 1 0 0 1 0 1
128 64 32 16 8 4 2 1

You may have noticed that the position of a bit increases by a multiple of two from right to left. If you add up all the position values that have their bit set to 1 (64 + 32 + 4 + 1), you get 101 again.

You may notice that if you add up all the position values, you only get a maximum number of 255 when 28 is 256. Don’t forget to include the value of 0.

How are numbers greater than 255 created then? By using multiple bytes of course. Using decimal number 256 as an example, here’s how it would be represented in bytes:

 

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
32,768 16,348 8,192 4,096 2,048 1,024 512 256 128 64 32 16 8 4 2 1

If you add all the position values for 2 bytes, you get a maximum number of 65,535. Again, don’t forget to include the value of 0.

Be aware that, in binary, leading zeros can be excluded when you want to express a number in its simplest form. For example, the decimal value of 3 in binary is 00000011. If you chose to express it without the leading zeros, it would just be 11. If you wanted to indicate that a 3-bit or 4-bit representation is being used, it would be 011 or 0011, respectively. But I will just write out all 8 bits throughout this paper because I am not LAZY.

IPv4

You may have heard of the term “IP address.” If you are unsure about what this refers to, keep reading. If you are confident you understand the concept, feel free to skip ahead.

Think of the Internet Protocol (IP) as the postal system of the Internet. Each device that is connected to the internet receives a unique, public IP address that is used to identify it. These addresses achieve the same purpose as yours and your friend’s mailing addresses for sending each other letters or packages. The difference here is that between devices, data is sent instead, and this data is in packets.

There are multiple versions of the IP, including IPv4 and IPv6. IPv4 is the older version of the protocol where addresses consist of 4 bytes. With 4 bytes, you would get 232 = 4,294,967,296.

IPv4 addresses are represented as decimal numbers in four sections, each of which is known as an octet. For example, what is known as the localhost (the IP address that refers to the device you are currently using) has an IPv4 address of 127.0.0.1. This address in binary is:

01111111.00000000.00000000.00000001

 

Byte 1 Byte 2 Byte 3 Byte 4
01111111 00000000 00000000 00000001

A computer network is a group of interconnected computers and devices that communicate with each other to share resources, such as data and files. With an internet connection, all the networks across the world create a global network. 

But wait, there are over eight billion people on Earth, with many of them owning multiple devices that can access the internet. Are 4,294,967,296 IP addresses enough? Actually, no.

IPv4 was established decades ago, before anyone could predict that the internet would be as ubiquitous as it is today. In 2019, the last address was handed out. This means that now, all IPv4 addresses are essentially recycled once they are released by their most recent holder and allocated to members on a waiting list.

So how then does your new smartphone or Internet of Things (IoT) device, like a home security camera that also dispenses dog treats, send and receive data packets online? Are you internet royalty who gets prioritized on the IP address waiting list? I’m sorry to break it to you, but no.

NAT

The introduction of the Network Address Translation (NAT) protocol saved the day. Actually, it had become widespread by 2004 because as the number of internet users kept growing, so did the concern that we would eventually run out of IPv4 addresses. NAT allows multiple devices on the same network to share a single public IP address, specifically the address of the device connected to the internet. For your home network, a public IP address is assigned to your router, and everything behind your router is considered to be your “local” network. A router is also referred to as a default gateway, as it is the gateway to the wonderful World Wide Web. These devices behind your router receive private IP addresses. The device you are reading this article on can actually have the same private IP address as many other devices across the world. But this doesn’t matter because those devices are behind their own routers as well.

The different local network private IP address ranges that are used include:

  • 10.0.0.0 to 10.255.255.255—This range is often used in large organizations that require a significant number of addresses due to the number of internal computers and devices they have.
  • 172.16.0.0 to 172.31.255.255—This range is also used in medium or large-sized networks but offers fewer addresses than the previously mentioned range.
  • 192.168.0.0 to 192.168.255.255—This range is commonly used in home networks.

Subnetting

Local networks can be considered a form of subnetting. Subnetting involves dividing a network into smaller subnetworks, referred to as subnets.

Let’s use a home network as an example since this is probably what you are most familiar with. While each octet holds the number range of 0 to 255, there are three private IP addresses a subnet reserves:

  1. 192.168.0.0—The very first address is known as the network address and identifies the subnet itself.
  2. 192.168.0.1—By default, this is the address of your router, though it can be configured to have a different last octet.
  3. 192,168.0.255—This is the broadcast address. When a device sends a data packet to this address, all devices on the same subnet receive the packet.

Besides these three addresses, all the others are available to be allocated to computers and devices on the same subnet. This process is automatically handled by the Dynamic Host Control Protocol (DHCP).

Again, the private IP address range of a home network is 192.168.0.0 to 192.168.255.255. This means you can have 256 subnets with 256 addresses each (minus the three previously mentioned). But what if you want more addresses on the same subnet? If you run the terminal command ifconfig in Linux/MacOS or ipconfig in Windows, you will see an address associated with your netmask or subnet mask depending on your operating system. I bet the mask is 255.255.255.0, which in binary is:

11111111.11111111.11111111.00000000

The octets of all 1s indicate that they are part of a network and subnet address and are not used to identify a device. So, if you need more device addresses on the same subnet for whatever reason, you can change the mask to essentially unlock more. For example:

11111111.11111111.11111110.00000000

In the above mask, another bit position is unlocked. So now, instead of 8 for a total of 256 addresses, it is 9 for a total of 512 addresses allocated to the subnet. You may have seen IP address ranges represented as 192.168.0.0/24. This is the Classless Inter-Domain Routing (CIDR) representation and indicates how many bits starting from the left-hand side of an address are part of a network/subnet address. So, with a mask of 255.255.255.0, the network/subnet address is 192.168.0, with the device portion of the address being .0 to .255 (again, minus the network, broadcast, and router addresses).

ASCII

Okay, now you understand how computers interpret decimal numbers using binary bits and bytes. But how do they interpret letters? Well, back in ye olden days, different computer manufacturers represented characters in their own way. This meant that different makes and models of computers were unable to communicate with each other.

This is where encoding comes in, as it is the process of converting one type of data into another, allowing for a standardized representation of characters.

Designed in the 1960s, the American Standard Code for Information Interchange (ASCII) is an encoding standard that assigns a unique number to 128 different characters. This set includes both printable and unprintable characters.

Printable characters are numbers, letters, and symbols.

The ones that cannot be printed are control characters, such as carriage return (CR) and line feed (LF), which are used to mark the end and beginning of a line of text. New page (aka form feed) was used for printing. Bell (BEL) made your computer beep. You get the idea.

Be aware that due to historical reasons, the order of character groupings skips ahead and is ugly.

ASCII Control Characters
Decimal Binary Character Name
0 00000000 NUL Null
1 00000001 SOH Start of heading
2 00000010 STX Start of text
3 00000011 ETX End of text
4 00000100 EOT End of transmission
5 00000101 ENQ Enquiry
6 00000110 ACK Acknowledge
7 00000111 BEL Bell
8 00001000 BS Backspace
9 00001001 HT Horizontal tab
10 00001010 LF New line
11 00001011 VT Vertical tab
12 00001100 FF New page
13 00001101 CR Carriage return
14 00001110 SO Shift out
15 00001111 SI Shift in
16 00010000 DLE Data link escape
17 00010001 DC1 Device control 1
18 00010010 DC2 Device control 2
19 00010011 DC3 Device control 3
20 00010100 DC4 Device control 4
21 00010101 NAK Negative acknowledgement
22 00010110 SYN Synchronous idle
23 00010111 ETB End of transmission block
24 00011000 CAN Cancel
25 00011001 EM End of medium
26 00011010 SUB Substitute
27 00011011 ESC Escape
28 00011100 FS File separator
29 00011101 GS Group separator
30 00011110 RS Record separator
31 00011111 US Unit separator
127 01111111 DEL Delete

The decimal and binary values of symbol characters are:

ASCII Symbol Characters
Decimal Binary Character
32 00100000 (Space)
33 00100001 !
34 00100010
35 00100011 #
36 00100100 $
37 00100101 %
38 00100110 &
39 00100111
40 00101000 (
41 00101001 )
42 00101010 *
43 00101011 +
44 00101100 ,
45 00101101
46 00101110 .
47 00101111 /
58 00111010 :
59 00111011 ;
60 00111100 <
61 00111101 =
62 00111110 >
63 00111111 ?
64 01000000 @
91 01011011 [
92 01011100 \
93 01011101 ]
94 01011110 ^
95 01011111 _
96 01100000 `
123 01111011 {
124 01111100 |
125 01111101 }
126 01111110 ~

To save myself from having to make another table for numbers and letters, note the following rules (you can convert them to binary if you want to practice—no I am not just being lazy…okay I am):

  • Digits 0 to 9 have a decimal range of 48 to 57.
  • Uppercase letters A to Z have a decimal range of 65 to 90.
  • Lowercase letters a to z have a decimal range of 97 to 122.

You may have noticed that only 7 bits are used. This caused a wave of disagreement about what characters should be assigned to the numbers 128 to 255. All encoding ambassadors around the world eventually agreed to not touch the ASCII table. But this 8th bit was used for different purposes since different languages use different characters. All these different character assignments for the free 128 characters made available by that last bit are known as code pages. If your computer interprets data using one code page while the data was encoded using another—well, we now have the interoperability issue again, don’t we?

Unicode

Officially released in 1991, Unicode sought to solve this character mismatch problem once and for all. It dreamed of being a universal character encoding system that can accommodate all characters from different languages. And it can—with space left over for more.

Again, since 1 byte only allows for 256 different characters, Unicode uses multiple bytes to solve this issue. The number of bytes used is easily identifiable (by their bit number) in the most commonly used Unicode encoding formats: UTF-8, UTF-16, and UTF-32.

Unicode uses code points to identify characters. These begin with U+ to let everyone know that Unicode is being used in a formal or documentation context. For example, the code point for the letter A is:

U+0041

While in programming, the \u prefix is used to represent Unicode characters as their literal value. For example, in JavaScript, \u0041 is interpreted as the letter A. So, when JavaScript comes across \u0041 in a string, it knows to convert that escape sequence (which is the term used for these special codes that represent a character) into an A.

But how are those last four characters determined?

Back to the bases: Hexadecimal

The prefix “hexa” means six. Combine that with “decimal” and it means sixteen. The hexadecimal base-16 system uses sixteen characters: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. Each character represents a decimal value, with 0 to 9 representing themselves and A to F representing 10 to 15, respectively. Each hexadecimal character can also be represented as 4 bits since each doesn’t utilize the bit positions of 5 to 8.

Hexadecimal Character Decimal Value Binary 4-bit Binary Representation
0 0 00000000 0000
1 1 00000001 0001
2 2 00000010 0010
3 3 00000011 0011
4 4 00000100 0100
5 5 00000101 0101
6 6 00000110 0110
7 7 00000111 0111
8 8 00001000 1000
9 9 00001001 1001
A 10 00001010 1010
B 11 00001011 1011
C 12 00001100 1100
D 13 00001101 1101
E 14 00001110 1110
F 15 00001111 1111

Keep in mind that hexadecimal encoding is not case-sensitive, meaning the six letters used can be either lowercase or uppercase. However, it is common to see them capitalized in Unicode, to make it pretty or whatever.

In programming languages, the 0x prefix is used to let everyone know that the hexadecimal format for a number is being used. For example, 0x41 will be interpreted as its decimal equivalent of 65. The escape sequence prefix used for hexadecimal is \x. So, \x41 will be interpreted as the letter A.

Hexadecimal is commonly used to represent a byte’s binary value in a compact way. For example:

Decimal Hexadecimal Binary byte Character Name
0 00 00000000 Null
9 09 00001001 Tab
10 0A 00001010 Line feed
13 0D 00001101 Carriage return
32 20 00100000 Space
34 22 00100010 Double quote
38 26 00100110 & Ampersand
39 27 00100111 Single quote
60 3C 00111100 < Lesser than
61 3D 00111101 = Equals
62 3E 00111110 > Greater than
65 41 01000001 A

If you’ve been bug hunting, you may recognize a lil’ sumthin’ sumthin’ by now. But be patient.

To convert a decimal number into its hexadecimal equivalent, you have to see how many times you can fit the number 16 in it and use the remainder. We will use the letter A as an example:

  1. The decimal value tied to the letter A is 65. Divide this number by 16.
  2. 65 / 16 = 4 with a remainder of 1.
  3. Now take the quotient of 4 and divide by 16 again.
  4. 4 / 16 = 0 with a remainder of 4.
  5. In hexadecimal, you take the remainder values and write them from the last to the first, going left to right, which results in 41.

For characters that only require a single byte, to determine the last four characters after the U+ in Unicode, you use this decimal-to-hexadecimal conversion and pad with leading zeros if necessary.

Back to the bases: Octal

The octal base-8 system uses the numerical digits 0 to 7.

The prefix of just 0 is used to let everyone know that a number is in octal format. For example, 075 represents the decimal number 61. The escape prefix is \0 or just \. For example, \061 would be interpreted as 1 and \101 would be interpreted as the letter A.

To convert a decimal number to its octal equivalent, you have to see how many times you can fit the number 8 in it and again use the remainder. Using the letter A as an example again:

  1. The decimal value tied to the letter A is 65. Divide this number by 8.
  2. 65 / 8 = 8 with a remainder of 1.
  3. Now take the quotient of 8 and divide by 8.
  4. 8 / 8 = 1 with a remainder of 0.
  5. Now take the quotient of 1 and divide by 8.
  6. 1 / 8 with a remainder of 1.
  7. As is the same with decimal to hexadecimal, you take the remainder values and write them from the last to the first, going left to right, which results in 101.

UTF-8

In UTF-16, there was yet another argument over its two variants, but we won’t get into that. For now, just know that UTF-8 solved it and that’s why it’s a hero and more popular.

Universal Transformation Format 8-bit (UTF-8) is the most commonly used Unicode encoding standard. This Unicode encoding format uses 1 byte for the first 128 characters of ASCII, but it can expand up to 32 bits if necessary. This is known as variable length encoding, which saves storage space, as most of the time, you are going to use the ASCII characters, which only require 7 bits.

If the number of bytes required to represent a character in UTF-8 is greater than one, there are different bit rules for the bytes:

Unicode Code Point Range Number of bytes 1st byte Starts with Following bytes Start with
U+0000 to U+007F 1
U+0080 to U+07FF 2 110xxxxx 10xxxxxx
U+0800 to U+FFFF 3 1110xxxx 10xxxxxx
U+10000 to U+10FFFF 4 11110xxx 10xxxxxx

As an example, let’s encode the symbol € in UTF-8 with a Unicode code point of U+20AC:

Hexadecimal Character 4-bit Binary Representation
2 0010
0 0000
A 1010
C 1100

 

  1. The code point U+20AC in binary is: 0010 0000 1010 1100.
  2. The code point is within the 3 byte range of U+0800 to U+FFFF.
  3. The first byte starts with 1110, so with the first 4 bits of the code point, it becomes 1110 0010, creating a full byte.
  4. The second byte starts with 10, so using the next 6 bits of the code point, it becomes 1000 0010, creating another full byte.
  5. Finally, the third byte also starts with 10, so using the remaining 6 bits of the code point, it becomes 1010 1100, creating another full byte.
  6. Combining these 3 bytes, the binary representation in UTF-8 encoding is:

1110 0010 1000 0010 1010 1100

  1. The hex character with a binary value of 1110 is E. The hex character with a binary value of 0010 is 2. So the first bit in UTF-8 is E2.
  2. 1000 0010 = 82
  3. 1010 1100 = AC
  4. So, the UTF-8 encoded representation of the Euro symbol is: 0xE2 0x82 0xAC

Snack time and then attack time

Holy moly that was a lot. Take a break now, eat, and hydrate. Come back and read the rest once you are refreshed.

Welcome back!

With your foundational understanding of how IP addresses are made and how different encodings are interpreted, you are now ready to learn how they can be used to make your payloads mo’ spicy.

Continue on to learn about dotless, hexadecimal, octal, and combinations of them to create IP addresses that may bypass protection mechanisms. Additionally, you will learn how to use encoding to possibly smuggle your injection attack payloads past security defenses.

 

Go on, brave soldier, to Part II.