Numeric Systems, Units of Data, and Little-Endian Addressing

Lazy Bastard
Hello; this is Lazy Bastard, your friendly, neighborhood...um, bastard.

Having not hacked for some time, I recently delved into PS2 hacking (using Artemis), and to my surprise, I'd forgotten the details of little-endian byte ordering. Lucky for me, Parasyte and Viper187 happened to be in chat to jog my memory, so I didn't have to go Googling around through C and ASM guides to glean that specific info.

Anyway, this lapse reminded me that there seems to be no decent guide to such addressing that's geared toward the hacking scene, and also that there's no simple guide to data types and sizes for beginning hackers. Thus, I've thrown together an introductory text concerning these. This will be especially helpful to those looking through memory dumps, browsing a memory editor, hex editing something, or making codes that aren't just the default (maximum) addressed size of a particular system. Much of the information presented will also be quite useful in other endeavors, such as programming. If you're already well-versed in any particular subject, feel free to skip it.



--Numeric Systems

In order to properly understand data types and sizes (specifically in the context of video game hacking), you will need to understand the different numeric systems you will encounter. There is an infinite number of numeric systems (literally), but for the time being, we will concern ourselves with only three: Binary (base 2), Decimal (base 10), and Hexadecimal (base 16). Forgive me for going out of order (numerically and alphabetically, heh), and starting with Decimal, but it's important to establish a base (an accidental pun, mind you).


-Decimal

Our 'normal' numeric system is known as Decimal. It uses ten numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0. You use it every day, and you likely never think twice about it. But Decimal is merely one way to represent numbers. Not only is there an infinite number of numeric systems, but any number represented in any system can be represented in any other system (though it will likely look quite different).

Before moving on to the other numeric systems, I will explain briefly the structure of Decimal, and how numbers are represented in it (and I will copy and paste liberally from my old Hexadecimal guide):

The decimal number 1234 represents not only a finite number of theoretical objects (discounting a host of other possibilities of its representation), but it also represents an equation. It is as follows (^ represents exponentiation, as in "Squared, to the third power", etc, and remember that any number to the 0th power is 1):

1234 =

(1 x 10^3) + (2 x 10^2) + (3 x 10^1) + (4 x 10^0) =

(1000) + (200) + (30) + (4) = 1234


Here's another example:

567 =

(5 x 10^2) + (6 x 10^1) + (7 x 10^0) =

(500) + (60) + (7) = 567

The reason that the number is multiplied by an increasing (from right to left, that is) power of ten is that decimal is ten-base. Were it sixteen-base, as hexadecimal is, you would multiply by an increasing power of sixteen.


-Binary

Binary is a numeric system that uses only two numbers: 1 and 0. These two numbers can form infinitely large numeric values, but they will never consist of anything but 1's and 0's. Below is a small list of Binary numbers and their Decimal equivalents (Binary on the left, Decimal on the right).

0 - 0
1 - 1
10 - 2
11 - 3
100 - 4
101 - 5
110 - 6
111 - 7
1000 - 8
1001 - 9
1010 -10
1011 - 11
1100 - 12
1101 - 13
1110 - 14
1111 -15
10000 - 16

...and so on. A simple way to determine a Binary number's Decimal equivalent is to use the same system you just read concerning Decimal, but substituting 2 for 10 (Binary is base 2, whereas Decimal is base 10). For example:

1111=

(1 x 2^3) + (1 x 2^2) + (1 x 2^1) + (1 x 2^0) =

(8) + (4) + (2) + (1) = 15

You can do this with any numeric system. It's quite handy.


-Hexadecimal

Hexadecimal is a numeric system that uses sixteen numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, and 0. Below is a small list of Hexadecimal numbers and their Decimal equivalents (Hexadecimal on the left, Decimal on the right).

0=0
1=1
2=2
3=3
4=4
5=5
6=6
7=7
8=8
9=9
A=10
B=11
C=12
D=13
E=14
F=15
10=16

...and so on. As noted earlier, the conversion system you've already learned will suffice as a method of determining the equivalent Decimal number.



--Units of Data

Next, let's go through the basic data types and sizes. Those who are absolutely sure they're familiar with the material here can skim through and move on to the next section.


-Bit

A bit is the smallest piece of information on a computer (that video game console you're hacking is a computer, built with the specific purpose of running games). A bit is either a 1 or a 0...on or off, up or down, etc (the word 'bit' comes from BInary digiT). Most of the time, you can't address a bit directly, although you can sometimes conduct something called a 'boolean operation' to modify a bit without affecting the surrounding bits (though we'll save that for another guide). 


-Nybble

Four bits make a nybble. Thus, there are sixteen possible numbers a nybble can represent: 0 through 1111, or 0 through 15, or 0 through F, depending on the numeric system you're working in.


-Byte

A byte consists of eight bits, or two nybbles. Thus, there are 256 possible numbers a byte can represent: 0 through 11111111, or 0 through 255, or 0 through FF, depending on the numeric system you're working in. Two bytes consist of sixteen bits, thus there are 65,536 possible numbers two bytes can represent: 0 through 1111111111111111, or 0 through 65,535, or 0 through FFFF, depending on the numeric system you're working in. Four bytes consist of thirty-two bits, thus there are 4,294,967,295 possible numbers four bytes can represent: 0 through 11111111111111111111111111111111, or 0 through 4,294,967,295, or 0 through FFFFFFFF, depending on the numeric system you're working in. The list goes on and on of course, but that should be enough for now.


-Word

You may hear the term 'word' from time to time, in reference to data. This is where things become variable. A word is simply the natively addressed unit of data on a particular system. As such, it can be one of several sizes, depending on the processor and memory architecture of the system. For example, for the PSX (Playstation), a word is 16-bits, or two bytes (a standard PSX code's value is 16 bits long). For the PS2, a word is 32-bits, or four bytes (a standard PS2 code's value is 32 bits long). There are also dwords (double words; two words long), qwords (quad words; four words long), and so on. Don't worry, when hacking a system, the word size will usually be apparent. We'll get back to this later.



--Little-Endian Addressing

Most systems you hack will store and reference data using something called 'little-endian addressing'. From an architectural and developmental viewpoint, there are important advantages to a system being little-endian, but as a hacker, it's just added complexity you're going to have to deal with.


-An Introduction to Endian-ness, Using Big-Endian Addressing 

The system you're used to handling numbers in is what would be referred to as 'big-endian'; that is, the big end comes first. For example, consider the number 1,434. 1,000 comes first, then 400, then 30, then 4. Some computers think that way, too.

Let's say you were hacking a code for one of these big-endian computers - a video game console, as it were. I'm going to give this imaginary video game console (we'll call it the Toshiba AwesomeConsole) a word size of two bytes, or sixteen bits. This will make the addressing scheme look similar to that of the PSX or N64 (both of which have sixteen-bit words) - however, do not be fooled! PSX and N64 are both little-endian, and this example will not apply to them.

Moving on, in an attempt to max out your Magical Shards in Super Role-Playing Game 5 for the AwesomeConsole, you've just hacked a 16-bit code for this big-endian system (most codes you hack for a given system will be in their native word size): 80040006 0806. This is great, because now you have 8 Magical Crystal Shards, which will allow you to blast your pathetic enemies to dust, but also bad, because now you have 6 Dark Orbs, which flags the game to skip most of the storyline with the assumption that you've already taken care of most of the quests.

Obviously, you'd like to have these as two separate codes, and since this is a big-endian system, the solution is fairly simple. For any given cheat/hacking system, there will be code types for handing different data sizes, which will generally be apparent in the first digit or two of the address of a code. In this case, we'll pretend (again, as in PSX and N64), that the '80' code type represents the native word size, or 16 bits, and that the '30' code type represents a byte, or 8 bits. With this information, let's take a closer look at the code you've hacked: 80040006 0806. First, you'll probably notice the '80' code type, which means it's addressing one word, in this case 16 bits. Second, since we've noticed two separate and obvious effects of this code, namely that your Magical Crystal Shards are now at 8, and your Dark Orbs are now at 6, and since the numbers '08' and '06' just happen to be right there in the 16-bit value, we can make the assumption that these two 8-bit values represent these two item quantities. We will need to address two 8-bit values, so we will be using the '30' code type, and since this system is big-endian, our resulting codes will be:

30040006 0008 - 8 Magical Crystal Shards
30040007 0006 - 6 Dark Orbs

You can now use either one without affecting the other. Note that I represented the 8-bit values as being 4 digits long. This is because most cheat/hacking systems will use standard value lengths that coincide with the native word size's maximum value (in other words, in a system with 16-bit native word size, a value will usually be represented in 4 digits - because the maximum 16-bit value is FFFF - even if the code's value is only 8 bits long). Rest assured, however, that in such an 8-bit code, the value cannot exceed 00FF (if it does, the system will either ignore the first two digits of the value, ignore the code entirely, or crash). Also note that you could not have referenced the address 80040007 directly with a 16-bit code; you could only have addressed all 16 bits starting at 80040006 and ending at 80040008. This is because of word alignment, which will not allow you to reference addresses that do not concur with the neat little block sizes of data that you're working with. Essentially, for 16-bit codes, there is only one addressable location in memory for every 16 bits, starting at 00000000 (00000002, 00000004, 00000006, etc), and the same applies to all data sizes, of course with different intervals (for example, 8-bit codes have one addressable location in memory for every 8 bits: 00000000, 00000001, 00000002, etc).


-Back to the Matter at Hand

Now that you understand the big-endian system of addressing, we can take a look at the little-endian system, which you will be dealing with a lot more. Worry not; all of the same principles still apply, except for one: values are read from the little end first. I know, this may sound strange, but bear with me.

Let's say you've just picked up the newest game console, the Intel AmazingThing. This system happens to be little-endian, but everything else is about the same as the AwesomeConsole (16-bit word size, same code types for the cheat/hacking system, etc). You start hacking your only game for the AmazingThing, Tennis VS. Bass Fishing 3. You get to the bonus round, in which your best tennis player faces off against the computer's best bass fisher in an all-out, armed brawl, and you manage to hack an Infinite Tennis Balls code for your guy. Unfortunately, by some stroke of luck or strange programming, the same 16-bit value that controls the number of tennis balls you have also controls the number of health bars the other guy has. Well, you having infinite weapons doesn't matter if the other guy never runs out of health, so you'll need to break this 16-bit code into two 8-bit codes.

The code you've hacked is 80108994 0F03. You've noticed that this gives you 15 tennis balls, and gives the bass fisher 3 health bars, so it's pretty easy to isolate what byte, or 8-bit value, controls what. In a big-endian system, the solution would be 30108994 000F for Have 15 Tennis Balls, and 30108995 0003 for Bass Fisher Has 3 Health Bars (maybe you could change the value to 0001, and make a 1-Hit Kills Bass Fisher code; who knows). However, this is a little-endian system, so the solution is actually:

30108994 0003 - Bass Fisher Has 3 Health Bars
30108995 000F - Have 15 Tennis Balls

As you can see, the 8-bit (byte) values have been swapped.

I will not bore you with further tales of imaginary game consoles, but I will leave you with one more important piece of information. The swap occurs at the byte level. It is not merely a "split the value in half, and swap the halves" solution. Let's say you're hacking a 32-bit little-endian system (such as the PS2) - meaning the word size is 32 bits - and come up with the following code:

20453620 09237642 (code type '2' will represent 32 bits, or 4 bytes)

...and determine that this code gives you 09 HP, 35 MP (23 hex is 35 decimal), 118 Gold (76 hex is 118 decimal), and 66 Experience (42 hex is 66 decimal). When you break it down into byte-size portions (each two-digit chunk of the 32-bit, or 4-byte value is 8 bits, or a byte, long), you will end up with the following (code type '0' will represent 8 bits, or one byte):

00453620 00000042 - 66 Experience
00453621 00000076 - 118 Gold
00453622 00000023 - 35 MP
00453623 00000009 - 09 HP

As you can see, the byte order was entirely reversed. The same would apply to a system with 64-bit words, and so on.

Well, I think I've covered enough in this guide. Let me know if anything's unclear, and I'll make updates where they would benefit the reader. Have a good one.



This text was brought to you by GSHI.org, unless someone else gave it to you, in which case it was only written by someone at GSHI.org. Heheh.