Difference between revisions of "Number and Data Encoding Formats"
(New page: ==Modulus== The modulus instruction (represented by % in Cstyle languages and MOD in other places) is a fundamental operation that is not used very often outside of computer science and ...) 

Line 1:  Line 1:  
+  {{Template:EnHacklopedia Header}}  
+  
==Modulus==  ==Modulus==  
Line 218:  Line 220:  
{{Template: EnHacklopedia_Legal}}  {{Template: EnHacklopedia_Legal}}  
[[Category: EnHacklopedia]]  [[Category: EnHacklopedia]]  
+  '''Bold text''' 
Revision as of 21:37, 2 December 2008
EnHacklopedia >> {{ #ifeq: Number and Data Encoding Formats  EnHacklopedia  Index  Number and Data Encoding Formats }} 

Contents
Modulus
The modulus instruction (represented by % in Cstyle languages and MOD in other places) is a fundamental operation that is not used very often outside of computer science and advance math. All it does is returned the remainder of a division. For example, 3 % 2 = 1, while 3 % 3 = 0. Modulus is often used to determine divisibility, as if one number modulus a second is zero, then the first number is evenly divisible by the second. The modulus is also used in hashing algorithms.
Decimal
The decimal method of representing numbers is the most common method used in mathematics. 10 digits (0  9) exist each of which occupies one decimal place. Decimal is also known as base10, due to the fact that is both consists of 10 different digits and because each decimal place has its value based on a power of ten. In a multiple digit number, each successive digit to the left has an increased value by a factor of ten, while each successive digit to the right has a decreased value by a factor of ten. For instance, the number 873 is equivalent to 8 * 10^{2} + 7 * 10^{1} + 3 * 10. This is an important concept as all other basen representations of numbers work in a similar fashion. Decimal numbers sometimes have a trailing "d" on them to make the base being used explicit. In this document, if there are no other base markers (trailing "b" or preceding "0x"), then the number is in base10.
Binary
At the lowest level, a computer is just a series of switches with two settings  high voltage and low voltage. By convention, the number 1 represents the high (or on) setting and 0 represents the low (or off) setting. Just as decimal places are based on powers of ten, binary (or base2) places are based on powers of two. For instance, the number 110 is equal to 1 * 10^{2} + 1 * 10^{1} + 0 * 10 in decimal and 1 * 2^{2} + 1 * 2^{1} + 0 * 2 in binary. While the binary representation does allow one to easily perform bitwise operations, it has a drawback. The length of the representation becomes massive quite quickly. A three digit decimal number requires a minimum of seven binary digits and a four digit decimal number requires a minimum of ten binary digits. By the number one million, twenty binary digits are required. Binary numbers are represented by a trailing b: 110 is a decimal number; 110b is binary. 1 binary digit is referred to as a "bit".
Hexadecimal
Hexadecimal (also called base16 or hex) is used as a substitute for binary. Hexadecimal has many of the advantages that binary has, but removes binary's largest disadvantage: the massive size. Each hexadecimal digit takes the value of one of 16 different values, 015. Values 10 and up are represented as the letters A through F. Because each hexadecimal place is based on a power of 16 and because 16 is equal to 2^{4}, hexadecimal numbers are four times shorter than their binary equivalents. Hexadecimal numbers are represented by a preceding "0x" or "$" or a succeeding "h"; For instance, 10d is also 0xA, $A, or Ah. In this document, all hexadecimal numbers are represented with the preceding "0x". 1 hexadecimal digit is known as a nybble or halfbyte, while 2 digits is called a byte (8 bits). Larger units are left somewhat ambiguous. A "word" is used to referred to the fundamental data type of a microprocessor. However, this can be 16, 32, or 64 bits, as well as some rather odd sizes on archaic processors, such as 39 bits. For the purposes of this document, a word is 32 bits with a halfword (or short) being 16 and a doubleword (or dword) being 64. A quadword (or qword) will be 128 bits.
Conversions
Converting between bases is not that difficult a task, but can require a bit of practice. The easiest conversions to do are being base16 and base2. For a base16 number, all you have to do is convert each digit into its corresponding binary form and keep the order intact. The representations are as follows:
0x0  0000b
0x1  0001b
0x2  0010b
0x3  0011b
0x4  0100b
0x5  0101b
0x6  0110b
0x7  0111b
0x8  1000b
0x9  1001b
0xA  1010b
0xB  1011b
0xC  1100b
0xD  1101b
0xE  1110b
0xF  1111b
For instance, to convert 0xDEA2 to binary, you would expand "D" to "1101", "E" to "1110", "A" to "1010", and "2" to "0010", giving you 1101111010100010b.
Converting binary to hexadecimal is similar. Each four bit number is converted into its hexadecimal version. While one can start working from the right when converting hexadecimal to binary, it's important to start on the left when doing the opposite, as starting from the right will give a wrong result unless the binary number has a number of bits that is a multiple of four. For this example, the number 1111010010b will be used. The lowest four bits are 0010b which corresponds to 0x2. Then, 1101b which is 0xD. Finally 0011b, which is 0x3. The base16 representation is 0x3D2.
Converting binary and hexadecimal into decimal is easy as well, but a much different task. It is good practice to know the first few powers of 2 (1, 2, 4, 8, 16) and 16 (1, 16, 256, 4096, 65536). The socalled one's place in the number has a value of 1 (2 or 16) and as one moves to the left, the value of the place increases by a factor of 2 or 16 respectively. The convert these values to decimal, multiply the value of the digit in a specific binary or hexadecimal place by that place's value. Do the same for each digit, then add all the products up.
Sample binary to decimal conversion: 110101011101b = 1 * 2 + 0 * 2^{1} + 1 * 2^{2} + 1 * 2^{3} + 1 * 2^{4} + 0 * 2^{5} + 1 * 2^{6} + 0 * 2^{7} + 1 * 2^{8} + 0 * 2^{9} + 1 * 2^{11} + 1 * 2^{12} = 1 + 4 + 8 + 16 + 64 + 256 + 1024 + 2048 = 3,421.
Sample hexadecimal to decimal conversion: 0xBEEF = 0xF * 16 + 0xE * 16^{1} + 0xE * 16^{2} + 0xB * 16^{3} = 15 * 1 + 14 * 16 + 14 * 256 + 11 * 4,096 = 15 + 224 + 3,584 + 45,056 = 48,879.
Converting numbers from decimal to binary or hexadecimal is slightly more problematic. The best way to do this is to find the largest power of 2 (or 16) that is not greater than the decimal number. Divide the number by that power and the result is the highest digit in the converted number. Divide the power by 2 (or 16) and repeat the process with that power and the remainder of the division.
Sample decimal to binary conversion: 132. 128 is 2^{7}. 132 / 128 gives 1 with a remainder of 4. Dividing 128 by 2 gives 64. 4 / 64 is 0 with a remainder of 4. Dividing 64 by 2 gives 32. 4 / 32 is 0 with a remainder of 4. Dividing 32 by 2 gives 16. 4 / 16 is 0 with a remainder of 4. Dividing 16 by 2 gives 8. 4 / 8 is 0 with a remainder of 4. Dividing 8 by 2 gives 4. 4 / 4 is 1 with a remainder of 0. At this point we can stop the math. However, we still need to 0fill the remaining bits. The 2'splace and 1's place still haven't been taken care of, which means two more zeros need to be appended. Our final result is 10000100b.
Sample decimal to hexadecimal conversion: 327. 256 is 16^{2} 327 / 256 is 1 with a remainder of 71. 256 / 16 is 16. 71 / 16 is 4 with a remainder of 7. 16 / 16 is 1. 7 / 1 is 7. The converted number is 0x147.
AND
One of the fundamental logical and bitwise operations is known as the AND. AND takes in two inputs and produces one output. At its most fundamental level true is returned if both inputs are true and false otherwise. Bitwise AND (represented by &) ANDs each individual bit of a number together and the corresponding bit of the output is the result of ANDing the two input bits. For instance, take 11011111b & 10101101b. Bit n of the output is equal to Bit n of input 1 & Bit n of input 2. Bit 7 of the output will be 1 &s 1, or 1. Bit 6 will be 1 & 0 or 0. Bit 5 will be 0 & 1 or 0. Bit 4 will be 1 & 0 or 0. Bit 3 will be 1 & 1 or 1. Bit 2 will be 1 & 1 or 1. Bit 1 will be 1 & 0 or 0. And bit 0 will be 1 & 1 or 1. The final result is 10001101.
OR
OR (represented by ) is another fundamental bitwise operation. On the onebit level, OR returns 1 if either input is 1 and 0 if both inputs are 0. For multiple bit numbers, the same bitbybit process used to AND numbers and be used to OR them. Example: 10011001b  11001010b = 11011011b.
XOR
XOR (represented by ^) is a slightly more advanced bitwise operation. It is similar to OR and takes two inputs. However on the onebit level, the output is 1 when either input is 1 but not both. If both inputs are 0 or 1, the output is 0. Example: 10011001b ^ 11001010b = 01010011b.
NOT
NOT (represented by ~) is a fundamental bitwise operation. However, unlike AND, OR, and XOR, it only takes in one input. NOT inverts a bit. That is, if the input is 0, the output is 1, and if the input is 1, the output is 0. When performing a NOT, it is important to keep in mind implicit zeros. ~1101b will not be 10b. It will be 11110010b, 1111111111110010b, or 11111111111111111111111111110010b depending on whether the data size is 8, 16, or 32bits.
Shifts
A shift does what it implies: it shifts data to the left or to the right; that is, each bit is moved one place over per shift. There are left shifts and right shifts and two varieties of each.
The logical shift (represented by << for left and >> for right) shifts every bit to the left or the right a specified amount. Vacant bit positions are filled with zeros while the high bits (for left shift; low for right) are discarded. For example: 01001101b << 2 = 00110100b.
The arithmetic shift is similar to the logical shift. In fact, the left arithmetic shift is identical to the left logical shift. The difference is what occurs in the right shift. An arithmetic right shift performs the corresponding right logical shift first. Then, the <acronym title="Most Signifigant Bit">MSB</acronym> is replaced with what it originally was prior to the shift. Because there is no difference between left logical and arithmetic shifts, << is used to represent both. The right arithmetic shift is represented by >>>. Example: 10111000 >>> 3 = 10010111.
Rotate
A rotate is identical to a shift except that instead of the edge bit being thrown away, it is moved to the other end of the number. On a left rotate, the MSB becomes the LSB and on a right rotate the LSB becomes the MSB.
Two's Complement
A major problem in the early years of computer science was how to represent negative numbers. After a lot of other formats were tried, the method eventually settled upon was Two's Complement. To turn a negative number into its two's complement representation, you invert the binary representation of that number's positive representation and then add 1. For instance, (assuming an 8bit data size), 12 would be calculated as follows:
12 = 00001100
~12 = 11110011
12 = ~12 + 1 = 11110100
Determining what a particular binary representation means is not difficult. If the highest bit is 0, the number is positive and can be calculated as normal. If the highest is 1, then the number is negative. To determine the negative number's magnitude, subtract one and then invert the number. Then convert to decimal like normal. The only exception is when the sign bit is set and every other bit is not. To determine the value of this representation, set the <acronym title="Least Signifigant Bit">LSB</acronym> and determine what this number is. Then subtract one.
Example: 11111110b
The sign bit is set, so the number is negative. Subtracting gives 11111101b. Inverting gives 10b, which is 2. The number is 2.
Example: 00000110b
The sign bit is not set, so the number is positive. Conversion can performed normal. The number is 6.
Example: 10000000b
The sign bit is set and all other bits are not. Adding one gives 10000001b. This number represents 127. Subtracting one from that gives 128.
Zero and SignExtensions
Zero and SignExtensions are the two different methods used to extend a value to a larger size, for example an 8bit value to 16bits. Zeroextension simply involves adding the appropriate number of zeros to the beginning of the number. For instance, zeroextending 10011111b to 16bits would result in 0000000010011111b. Signextension is slightly different. If the MSB is zero, then the conversion is the same as in zeroextension. However, if the MSB is one, then the appropriate number of ones are added to the beginning of the number instead of zeros. Using the same example, signextending 10011111b to 16bits would result in 1111111110011111b. Signextension is used because it allows two's complement numbers to keep their value when extended, whereas zeroextending the number would not.
IEEE754
IEEE754 is the nearlyuniversal standard used for representing floating point values. Nearly any computer that contains an <acronym title="FloatingPoint Unit">FPU</acronym> will be using this format, and this includes consoles. The small floating point size is 32 bits; floating point values of this size are called singleprecision floating point numbers (keyword float in Clike languages). Of these 32 bits, 1 is set aside for the sign, 8 bits are for the exponent, and 23 bits are for the mantissa. When the sign bit is 1, the number is negative; when it is 0, it is positive. The exponent bits actually store the exponent with 127 added to it, so when the exponent bits are 00000000b, the exponent is 127. The mantissa stores the normalized version of the number. This means that the decimal value is stored without the decimal point; the exponent keeps track of where the decimal point belongs. Normalized numbers have a MSB of one in the mantissa; there are also denormalized numbers, which have a MSB of zero in the mantissa. When using denormalized numbers, the exponent is stored with 126 added to it instead of 127.
Example: 118.625
The number is negative so the sign bit is set.
118.625 in binary is 1110110.101b.
Normalized, the number is 1.110110101 * 2^{6}.
The exponent is 6. Adding 127, the exponent is 133 or 10000101b.
The floating point value is 1100001011110110101000000000000b.
Doubleprecision floating point values also exist (keyword double in Clike languages). The size is doubled to 64bits. The exponent is expanded to 11 bits with a bias of 1,023, and the mantissa size is more than double to 52 bits. Finally, the quadrupleprecision floating point values (keyword long double in Clike languages) expands the size to 128 bits, with a 112bit mantissa and a 15bit exponent (with a 16,383 bias).
IEEE754 also allows for signed infinities and <acronym title="Not a Number">NaN</acronym>s. An infinity is represented by the maximum possible exponent with a mantissa of zero, while an NaN is represented by the maximum possible exponent with a nonzero mantissa.
Float Convert is a tool that allows one to convert between decimal floating point numbers and IEEE754 formatted floating point numbers. However, it only works with singleprecision values.
Some basic C code can be used to convert between decimal and IEEE754 doubleprecision values. This code is portable and is guaranteed to work on any platform. To convert a double to hex:
#include <stdio.h>
int main()
{
double val = 1.0; //Substitute your value here.
printf("%016I64X\n", *(long long *)&val);
return 0;
}
Converting from hex to double is easy as well:
#include <stdio.h>
int main()
{
long long val = 0x3FF0000000000000LL; //Substitute your value here.
printf("%G", *(double *)&val);
return 0;
}
Binary Coded Decimal (BCD)
<acronym title="Binary Coded Decimal">BCD</acronym> is an archaic method of representing decimal values. It was used on system without an FPU, such as the SNES. Each nybble in a BCD represents a specific decimal place; because of this, only values 09 are legal. The size of a BCD is arbitrary, as is the location where the decimal point exists.
Example: 0x9837. Assuming that the decimal point exists inbetween the bytes, this halfword represents the BCD value of 98.37.
Byte Order
Big Endian
Big endian is a method of storing data. The "big end" of a word is stored first. For instance, the word 0x0A0B0C0D stored at 0x08000000 would have 0x0A stored at 0x08000000, 0x0B at 0x08000001, 0x0C at 0x08000002, and 0x0D at 0x08000003.
Little Endian
Little endian is the most common method of storing data. The "little end" of a word is stored first. For instance, the word 0x0A0B0C0D stored at 0x08000000 would have 0x0D stored at 0x08000000, 0x0C stored at 0x08000001, 0x0B stored at 0x08000002, and 0x0A stored at 0x08000003. Many systems, including Nintendo's handhelds, store data this way.
ASCII
ASCII is the most commonly used format for encoding characters. Originally a 7bit encoding, it later expanded to 8bits to double the number of characters. In ASCII, the letters 'A' through 'Z' are represented by 0x41 through 0x5A and 'a' through 'z' are 0x61 through 0x7A. '0' through '9' are 0x30 through 0x39, and a space is 0x20. While many games encode their text into ASCII, a lot of games use a modified ASCII, where the letters, while still together, start at a different point. For instance, the capital letters may begin a 0x00 instead of their standard ASCII location. A relative search can help search for text in games that use this. A relative searcher searches based on differences between bytes instead of looking for a specific series of bytes. For instance, if one were to search for "ABD", the relative searcher would search for three consecutive bytes where the difference between the first and second was one and the difference between the second and third was two. SearchR3 is a program that automates this process. Below is an abridged ASCII table, which contains some of the more common characters used. Full ASCII tables are available on other websites, such as asciitable.com.
Character  Dec  Hex 

NULL  0  0x00 
LF (Newline)  10  0x0A 
Space  32  0x20 
!  33  0x21 
"  34  0x22 
#  35  0x23 
$  36  0x24 
'('  ')'  40  41  0x28  0x29 
.  46  0x2B 
'0'  '9'  48  57  0x30  0x39 
'A'  'Z'  65  90  0x41  0x5A 
'a'  'z'  97  122  0x61  0x7A 
Legal 


Bold text