You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of system, e.g. binary data being sent over the response of an API calling, or viewing special characters on a debug console or unit test function. The goal is not to keep information secret, but rather to ensure that it’s able to be properly consumed.
4
-
Encoding transforms data into another format using a scheme that is publicly available so that it can easily be reversed. It does not require a key as the only thing required to decode it is the algorithm that was used to encode it.
To understand the purpose of Encoding, please check [here](../../README.md#purpose)
5
20
6
21
## Hexadecimal (Base16)
7
22
8
-
Base16 can also refer to a binary to text encoding belonging to the same family as Base32, Base58, and Base64.
23
+
The Hexadecimal is a numeral system made up of 16 symbols to write and share numerical values. Base16 can also refer to
24
+
a binary to text encoding belonging to the same family as Base32, Base58, and Base64.
25
+
26
+
In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16
27
+
symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the
28
+
ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with
29
+
standard written notation for hexadecimal numbers.
9
30
10
-
In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16 symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with standard written notation for hexadecimal numbers.
31
+
### Advantages
11
32
12
33
There are several advantages of Base16 encoding:
13
34
14
-
- Most programming languages already have facilities to parse ASCII-encoded hexadecimal
15
-
- Being exactly half a byte, 4-bits is easier to process than the 5 or 6 bits of Base32 and Base64 respectively
16
-
The symbols 0-9 and A-F are universal in hexadecimal notation, so it is easily understood at a glance without needing to rely on a symbol lookup table
17
-
- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"), making it more efficient in hardware than Base32 and Base64
35
+
- Most programming languages already have facilities to parse ASCII-encoded hexadecimal.
36
+
- Being exactly half a byte (4-bits) is easier to process than the 5 or 6 bits of Base32 and Base64 respectively. The
37
+
symbols 0-9 and A-F are universal in hexadecimal notation, so it would be easily understood at a glance without
38
+
needing to rely on a symbol lookup table.
39
+
- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"),
40
+
making Base16 more efficient in hardware than Base32 and Base64.
41
+
42
+
### Disadvantages
18
43
19
44
The main disadvantages of Base16 encoding are:
20
45
21
-
- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively.
46
+
- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In
47
+
contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively.
22
48
- Possible added complexity of having to accept both uppercase and lowercase letters.
23
49
24
50
## Base64
25
51
26
-
Here, we are talking about the `Base64` encoding from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648).
52
+
Here, we are talking about the `Base64` encoding
53
+
from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648).
27
54
28
55
There are two different versions defined in RFC 4648:
29
56
30
57
* Standard
31
58
* With URL and Filename Safe Alphabet
32
59
33
-
The encoding process represents 24-bit groups of input bits as output
34
-
strings of 4 encoded characters. Proceeding from left to right, a
35
-
24-bit input group is formed by concatenating 3 8-bit input groups.
36
-
These 24 bits are then treated as 4 concatenated 6-bit groups, each
37
-
of which is translated into a single character in the base 64
38
-
alphabet.
60
+
The encoding process takes 24-bit groups as input and represents 4 encoded characters string as output.
61
+
62
+
The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from
63
+
left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4
64
+
concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet.
39
65
40
-
Each 6-bit group is used as an index into an array of 64 printable
41
-
characters. The character referenced by the index is placed in the
42
-
output string.
66
+
Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is
67
+
placed in the output string.
43
68
44
69
The Base 64 Alphabet Table
45
70
@@ -62,23 +87,22 @@ The Base 64 Alphabet Table
62
87
15 P 32 g 49 x
63
88
16 Q 33 h 50 y
64
89
65
-
Special processing is performed if fewer than 24 bits are available
66
-
at the end of the data being encoded. A full encoding quantum is
67
-
always completed at the end of a quantity. When fewer than 24 input
68
-
bits are available in an input group, bits with value zero are added
69
-
(on the right) to form an integral number of 6-bit groups.
70
-
Since it encodes by group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for padding.
90
+
Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full
91
+
encoding quantum is always completed at the end of a quantity. When fewer than 24 input bits are available in an input
92
+
group, bits with value zero are added (on the right) to form an integral number of 6-bit groups. Since it encodes by
93
+
group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for
94
+
padding.
71
95
72
-
In `URL/Filename safe` version, the `-` is used for `62` instead of `+` ,
73
-
and the `_` is used for `63` instead of `/` . This encoding may be referred to as "base64url".
74
-
This encoding should not be regarded as the same as the "base64" encoding and
75
-
should not be referred to as only "base64".
96
+
In `URL/Filename safe` version, the `-` is used for `62` instead of `+` , and the `_` is used for `63` instead of `/`.
97
+
This encoding may be referred to as "base64url".
98
+
This encoding should not be regarded as the same as the "base64" encoding and should not be referred to as only "base64"
99
+
.
76
100
77
-
In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021.
101
+
In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021.
78
102
79
-
### Example
103
+
### Examples
80
104
81
-
#### manual encoding
105
+
#### Manual encoding
82
106
83
107
Suppose that the input byte array is [0xff, 0xe2].
84
108
@@ -100,50 +124,56 @@ The output length is not the multiplier of 4, so add `=` as the padding characte
100
124
101
125
`/``+``I``=`
102
126
103
-
If we try to do same one for `base64url`:
127
+
If we try to do same one for `base64url`:
104
128
105
129
`_``-``I``=`
106
130
107
-
#### create a binary file
131
+
#### Create a binary file
108
132
109
-
You can use `echo` in command line interface :
133
+
You can use `echo` in command line interface:
110
134
111
135
```
112
136
$ echo -n -e \\xff\\xe2 > data_binary.bin
113
137
```
114
138
115
-
To check the content of the binary file:
139
+
To check the content of the binary file:
116
140
117
141
```
118
142
$ hexdump data_binary.bin
119
143
```
120
144
121
-
#### encode to standard Base64
145
+
#### Encode to standard Base64
122
146
123
147
```
124
148
$ openssl enc -base64 -e -in data_binary.bin
125
149
```
126
150
127
-
#### decode from standard Base64
151
+
#### Decode from standard Base64
128
152
129
153
```
130
154
$ openssl enc -base64 -d <<< /+I= | od -vt x1
131
155
```
132
156
133
157
# Text-to-binary decoding
134
158
135
-
In many situations, we have some text values which should be decoded to an equivalent byte arrays. Because we need to put them as the input of a cryptographic process. For example, assume that we have message for an authorized party in text and we need to encrypt it before transmission. The encryption process accepts a byte array as the input so we need to convert the message to a byte array :
159
+
In many situations, we have some text values which should be decoded to an equivalent byte arrays to use as the input of
160
+
a cryptographic process. For example, assume that we have message for an authorized party in text and we need to encrypt
161
+
it before transmission. The encryption process accepts a byte array as the input, so we need to convert the message to a
162
+
byte array :
136
163
137
164
```
138
165
$ echo -n 'Hello, World' | od -t x1
139
166
0000000 48 65 6c 6c 6f 20 57 6f 72 6c 64
140
167
```
141
-
or in other representation way :
168
+
169
+
or in other representation way:
142
170
143
171
```
144
172
$ echo -n 'Hello, World' | xxd -ps
145
173
48656c6c6f2c20576f726c64
146
174
```
147
-
But what does it mean really? It's very important for you to understand what happens exactly in this conversion.
148
-
Take a look at the `ASCII Table` again. `0x48` refers to the hexadecimal representation of `H` character, `0x65` refers to `e` character and so on. So, every character in the `Hello, World` message is converted to a hexadecimal value from `ASCII Table`. It means that we have done the `ASCII` decoding process. Did we have any other option?
149
-
Yes,
175
+
176
+
But what does it mean really? It's very important for you to understand what happens exactly in this conversion. Take a
177
+
look at the `ASCII Table` again. `0x48` refers to the hexadecimal representation of `H` character, `0x65` refers to `e`
178
+
character and so on. So, every character in the `Hello, World` message is converted to a hexadecimal value
179
+
from `ASCII Table`. It means that we have done the `ASCII` decoding process. Did we have any other option? Yes,
0 commit comments