Skip to content

Commit f65793a

Browse files
author
Hgh
committed
Updated text of the encoding -> binary to text read me file regarding some text modification
1 parent b683ff5 commit f65793a

File tree

1 file changed

+73
-43
lines changed

1 file changed

+73
-43
lines changed

encoding/binary-to-text/README.md

Lines changed: 73 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,70 @@
11
# Binary-to-text Encoding
22

3-
The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of system, e.g. binary data being sent over the response of an API calling, or viewing special characters on a debug console or unit test function. The goal is not to keep information secret, but rather to ensure that it’s able to be properly consumed.
4-
Encoding transforms data into another format using a scheme that is publicly available so that it can easily be reversed. It does not require a key as the only thing required to decode it is the algorithm that was used to encode it.
3+
## Table of contents
4+
5+
- ### [Purpose](#purpose)
6+
- ### [Hexadecimal (Base16)](#Hexadecimal-(Base16))
7+
- #### [Advantages](#advantages)
8+
- #### [Disadvantages](#disadvantages)
9+
- ### [Base64](#base64)
10+
- ### [Examples](#examples)
11+
- #### [Manual encoding](#manual-encoding)
12+
- #### [Create a binary file](#create-a-binary-file)
13+
- #### [Encode to standard Base64](#encode-to-standard-base64)
14+
- #### [Decode from standard Base64](#decode-from-standard-base64)
15+
- ### [Text-to-binary decoding](#text-to-binary-decoding)
16+
17+
## Purpose
18+
19+
To understand the purpose of Encoding, please check [here](../../README.md#purpose)
520

621
## Hexadecimal (Base16)
722

8-
Base16 can also refer to a binary to text encoding belonging to the same family as Base32, Base58, and Base64.
23+
The Hexadecimal is a numeral system made up of 16 symbols to write and share numerical values. Base16 can also refer to
24+
a binary to text encoding belonging to the same family as Base32, Base58, and Base64.
25+
26+
In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16
27+
symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the
28+
ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with
29+
standard written notation for hexadecimal numbers.
930

10-
In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16 symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with standard written notation for hexadecimal numbers.
31+
### Advantages
1132

1233
There are several advantages of Base16 encoding:
1334

14-
- Most programming languages already have facilities to parse ASCII-encoded hexadecimal
15-
- Being exactly half a byte, 4-bits is easier to process than the 5 or 6 bits of Base32 and Base64 respectively
16-
The symbols 0-9 and A-F are universal in hexadecimal notation, so it is easily understood at a glance without needing to rely on a symbol lookup table
17-
- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"), making it more efficient in hardware than Base32 and Base64
35+
- Most programming languages already have facilities to parse ASCII-encoded hexadecimal.
36+
- Being exactly half a byte (4-bits) is easier to process than the 5 or 6 bits of Base32 and Base64 respectively. The
37+
symbols 0-9 and A-F are universal in hexadecimal notation, so it would be easily understood at a glance without
38+
needing to rely on a symbol lookup table.
39+
- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"),
40+
making Base16 more efficient in hardware than Base32 and Base64.
41+
42+
### Disadvantages
1843

1944
The main disadvantages of Base16 encoding are:
2045

21-
- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively.
46+
- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In
47+
contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively.
2248
- Possible added complexity of having to accept both uppercase and lowercase letters.
2349

2450
## Base64
2551

26-
Here, we are talking about the `Base64` encoding from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648).
52+
Here, we are talking about the `Base64` encoding
53+
from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648).
2754

2855
There are two different versions defined in RFC 4648:
2956

3057
* Standard
3158
* With URL and Filename Safe Alphabet
3259

33-
The encoding process represents 24-bit groups of input bits as output
34-
strings of 4 encoded characters. Proceeding from left to right, a
35-
24-bit input group is formed by concatenating 3 8-bit input groups.
36-
These 24 bits are then treated as 4 concatenated 6-bit groups, each
37-
of which is translated into a single character in the base 64
38-
alphabet.
60+
The encoding process takes 24-bit groups as input and represents 4 encoded characters string as output.
61+
62+
The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from
63+
left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4
64+
concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet.
3965

40-
Each 6-bit group is used as an index into an array of 64 printable
41-
characters. The character referenced by the index is placed in the
42-
output string.
66+
Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is
67+
placed in the output string.
4368

4469
The Base 64 Alphabet Table
4570

@@ -62,23 +87,22 @@ The Base 64 Alphabet Table
6287
15 P 32 g 49 x
6388
16 Q 33 h 50 y
6489

65-
Special processing is performed if fewer than 24 bits are available
66-
at the end of the data being encoded. A full encoding quantum is
67-
always completed at the end of a quantity. When fewer than 24 input
68-
bits are available in an input group, bits with value zero are added
69-
(on the right) to form an integral number of 6-bit groups.
70-
Since it encodes by group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for padding.
90+
Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full
91+
encoding quantum is always completed at the end of a quantity. When fewer than 24 input bits are available in an input
92+
group, bits with value zero are added (on the right) to form an integral number of 6-bit groups. Since it encodes by
93+
group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for
94+
padding.
7195

72-
In `URL/Filename safe` version, the `-` is used for `62` instead of `+` ,
73-
and the `_` is used for `63` instead of `/` . This encoding may be referred to as "base64url".
74-
This encoding should not be regarded as the same as the "base64" encoding and
75-
should not be referred to as only "base64".
96+
In `URL/Filename safe` version, the `-` is used for `62` instead of `+` , and the `_` is used for `63` instead of `/`.
97+
This encoding may be referred to as "base64url".
98+
This encoding should not be regarded as the same as the "base64" encoding and should not be referred to as only "base64"
99+
.
76100

77-
In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021.
101+
In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021.
78102

79-
### Example
103+
### Examples
80104

81-
#### manual encoding
105+
#### Manual encoding
82106

83107
Suppose that the input byte array is [0xff, 0xe2].
84108

@@ -100,50 +124,56 @@ The output length is not the multiplier of 4, so add `=` as the padding characte
100124

101125
`/` `+` `I` `=`
102126

103-
If we try to do same one for `base64url` :
127+
If we try to do same one for `base64url`:
104128

105129
`_` `-` `I` `=`
106130

107-
#### create a binary file
131+
#### Create a binary file
108132

109-
You can use `echo` in command line interface :
133+
You can use `echo` in command line interface:
110134

111135
```
112136
$ echo -n -e \\xff\\xe2 > data_binary.bin
113137
```
114138

115-
To check the content of the binary file :
139+
To check the content of the binary file:
116140

117141
```
118142
$ hexdump data_binary.bin
119143
```
120144

121-
#### encode to standard Base64
145+
#### Encode to standard Base64
122146

123147
```
124148
$ openssl enc -base64 -e -in data_binary.bin
125149
```
126150

127-
#### decode from standard Base64
151+
#### Decode from standard Base64
128152

129153
```
130154
$ openssl enc -base64 -d <<< /+I= | od -vt x1
131155
```
132156

133157
# Text-to-binary decoding
134158

135-
In many situations, we have some text values which should be decoded to an equivalent byte arrays. Because we need to put them as the input of a cryptographic process. For example, assume that we have message for an authorized party in text and we need to encrypt it before transmission. The encryption process accepts a byte array as the input so we need to convert the message to a byte array :
159+
In many situations, we have some text values which should be decoded to an equivalent byte arrays to use as the input of
160+
a cryptographic process. For example, assume that we have message for an authorized party in text and we need to encrypt
161+
it before transmission. The encryption process accepts a byte array as the input, so we need to convert the message to a
162+
byte array :
136163

137164
```
138165
$ echo -n 'Hello, World' | od -t x1
139166
0000000 48 65 6c 6c 6f 20 57 6f 72 6c 64
140167
```
141-
or in other representation way :
168+
169+
or in other representation way:
142170

143171
```
144172
$ echo -n 'Hello, World' | xxd -ps
145173
48656c6c6f2c20576f726c64
146174
```
147-
But what does it mean really? It's very important for you to understand what happens exactly in this conversion.
148-
Take a look at the `ASCII Table` again. `0x48` refers to the hexadecimal representation of `H` character, `0x65` refers to `e` character and so on. So, every character in the `Hello, World` message is converted to a hexadecimal value from `ASCII Table`. It means that we have done the `ASCII` decoding process. Did we have any other option?
149-
Yes,
175+
176+
But what does it mean really? It's very important for you to understand what happens exactly in this conversion. Take a
177+
look at the `ASCII Table` again. `0x48` refers to the hexadecimal representation of `H` character, `0x65` refers to `e`
178+
character and so on. So, every character in the `Hello, World` message is converted to a hexadecimal value
179+
from `ASCII Table`. It means that we have done the `ASCII` decoding process. Did we have any other option? Yes,

0 commit comments

Comments
 (0)