Skip to content

Commit 3872263

Browse files
authored
Merge pull request #127 from davidbrochart/notebook
Add file_array and chunked_file_array notebooks
2 parents 87d8087 + 397ee29 commit 3872263

File tree

2 files changed

+336
-0
lines changed

2 files changed

+336
-0
lines changed

notebooks/chunked_file_array.ipynb

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Chunked file arrays"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"Chunked file arrays are chunked arrays whose chunks are stored in a file system. A store manager holds one or more chunks in memory, and flushes them to the file system when necessary."
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": null,
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"#include \"xtensor-io/xchunk_store_manager.hpp\"\n",
24+
"#include \"xtensor-io/xio_binary.hpp\"\n",
25+
"#include \"xtensor-io/xio_disk_handler.hpp\""
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": null,
31+
"metadata": {},
32+
"outputs": [],
33+
"source": [
34+
"namespace fs = ghc::filesystem;"
35+
]
36+
},
37+
{
38+
"cell_type": "code",
39+
"execution_count": null,
40+
"metadata": {},
41+
"outputs": [],
42+
"source": [
43+
"std::vector<size_t> shape = {4, 4};\n",
44+
"std::vector<size_t> chunk_shape = {2, 2};\n",
45+
"std::string chunk_dir = \"chunks\";\n",
46+
"double init_value = 5.5;\n",
47+
"\n",
48+
"fs::create_directory(chunk_dir);"
49+
]
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": [
55+
"A maximum of 2 chunks will be hold in memory:"
56+
]
57+
},
58+
{
59+
"cell_type": "code",
60+
"execution_count": null,
61+
"metadata": {},
62+
"outputs": [],
63+
"source": [
64+
"std::size_t pool_size = 2;"
65+
]
66+
},
67+
{
68+
"cell_type": "code",
69+
"execution_count": null,
70+
"metadata": {},
71+
"outputs": [],
72+
"source": [
73+
"auto a = xt::chunked_file_array<double, xt::xio_disk_handler<xt::xio_binary_config>>( \\\n",
74+
" shape, \\\n",
75+
" chunk_shape, \\\n",
76+
" chunk_dir, \\\n",
77+
" init_value, \\\n",
78+
" pool_size);"
79+
]
80+
},
81+
{
82+
"cell_type": "markdown",
83+
"metadata": {},
84+
"source": [
85+
"This assigns to chunk `(1, 0)` in memory:"
86+
]
87+
},
88+
{
89+
"cell_type": "code",
90+
"execution_count": null,
91+
"metadata": {},
92+
"outputs": [],
93+
"source": [
94+
"a(2, 1) = 1.2;"
95+
]
96+
},
97+
{
98+
"cell_type": "markdown",
99+
"metadata": {},
100+
"source": [
101+
"This assigns to chunk `(0, 1)` in memory:"
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": null,
107+
"metadata": {},
108+
"outputs": [],
109+
"source": [
110+
"a(1, 2) = 3.4;"
111+
]
112+
},
113+
{
114+
"cell_type": "markdown",
115+
"metadata": {},
116+
"source": [
117+
"Because the pool is full, this saves chunk `(1, 0)` to disk and assigns to chunk `(0, 0)` in memory:"
118+
]
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": null,
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"a(0, 0) = 5.6;"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"When `a` is destroyed or flushed, all the modified chunks are saved to disk. Here, only chunks `(0, 1)` and `(0, 0)` are saved, since chunk `(1, 0)` has not changed."
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"metadata": {},
140+
"outputs": [],
141+
"source": [
142+
"a.chunks().flush();"
143+
]
144+
}
145+
],
146+
"metadata": {
147+
"kernelspec": {
148+
"display_name": "C++14",
149+
"language": "C++14",
150+
"name": "xcpp14"
151+
},
152+
"language_info": {
153+
"codemirror_mode": "text/x-c++src",
154+
"file_extension": ".cpp",
155+
"mimetype": "text/x-c++src",
156+
"name": "c++",
157+
"version": "14"
158+
}
159+
},
160+
"nbformat": 4,
161+
"nbformat_minor": 4
162+
}

notebooks/file_array.ipynb

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# File arrays"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"File arrays are file-backed, cached arrays. They implement persistent arrays stored in the local file system or in the cloud, in a variety of file formats."
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": null,
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"#include <xtensor-io/xfile_array.hpp>\n",
24+
"#include <xtensor-io/xio_binary.hpp>\n",
25+
"#include <xtensor-io/xio_disk_handler.hpp>"
26+
]
27+
},
28+
{
29+
"cell_type": "markdown",
30+
"metadata": {},
31+
"source": [
32+
"An on-disk file array stored in binary format:"
33+
]
34+
},
35+
{
36+
"cell_type": "code",
37+
"execution_count": null,
38+
"metadata": {},
39+
"outputs": [],
40+
"source": [
41+
"using file_array = xt::xfile_array<double, xt::xio_disk_handler<xt::xio_binary_config>>;"
42+
]
43+
},
44+
{
45+
"cell_type": "markdown",
46+
"metadata": {},
47+
"source": [
48+
"Since the file doesn't alreay exist, we use the `init` file mode:"
49+
]
50+
},
51+
{
52+
"cell_type": "code",
53+
"execution_count": null,
54+
"metadata": {},
55+
"outputs": [],
56+
"source": [
57+
"file_array a1(\"a1.bin\", xt::xfile_mode::init);"
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": null,
63+
"metadata": {},
64+
"outputs": [],
65+
"source": [
66+
"std::vector<size_t> shape = {2, 2};\n",
67+
"a1.resize(shape);"
68+
]
69+
},
70+
{
71+
"cell_type": "markdown",
72+
"metadata": {},
73+
"source": [
74+
"Let's assign a value to an element of the file array."
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": null,
80+
"metadata": {},
81+
"outputs": [],
82+
"source": [
83+
"a1(0, 1) = 1.;"
84+
]
85+
},
86+
{
87+
"cell_type": "markdown",
88+
"metadata": {},
89+
"source": [
90+
"The in-memory element value has changed, but not the on-disk file yet. The on-disk file will change when the array is explicitly flushed, or when it is destroyed (e.g. when going out of scope)."
91+
]
92+
},
93+
{
94+
"cell_type": "code",
95+
"execution_count": null,
96+
"metadata": {},
97+
"outputs": [],
98+
"source": [
99+
"a1.flush();"
100+
]
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"metadata": {},
105+
"source": [
106+
"Now the on-disk file has changed."
107+
]
108+
},
109+
{
110+
"cell_type": "markdown",
111+
"metadata": {},
112+
"source": [
113+
"Let's point `a2` to `a1`'s file. For that, we use the `load` file mode:"
114+
]
115+
},
116+
{
117+
"cell_type": "code",
118+
"execution_count": null,
119+
"metadata": {},
120+
"outputs": [],
121+
"source": [
122+
"file_array a2(\"a1.bin\", xt::xfile_mode::load);"
123+
]
124+
},
125+
{
126+
"cell_type": "markdown",
127+
"metadata": {},
128+
"source": [
129+
"The binary format doesn't store the shape, so we reshape the array."
130+
]
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": null,
135+
"metadata": {},
136+
"outputs": [],
137+
"source": [
138+
"a2.resize(shape);"
139+
]
140+
},
141+
{
142+
"cell_type": "markdown",
143+
"metadata": {},
144+
"source": [
145+
"Let's ensure that `a1` and `a2` are equal."
146+
]
147+
},
148+
{
149+
"cell_type": "code",
150+
"execution_count": null,
151+
"metadata": {},
152+
"outputs": [],
153+
"source": [
154+
"assert(xt::all(xt::equal(a1, a2)));"
155+
]
156+
}
157+
],
158+
"metadata": {
159+
"kernelspec": {
160+
"display_name": "C++14",
161+
"language": "C++14",
162+
"name": "xcpp14"
163+
},
164+
"language_info": {
165+
"codemirror_mode": "text/x-c++src",
166+
"file_extension": ".cpp",
167+
"mimetype": "text/x-c++src",
168+
"name": "c++",
169+
"version": "14"
170+
}
171+
},
172+
"nbformat": 4,
173+
"nbformat_minor": 4
174+
}

0 commit comments

Comments
 (0)