Skip to content

Commit 9fe914d

Browse files
committed
Re-run benchmark for pdfium
1 parent 2417507 commit 9fe914d

17 files changed

+2243
-1600
lines changed

README.md

+11-11
Large diffs are not rendered by default.

benchmark.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -621,10 +621,10 @@ def get_text_extraction_score(doc: Document, library_name: str):
621621
"pdfium",
622622
"https://pypi.org/project/pypdfium2/",
623623
pdfium_get_text,
624-
"3.0.0",
624+
"3.3.0",
625625
None,
626626
license="Apache-2.0 or BSD-3-Clause",
627-
last_release_date="2022-09-24",
627+
last_release_date="2022-10-11",
628628
dependencies="PDFium (Foxit/Google)",
629629
),
630630
}

cache.json

+26-26
Original file line numberDiff line numberDiff line change
@@ -46,46 +46,46 @@
4646
},
4747
"pdfium": {
4848
"1601.03642": {
49-
"read": 0.03187894821166992
49+
"read": 0.022846460342407227
5050
},
5151
"1602.06541": {
52-
"read": 0.07932782173156738
52+
"read": 0.06338787078857422
5353
},
5454
"1707.09725": {
55-
"read": 0.4979126453399658
55+
"read": 0.25937962532043457
5656
},
5757
"2201.00021": {
58-
"read": 0.05899667739868164
58+
"read": 0.04457664489746094
5959
},
6060
"2201.00022": {
61-
"read": 0.05965733528137207
61+
"read": 0.03960609436035156
6262
},
6363
"2201.00029": {
64-
"read": 0.024973154067993164
64+
"read": 0.0203707218170166
6565
},
6666
"2201.00037": {
67-
"read": 0.1794278621673584
67+
"read": 0.10732841491699219
6868
},
6969
"2201.00069": {
70-
"read": 0.09558367729187012
70+
"read": 0.05700826644897461
7171
},
7272
"2201.00151": {
73-
"read": 0.1812427043914795
73+
"read": 0.15162134170532227
7474
},
7575
"2201.00178": {
76-
"read": 0.08454561233520508
76+
"read": 0.05579185485839844
7777
},
7878
"2201.00200": {
79-
"read": 0.03901362419128418
79+
"read": 0.029985904693603516
8080
},
8181
"2201.00201": {
82-
"read": 0.04749131202697754
82+
"read": 0.034445762634277344
8383
},
8484
"2201.00214": {
85-
"read": 0.5253455638885498
85+
"read": 2.0789294242858887
8686
},
8787
"GeoTopo-book": {
88-
"read": 0.5842208862304688
88+
"read": 0.31180763244628906
8989
}
9090
},
9191
"pdfminer": {
@@ -428,19 +428,19 @@
428428
},
429429
"pdfium": {
430430
"1601.03642": 0.9933865842136906,
431-
"1602.06541": 0.9935348174079982,
432-
"1707.09725": 0.9660502640896164,
433-
"2201.00021": 0.9835705768764047,
434-
"2201.00022": 0.9808979519224355,
431+
"1602.06541": 0.9934296421236865,
432+
"1707.09725": 0.9660028182027109,
433+
"2201.00021": 0.9827206308709265,
434+
"2201.00022": 0.9808761396486546,
435435
"2201.00029": 0.988998899889989,
436-
"2201.00037": 0.9633262470954664,
437-
"2201.00069": 0.9899890590809628,
438-
"2201.00151": 0.9463014140762108,
439-
"2201.00178": 0.961165450928382,
440-
"2201.00200": 0.9837577057089252,
441-
"2201.00201": 0.9861388064782841,
442-
"2201.00214": 0.992888914618732,
443-
"GeoTopo-book": 0.9683748611164659
436+
"2201.00037": 0.9630028664623901,
437+
"2201.00069": 0.9895289389507067,
438+
"2201.00151": 0.9462536938608789,
439+
"2201.00178": 0.9609345871825676,
440+
"2201.00200": 0.983652248485823,
441+
"2201.00201": 0.9860908471938528,
442+
"2201.00214": 0.9927643892179752,
443+
"GeoTopo-book": 0.9674374638962312
444444
},
445445
"pdfminer": {
446446
"1601.03642": 0.8627611914777197,

read/results/pdfium/1601.03642.txt

+12-12
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ a lot of data has become available. The idea of machine learning
4242
is to make use of this data.
4343
A formal definition of the field of Machine Learning is given
4444
by Tom Mitchel [Mit97]:
45-
A computer program is said to learn from experience E with respect to some class of tasks T and
45+
A computer program is said to learn from experience E with respect to some class of tasks T and
4646
performance measure P, if its performance at tasks
4747
in T, as measured by P, improves with experience E.
4848
Σ ϕ
@@ -64,8 +64,8 @@ xi are the input signals and wi are
6464
weights which have to get learned.
6565
Each input signal gets multiplied
6666
with its weight, everything gets
67-
summed up and the activation function ϕ is applied.
68-
(b) A visualization of a simple feedforward neural network. The 5 input nodes are red, the 2 bias nodes
67+
summed up and the activation function ϕ is applied.
68+
(b) A visualization of a simple feedforward neural network. The 5 input nodes are red, the 2 bias nodes
6969
are gray, the 3 hidden units are
7070
green and the single output node
7171
is blue.
@@ -79,7 +79,7 @@ adjust them without having to re-program everything. Machine
7979
learning programs should generally improve when they are fed
8080
with more data.
8181
The field of machine learning is related to statistics. Some
82-
algorithms directly try to find models which are based on wellknown distribution assumptions of the developer, others are
82+
algorithms directly try to find models which are based on wellknown distribution assumptions of the developer, others are
8383
more general.
8484
A common misunderstanding of people who are not related
8585
in this field is that the developers don’t understand what their
@@ -97,7 +97,7 @@ basic building blocks is a time-intensive and difficult task.
9797
An important group of machine learning algorithms was
9898
inspired by biological neurons and are thus called artificial
9999
neural networks. Those networks are based on mathematical
100-
functions called artificial neurons which take n ∈ N numbers x1, . . . , xn ∈ R as input, multiply them with weights
100+
functions called artificial neurons which take n ∈ N numbers x1, . . . , xn ∈ R as input, multiply them with weights
101101
w1, . . . , wn ∈ R, add them and apply a so called activation
102102
function ϕ as visualized in Figure 1(a). One example of such
103103
an activation function is the sigmoid function ϕ(x) = 1
@@ -235,7 +235,7 @@ a lower level. Using such a predictor, one can generate texts
235235
character by character. If the model is good, the text can have
236236
the correct punctuation. This would not be possible with a
237237
word predictor.
238-
Character predictors can be implemented with RNNs. In contrast to standard feed-forward neural networks like multilayer
238+
Character predictors can be implemented with RNNs. In contrast to standard feed-forward neural networks like multilayer
239239
Perceptrons (MLPs) which was shown in Figure 1(b), those
240240
networks are trained to take their output at some point as well as
241241
the normal input. This means they can keep some information
@@ -327,14 +327,14 @@ highly authentic replications and novel music compositions”.
327327
The reader might want to listen to [Cop12] to get an impression
328328
of the beauty of the created music.
329329
According to Cope, an essential part of music is “a set of
330-
instructions for creating different, but highly related selfreplications”. Emmy was programmed to find this set of
330+
instructions for creating different, but highly related selfreplications”. Emmy was programmed to find this set of
331331
instructions. It tries to find the “signature” of a composer,
332332
which Cope describes as “contiguous patterns that recur in two
333333
or more works of the composer”.
334334
The new feature of Emily Howell compared to Emmy is that
335335
Emily Howell does not necessarily remain in a single, already
336336
known style.
337-
Emily Howell makes use of association network. Cope emphasizes that this is not a form of a neural network. However, it
337+
Emily Howell makes use of association network. Cope emphasizes that this is not a form of a neural network. However, it
338338
is not clear from [Cop13] how exactly an association network
339339
is trained. Cope mentions that Emily Howell is explained in
340340
detail in [Cop05].
@@ -392,7 +392,7 @@ Available: https://www.youtube.com/watch?v=jLR- c uCwI
392392
composition,” XRDS: Crossroads, The ACM Magazine for
393393
Students, vol. 19, no. 4, pp. 16–20, 2013. [Online]. Available:
394394
http://dl.acm.org/citation.cfm?id=2460444
395-
[Cur14] A. Curtis, “Now then,” BBC, Jul. 2014. [Online]. Available: http://www.bbc.co.uk/blogs/adamcurtis/entries/
395+
[Cur14] A. Curtis, “Now then,” BBC, Jul. 2014. [Online]. Available: http://www.bbc.co.uk/blogs/adamcurtis/entries/
396396
78691781-c9b7-30a0-9a0a-3ff76e8bfe58
397397
[Gad06] A. Gadsby, Ed., Dictionary of Contemporary English. Pearson
398398
Education Limited, 2006.
@@ -413,7 +413,7 @@ for image recognition,” arXiv preprint arXiv:1512.03385, 2015.
413413
[Joh15a] D. Johnson, “Biaxial recurrent neural network for music
414414
composition,” GitHub, Aug. 2015. [Online]. Available: https:
415415
//github.com/hexahedria/biaxial-rnn-music-composition
416-
[Joh15b] ——, “Composing music with recurrent neural networks,” Personal Blog, Aug. 2015. [Online]. Available: http://www.hexahedria.com/2015/08/03/
416+
[Joh15b] ——, “Composing music with recurrent neural networks,” Personal Blog, Aug. 2015. [Online]. Available: http://www.hexahedria.com/2015/08/03/
417417
composing-music-with-recurrent-neural-networks/
418418
[Joh16] J. Johnson, “neural-style,” GitHub, Jan. 2016. [Online]. Available:
419419
https://github.com/jcjohnson/neural-style
@@ -432,7 +432,7 @@ computer science. McGraw-Hill, 1997.
432432
deeper into neural networks,” googleresearch.blogspot.co.uk,
433433
Jun. 2015. [Online]. Available: http://googleresearch.blogspot.de/
434434
2015/06/inceptionism-going-deeper-into-neural.html
435-
[Nie15] M. A. Nielsen, Neural Networks and Deep Learning. Determination Press, 2015. [Online]. Available: http://neuralnetworksanddeeplearning.com/chap6.html#
435+
[Nie15] M. A. Nielsen, Neural Networks and Deep Learning. Determination Press, 2015. [Online]. Available: http://neuralnetworksanddeeplearning.com/chap6.html#
436436
introducing convolutional networks
437437
[NV15] A. Nayebi and M. Vitelli, “GRUV: Algorithmic music generation
438438
using recurrent neural networks,” 2015. [Online]. Available:
@@ -463,7 +463,7 @@ http://arxiv.org/abs/1506.05869v2
463463
Available: https://github.com/MattVitelli/GRUV
464464
[Wei76] J. Weizenbaum, Computer Power and Human Reason: From
465465
Judgement to Calculation. W.H.Freeman & Co Ltd, 1976.
466-
[ZF14] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Computer Vision–ECCV 2014. Springer,
466+
[ZF14] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Computer Vision–ECCV 2014. Springer,
467467
2014, pp. 818–833.
468468
6
469469
APPENDIX A

0 commit comments

Comments
 (0)