Skip to content

Commit 1bd1332

Browse files
authored
Merge pull request #28 from neocl/dev
Release version 0.1a9
2 parents b55f035 + c6c5d73 commit 1bd1332

10 files changed

+245
-47
lines changed

CHANGES.md

-16
This file was deleted.

README.md

+28-27
Original file line numberDiff line numberDiff line change
@@ -18,24 +18,28 @@
1818

1919
Homepage: [https://github.com/neocl/jamdict](https://github.com/neocl/jamdict)
2020

21-
[Contributors](#contributors) are welcome! 🙇
21+
[Contributors](#contributors) are welcome! 🙇. If you want to help, please see [Contributing](https://jamdict.readthedocs.io/en/latest/contributing.html) page.
22+
23+
# Try Jamdict out
24+
25+
There is a demo Jamdict virtual machine to try out on the web on Repl.it: https://replit.com/@tuananhle/jamdict-demo
2226

2327
# Installation
2428

2529
Jamdict & Jamdict database are both available on [PyPI](https://pypi.org/project/jamdict/) and can be installed using pip
2630

2731
```bash
28-
pip install jamdict jamdict-data
32+
pip install --upgrade jamdict jamdict-data
2933
```
3034

3135
# Sample jamdict Python code
3236

3337
```python
3438
from jamdict import Jamdict
35-
jmd = Jamdict()
39+
jam = Jamdict()
3640

3741
# use wildcard matching to find anything starts with 食べ and ends with る
38-
result = jmd.lookup('食べ%る')
42+
result = jam.lookup('食べ%る')
3943

4044
# print all word entries
4145
for entry in result.entries:
@@ -108,21 +112,21 @@ The terminology of radicals/components used by Jamdict can be different from els
108112

109113
By default jamdict provides two maps:
110114

111-
- jmd.krad is a Python dict that maps characters to list of components.
112-
- jmd.radk is a Python dict that maps each available components to a list of characters.
115+
- jam.krad is a Python dict that maps characters to list of components.
116+
- jam.radk is a Python dict that maps each available components to a list of characters.
113117

114118
```python
115119
# Find all writing components (often called "radicals") of the character 雲
116-
print(jmd.krad[''])
120+
print(jam.krad[''])
117121
# ['一', '雨', '二', '厶']
118122

119123
# Find all characters with the component 鼎
120-
chars = jmd.radk['']
124+
chars = jam.radk['']
121125
print(chars)
122126
# {'鼏', '鼒', '鼐', '鼎', '鼑'}
123127

124128
# look up the characters info
125-
result = jmd.lookup(''.join(chars))
129+
result = jam.lookup(''.join(chars))
126130
for c in result.chars:
127131
print(c, c.meanings())
128132
# 鼏 ['cover of tripod cauldron']
@@ -136,7 +140,7 @@ for c in result.chars:
136140

137141
```bash
138142
# Find all names with 鈴木 inside
139-
result = jmd.lookup('%鈴木%')
143+
result = jam.lookup('%鈴木%')
140144
for name in result.names:
141145
print(name)
142146

@@ -154,30 +158,27 @@ for name in result.names:
154158
155159
## Exact matching
156160
157-
Use exact matching for faster search
161+
Use exact matching for faster search.
158162
159-
```python
160-
# Find an entry (word, name entity) by idseq
161-
result = jmd.lookup('id#5711308')
162-
print(result.names[0])
163-
# [id#5711308] すすき (鈴木) : Susuki (family or surname)
164-
result = jmd.lookup('id#1467640')
165-
print(result.entries[0])
166-
# ねこ (猫) : 1. cat 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
163+
Find the word 花火 by idseq (1194580)
167164
168-
# use exact matching to increase searching speed (thanks to @reem-codes)
169-
result = jmd.lookup('')
165+
```python
166+
>>> result = jam.lookup('id#1194580')
167+
>>> print(result.names[0])
168+
[id#1194580] はなび (花火) : fireworks ((noun (common) (futsuumeishi)))
169+
```
170170
171-
for entry in result.entries:
172-
print(entry)
171+
Find an exact name 花火 by idseq (5170462)
173172
174-
# [id#1467640] ねこ (猫) : 1. cat ((noun (common) (futsuumeishi))) 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
175-
# [id#2698030] ねこま (猫) : cat ((noun (common) (futsuumeishi)))
173+
```python
174+
>>> result = jam.lookup('id#5170462')
175+
>>> print(result.names[0])
176+
[id#5170462] はなび (花火) : Hanabi (female given name or forename)
176177
```
177178
178179
See `jamdict_demo.py` and `jamdict/tools.py` for more information.
179180
180-
# Official website
181+
# Useful links
181182
182183
* JMdict: [http://edrdg.org/jmdict/edict_doc.html](http://edrdg.org/jmdict/edict_doc.html)
183184
* kanjidic2: [https://www.edrdg.org/wiki/index.php/KANJIDIC_Project](https://www.edrdg.org/wiki/index.php/KANJIDIC_Project)
@@ -189,4 +190,4 @@ See `jamdict_demo.py` and `jamdict/tools.py` for more information.
189190
- [Le Tuan Anh](https://github.com/letuananh) (Maintainer)
190191
- [Matteo Fumagalli](https://github.com/matteofumagalli1275)
191192
- [Reem Alghamdi](https://github.com/reem-codes)
192-
- [alt-romes](https://github.com/alt-romes)
193+
- [alt-romes](https://github.com/alt-romes)

docs/_static/jamdict_db_schema.png

215 KB
Loading

docs/contributing.rst

+87
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
.. _contributing:
2+
3+
Contributing
4+
============
5+
6+
There are many ways to contribute to the Jamdict project.
7+
The one that Jamdict development team are focusing on at the moment are:
8+
9+
- Fixing :ref:`existing bugs <contrib_bugfix>`
10+
- Improving query functions
11+
- Improving :ref:`documentation <contrib_docs>`
12+
- Keeping jamdict database up to date
13+
14+
If you have some suggestions or bug reports, please share on `jamdict issues tracker <https://github.com/neocl/jamdict/issues>`_.
15+
16+
.. _contrib_bugfix:
17+
18+
Fixing bugs
19+
-----------
20+
21+
If you found a bug please report at https://github.com/neocl/jamdict/issues
22+
23+
When it is possible, please also share how to reproduce the bugs and a snapshot of jamdict info to help with the bug finding process.
24+
25+
.. code:: bash
26+
27+
python3 -m jamdict info
28+
29+
Pull requests are welcome.
30+
31+
.. _contrib_docs:
32+
33+
Updating Documentation
34+
----------------------
35+
36+
1. Fork `jamdict <https://github.com/neocl/jamdict>`_ repository to your own Github account.
37+
38+
#. Clone `jamdict` repository to your local machine.
39+
40+
.. code:: bash
41+
42+
git clone https://github.com/<your-account-name>/jamdict
43+
44+
#. Create a virtual environment (optional, but highly recommended)
45+
46+
.. code:: bash
47+
48+
# if you use virtualenvwrapper
49+
mkvirtualenv jamdev
50+
workon jamdev
51+
52+
# if you use Python venv
53+
python3 -m venv .env
54+
. .env/bin/activate
55+
python3 -m pip install --upgrade pip wheel Sphinx
56+
57+
#. Build the docs
58+
59+
.. code:: bash
60+
61+
cd jamdict/docs
62+
# compile the docs
63+
make dirhtml
64+
# serve the docs using Python3 built-in development server
65+
# Note: this requires Python >= 3.7 to support --directory
66+
python3 -m http.server 7000 --directory _build/dirhtml
67+
# if you use earlier Python 3, you may use
68+
cd _build/dirhtml
69+
python3 -m http.server 7000
70+
71+
#. Now the docs should be ready to view at http://localhost:7000 . You can visit that URL on your browser to view the docs.
72+
73+
#. More information:
74+
75+
- Sphinx tutorial: https://sphinx-tutorial.readthedocs.io/start/
76+
- Using `virtualenv`: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
77+
- Using `venv`: https://docs.python.org/3/library/venv.html
78+
79+
.. _contrib_dev:
80+
81+
Development
82+
-----------
83+
84+
Development contributions are welcome.
85+
Setting up development environment for Jamdict should be similar to :ref:`contrib_docs`.
86+
87+
Please contact the development team if you need more information: https://github.com/neocl/jamdict/issues

docs/index.rst

+9-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ Main features
2121
Hide this for now
2222
- jamdol (jamdol-flask) - a Python/Flask server that provides Jamdict lookup via REST API (experimental state)
2323
24-
:ref:`Contributors <contributors>` are welcome! 🙇
24+
:ref:`Contributors <contributors>` are welcome! 🙇.
25+
If you want to help developing Jamdict, please visit :ref:`contributing` page.
2526

2627
Installation
2728
------------
@@ -70,7 +71,7 @@ Looking up named entities
7071
[id#5053163] ディズニー : Disney (family or surname/company name)
7172
[id#5741091] ディズニーランド : Disneyland (place name)
7273

73-
See :ref:`recipes` for more sample code.
74+
See :ref:`recipes` for more code samples.
7475

7576
.. _commandline:
7677

@@ -123,10 +124,16 @@ Documentation
123124
tutorials
124125
recipes
125126
api
127+
contributing
126128

127129
Other info
128130
==========
129131

132+
Release Notes
133+
-------------
134+
135+
Release notes is available :ref:`here <updates>`.
136+
130137
.. _contributors:
131138

132139
Contributors

docs/recipes.rst

+69
Original file line numberDiff line numberDiff line change
@@ -99,3 +99,72 @@ Use exact matching for faster search
9999
# [id#1467640] ねこ (猫) : 1. cat ((noun (common) (futsuumeishi))) 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
100100
# [id#2698030] ねこま (猫) : cat ((noun (common) (futsuumeishi)))
101101
102+
Low-level data queries
103+
----------------------
104+
105+
It’s possible to access to the dictionary data by querying database directly using lower level APIs.
106+
However these are prone to future changes so please keep that in mind.
107+
108+
When you create a Jamdict object, you have direct access to the
109+
underlying databases, via these properties
110+
111+
.. code:: python
112+
113+
from jamdict import Jamdict
114+
jam = Jamdict()
115+
>>> jam.jmdict # jamdict.JMDictSQLite object for accessing word dictionary
116+
>>> jam.kd2 # jamdict.KanjiDic2SQLite object, for accessing kanji dictionary
117+
>>> jam.jmnedict # jamdict.JMNEDictSQLite object, for accessing named-entities dictionary
118+
119+
You can perform database queries on each of these databases by obtaining
120+
a database cursor with ``ctx()`` function (i.e. database query context).
121+
122+
For example the following code list down all existing part-of-speeches
123+
in the database.
124+
125+
.. code:: python
126+
127+
# returns a list of sqlite3.Row object
128+
pos_rows = jam.jmdict.ctx().select("SELECT DISTINCT text FROM pos")
129+
130+
# access columns in each query row by name
131+
all_pos = [x['text'] for x in pos_rows]
132+
133+
# sort all POS
134+
all_pos.sort()
135+
for pos in all_pos:
136+
print(pos)
137+
138+
For more information, please see `Jamdict database schema </_static/jamdict_db_schema.png>`_.
139+
140+
Say we want to get all irregular suru verbs, we can start with finding
141+
all Sense IDs with pos = ``suru verb - irregular``, and then find all the
142+
Entry idseq connected to those Senses.
143+
144+
Words (and also named entities) can be retrieved directly using their ``idseq``.
145+
Each word may have many Senses (meaning) and each Sense may have different pos.
146+
147+
::
148+
149+
# Entry (idseq) --(has many)--> Sense --(has many)--> pos
150+
151+
.. note::
152+
Tips: Since we hit the database so many times (to find the IDs, to retrieve
153+
each word, etc.), we also should consider to reuse the database
154+
connection using database context to have better performance
155+
(``with jam.jmdict.ctx() as ctx:`` and ``ctx=ctx`` in the code below).
156+
157+
Here is the sample code:
158+
159+
.. code:: python
160+
161+
# find all idseq of lexical entry (i.e. words) that have at least 1 sense with pos = suru verb - irregular
162+
with jam.jmdict.ctx() as ctx:
163+
# query all word's idseqs
164+
rows = ctx.select(
165+
query="SELECT DISTINCT idseq FROM Sense WHERE ID IN (SELECT sid FROM pos WHERE text = ?) LIMIT 10000",
166+
params=("suru verb - irregular",))
167+
for row in rows:
168+
# reuse database connection with ctx=ctx for better performance
169+
word = jam.jmdict.get_entry(idseq=row['idseq'], ctx=ctx)
170+
print(word)

docs/requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
jamdict
2+
Sphinx

docs/updates.rst

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
.. _updates:
2+
3+
Updates
4+
=======
5+
6+
2021-04-19
7+
----------
8+
9+
- [Version 0.1a9]
10+
- Fix data audit query
11+
- Enhanced Jamdict() constructor. ``Jamdict('/path/to/jamdict.db')``
12+
works properly.
13+
- Code quality review
14+
- Automated documentation build via
15+
`readthedocs.org <https://jamdict.readthedocs.io/en/latest/>`__
16+
17+
.. _section-1:
18+
19+
2021-04-15
20+
----------
21+
22+
- Make ``lxml`` optional
23+
- Data package can be installed via PyPI with ``jamdict_data`` package
24+
- Make configuration file optional as data files can be installed via
25+
PyPI.
26+
27+
.. _section-2:
28+
29+
2020-05-31
30+
----------
31+
32+
- [Version 0.1a7]
33+
- Added Japanese Proper Names Dictionary (JMnedict) support
34+
- Included built-in KRADFILE/RADKFile support
35+
- Improved command line tools (json, compact mode, etc.)
36+
37+
.. _section-3:
38+
39+
2017-08-18
40+
----------
41+
42+
- Support KanjiDic2 (XML/SQLite formats)
43+
44+
.. _section-4:
45+
46+
2016-11-09
47+
----------
48+
49+
- Release first version to Github

jamdict/__version__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@
1010
__url__ = "https://github.com/neocl/jamdict"
1111
__maintainer__ = "Le Tuan Anh"
1212
__version_major__ = "0.1"
13-
__version__ = "{}a8".format(__version_major__)
13+
__version__ = "{}a9".format(__version_major__)
1414
__version_long__ = "{} - Alpha".format(__version_major__)
1515
__status__ = "Prototype"

0 commit comments

Comments
 (0)