Skip to content

Commit c68c6a0

Browse files
committed
Refactor
1 parent dec8ec8 commit c68c6a0

File tree

3 files changed

+8
-4
lines changed

3 files changed

+8
-4
lines changed

db.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@
33
Our use case is extremely simple.
44
- We want an identifier for a file.
55
- And need to associate the extracted content from this file and map it with the identifier.
6+
Thus SET operation suffices.
7+
8+
Incrementally, we might want to persist multiple granular information associated with a file.
9+
HMSET will suffice in such cases.
610
711
Hence, a Key-Value data store suffices.
812
We do not want to get into the complexity of setting up a Relational Database with and manage the schema nor deal with a Document database which has more administrative overhead.
@@ -55,9 +59,9 @@ def set_value(key: str, value: str):
5559
def set_object(key: str, field: str, value: str):
5660
"""
5761
`key` is the file identifer. In our case a hash created using the file path.
58-
`field` is the field name. The possible values are `type`,`raw_image_content` and `processed_image_content`.
62+
`field` is the field name. The possible values are `type` and `content`.
5963
`value` is the extracted text content after performing the Optical Character Recognition.
60-
HMSET <file_hash> raw_image_content <raw_image_content> processed_image_content <processed_image_content>
64+
HMSET <file_hash> type image content <processed_image_content>
6165
"""
6266
connection = get_connection()
6367
value = value.encode('utf-8')

main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ def textract_ocr(attachment: UploadFile):
181181
path_hash = hashlib.sha256(output_filename.encode('utf-8')).hexdigest()
182182
set_object(key=path_hash, field="type", value="pdf")
183183
# Add it to a queue.
184-
enqueue_extraction(extraction_function=detect_text_and_set_db, file_path=output_filename, key=path_hash, field="content")
184+
enqueue_extraction(extraction_function=detect_text_and_set_db, file_path=output_filename, key=path_hash)
185185
BASE_URL = os.environ.get("BASE_URL", "http://localhost:8000")
186186
link = f"{BASE_URL}/ocr-result/{path_hash}"
187187
return {"link": link}

textract_wrapper.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from db import set_object
44

55

6-
def detect_text_and_set_db(file_path: str, key: str, field: str, options=None):
6+
def detect_text_and_set_db(file_path: str, key: str, field: str = 'content'):
77
is_success, content = detect_text(file_path)
88
if is_success is True:
99
set_object(key, field, content)

0 commit comments

Comments
 (0)