Fix: Resolve Windows clone failure from invoice directory with trailing space #2071
+1,203
−4,608
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR deletes the

extracted_invoice_json
directory and removes the trailing space from the'extracted_invoice_json '
directory. This fixes an invalid path error when git cloning on Windows machines:Motivation
These changes are necessary to allow Window users to successfully clone the repo on their machine. This fixes 2 open issues (#1934, #1837), improving the quality of the repository's file structure and reducing confusion. Both directories contain the same filenames but the files themselves differ slightly due to the non-deterministic nature of LLMs. A typo must've been made when creating the cookbook and then re-run without the space, creating the two directories. Due to the differing files, we do an analysis to find the correct directory.
Correctness
The choice of which directory to delete is based on it's associated cookbook and notebook. Using the json showed at the end of part 1, we can determine the referenced filename using a simple search (leading us to
premierinn_GABCI19014325_extracted.json
). From there, we can determine which of the 2 directories is the one used for the cookbook. We check the filename in both directories and which ever one matches is the correct one, leading to the deletion of the non-matching directory.We also see that the correct directory name is specified in the notebook with
extracted_invoice_json_path = "./data/hotel_invoices/extracted_invoice_json"
. So we remove the trailing space from the remaining directory. This analysis can be found in this notebook.