Skip to content

Fix: Resolve Windows clone failure from invoice directory with trailing space #2071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AlbMej
Copy link

@AlbMej AlbMej commented Aug 18, 2025

Summary

This PR deletes the extracted_invoice_json directory and removes the trailing space from the 'extracted_invoice_json ' directory. This fixes an invalid path error when git cloning on Windows machines:
invalid_path_git_clone

Motivation

These changes are necessary to allow Window users to successfully clone the repo on their machine. This fixes 2 open issues (#1934, #1837), improving the quality of the repository's file structure and reducing confusion. Both directories contain the same filenames but the files themselves differ slightly due to the non-deterministic nature of LLMs. A typo must've been made when creating the cookbook and then re-run without the space, creating the two directories. Due to the differing files, we do an analysis to find the correct directory.

Correctness

The choice of which directory to delete is based on it's associated cookbook and notebook. Using the json showed at the end of part 1, we can determine the referenced filename using a simple search (leading us to premierinn_GABCI19014325_extracted.json). From there, we can determine which of the 2 directories is the one used for the cookbook. We check the filename in both directories and which ever one matches is the correct one, leading to the deletion of the non-matching directory.

We also see that the correct directory name is specified in the notebook with extracted_invoice_json_path = "./data/hotel_invoices/extracted_invoice_json". So we remove the trailing space from the remaining directory. This analysis can be found in this notebook.

@AlbMej AlbMej force-pushed the main branch 3 times, most recently from 3f59370 to 2a0b273 Compare August 20, 2025 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant