|
1 | 1 | INSTRUCTIONS:
|
2 | 2 |
|
| 3 | +... |
| 4 | + |
| 5 | +NOTE: AYAH NUMBERS MUST BE CORRECT AND SEQUENTIAL (NOT REVERSED NUMBERS) IN ORDER FOR THIS SCRIPT TO WORK ON THE HTML |
| 6 | + |
| 7 | +to correct reversed ayah numbers (if any): |
| 8 | + |
| 9 | +(TO SET IT CORRECTLY IN INDESIGN BEFORE FORMATTING THE DOCUMENT AT THE START OF LAYOUTING) |
| 10 | +character direction > default |
| 11 | + |
| 12 | +(IF DOC IS ALREADY FORMMATTED AND LAYOUTED, THEN FIND AND REPLACE) |
| 13 | +find > grep > |
| 14 | +(\d)(\d)(\d)(\d) |
| 15 | +change to: |
| 16 | +$4$3$2$1 |
| 17 | +change format: |
| 18 | +(middle east character formats) character direction: default |
| 19 | +DESELECT FOOTNOTE TEXT (small square looking thing at the end of 5 icons above find format) |
| 20 | + |
| 21 | +Try that for 4 digits, 3 digits, 2 digits, etc. |
| 22 | +(\d)(\d)(\d)(\d) |
| 23 | +$4$3$2$1 |
| 24 | + |
| 25 | +(\d)(\d)(\d) |
| 26 | +$3$2$1 |
| 27 | + |
| 28 | +(\d)(\d) |
| 29 | +$2$1 |
| 30 | + |
| 31 | +... |
| 32 | + |
| 33 | +One issue is that when we clean up the <span> tags, some white space is getting removed. |
| 34 | +[ayah 24 of book 5] |
| 35 | +...I think we can pre process the document from InDesign side or using regex after HTML output. |
| 36 | + |
| 37 | +SOLUTION: |
| 38 | +that happens due to forced line breaks in the text. |
| 39 | +to fix, before exporting as html, |
| 40 | +grep replace all: |
| 41 | +\n |
| 42 | +with |
| 43 | +nothing |
| 44 | + |
| 45 | +... |
| 46 | + |
3 | 47 | File > Export > HTML
|
4 | 48 |
|
5 | 49 | Advanced > Dont include classes in HTML
|
6 | 50 |
|
7 | 51 | ...
|
8 | 52 |
|
9 |
| -Remove all the <span> tags |
| 53 | +Remove all the <span> tags in CODE EDITOR |
| 54 | + |
10 | 55 | <span.*?>(.+?)</span>
|
11 | 56 | Replace with
|
12 | 57 | $1
|
@@ -226,21 +271,31 @@ this is done in order to simplify the markup of the html prior to doing the rest
|
226 | 271 |
|
227 | 272 | ...
|
228 | 273 |
|
229 |
| -I'll add a preprocessing step to remove all <span> tags while preserving their content. We can do this using BeautifulSoup's unwrap() method, which is more reliable than regex for HTML processing. Here's how we'll modify the script: |
| 274 | +id like to use this script without this part |
| 275 | + |
| 276 | +current_surah_number = 4 # We know this is Surah An-Nisa |
| 277 | + current_surah_name = "النِّسَاء" |
| 278 | + |
| 279 | +i just dont want surah number or surah name column at actually |
| 280 | + |
| 281 | +... |
230 | 282 |
|
231 |
| -I've added a preprocessing step that: |
232 |
| -Uses BeautifulSoup's find_all('span') to find all <span> tags in the document |
233 |
| -Uses the unwrap() method to remove each span tag while preserving its content |
234 |
| -Does this before any other processing begins |
235 |
| -This approach is better than using regex because: |
236 |
| -It properly handles nested tags |
237 |
| -It correctly handles spans that contain other HTML elements |
238 |
| -It's more reliable for HTML processing than regex |
239 |
| -It won't break if there are special characters or formatting in the spans |
240 |
| -The script will now: |
241 |
| -Load the HTML file |
242 |
| -Remove all span tags while keeping their content |
243 |
| -Continue with the rest of the processing as before |
244 |
| -You can run the script again and it will process the HTML files with the spans removed. This should simplify the markup and make the rest of the processing more straightforward. |
| 283 | +Ah, I understand now. You want to completely remove the Surah information from the output. I'll modify the script to remove those fields from the data structure and output: |
| 284 | + |
| 285 | +I've made the following changes: |
| 286 | +Removed all surah-related variables and fields: |
| 287 | +Removed current_surah_number and current_surah_name |
| 288 | +Removed surah detection code |
| 289 | +Removed surah fields from the JSON output |
| 290 | +Removed surah headers from the Markdown output |
| 291 | +Simplified the data structure to only include: |
| 292 | +aayah_number |
| 293 | +tafseer |
| 294 | +footnotes |
| 295 | +Updated the output formats: |
| 296 | +JSON now only contains aayah number, tafseer text, and footnotes |
| 297 | +Markdown now starts directly with aayah sections, without surah headers |
| 298 | +Updated the processing messages to be more generic |
| 299 | +The script will now process the content without any surah information, focusing only on the aayah numbers, tafseer text, and footnotes. |
245 | 300 |
|
246 | 301 | ...
|
0 commit comments