Jihad Naaman Archive – Proposed Folder & Filename Standards ========================================================= High-level goals ---------------- 1. Keep publication source, authorship (By/About/Other), article title, and publication date visible in the path/filename. 2. Normalize names (lowercase slugs, consistent separators) so automation and search become easier. 3. Preserve original information (title spellings, partial dates, contextual notes) in a metadata table to avoid data loss. Folder hierarchy ---------------- 1. Introduce a single root (e.g., `Jihad_Naaman_Archive/`) with dedicated subtrees: • `sources/` → periodicals and news sites • `books/`, `university-docs/`, `personal/`, `events/`, etc., for non-periodical content (CVs, books, travel, ceremonies). 2. Inside `sources/`, standardize every publication folder to a canonical slug (lowercase, hyphenated). Maintain a mapping for variants (`ANNAHAR`, `Al Nahar`, `nahar al kitab` → `an-nahar`). 3. Within each publication folder, create uniform subfolders to organize authorship and special material: • `01_by/` – pieces written by Jihad • `02_about/` – coverage about him • `03_other/` – interviews, event flyers, uncategorized pieces (numeric prefixes keep alphabetical ordering consistent.) 4. For heavy sources (hundreds of files), add a year layer (`al-hayat/01_by/2006/…`) so directories stay manageable and chronological browsing is easier. 5. Grant the same structure to currently unstructured folders (`Books`, `University Documents`, `Voyage`, `MADRASATI`, `Nadwat al jouda`, `Nashid`, `Revue Sarba`, `AARIFOU KHASSATI`). If they are not publications, relocate them into the new thematic trees (e.g., travel → `events/travel`). 6. Eliminate the catch-all `OTHER` folder by moving each item to its proper source. Reserve `sources/other/` only for material whose origin truly cannot be identified. 7. Store a `metadata/` directory at the root containing a CSV/JSON log of all files with columns like `source`, `type`, `title_original`, `title_slug`, `date_exact`, `date_quality`, `notes`, `current_path`, etc. Filename convention ------------------- 1. Adopt a single template: `YYYY-MM-DD - SOURCE NAME - TYPE - TITLE[ (variant)].ext` • Example: `2006-02-18 - AL HAYAT - by - في ذكرى رحيل عاشق لبنان.jpg` (current file: `AL HAYAT/By Jihad/Fi zekra ra7il 3achek lubnan 18-02-2006.jpg`). 2. Always use ISO dates with zero padding. Convert existing `DD-MM-YYYY` tokens and standalone years; if only a year is known, use `YYYY-00-00` and store the precision in metadata. 3. Encode authorship/role via the `type` segment (`by`, `about`, `interview`, `event`, etc.) rather than repeating “By Jihad” in the title. 4. Rewrite each title segment in its original language/script (no transliteration), keeping spaces inside the segment and trimming only forbidden filesystem characters. 5. Preserve transliterations or alternate spellings in the metadata table for searchability. When necessary, append short language tags (` (ar)`, ` (fr)`) right before the extension. 6. For multi-file articles (scan + OCR + translation), append variants like ` (scan)`, ` (ocr)`, ` (translation)` to avoid duplicate titles. 7. Retain contextual info currently encoded in folder names (e.g., `mahrajan al che3r`) by adding descriptor suffixes such as ` (event - mahrajan al she3r)`. Data hygiene & process ---------------------- 1. During reorganization, build a script/log that records every rename/move (old path → new path) to maintain traceability. 2. Deduplicate identical files or near-identical names (several `nida' ila al shaeer saiid aqel 2002-08-16` variants exist) if the content is identical while keeping one canonical record plus references in metadata. 3. Use the metadata table to note partial or uncertain dates, language, issue number, and page; that metadata becomes the authoritative source once filenames are normalized. 4. Create documentation describing: • the canonical source slug list • the date normalization policy (priority order when multiple dates are present) • the transliteration guide for Arabic titles • instructions for adding new content so future files stay consistent.