# ProfileMatching Quick Start Guide ## ๐Ÿš€ Fastest Way to Get Started ### Option 1: Interactive Menu (RECOMMENDED) ```bash cd /Users/andrewjiang/Bao/TimeToLockIn/Profile/UnifiedContacts/ProfileMatching python3.10 main.py ``` This gives you an interactive menu with: - Real-time statistics - Guided workflow - Easy access to all features ### Option 2: Launch from Main UnifiedContacts Menu ```bash cd /Users/andrewjiang/Bao/TimeToLockIn/Profile/UnifiedContacts python3.10 main.py # Select option 15: "Open Profile Matching System" ``` ## ๐Ÿ“‹ Typical Workflow ### Step 1: Find Candidates (First Time) If you haven't found candidates yet, run: ```bash # From ProfileMatching folder python3.10 find_twitter_candidates_threaded.py --workers 8 # Or use interactive menu: Option 1 ``` **Expected time**: ~16-18 hours for all 43K contacts ### Step 2: Verify with LLM After candidates are found, verify them: ```bash # From ProfileMatching folder python3.10 verify_twitter_matches_v2.py --verbose --concurrent 100 # Or use interactive menu: Option 3 ``` **Expected time**: ~23 hours for all users **Cost**: ~$130 for all users (GPT-5-mini at $0.003/user) ### Step 3: Review Results ```bash # From ProfileMatching folder python3.10 review_match_quality.py # Or use interactive menu: Option 5 ``` ## ๐Ÿงช Test Mode (Recommended Before Full Run) Always test with a small batch first: ```bash # Test with 50 users python3.10 verify_twitter_matches_v2.py --test --limit 50 --verbose --concurrent 10 # Or use interactive menu: Option 4 ``` This helps you: - Verify the system is working correctly - Check match quality before spending on full run - Estimate costs and timing ## ๐Ÿ“Š Check Current Status At any time, you can check where you're at: ```bash # Launch interactive menu and select Option 6: "Show statistics only" python3.10 main.py # Press 6, then 0 to exit ``` Or query directly: ```bash psql -d telegram_contacts -U andrewjiang -c " SELECT COUNT(DISTINCT telegram_user_id) as users_with_candidates, COUNT(*) as total_candidates, COUNT(*) FILTER (WHERE llm_processed = TRUE) as processed, COUNT(*) FILTER (WHERE llm_processed = FALSE) as pending FROM twitter_match_candidates; " ``` ## ๐Ÿ”„ Re-Running After Updates If you've updated the LLM prompt or matching logic: ### Re-find Candidates (if matching logic changed) ```bash # Delete old candidates psql -d telegram_contacts -U andrewjiang -c "TRUNCATE twitter_match_candidates CASCADE;" # Re-run candidate finding python3.10 find_twitter_candidates_threaded.py --workers 8 ``` ### Re-verify with New Prompt (if only prompt changed) ```bash # Reset LLM processing flag psql -d telegram_contacts -U andrewjiang -c "UPDATE twitter_match_candidates SET llm_processed = FALSE;" # Delete old matches psql -d telegram_contacts -U andrewjiang -c "TRUNCATE twitter_telegram_matches;" # Re-run verification python3.10 verify_twitter_matches_v2.py --verbose --concurrent 100 ``` ## ๐ŸŽฏ Most Common Commands ### Find candidates for first 1000 contacts (testing) ```bash python3.10 find_twitter_candidates_threaded.py --limit 1000 --workers 8 ``` ### Verify matches for pending candidates ```bash python3.10 verify_twitter_matches_v2.py --verbose --concurrent 100 ``` ### Check match quality distribution ```bash python3.10 review_match_quality.py ``` ### Export matches to CSV (coming soon) ```bash # Will be added in future update ``` ## ๐Ÿ’ก Pro Tips 1. **Always use threaded candidate finding** - It's 10-20x faster 2. **Use high concurrency for verification** - 100-200 concurrent requests for optimal speed 3. **Test first** - Always run with `--test --limit 50` before full runs 4. **Monitor costs** - Check OpenAI dashboard during verification 5. **Check the stats** - Use Option 6 in interactive menu to monitor progress ## ๐Ÿ› Troubleshooting ### "No candidates found" - Check if Twitter database has data: `psql -d twitter_data -c "SELECT COUNT(*) FROM users;"` - Check Telegram contacts: `psql -d telegram_contacts -c "SELECT COUNT(*) FROM contacts;"` ### "LLM verification is slow" - Increase `--concurrent` parameter (try 150-200) - Check OpenAI rate limits in dashboard - Verify network connection ### "Too many low-quality matches" - Review the V6 prompt in `verify_twitter_matches_v2.py` - Run `review_match_quality.py` to analyze - Consider adjusting confidence thresholds ### "Missing obvious matches" - Check if candidate was found: ```sql SELECT * FROM twitter_match_candidates WHERE telegram_user_id = YOUR_USER_ID; ``` - If found but not verified, check `llm_verdict` field for reasoning - If not found at all, may need new matching method ## ๐Ÿ“š More Information - See `README.md` for complete documentation - See `CHANGELOG.md` for recent updates - See individual script files for command-line options ## ๐Ÿ†˜ Need Help? Common issues and solutions: | Issue | Solution | |-------|----------| | Import errors | Make sure you're using python3.10 | | Database connection errors | Check PostgreSQL is running: `pg_isready` | | OpenAI API errors | Verify API key in `.env` file | | Out of memory | Reduce concurrent requests or use batching | ## ๐ŸŽ“ Understanding the Output ### Candidate Finding Output ``` Processing contact 1000/43000 (2.3%) Found 6 candidates for @username โ€ข exact_username: @username (0.90) โ€ข fuzzy_name: @similar_name (0.75) ``` ### LLM Verification Output ``` [Progress] 500/1000 users (50.0%) | 125 matches | ~$1.50 | 25.0 users/min ``` ### Match Quality Review ``` Total users with matches: 25,662 Total matches: 36,147 Average confidence: 0.74 Confidence Distribution: 90%+: 12,031 matches (HIGH) 80-89%: 1,452 matches (MEDIUM) 70-79%: 7,505 matches (LOW) ```