ProfileMatching/QUICKSTART.md

# ProfileMatching Quick Start Guide

## 🚀 Fastest Way to Get Started

### Option 1: Interactive Menu (RECOMMENDED)
```bash
cd /Users/andrewjiang/Bao/TimeToLockIn/Profile/UnifiedContacts/ProfileMatching
python3.10 main.py
```

This gives you an interactive menu with:
- Real-time statistics
- Guided workflow
- Easy access to all features

### Option 2: Launch from Main UnifiedContacts Menu
```bash
cd /Users/andrewjiang/Bao/TimeToLockIn/Profile/UnifiedContacts
python3.10 main.py
# Select option 15: "Open Profile Matching System"
```

## 📋 Typical Workflow

### Step 1: Find Candidates (First Time)
If you haven't found candidates yet, run:

```bash
# From ProfileMatching folder
python3.10 find_twitter_candidates_threaded.py --workers 8

# Or use interactive menu: Option 1
```

**Expected time**: ~16-18 hours for all 43K contacts

### Step 2: Verify with LLM
After candidates are found, verify them:

```bash
# From ProfileMatching folder
python3.10 verify_twitter_matches_v2.py --verbose --concurrent 100

# Or use interactive menu: Option 3
```

**Expected time**: ~23 hours for all users
**Cost**: ~$130 for all users (GPT-5-mini at $0.003/user)

### Step 3: Review Results
```bash
# From ProfileMatching folder
python3.10 review_match_quality.py

# Or use interactive menu: Option 5
```

## 🧪 Test Mode (Recommended Before Full Run)

Always test with a small batch first:

```bash
# Test with 50 users
python3.10 verify_twitter_matches_v2.py --test --limit 50 --verbose --concurrent 10

# Or use interactive menu: Option 4
```

This helps you:
- Verify the system is working correctly
- Check match quality before spending on full run
- Estimate costs and timing

## 📊 Check Current Status

At any time, you can check where you're at:

```bash
# Launch interactive menu and select Option 6: "Show statistics only"
python3.10 main.py
# Press 6, then 0 to exit
```

Or query directly:

```bash
psql -d telegram_contacts -U andrewjiang -c "
SELECT
  COUNT(DISTINCT telegram_user_id) as users_with_candidates,
  COUNT(*) as total_candidates,
  COUNT(*) FILTER (WHERE llm_processed = TRUE) as processed,
  COUNT(*) FILTER (WHERE llm_processed = FALSE) as pending
FROM twitter_match_candidates;
"
```

## 🔄 Re-Running After Updates

If you've updated the LLM prompt or matching logic:

### Re-find Candidates (if matching logic changed)
```bash
# Delete old candidates
psql -d telegram_contacts -U andrewjiang -c "TRUNCATE twitter_match_candidates CASCADE;"

# Re-run candidate finding
python3.10 find_twitter_candidates_threaded.py --workers 8
```

### Re-verify with New Prompt (if only prompt changed)
```bash
# Reset LLM processing flag
psql -d telegram_contacts -U andrewjiang -c "UPDATE twitter_match_candidates SET llm_processed = FALSE;"

# Delete old matches
psql -d telegram_contacts -U andrewjiang -c "TRUNCATE twitter_telegram_matches;"

# Re-run verification
python3.10 verify_twitter_matches_v2.py --verbose --concurrent 100
```

## 🎯 Most Common Commands

### Find candidates for first 1000 contacts (testing)
```bash
python3.10 find_twitter_candidates_threaded.py --limit 1000 --workers 8
```

### Verify matches for pending candidates
```bash
python3.10 verify_twitter_matches_v2.py --verbose --concurrent 100
```

### Check match quality distribution
```bash
python3.10 review_match_quality.py
```

### Export matches to CSV (coming soon)
```bash
# Will be added in future update
```

## 💡 Pro Tips

1. **Always use threaded candidate finding** - It's 10-20x faster
2. **Use high concurrency for verification** - 100-200 concurrent requests for optimal speed
3. **Test first** - Always run with `--test --limit 50` before full runs
4. **Monitor costs** - Check OpenAI dashboard during verification
5. **Check the stats** - Use Option 6 in interactive menu to monitor progress

## 🐛 Troubleshooting

### "No candidates found"
- Check if Twitter database has data: `psql -d twitter_data -c "SELECT COUNT(*) FROM users;"`
- Check Telegram contacts: `psql -d telegram_contacts -c "SELECT COUNT(*) FROM contacts;"`

### "LLM verification is slow"
- Increase `--concurrent` parameter (try 150-200)
- Check OpenAI rate limits in dashboard
- Verify network connection

### "Too many low-quality matches"
- Review the V6 prompt in `verify_twitter_matches_v2.py`
- Run `review_match_quality.py` to analyze
- Consider adjusting confidence thresholds

### "Missing obvious matches"
- Check if candidate was found:
  ```sql
  SELECT * FROM twitter_match_candidates WHERE telegram_user_id = YOUR_USER_ID;
  ```
- If found but not verified, check `llm_verdict` field for reasoning
- If not found at all, may need new matching method

## 📚 More Information

- See `README.md` for complete documentation
- See `CHANGELOG.md` for recent updates
- See individual script files for command-line options

## 🆘 Need Help?

Common issues and solutions:

| Issue | Solution |
|-------|----------|
| Import errors | Make sure you're using python3.10 |
| Database connection errors | Check PostgreSQL is running: `pg_isready` |
| OpenAI API errors | Verify API key in `.env` file |
| Out of memory | Reduce concurrent requests or use batching |

## 🎓 Understanding the Output

### Candidate Finding Output
```
Processing contact 1000/43000 (2.3%)
Found 6 candidates for @username
  • exact_username: @username (0.90)
  • fuzzy_name: @similar_name (0.75)
```

### LLM Verification Output
```
[Progress] 500/1000 users (50.0%) | 125 matches | ~$1.50 | 25.0 users/min
```

### Match Quality Review
```
Total users with matches: 25,662
Total matches: 36,147
Average confidence: 0.74

Confidence Distribution:
  90%+: 12,031 matches (HIGH)
  80-89%: 1,452 matches (MEDIUM)
  70-79%: 7,505 matches (LOW)
```