Docs
README
File Sync Tool
A Python-based file synchronization tool that monitors local directories and syncs changes to cloud storage (AWS S3 or local backup).
Features
- ā¢Real-time monitoring: Uses
watchdogto detect file changes instantly - ā¢Multiple backends: Support for S3, local backup, and extensible for other cloud providers
- ā¢Conflict resolution: Smart handling of sync conflicts
- ā¢Incremental sync: Only syncs changed files
- ā¢Ignore patterns: Configurable file/folder exclusions
- ā¢Compression: Optional compression for bandwidth savings
- ā¢Encryption: Optional AES encryption for sensitive files
- ā¢Resume support: Handles interrupted syncs gracefully
- ā¢Detailed logging: Track all sync operations
Project Structure
05_file_sync/
āāā README.md
āāā requirements.txt
āāā .env.example
āāā sync/
ā āāā __init__.py
ā āāā __main__.py
ā āāā config.py
ā āāā watcher.py
ā āāā sync_manager.py
ā āāā backends/
ā ā āāā __init__.py
ā ā āāā base.py
ā ā āāā s3.py
ā ā āāā local.py
ā āāā utils/
ā ā āāā __init__.py
ā ā āāā hashing.py
ā ā āāā compression.py
ā ā āāā encryption.py
ā āāā state.py
āāā tests/
āāā __init__.py
āāā conftest.py
āāā test_sync.py
Learning Concepts
Core Python Skills
- ā¢Async I/O: Using
asynciofor non-blocking operations - ā¢File system watching:
watchdoglibrary for monitoring changes - ā¢Abstract base classes: Backend interface design
- ā¢Context managers: Resource management for files and connections
- ā¢Generators: Streaming large file operations
Advanced Topics
- ā¢AWS SDK (boto3): Cloud storage integration
- ā¢Threading: Background sync workers
- ā¢Hashing: MD5/SHA256 for change detection
- ā¢Compression: zlib/gzip for file compression
- ā¢Encryption: AES encryption with cryptography library
Design Patterns
- ā¢Strategy pattern: Swappable storage backends
- ā¢Observer pattern: File change notifications
- ā¢Singleton: Configuration management
- ā¢Factory pattern: Backend creation
Installation
cd 05_file_sync
pip install -r requirements.txt
Configuration
- ā¢Copy
.env.exampleto.env:
cp .env.example .env
- ā¢Configure your settings:
# Watch directory
WATCH_DIR=/path/to/sync
# Backend type: s3, local
BACKEND_TYPE=local
# Local backup settings
BACKUP_DIR=/path/to/backup
# S3 settings (if using S3)
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_REGION=us-east-1
S3_BUCKET=your-bucket
# Optional settings
COMPRESSION_ENABLED=true
ENCRYPTION_ENABLED=false
ENCRYPTION_KEY=your-32-byte-key-here
# Ignore patterns (comma-separated)
IGNORE_PATTERNS=.git,*.pyc,__pycache__,*.tmp
Usage
Basic Usage
# Start file sync daemon
python -m sync
# Or using the entry point
file-sync start
# Initial full sync
file-sync sync --full
# Sync specific directory
file-sync sync --path /path/to/folder
Command Line Options
# Show sync status
file-sync status
# List pending changes
file-sync list
# Force re-sync all files
file-sync sync --force
# Dry run (show what would sync)
file-sync sync --dry-run
# Watch mode (continuous monitoring)
file-sync watch
# Restore files from backup
file-sync restore --date 2024-01-15
Programmatic Usage
from sync import SyncManager, S3Backend, LocalBackend
# Create sync manager with S3
manager = SyncManager(
source_dir="/path/to/watch",
backend=S3Backend(bucket="my-bucket", region="us-east-1")
)
# Start watching
await manager.start()
# Manual sync
await manager.sync_all()
# Stop watching
await manager.stop()
Architecture
Sync Flow
āāāāāāāāāāāāāāāāāāā
ā File Watcher ā āāā Detects changes
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Event Queue ā āāā Debounces rapid changes
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Sync Manager ā āāā Coordinates sync operations
āāāāāāāāāā¬āāāāāāāāā
ā
āāāāāā“āāāāā
ā¼ ā¼
āāāāāāāāā āāāāāāāāā
ā S3 ā ā Local ā āāā Storage backends
āāāāāāāāā āāāāāāāāā
State Management
The sync tool maintains a state file (.sync_state.json) that tracks:
- ā¢File checksums for change detection
- ā¢Last sync timestamps
- ā¢Pending uploads/downloads
- ā¢Conflict information
Conflict Resolution
When conflicts are detected (file changed both locally and remotely):
- ā¢Keep Local: Local file overwrites remote
- ā¢Keep Remote: Remote file overwrites local
- ā¢Keep Both: Creates
.conflictbackup - ā¢Manual: Prompts user for decision
# Configure conflict resolution
manager = SyncManager(
source_dir="/path/to/watch",
conflict_resolution="keep_both" # keep_local, keep_remote, keep_both, manual
)
Security
Encryption
Files can be encrypted before upload:
from sync.utils.encryption import FileEncryptor
encryptor = FileEncryptor(key="your-32-byte-key")
# Encrypt file
encrypted_path = encryptor.encrypt_file("secret.txt")
# Decrypt file
decrypted_path = encryptor.decrypt_file("secret.txt.enc")
Credential Management
- ā¢Never commit
.envfiles - ā¢Use AWS IAM roles in production
- ā¢Rotate encryption keys regularly
Testing
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=sync
# Run specific test file
pytest tests/test_sync.py -v
Exercises
- ā¢Add Google Drive backend: Implement a new backend for Google Drive
- ā¢Add file versioning: Keep multiple versions of synced files
- ā¢Add bandwidth limiting: Implement rate limiting for uploads
- ā¢Add sync scheduling: Add cron-like scheduling for syncs
- ā¢Add progress bars: Show real-time sync progress
Troubleshooting
Common Issues
"Permission denied" errors
# Check directory permissions
chmod 755 /path/to/watch
S3 connection issues
# Verify AWS credentials
aws sts get-caller-identity
High memory usage
- ā¢Enable streaming for large files
- ā¢Reduce concurrent sync limit
- ā¢Increase debounce timeout
License
MIT License - Educational use