Docs

file sync

File Sync Tool

A Python-based file synchronization tool that monitors local directories and syncs changes to cloud storage (AWS S3 or local backup).

Features

  • •Real-time monitoring: Uses watchdog to detect file changes instantly
  • •Multiple backends: Support for S3, local backup, and extensible for other cloud providers
  • •Conflict resolution: Smart handling of sync conflicts
  • •Incremental sync: Only syncs changed files
  • •Ignore patterns: Configurable file/folder exclusions
  • •Compression: Optional compression for bandwidth savings
  • •Encryption: Optional AES encryption for sensitive files
  • •Resume support: Handles interrupted syncs gracefully
  • •Detailed logging: Track all sync operations

Project Structure

05_file_sync/
ā”œā”€ā”€ README.md
ā”œā”€ā”€ requirements.txt
ā”œā”€ā”€ .env.example
ā”œā”€ā”€ sync/
│   ā”œā”€ā”€ __init__.py
│   ā”œā”€ā”€ __main__.py
│   ā”œā”€ā”€ config.py
│   ā”œā”€ā”€ watcher.py
│   ā”œā”€ā”€ sync_manager.py
│   ā”œā”€ā”€ backends/
│   │   ā”œā”€ā”€ __init__.py
│   │   ā”œā”€ā”€ base.py
│   │   ā”œā”€ā”€ s3.py
│   │   └── local.py
│   ā”œā”€ā”€ utils/
│   │   ā”œā”€ā”€ __init__.py
│   │   ā”œā”€ā”€ hashing.py
│   │   ā”œā”€ā”€ compression.py
│   │   └── encryption.py
│   └── state.py
└── tests/
    ā”œā”€ā”€ __init__.py
    ā”œā”€ā”€ conftest.py
    └── test_sync.py

Learning Concepts

Core Python Skills

  • •Async I/O: Using asyncio for non-blocking operations
  • •File system watching: watchdog library for monitoring changes
  • •Abstract base classes: Backend interface design
  • •Context managers: Resource management for files and connections
  • •Generators: Streaming large file operations

Advanced Topics

  • •AWS SDK (boto3): Cloud storage integration
  • •Threading: Background sync workers
  • •Hashing: MD5/SHA256 for change detection
  • •Compression: zlib/gzip for file compression
  • •Encryption: AES encryption with cryptography library

Design Patterns

  • •Strategy pattern: Swappable storage backends
  • •Observer pattern: File change notifications
  • •Singleton: Configuration management
  • •Factory pattern: Backend creation

Installation

cd 05_file_sync
pip install -r requirements.txt

Configuration

  1. •Copy .env.example to .env:
cp .env.example .env
  1. •Configure your settings:
# Watch directory
WATCH_DIR=/path/to/sync

# Backend type: s3, local
BACKEND_TYPE=local

# Local backup settings
BACKUP_DIR=/path/to/backup

# S3 settings (if using S3)
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_REGION=us-east-1
S3_BUCKET=your-bucket

# Optional settings
COMPRESSION_ENABLED=true
ENCRYPTION_ENABLED=false
ENCRYPTION_KEY=your-32-byte-key-here

# Ignore patterns (comma-separated)
IGNORE_PATTERNS=.git,*.pyc,__pycache__,*.tmp

Usage

Basic Usage

# Start file sync daemon
python -m sync

# Or using the entry point
file-sync start

# Initial full sync
file-sync sync --full

# Sync specific directory
file-sync sync --path /path/to/folder

Command Line Options

# Show sync status
file-sync status

# List pending changes
file-sync list

# Force re-sync all files
file-sync sync --force

# Dry run (show what would sync)
file-sync sync --dry-run

# Watch mode (continuous monitoring)
file-sync watch

# Restore files from backup
file-sync restore --date 2024-01-15

Programmatic Usage

from sync import SyncManager, S3Backend, LocalBackend

# Create sync manager with S3
manager = SyncManager(
    source_dir="/path/to/watch",
    backend=S3Backend(bucket="my-bucket", region="us-east-1")
)

# Start watching
await manager.start()

# Manual sync
await manager.sync_all()

# Stop watching
await manager.stop()

Architecture

Sync Flow

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  File Watcher   │ ─── Detects changes
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
         │
         ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  Event Queue    │ ─── Debounces rapid changes
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
         │
         ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  Sync Manager   │ ─── Coordinates sync operations
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
         │
    ā”Œā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”
    ā–¼         ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  S3   │ │ Local │ ─── Storage backends
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

State Management

The sync tool maintains a state file (.sync_state.json) that tracks:

  • •File checksums for change detection
  • •Last sync timestamps
  • •Pending uploads/downloads
  • •Conflict information

Conflict Resolution

When conflicts are detected (file changed both locally and remotely):

  1. •Keep Local: Local file overwrites remote
  2. •Keep Remote: Remote file overwrites local
  3. •Keep Both: Creates .conflict backup
  4. •Manual: Prompts user for decision
# Configure conflict resolution
manager = SyncManager(
    source_dir="/path/to/watch",
    conflict_resolution="keep_both"  # keep_local, keep_remote, keep_both, manual
)

Security

Encryption

Files can be encrypted before upload:

from sync.utils.encryption import FileEncryptor

encryptor = FileEncryptor(key="your-32-byte-key")

# Encrypt file
encrypted_path = encryptor.encrypt_file("secret.txt")

# Decrypt file
decrypted_path = encryptor.decrypt_file("secret.txt.enc")

Credential Management

  • •Never commit .env files
  • •Use AWS IAM roles in production
  • •Rotate encryption keys regularly

Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=sync

# Run specific test file
pytest tests/test_sync.py -v

Exercises

  1. •Add Google Drive backend: Implement a new backend for Google Drive
  2. •Add file versioning: Keep multiple versions of synced files
  3. •Add bandwidth limiting: Implement rate limiting for uploads
  4. •Add sync scheduling: Add cron-like scheduling for syncs
  5. •Add progress bars: Show real-time sync progress

Troubleshooting

Common Issues

"Permission denied" errors

# Check directory permissions
chmod 755 /path/to/watch

S3 connection issues

# Verify AWS credentials
aws sts get-caller-identity

High memory usage

  • •Enable streaming for large files
  • •Reduce concurrent sync limit
  • •Increase debounce timeout

License

MIT License - Educational use

File Sync - Python Tutorial | DeepML