3.8 KiB
3.8 KiB
ResNet Phenology Classifier - Development Guide
Development Setup
Prerequisites
- Python 3.11+
- CUDA-capable GPU (recommended)
- 8GB+ RAM
- Git
Environment Setup
- Clone the repository:
git clone <repository_url>
cd resnet
- Create virtual environment:
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Install development dependencies:
pip install black flake8 mypy pylint pytest-cov
Code Quality Standards
PEP 8 Compliance
All code must follow PEP 8 standards:
flake8 src/ tests/
Type Hints
Use type hints for all functions:
def train_model(epochs: int, lr: float) -> dict:
...
Docstrings
All modules, classes, and functions must have docstrings:
def function(arg: str) -> int:
"""
Brief description.
Args:
arg: Description
Returns:
Description
"""
pass
Testing
Running Tests
# All tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=src --cov-report=html
# Specific markers
pytest tests/ -m unit
pytest tests/ -m integration
pytest tests/ -m slow
Writing Tests
- Unit tests for all utility functions
- Integration tests for data pipelines
- Model validation tests
- Use fixtures for common setup
Test Coverage
- Minimum 80% code coverage
- 100% coverage for critical paths
Continuous Integration
Pre-commit Checks
Before committing:
- Run linter:
flake8 src/ tests/ - Run type checker:
mypy src/ - Run tests:
pytest tests/ -v - Check formatting:
black --check src/ tests/
CI Pipeline
The CI/CD pipeline runs:
- Linting (flake8, pylint)
- Type checking (mypy)
- Unit tests
- Integration tests
- Coverage report
Model Development
Training Best Practices
- Always set random seed
- Use validation set for hyperparameter tuning
- Save checkpoints regularly
- Monitor training metrics
- Use early stopping
Evaluation
- Evaluate on independent test set
- Report multiple metrics (accuracy, recall, F1)
- Analyze confusion matrix
- Check for bias
Versioning
- Version models with timestamp
- Track hyperparameters
- Save class mappings
- Document training data
Git Workflow
Branching Strategy
master: Production-ready code1-phenology-classifier: Feature branch- Feature branches for new capabilities
Commit Messages
Follow conventional commits:
feat: add confusion matrix visualization
fix: correct data loader split logic
docs: update README with API examples
test: add unit tests for inference
Performance Optimization
Training
- Use mixed precision training
- Optimize data loading (num_workers)
- Use GPU if available
- Batch size tuning
Inference
- Model quantization
- Batch predictions
- Cache loaded models
- Optimize image preprocessing
Troubleshooting
Common Issues
CUDA out of memory:
- Reduce batch size
- Use gradient accumulation
- Clear cache:
torch.cuda.empty_cache()
Slow data loading:
- Increase num_workers
- Use SSD for dataset
- Preprocess images offline
Poor accuracy:
- Check data quality
- Increase training epochs
- Try different learning rates
- Use data augmentation
Documentation
Code Documentation
- Docstrings for all public APIs
- Inline comments for complex logic
- Type hints throughout
Project Documentation
- Update README for new features
- Document API changes
- Maintain changelog
Release Process
- Update version number
- Run full test suite
- Build documentation
- Create release notes
- Tag release in git
- Deploy to production
Contact
For questions or issues, refer to the project specifications in specs/1-phenology-classifier/.