Phenology/Code/Supervised_learning/resnet/CONTRIBUTING.md
2025-11-06 14:16:49 +01:00

204 lines
3.8 KiB
Markdown

# ResNet Phenology Classifier - Development Guide
## Development Setup
### Prerequisites
- Python 3.11+
- CUDA-capable GPU (recommended)
- 8GB+ RAM
- Git
### Environment Setup
1. Clone the repository:
```bash
git clone <repository_url>
cd resnet
```
2. Create virtual environment:
```bash
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Install development dependencies:
```bash
pip install black flake8 mypy pylint pytest-cov
```
## Code Quality Standards
### PEP 8 Compliance
All code must follow PEP 8 standards:
```bash
flake8 src/ tests/
```
### Type Hints
Use type hints for all functions:
```python
def train_model(epochs: int, lr: float) -> dict:
...
```
### Docstrings
All modules, classes, and functions must have docstrings:
```python
def function(arg: str) -> int:
"""
Brief description.
Args:
arg: Description
Returns:
Description
"""
pass
```
## Testing
### Running Tests
```bash
# All tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=src --cov-report=html
# Specific markers
pytest tests/ -m unit
pytest tests/ -m integration
pytest tests/ -m slow
```
### Writing Tests
- Unit tests for all utility functions
- Integration tests for data pipelines
- Model validation tests
- Use fixtures for common setup
### Test Coverage
- Minimum 80% code coverage
- 100% coverage for critical paths
## Continuous Integration
### Pre-commit Checks
Before committing:
1. Run linter: `flake8 src/ tests/`
2. Run type checker: `mypy src/`
3. Run tests: `pytest tests/ -v`
4. Check formatting: `black --check src/ tests/`
### CI Pipeline
The CI/CD pipeline runs:
1. Linting (flake8, pylint)
2. Type checking (mypy)
3. Unit tests
4. Integration tests
5. Coverage report
## Model Development
### Training Best Practices
1. Always set random seed
2. Use validation set for hyperparameter tuning
3. Save checkpoints regularly
4. Monitor training metrics
5. Use early stopping
### Evaluation
- Evaluate on independent test set
- Report multiple metrics (accuracy, recall, F1)
- Analyze confusion matrix
- Check for bias
### Versioning
- Version models with timestamp
- Track hyperparameters
- Save class mappings
- Document training data
## Git Workflow
### Branching Strategy
- `master`: Production-ready code
- `1-phenology-classifier`: Feature branch
- Feature branches for new capabilities
### Commit Messages
Follow conventional commits:
```
feat: add confusion matrix visualization
fix: correct data loader split logic
docs: update README with API examples
test: add unit tests for inference
```
## Performance Optimization
### Training
- Use mixed precision training
- Optimize data loading (num_workers)
- Use GPU if available
- Batch size tuning
### Inference
- Model quantization
- Batch predictions
- Cache loaded models
- Optimize image preprocessing
## Troubleshooting
### Common Issues
**CUDA out of memory:**
- Reduce batch size
- Use gradient accumulation
- Clear cache: `torch.cuda.empty_cache()`
**Slow data loading:**
- Increase num_workers
- Use SSD for dataset
- Preprocess images offline
**Poor accuracy:**
- Check data quality
- Increase training epochs
- Try different learning rates
- Use data augmentation
## Documentation
### Code Documentation
- Docstrings for all public APIs
- Inline comments for complex logic
- Type hints throughout
### Project Documentation
- Update README for new features
- Document API changes
- Maintain changelog
## Release Process
1. Update version number
2. Run full test suite
3. Build documentation
4. Create release notes
5. Tag release in git
6. Deploy to production
## Contact
For questions or issues, refer to the project specifications in `specs/1-phenology-classifier/`.