204 lines
3.8 KiB
Markdown
204 lines
3.8 KiB
Markdown
# ResNet Phenology Classifier - Development Guide
|
|
|
|
## Development Setup
|
|
|
|
### Prerequisites
|
|
- Python 3.11+
|
|
- CUDA-capable GPU (recommended)
|
|
- 8GB+ RAM
|
|
- Git
|
|
|
|
### Environment Setup
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone <repository_url>
|
|
cd resnet
|
|
```
|
|
|
|
2. Create virtual environment:
|
|
```bash
|
|
python -m venv venv
|
|
source venv/bin/activate # Windows: venv\Scripts\activate
|
|
```
|
|
|
|
3. Install dependencies:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
4. Install development dependencies:
|
|
```bash
|
|
pip install black flake8 mypy pylint pytest-cov
|
|
```
|
|
|
|
## Code Quality Standards
|
|
|
|
### PEP 8 Compliance
|
|
All code must follow PEP 8 standards:
|
|
```bash
|
|
flake8 src/ tests/
|
|
```
|
|
|
|
### Type Hints
|
|
Use type hints for all functions:
|
|
```python
|
|
def train_model(epochs: int, lr: float) -> dict:
|
|
...
|
|
```
|
|
|
|
### Docstrings
|
|
All modules, classes, and functions must have docstrings:
|
|
```python
|
|
def function(arg: str) -> int:
|
|
"""
|
|
Brief description.
|
|
|
|
Args:
|
|
arg: Description
|
|
|
|
Returns:
|
|
Description
|
|
"""
|
|
pass
|
|
```
|
|
|
|
## Testing
|
|
|
|
### Running Tests
|
|
```bash
|
|
# All tests
|
|
pytest tests/ -v
|
|
|
|
# With coverage
|
|
pytest tests/ --cov=src --cov-report=html
|
|
|
|
# Specific markers
|
|
pytest tests/ -m unit
|
|
pytest tests/ -m integration
|
|
pytest tests/ -m slow
|
|
```
|
|
|
|
### Writing Tests
|
|
- Unit tests for all utility functions
|
|
- Integration tests for data pipelines
|
|
- Model validation tests
|
|
- Use fixtures for common setup
|
|
|
|
### Test Coverage
|
|
- Minimum 80% code coverage
|
|
- 100% coverage for critical paths
|
|
|
|
## Continuous Integration
|
|
|
|
### Pre-commit Checks
|
|
Before committing:
|
|
1. Run linter: `flake8 src/ tests/`
|
|
2. Run type checker: `mypy src/`
|
|
3. Run tests: `pytest tests/ -v`
|
|
4. Check formatting: `black --check src/ tests/`
|
|
|
|
### CI Pipeline
|
|
The CI/CD pipeline runs:
|
|
1. Linting (flake8, pylint)
|
|
2. Type checking (mypy)
|
|
3. Unit tests
|
|
4. Integration tests
|
|
5. Coverage report
|
|
|
|
## Model Development
|
|
|
|
### Training Best Practices
|
|
1. Always set random seed
|
|
2. Use validation set for hyperparameter tuning
|
|
3. Save checkpoints regularly
|
|
4. Monitor training metrics
|
|
5. Use early stopping
|
|
|
|
### Evaluation
|
|
- Evaluate on independent test set
|
|
- Report multiple metrics (accuracy, recall, F1)
|
|
- Analyze confusion matrix
|
|
- Check for bias
|
|
|
|
### Versioning
|
|
- Version models with timestamp
|
|
- Track hyperparameters
|
|
- Save class mappings
|
|
- Document training data
|
|
|
|
## Git Workflow
|
|
|
|
### Branching Strategy
|
|
- `master`: Production-ready code
|
|
- `1-phenology-classifier`: Feature branch
|
|
- Feature branches for new capabilities
|
|
|
|
### Commit Messages
|
|
Follow conventional commits:
|
|
```
|
|
feat: add confusion matrix visualization
|
|
fix: correct data loader split logic
|
|
docs: update README with API examples
|
|
test: add unit tests for inference
|
|
```
|
|
|
|
## Performance Optimization
|
|
|
|
### Training
|
|
- Use mixed precision training
|
|
- Optimize data loading (num_workers)
|
|
- Use GPU if available
|
|
- Batch size tuning
|
|
|
|
### Inference
|
|
- Model quantization
|
|
- Batch predictions
|
|
- Cache loaded models
|
|
- Optimize image preprocessing
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**CUDA out of memory:**
|
|
- Reduce batch size
|
|
- Use gradient accumulation
|
|
- Clear cache: `torch.cuda.empty_cache()`
|
|
|
|
**Slow data loading:**
|
|
- Increase num_workers
|
|
- Use SSD for dataset
|
|
- Preprocess images offline
|
|
|
|
**Poor accuracy:**
|
|
- Check data quality
|
|
- Increase training epochs
|
|
- Try different learning rates
|
|
- Use data augmentation
|
|
|
|
## Documentation
|
|
|
|
### Code Documentation
|
|
- Docstrings for all public APIs
|
|
- Inline comments for complex logic
|
|
- Type hints throughout
|
|
|
|
### Project Documentation
|
|
- Update README for new features
|
|
- Document API changes
|
|
- Maintain changelog
|
|
|
|
## Release Process
|
|
|
|
1. Update version number
|
|
2. Run full test suite
|
|
3. Build documentation
|
|
4. Create release notes
|
|
5. Tag release in git
|
|
6. Deploy to production
|
|
|
|
## Contact
|
|
|
|
For questions or issues, refer to the project specifications in `specs/1-phenology-classifier/`.
|