# ResNet Phenology Classifier - Development Guide ## Development Setup ### Prerequisites - Python 3.11+ - CUDA-capable GPU (recommended) - 8GB+ RAM - Git ### Environment Setup 1. Clone the repository: ```bash git clone cd resnet ``` 2. Create virtual environment: ```bash python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate ``` 3. Install dependencies: ```bash pip install -r requirements.txt ``` 4. Install development dependencies: ```bash pip install black flake8 mypy pylint pytest-cov ``` ## Code Quality Standards ### PEP 8 Compliance All code must follow PEP 8 standards: ```bash flake8 src/ tests/ ``` ### Type Hints Use type hints for all functions: ```python def train_model(epochs: int, lr: float) -> dict: ... ``` ### Docstrings All modules, classes, and functions must have docstrings: ```python def function(arg: str) -> int: """ Brief description. Args: arg: Description Returns: Description """ pass ``` ## Testing ### Running Tests ```bash # All tests pytest tests/ -v # With coverage pytest tests/ --cov=src --cov-report=html # Specific markers pytest tests/ -m unit pytest tests/ -m integration pytest tests/ -m slow ``` ### Writing Tests - Unit tests for all utility functions - Integration tests for data pipelines - Model validation tests - Use fixtures for common setup ### Test Coverage - Minimum 80% code coverage - 100% coverage for critical paths ## Continuous Integration ### Pre-commit Checks Before committing: 1. Run linter: `flake8 src/ tests/` 2. Run type checker: `mypy src/` 3. Run tests: `pytest tests/ -v` 4. Check formatting: `black --check src/ tests/` ### CI Pipeline The CI/CD pipeline runs: 1. Linting (flake8, pylint) 2. Type checking (mypy) 3. Unit tests 4. Integration tests 5. Coverage report ## Model Development ### Training Best Practices 1. Always set random seed 2. Use validation set for hyperparameter tuning 3. Save checkpoints regularly 4. Monitor training metrics 5. Use early stopping ### Evaluation - Evaluate on independent test set - Report multiple metrics (accuracy, recall, F1) - Analyze confusion matrix - Check for bias ### Versioning - Version models with timestamp - Track hyperparameters - Save class mappings - Document training data ## Git Workflow ### Branching Strategy - `master`: Production-ready code - `1-phenology-classifier`: Feature branch - Feature branches for new capabilities ### Commit Messages Follow conventional commits: ``` feat: add confusion matrix visualization fix: correct data loader split logic docs: update README with API examples test: add unit tests for inference ``` ## Performance Optimization ### Training - Use mixed precision training - Optimize data loading (num_workers) - Use GPU if available - Batch size tuning ### Inference - Model quantization - Batch predictions - Cache loaded models - Optimize image preprocessing ## Troubleshooting ### Common Issues **CUDA out of memory:** - Reduce batch size - Use gradient accumulation - Clear cache: `torch.cuda.empty_cache()` **Slow data loading:** - Increase num_workers - Use SSD for dataset - Preprocess images offline **Poor accuracy:** - Check data quality - Increase training epochs - Try different learning rates - Use data augmentation ## Documentation ### Code Documentation - Docstrings for all public APIs - Inline comments for complex logic - Type hints throughout ### Project Documentation - Update README for new features - Document API changes - Maintain changelog ## Release Process 1. Update version number 2. Run full test suite 3. Build documentation 4. Create release notes 5. Tag release in git 6. Deploy to production ## Contact For questions or issues, refer to the project specifications in `specs/1-phenology-classifier/`.