Phenology/Code/Supervised_learning/resnet/CONTRIBUTING.md

# ResNet Phenology Classifier - Development Guide

## Development Setup

### Prerequisites
- Python 3.11+
- CUDA-capable GPU (recommended)
- 8GB+ RAM
- Git

### Environment Setup

1. Clone the repository:
```bash
git clone <repository_url>
cd resnet
```

2. Create virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

4. Install development dependencies:
```bash
pip install black flake8 mypy pylint pytest-cov
```

## Code Quality Standards

### PEP 8 Compliance
All code must follow PEP 8 standards:
```bash
flake8 src/ tests/
```

### Type Hints
Use type hints for all functions:
```python
def train_model(epochs: int, lr: float) -> dict:
    ...
```

### Docstrings
All modules, classes, and functions must have docstrings:
```python
def function(arg: str) -> int:
    """
    Brief description.

    Args:
        arg: Description

    Returns:
        Description
    """
    pass
```

## Testing

### Running Tests
```bash
# All tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=src --cov-report=html

# Specific markers
pytest tests/ -m unit
pytest tests/ -m integration
pytest tests/ -m slow
```

### Writing Tests
- Unit tests for all utility functions
- Integration tests for data pipelines
- Model validation tests
- Use fixtures for common setup

### Test Coverage
- Minimum 80% code coverage
- 100% coverage for critical paths

## Continuous Integration

### Pre-commit Checks
Before committing:
1. Run linter: `flake8 src/ tests/`
2. Run type checker: `mypy src/`
3. Run tests: `pytest tests/ -v`
4. Check formatting: `black --check src/ tests/`

### CI Pipeline
The CI/CD pipeline runs:
1. Linting (flake8, pylint)
2. Type checking (mypy)
3. Unit tests
4. Integration tests
5. Coverage report

## Model Development

### Training Best Practices
1. Always set random seed
2. Use validation set for hyperparameter tuning
3. Save checkpoints regularly
4. Monitor training metrics
5. Use early stopping

### Evaluation
- Evaluate on independent test set
- Report multiple metrics (accuracy, recall, F1)
- Analyze confusion matrix
- Check for bias

### Versioning
- Version models with timestamp
- Track hyperparameters
- Save class mappings
- Document training data

## Git Workflow

### Branching Strategy
- `master`: Production-ready code
- `1-phenology-classifier`: Feature branch
- Feature branches for new capabilities

### Commit Messages
Follow conventional commits:
```
feat: add confusion matrix visualization
fix: correct data loader split logic
docs: update README with API examples
test: add unit tests for inference
```

## Performance Optimization

### Training
- Use mixed precision training
- Optimize data loading (num_workers)
- Use GPU if available
- Batch size tuning

### Inference
- Model quantization
- Batch predictions
- Cache loaded models
- Optimize image preprocessing

## Troubleshooting

### Common Issues

**CUDA out of memory:**
- Reduce batch size
- Use gradient accumulation
- Clear cache: `torch.cuda.empty_cache()`

**Slow data loading:**
- Increase num_workers
- Use SSD for dataset
- Preprocess images offline

**Poor accuracy:**
- Check data quality
- Increase training epochs
- Try different learning rates
- Use data augmentation

## Documentation

### Code Documentation
- Docstrings for all public APIs
- Inline comments for complex logic
- Type hints throughout

### Project Documentation
- Update README for new features
- Document API changes
- Maintain changelog

## Release Process

1. Update version number
2. Run full test suite
3. Build documentation
4. Create release notes
5. Tag release in git
6. Deploy to production

## Contact

For questions or issues, refer to the project specifications in `specs/1-phenology-classifier/`.