Phenology/Code/Supervised_learning/resnet/CONTRIBUTING.md
2025-11-06 14:16:49 +01:00

3.8 KiB

ResNet Phenology Classifier - Development Guide

Development Setup

Prerequisites

  • Python 3.11+
  • CUDA-capable GPU (recommended)
  • 8GB+ RAM
  • Git

Environment Setup

  1. Clone the repository:
git clone <repository_url>
cd resnet
  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Install development dependencies:
pip install black flake8 mypy pylint pytest-cov

Code Quality Standards

PEP 8 Compliance

All code must follow PEP 8 standards:

flake8 src/ tests/

Type Hints

Use type hints for all functions:

def train_model(epochs: int, lr: float) -> dict:
    ...

Docstrings

All modules, classes, and functions must have docstrings:

def function(arg: str) -> int:
    """
    Brief description.
    
    Args:
        arg: Description
        
    Returns:
        Description
    """
    pass

Testing

Running Tests

# All tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=src --cov-report=html

# Specific markers
pytest tests/ -m unit
pytest tests/ -m integration
pytest tests/ -m slow

Writing Tests

  • Unit tests for all utility functions
  • Integration tests for data pipelines
  • Model validation tests
  • Use fixtures for common setup

Test Coverage

  • Minimum 80% code coverage
  • 100% coverage for critical paths

Continuous Integration

Pre-commit Checks

Before committing:

  1. Run linter: flake8 src/ tests/
  2. Run type checker: mypy src/
  3. Run tests: pytest tests/ -v
  4. Check formatting: black --check src/ tests/

CI Pipeline

The CI/CD pipeline runs:

  1. Linting (flake8, pylint)
  2. Type checking (mypy)
  3. Unit tests
  4. Integration tests
  5. Coverage report

Model Development

Training Best Practices

  1. Always set random seed
  2. Use validation set for hyperparameter tuning
  3. Save checkpoints regularly
  4. Monitor training metrics
  5. Use early stopping

Evaluation

  • Evaluate on independent test set
  • Report multiple metrics (accuracy, recall, F1)
  • Analyze confusion matrix
  • Check for bias

Versioning

  • Version models with timestamp
  • Track hyperparameters
  • Save class mappings
  • Document training data

Git Workflow

Branching Strategy

  • master: Production-ready code
  • 1-phenology-classifier: Feature branch
  • Feature branches for new capabilities

Commit Messages

Follow conventional commits:

feat: add confusion matrix visualization
fix: correct data loader split logic
docs: update README with API examples
test: add unit tests for inference

Performance Optimization

Training

  • Use mixed precision training
  • Optimize data loading (num_workers)
  • Use GPU if available
  • Batch size tuning

Inference

  • Model quantization
  • Batch predictions
  • Cache loaded models
  • Optimize image preprocessing

Troubleshooting

Common Issues

CUDA out of memory:

  • Reduce batch size
  • Use gradient accumulation
  • Clear cache: torch.cuda.empty_cache()

Slow data loading:

  • Increase num_workers
  • Use SSD for dataset
  • Preprocess images offline

Poor accuracy:

  • Check data quality
  • Increase training epochs
  • Try different learning rates
  • Use data augmentation

Documentation

Code Documentation

  • Docstrings for all public APIs
  • Inline comments for complex logic
  • Type hints throughout

Project Documentation

  • Update README for new features
  • Document API changes
  • Maintain changelog

Release Process

  1. Update version number
  2. Run full test suite
  3. Build documentation
  4. Create release notes
  5. Tag release in git
  6. Deploy to production

Contact

For questions or issues, refer to the project specifications in specs/1-phenology-classifier/.