Phenology/Code/Supervised_learning/resnet/README.md

# ResNet Phenology Classifier

A deep learning model for classifying plant images by phenological phase using ResNet50 architecture.

## Features

- **Training**: Train ResNet50 model on labeled plant images
- **Evaluation**: Comprehensive metrics including accuracy, recall, macro-F1, and confusion matrix
- **Inference**: Classify new images with visual output
- **API**: REST API for batch classification
- **Reproducibility**: Random seed management and versioning

## Dataset

The model uses the dataset located at:
```
C:\Users\sof12\Desktop\ML\Datasets\Nocciola\GBIF
```

Dataset split: 70% training, 15% validation, 15% testing

## Installation

1. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

## Usage

### Training

Train the model on your dataset:

```bash
python src/train.py \
    --data_dir C:\Users\sof12\Desktop\ML\Datasets\Nocciola\GBIF \
    --csv_file C:\Users\sof12\Desktop\ML\Datasets\Nocciola\GBIF\labels.csv \
    --output_dir models \
    --epochs 50 \
    --batch_size 32 \
    --lr 0.001
```

**Arguments:**
- `--data_dir`: Directory containing images
- `--csv_file`: Path to CSV file with columns: `image_path`, `phase`
- `--output_dir`: Directory to save trained models (default: `models`)
- `--epochs`: Number of training epochs (default: 50)
- `--batch_size`: Batch size (default: 32)
- `--lr`: Learning rate (default: 0.001)
- `--num_workers`: Data loader workers (default: 4)
- `--seed`: Random seed for reproducibility (default: 42)
- `--patience`: Early stopping patience (default: 10)

### Evaluation

Evaluate a trained model:

```bash
python src/evaluate.py \
    --model_path models/best_model.pth \
    --data_dir C:\Users\sof12\Desktop\ML\Datasets\Nocciola\GBIF \
    --csv_file C:\Users\sof12\Desktop\ML\Datasets\Nocciola\GBIF\labels.csv \
    --class_mapping models/class_mapping.json \
    --output_dir evaluation \
    --split test
```

**Arguments:**
- `--model_path`: Path to trained model checkpoint
- `--data_dir`: Directory containing images
- `--csv_file`: Path to CSV file with labels
- `--class_mapping`: Path to class mapping JSON file
- `--output_dir`: Directory to save evaluation results (default: `evaluation`)
- `--split`: Dataset split to evaluate (`val` or `test`, default: `test`)

**Output:**
- Accuracy, Recall (macro), F1 (macro)
- Per-class metrics
- Confusion matrix (normalized and raw)
- Classification report

### Inference

Classify a single image:

```bash
python src/inference.py \
    --image_path path/to/image.jpg \
    --model_path models/best_model.pth \
    --class_mapping models/class_mapping.json \
    --output_dir results
```

**Arguments:**
- `--image_path`: Path to image file
- `--model_path`: Path to trained model checkpoint
- `--class_mapping`: Path to class mapping JSON file
- `--output_dir`: Directory to save results (optional)
- `--no_visualize`: Skip creating visualization

**Output:**
- Predicted phenological phase
- Confidence score
- Probabilities for all classes
- Visual output showing prediction and probabilities

### API Server

Start the FastAPI server:

```bash
# Set environment variables
export MODEL_PATH=models/best_model.pth
export CLASS_MAPPING_PATH=models/class_mapping.json

# Start server
python -m uvicorn src.api:app --host 0.0.0.0 --port 8000 --reload
```

**Endpoints:**
- `GET /`: API information
- `GET /health`: Health check
- `GET /classes`: List available classes
- `POST /classify`: Classify single image
- `POST /classify/batch`: Classify multiple images (max 10)

**Example request:**
```bash
curl -X POST "http://localhost:8000/classify" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@path/to/image.jpg"
```

## Testing

Run unit tests:

```bash
pytest tests/ -v
```

Run specific test categories:

```bash
pytest tests/ -v -m unit           # Unit tests only
pytest tests/ -v -m integration    # Integration tests only
```

## Project Structure

```
.
├── src/
│   ├── __init__.py
│   ├── data_loader.py      # Dataset and data loading
│   ├── model.py            # ResNet model definition
│   ├── train.py            # Training script
│   ├── evaluate.py         # Evaluation script
│   ├── inference.py        # Inference script
│   ├── api.py              # FastAPI application
│   └── utils.py            # Utility functions
├── tests/
│   ├── test_data_loader.py
│   ├── test_train.py
│   ├── test_evaluate.py
│   └── test_inference.py
├── data/                   # Dataset directory
├── models/                 # Saved models
├── specs/                  # Feature specifications
├── requirements.txt        # Python dependencies
├── pytest.ini             # Pytest configuration
└── README.md              # This file
```

## Model Architecture

- **Base**: ResNet50 pretrained on ImageNet
- **Classification Head**:
  - Dropout (0.5)
  - Linear (2048 → 512)
  - ReLU
  - Dropout (0.3)
  - Linear (512 → num_classes)

## Performance Requirements

- **Accuracy**: >90% on test set
- **Training Time**: <1 hour for standard dataset
- **Inference Time**: <1 second per image

## Data Format

The CSV file should have the following format:

```csv
image_path,phase
images/img001.jpg,vegetative
images/img002.jpg,flowering
images/img003.jpg,fruiting
```

## Reproducibility

The project ensures reproducibility through:
- Random seed management (default: 42)
- Deterministic training
- Model checkpointing
- Class mapping versioning

## License

This project is part of a supervised learning implementation for phenology classification.

## Citation

If you use this code, please cite the project specifications in `specs/1-phenology-classifier/`.