# TensorFlow Data Pipeline Hangs When Used with Ray or OpenDP

## Problem

When using TensorFlow's `tf.data.Dataset` (via HuggingFace's `to_tf_dataset()`) with p2pfl, the program hangs at:

```python
sample = next(iter(tf_dataset))
```

## Root Cause

**Import order conflict**: Ray or OpenDP is initialized before TensorFlow is imported, causing a threading deadlock.

1. Importing `p2pfl.management.logger` triggers `ray.init()` at module level
2. TensorFlow is imported afterwards
3. TensorFlow's data pipeline threads deadlock due to Ray's modified threading environment

```
p2pfl/management/logger/__init__.py:30 -> ray_installed() -> ray.init()
```

**This fails (hangs):**

```python
from p2pfl.management.logger import logger  # Ray initialized here
import tensorflow as tf  # Too late
from datasets import Dataset

dataset = Dataset.from_dict({"x": [[1]*784], "y": [0]})
tf_dataset = dataset.to_tf_dataset(batch_size=1, columns=["x"], label_cols=["y"])
next(iter(tf_dataset))  # Hangs forever
```

**This works:**

```python
import tensorflow as tf  # TensorFlow first
from p2pfl.management.logger import logger  # Ray after
from datasets import Dataset

dataset = Dataset.from_dict({"x": [[1]*784], "y": [0]})
tf_dataset = dataset.to_tf_dataset(batch_size=1, columns=["x"], label_cols=["y"])
next(iter(tf_dataset))  # Works
```

## Solutions

**Option 1: Import TensorFlow first (Quick Fix)**

Import TensorFlow before Ray or OpenDP:

```python
import tensorflow as tf  # FIRST
from p2pfl.management.logger import logger  # After TensorFlow
```

This is how p2pfl's test suite handles it in `test/conftest.py`:

```python
with contextlib.suppress(ImportError):
    import tensorflow
```

**Option 2: Don't install Ray**

Ray is an optional dependency. If you don't need distributed computing features:

```bash
pip install "p2pfl[tensorflow]"  # Without Ray
```

**Option 3: Disable Ray at runtime**

```python
from p2pfl.settings import Settings
Settings.general.DISABLE_RAY = True
```

## Environment

- TensorFlow 2.20.0
- Ray 2.53.0
- Python 3.12
- macOS (Darwin)

## Status

**Fixed on macOS** - p2pfl now uses a Ray worker setup hook to import TensorFlow before Ray workers start.

The fix is in `p2pfl/utils/check_ray.py`:

```python
def _worker_setup() -> None:
    """Import ML frameworks first in Ray workers to avoid deadlocks on macOS."""
    if sys.platform != "darwin":
        return
    import contextlib
    with contextlib.suppress(ImportError):
        import tensorflow
    with contextlib.suppress(ImportError):
        import torch

# In ray.init():
if sys.platform == "darwin":
    init_kwargs["runtime_env"] = {"worker_process_setup_hook": _worker_setup}
```

Related: [ray-project/ray#59661](https://github.com/ray-project/ray/issues/59661)