TensorFlow Data Pipeline Hangs When Used with Ray or OpenDP¶
Problem¶
When using TensorFlow’s tf.data.Dataset (via HuggingFace’s to_tf_dataset()) with p2pfl, the program hangs at:
sample = next(iter(tf_dataset))
Root Cause¶
Import order conflict: Ray or OpenDP is initialized before TensorFlow is imported, causing a threading deadlock.
Importing
p2pfl.management.loggertriggersray.init()at module levelTensorFlow is imported afterwards
TensorFlow’s data pipeline threads deadlock due to Ray’s modified threading environment
p2pfl/management/logger/__init__.py:30 -> ray_installed() -> ray.init()
This fails (hangs):
from p2pfl.management.logger import logger # Ray initialized here
import tensorflow as tf # Too late
from datasets import Dataset
dataset = Dataset.from_dict({"x": [[1]*784], "y": [0]})
tf_dataset = dataset.to_tf_dataset(batch_size=1, columns=["x"], label_cols=["y"])
next(iter(tf_dataset)) # Hangs forever
This works:
import tensorflow as tf # TensorFlow first
from p2pfl.management.logger import logger # Ray after
from datasets import Dataset
dataset = Dataset.from_dict({"x": [[1]*784], "y": [0]})
tf_dataset = dataset.to_tf_dataset(batch_size=1, columns=["x"], label_cols=["y"])
next(iter(tf_dataset)) # Works
Solutions¶
Option 1: Import TensorFlow first (Quick Fix)
Import TensorFlow before Ray or OpenDP:
import tensorflow as tf # FIRST
from p2pfl.management.logger import logger # After TensorFlow
This is how p2pfl’s test suite handles it in test/conftest.py:
with contextlib.suppress(ImportError):
import tensorflow
Option 2: Don’t install Ray
Ray is an optional dependency. If you don’t need distributed computing features:
pip install "p2pfl[tensorflow]" # Without Ray
Option 3: Disable Ray at runtime
from p2pfl.settings import Settings
Settings.general.DISABLE_RAY = True
Environment¶
TensorFlow 2.20.0
Ray 2.53.0
Python 3.12
macOS (Darwin)
Status¶
Fixed on macOS - p2pfl now uses a Ray worker setup hook to import TensorFlow before Ray workers start.
The fix is in p2pfl/utils/check_ray.py:
def _worker_setup() -> None:
"""Import ML frameworks first in Ray workers to avoid deadlocks on macOS."""
if sys.platform != "darwin":
return
import contextlib
with contextlib.suppress(ImportError):
import tensorflow
with contextlib.suppress(ImportError):
import torch
# In ray.init():
if sys.platform == "darwin":
init_kwargs["runtime_env"] = {"worker_process_setup_hook": _worker_setup}
Related: ray-project/ray#59661