Airflow Xcom Exclusive |link| (2024)

XCom allows tasks to exchange small amounts of data by storing them in the Airflow metadata database. An XCom is essentially a key-value pair associated with a specific task instance, DAG, and execution date. The identifier for the data (e.g., filename ).

Downstream tasks retrieve this data using xcom_pull . You can filter the pull by specifying the upstream task_ids and the specific key . If no key is specified, Airflow defaults to searching for return_value . 2. The Danger Zone: Anti-Patterns and Database Bloat

Mastering Airflow XComs: Advanced Patterns for Exclusive Data Sharing

This public link is valid for 7 days and shares a thread, including any personal information you added. This link or copies made by others cannot be deleted. If you share with third parties, their policies apply. Can’t copy the link right now. Try again later. airflow xcom exclusive

def push_metadata(**kwargs): kwargs['ti'].xcom_push(key='user_status', value='active') def pull_metadata(**kwargs): status = kwargs['ti'].xcom_pull(key='user_status', task_ids='push_task') print(f"User is status") Use code with caution. The Modern Way (TaskFlow API)

The removal of enable_xcom_pickling in recent Airflow versions underscores a move towards more secure and standardized serialization (like JSON). Ensure any custom serialization method is secure and does not create vulnerabilities.

: Tasks "push" data by returning a value from an operator's execute() method or by explicitly calling task_instance.xcom_push() . XCom allows tasks to exchange small amounts of

To mitigate these risks, workflows require an : restricting data access only to downstream tasks that explicitly need it, and moving data payloads out of the transactional database. 2. Implementing Explicit and Exclusive Pulls

Automated cloud databases can hit storage limits quickly, locking up the entire cluster.

@task def consume_id(ref: str) -> None: # ref is automatically pulled as an exclusive XCom spark.read.parquet(ref).show() Downstream tasks retrieve this data using xcom_pull

Use ShortCircuitOperator with exclusive mode to stop downstream tasks if a certain key’s value doesn’t meet a threshold:

When a task returns a dict, Airflow pushes each key independently. This can cause fragmentation. Use single return values or multiple_outputs=True carefully.

The you pass between tasks (JSON, DataFrames, File Paths)

Airflow allows you to dynamically scale tasks at runtime based on upstream data. XComs serve as the engine for this capability via the .expand() method.