Theia

Article

Uber Develops HiveSync for Cross-Region Data Synchronization and Disaster Recovery

DATA AND AI INFRASTRUCTURE

Uber developed HiveSync, a sharded batch replication system for synchronizing Hive and HDFS data across regions, processing millions of events daily. It enhances data consistency, supports disaster recovery, and reduces idle regional hardware costs.

Initially based on Airbnb's ReAir project, HiveSync features sharding, DAG-based orchestration, and control/data plane separation, allowing ETL jobs to run in the primary data center while maintaining near real-time replication. The system includes the HiveSync Replication Service and Data Reparo Service, utilizing a Hive Metastore Event Listener for real-time change capture and asynchronous replication jobs. Future plans involve extending HiveSync for cloud replication as analytics and ML migrate to Google Cloud.

Uber Develops HiveSync for Cross-Region Data Synchronization and Disaster Recovery
Jan 20, 2026, 6:09 AM

No comments yet. Be the first to share your thoughts!