Uber Develops HiveSync for Cross-Region Data Synchronization and Disaster Recovery

Uber has introduced HiveSync, a sharded batch replication system designed for synchronizing Hive and HDFS data across regions, processing millions of events daily. This system enhances data consistency and supports disaster recovery while minimizing idle hardware costs, featuring components like the HiveSync Replication Service and Data Reparo Service for real-time change capture. Future developments aim to extend HiveSync for cloud replication as analytics and machine learning transition to Google Cloud.

Theia Market Signal Identification - AI Assisted

Published Jan 20, 2026

Uber Develops HiveSync for Cross-Region Data Synchronization and Disaster Recovery

DATA AND AI INFRASTRUCTURE

Uber developed HiveSync, a sharded batch replication system for synchronizing Hive and HDFS data across regions, processing millions of events daily. It enhances data consistency, supports disaster recovery, and reduces idle regional hardware costs.

Initially based on Airbnb's ReAir project, HiveSync features sharding, DAG-based orchestration, and control/data plane separation, allowing ETL jobs to run in the primary data center while maintaining near real-time replication. The system includes the HiveSync Replication Service and Data Reparo Service, utilizing a Hive Metastore Event Listener for real-time change capture and asynchronous replication jobs. Future plans involve extending HiveSync for cloud replication as analytics and ML migrate to Google Cloud.

Uber Develops HiveSync for Cross-Region Data Synchronization and Disaster Recovery

Uber Develops HiveSync for Cross-Region Data Synchronization and Disaster Recovery

Comments

Discover more

Digital Infrastructure Upgrade in Central Italy Post-Earthquakes

Edgewater Considers Ballot Measure to Ban Data Centers

Kevin O'Leary's Water Use Comparison for AI Data Centers Sparks Controversy

Aether Holdings Launches Aether Compute for AI Modular Data Center Solutions

Comments

Discover more

Digital Infrastructure Upgrade in Central Italy Post-Earthquakes

Edgewater Considers Ballot Measure to Ban Data Centers

Kevin O'Leary's Water Use Comparison for AI Data Centers Sparks Controversy

Aether Holdings Launches Aether Compute for AI Modular Data Center Solutions