Skip to content

focus-andy/Binlog-ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Binlog-ETL

Binlog-ETL is a system designed for synchronising the data in MySQL to HIVE data warehouse on Hadoop. The best feature is that it can keep the latest snapshot of MySQL tables in the DW while MySQL databases are updating.

Binlog

Binlog is the short for binary log. The binary log of MySQL contains “events” that describe database changes such as table creation operations or changes to table data. It is also used for MySQL Master-Slave synchronization. So in this repository, it is designed to be used for synchronising MySQL tables to Data Wharehouse constructed with HIVE and Hadoop.

#Features

  • Create HIVE tables and partitions
  • Parse MySQL binary logs and extract updates
  • Upload data to HDFS
  • Support multi-processes
  • Support MySQL sharding databases and tables
  • Keep the latest snapshot and remove duplicates
  • The above features are all automatical

#Requriements

  • MySQL version 5.6 or newer
  • Enable binary log in my.cnf

#Data Flow The Binlog-ETL system lays between MySQL cluster and HIVE data warehouse. It requests bin-logs from MySQL cluster initiativly and creates snapshots automatically.

  • Step 1, request latest bin-logs and download them to local disks
  • Step 2, parse bin-logs and transform the data to target format
  • Step 3, upload the new data to HDFS. The data are the latest updates of MySQL tables
  • Step 4, merge the new data with the old snapshot, keep the latest updates and remove the duplicates.
  • Step 5, Create new a snapshot and end.

About

An ETL framework for synchronising data from mysql

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published