LakeSoul Cloud-Native Lakehouse
The only open source lakehouse in China, with a modern lakehouse data intelligent architecture integrating batch flow, data lakehouse, and analysis intelligence
Leading technical concept and architecture design
Traditional data architecture is faced with the untimely response, high cost, inability to unify real-time data, batch data, and difficulty scaling. LakeSoul provides a perfect lake warehouse storage to solve the above problems. It offers high concurrency, high throughput, read and write capabilities and complete warehouse management capabilities on the cloud and provides it to various computing engines in a general way.
Efficient and extensible Catalog metadata service
Use a PostgreSQL database to store Catalog information, improving metadata scalability and transaction concurrency.
Concurrent writes and ACID transactions
Concurrency control, with a high degree of write concurrency ability, is the automatic judgment of conflicts and processing to ensure data consistency.
Incremental writes and Upsert updates are supported
LakeSoul provides efficient Merge on Read and Upsert functions to improve data intake flexibility and performance.
It supports streaming and batch writing, row-level updates, and SQL operations. MVCC multi-version control, snapshot reading, and version rollback are available. Provide Flink CDC for efficient real-time access to the lake.
It supports interconnection with various computing engines such as Spark, Flink, and Presto and fully supports multiple data intelligent computing services such as ETL, OLAP, and AI model training.
Unified stream-batch table storage
Rich application scenarios, meeting various service requirements and helping to release service value
Real-time data is rapidly entering the lake
Flink CDC is provided for real-time implementation from the data source without T+1 import and Kafka deployment
Example of real-time online database entry report analysis
With only relevant configurations, such as online data sources, the whole database synchronization and real-time entry task can be started. It supports the automatic sensing of new tables and synchronizing table structure changes without human operation and maintenance. The online data is updated to the lake warehouse in real time. The BI reports and large-screen display are seamlessly connected and updated in real time so that key business indicators can be grasped at any time to support business decisions.
Real-time Report Analysis
Based on the streaming batch update feature, data extraction, transformation and development are completed through SQL, simplifying the ETL and data analysis process.
AI Application Landing
Large-scale DMP, machine learning sample database, and feature database are constructed to connect AI models and online reasoning seamlessly to realize intelligent data applications.
Join the community and share data intelligence