Automatic Schema Inference: XPipes Makes Universal Database Integration Possible

Discover how XPipes automatically maps and creates target table structures across any database systems without requiring users to write a single line of SQL code.

In the rapidly evolving landscape of multi-database architectures and data integration services, building a truly universal real-time data integration platform is a challenge faced by many organizations. Our comprehensive data integration platform, XPipes, supports seamless integration between any database systems including PostgreSQL, MySQL, MongoDB, ClickHouse, Oracle, and many more, focusing on solving complex cross-database integration challenges for users.

When performing real-time data synchronization between different database systems, automatically and accurately creating target tables from source table structures across diverse database technologies is a crucial part of the integration process. This article will introduce XPipes' solution to this universal schema mapping problem, allowing users to achieve accurate real-time data integration across any database combination without writing any database-specific code.

Universal Intermediate Data Type Design

To accomplish automatic mapping between any database systems, we abstract the data types of all supported databases into a set of independent, universal basic types, including:

  • Integers with ranges
  • Decimals with precision
  • Strings with length and fixed width
  • Long text
  • Binary with length
  • Boolean
  • Date
  • Time
  • Enumeration types

During the development of each data source, its own data types are mapped to intermediate types through configuration. After the data source is declared, the engine understands the conversion from their own data types (such as int) to standard types (such as Number). This provides the prerequisite for the engine to perform automatic inference later. The declaration format is as follows:

{
  "int[($zerofill)]": {
    "to": "Number",
    "bit": 32,
    "precision": 10,
    "value": [
      -2147483648,
      2147483647
    ]
  },
  "float": {
    "to": "Number",
    "precision": [
      1,
      6
    ],
    "scale": [
      0,
      6
    ],
    "fixed": false
  }
}

Universal Table Structure Generation

Before synchronizing data between any database systems, the integration engine automatically reads the data structure of the source table, converts it into a virtual table composed of standard intermediate types, then parses the type mapping declaration of the target database, and deduces the optimal physical table structure that the target table should have from the virtual table, and then performs automatic table creation.

For example, for the scenario of MySQL synchronizing to PostgreSQL, the integer type is declared in MySQL as: int -> Number, and in PostgreSQL as: integer -> Number. If there is a MySQL table containing an int type, when creating a table in PostgreSQL, it will be automatically inferred as an integer type. Similarly, for MongoDB to ClickHouse integration, a MongoDB NumberInt would map to ClickHouse Int32 through the intermediate Number type.

In the actual process, the engine will automatically infer the type that can accurately store data in the target database while using the most efficient storage format through analysis of data type characteristics, constraints, data range, and other attributes. This optimization ensures data accuracy while avoiding the use of types that occupy more storage or have performance limitations in the target database system.

Through this design, when creating synchronization tasks in XPipes between any database combination, target tables can be automatically created without users having to understand the specific data types and constraints of different database systems.

XPipes will also pre-check whether the target table exists. If users have already created the table themselves, the synchronization process will use it instead of recreating it. If the automatic mapping of the table structure does not meet the user's expectations, users can also create tables according to their expected table structure through manual table creation or customize the mapping rules.

Schema Modification in Data Processors

Before data is synchronized to the target database, XPipes provides comprehensive data processing capabilities that work across all database systems. Among them, some processors will affect the final target table structure, such as: table renaming, field renaming, field deletion, field addition, adding timestamp fields, data type conversions, and other transformation processors.

In addition, XPipes supports user-defined processors based on Python and SQL. Users can write custom code to perform complex transformations and processing of data that work regardless of the source and target database types.

When performing cross-database synchronization, the integration engine will calculate and process the schema model according to different situations:

  1. For table and field processors, it will automatically process the target table structure according to the configuration, ensuring compatibility with the target database's constraints and capabilities
  2. For custom processors, the engine will analyze sample data from the source, process them through the custom logic, and then use the results to infer the optimal table structure for the target database system
  3. For cross-database type conversions, the engine automatically handles complex mappings like MongoDB documents to PostgreSQL JSONB, or Oracle CLOB to ClickHouse String

Through this design, no matter how users configure and use data synchronization tasks between any database systems, they can automatically complete accurate table structure creation in the target database, avoiding manual schema management operations.

Conclusion

XPipes, through abstracting universal intermediate data types, can quickly and accurately infer the optimal data types in any target database system, and automatically calculate and infer table structure changes that occur at processing nodes, completing the automatic creation capability of target tables across any database combination. This greatly reduces the operational complexity for users, allowing them to achieve accurate real-time data integration between any databases without writing database-specific code.