sqlcompyre.analysis package

Subpackages

Submodules

Module contents

class sqlcompyre.analysis.QueryInspection(engine: Engine, query: FromClause)[source]

Bases: object

Inspect the results of a SQL query.

Note:

This class should never be initialized directly. Use the inspect() or inspect_table() functions instead.

Attributes:
row_count

Get the number of rows returned by the query.

Methods

column_stats(column)

Obtain statistics about a single column.

distinct_row_count(*columns)

Get the number of rows with distinct values wrt.

column_stats(column: str) ColumnStats[source]

Obtain statistics about a single column.

Args:

column: The name of the column to obtain information about.

Returns:

An object providing access to column statistics.

distinct_row_count(*columns: str) int[source]

Get the number of rows with distinct values wrt. to the provided column(s).

Args:
columns: The set of columns to compute the number of distinct values for. If no

columns are provided, the number of distinct rows (across all columns) is computed.

Returns:

The number of distinct rows.

property row_count: int

Get the number of rows returned by the query.

class sqlcompyre.analysis.SchemaComparison(engine: Engine, left_schema: str, right_schema: str, left_tables: dict[str, Table], right_tables: dict[str, Table], float_precision: float, collation: str | None, ignore_casing: bool)[source]

Bases: object

Compare the table metadata from two different SQL database schemas.

Note:

This class should never be initialized directly. Use the compare_schemas() function instead.

Attributes:
table_counts

A comparison between the number of tables in each schema.

table_names

A comparison between the table names of the two schemas.

Methods

compare_matched_table(name[, join_columns, ...])

A full comparison between the tables in "left" and "right" schema with the specified name.

summary_report()

Generate a report that summarizes the schema comparison.

table_reports([ignore_tables, ...])

Generate reports for all tables matched between the schemas.

compare_matched_table(name: str, join_columns: list[str] | None = None, ignore_columns: list[str] | None = None, column_name_mapping: dict[str, str] | None = None, infer_primary_keys: bool = False) TableComparison[source]

A full comparison between the tables in “left” and “right” schema with the specified name. The name must be available in table_names.in_common.

Args:
name: The name of the table (possibly including a schema name if this schema comparison

compares multi-part schemas).

join_columns: The columns to join the tables on in order to compare column values. If

not provided, defaults to the union of primary keys. The corresponding primary key(s) of the right table are determined via column_name_mapping.

ignore_columns: Columns to ignore to evaluate equality. These column names should

reference the left table: corresponding columns in the right table are determined via column_name_mapping. Primary key columns passed here are ignored.

column_name_mapping: A mapping from column names in the left table to column names in

the right table. If not provided, defaults to mapping columns with the same names.

infer_primary_keys: Whether primary keys should be inferred if the tables do not have

matching primary key(s).

Returns:

The full comparison between the tables.

summary_report() Report[source]

Generate a report that summarizes the schema comparison.

property table_counts: Counts

A comparison between the number of tables in each schema.

property table_names: Names

A comparison between the table names of the two schemas.

table_reports(ignore_tables: list[str] | None = None, ignore_table_columns: dict[str, list[str]] | None = None, skip_equal: bool = False, infer_primary_keys: bool = False, sort_by: Literal['name', 'creation_timestamp'] = 'name', verbose: bool = True) dict[str, Report][source]

Generate reports for all tables matched between the schemas.

Args:
ignore_tables: A list of regexes specifying tables to ignore completely. Consider

using ^ and $ to match full table names.

ignore_table_columns: A mapping from table names to column names. The provided columns

are ignored to evaluate equality and dropped from all reports. Different to the behavior for ignore_tables, table names are not considered to be regular expressions.

ignore: A mapping from table names to column names. Each table which provides no

column names is ignored completely. If at least one column name is provided, the provided columns are ignored to evaluate equality.

skip_equal: Whether to skip reports about tables which are equal in the two schemas.

Tables that cannot be compared for equality (e.g. because they have no primary key) are still included.

infer_primary_keys: Whether primary keys should be inferred if compared tables do not

have matching primary key(s).

sort_by: The strategy for sorting the table reports. Either alphabetically by the name

of the tables (name) or chronologically by the creation timestamp of the tables in the “left” schema (creation_timestamp). Note that the latter option is not supported for all database systems.

verbose: Whether to show a progress bar for the report generation.

Returns:

A mapping from matched table names to the reports of their comparisons, ordered by the sort strategy.

class sqlcompyre.analysis.TableComparison(engine: Engine, left_table: FromClause, right_table: FromClause, join_columns: list[str] | None, column_name_mapping: dict[str, str] | None, ignore_columns: list[str] | None, float_precision: float, collation: str | None, ignore_casing: bool, infer_primary_keys: bool)[source]

Bases: object

Compare the content of two SQL database tables.

Note:

This class should never be initialized directly. Use the compare_tables() function instead.

Attributes:
column_matches

A comparison between the column values of the two tables.

column_names

A comparison between the column names of the two tables.

equal

Whether the compared tables are equal.

join_columns

The columns used for joining the two tables.

row_counts

A comparison between the number of rows in each table.

row_matches

A comparison between the contents of the individual rows in the two tables.

Methods

get_top_changes(column_name[, n])

Gets the most common changes in a single column.

summary_report()

Generate a report that summarizes the table comparison.

property column_matches: ColumnMatches

A comparison between the column values of the two tables.

property column_names: Names

A comparison between the column names of the two tables.

property equal: bool

Whether the compared tables are equal.

get_top_changes(column_name: str, n: int = 5) dict[str, int][source]

Gets the most common changes in a single column.

Args:

column_name: The name of the column in the “left” table. n: The number of changes to get.

Returns:

A dictionary where the key is a change in string form (maybe “true -> false”) and the value is the number of times the change occurs.

property join_columns: list[str]

The columns used for joining the two tables.

property row_counts: Counts

A comparison between the number of rows in each table.

property row_matches: RowMatches

A comparison between the contents of the individual rows in the two tables.

summary_report() Report[source]

Generate a report that summarizes the table comparison.

Returns:

A report summarizing the comparison of the two tables.