Getting Started

Prerequisites

This documentation assumes some programming experience with Python and a beginner’s understanding of SQLAlchemy Engines and Tables. If you haven’t used SQLAlchemy to connect to databases in the past, try looking at the SQLAlchemy Documentation first:

Creating a TableComparison Object

To compare two tables, you simply need to call the top-level compare_tables() function:

import sqlalchemy as sa
import sqlcompyre as sc

engine = sa.create_engine("<your_connection_string>")
comparison = sc.compare_tables(engine, "<left table name>", "<right table name>")

Importantly, the tables passed to this method may belong to different schemas. In the case of MSSQL, tables may even belong to different databases. The user simply has to take care to pass the fully-qualified name of the table.

There are a few optional arguments when calling compare_tables() that often need to be set when comparing tables.

Generating a Report

Perhaps the first thing you would do with a TableComparison is to print a report summarizing the changes. This is done through the summary_report() method:

report = comparison.summary_report()
print(report)

Accessing Comparison Metrics

After viewing a report, you may wish to explore some sections in more detail. Most comparison information is contained within the attributes of a TableComparison, split into four categories:

  • row_counts: A RowCounts object

  • column_names: A ColumnNames object

  • row_matches: A RowMatches object

  • column_matches: A ColumnMatches object

You can access attributes using dot notation as in the below:

# Find number of rows in left table
num_rows_left = comparison.row_counts.left

# Find number of rows in the inner join
num_joined_columns = comparison.row_matches.n_joined_total

Additionally, you can find the top changes in any column using the get_top_changes() function.

Using Query Results

Some attributes contained within a TableComparison object are SQLAlchemy Select objects that you can use to further explore your data.

# Querying for all rows in the right table that could not be joined with rows in the left table
unjoined_right = comparison.row_matches.unjoined_right

# editing the query to only give 10 rows
first_10 = unjoined_right.limit(10)

# executing the query and printing results
with left_engine.connect() as conn:
    res = conn.execute(first_10)
for row in res:
    print(row)

# Alternatively, storing query results as a dataframe
df = pandas.read_sql(first_10, left_engine)