Comparing two excel spreadsheets and writing difference to a new excel was always a tedious task and Long Ago, I was doing the same thing and the objective there was to compare the row,column values for both the excel and write the comparison to a new excel files. In those days I have used xlrd module to read and write the comparison result of both the files in an excel file. I can still recall that we have written long lines of code to achieve that.

Recently at work, I encountered the same issue and retrieving my old xlrd script was not an option. So, i thought to give Pandas a try and amazingly I completed comparing the two excel files and writing the results to a new excel file in not more than 10 line of codes. I’m pretty sure that if I spend some more time then I can optimize the code further but this was a quick code that I wrote almost in no time for comparing over 100K records in both the excel file.

To merge multiple files in a new file, you can simply read files and write them to a new file using loops. For example filenames = 'file1.txt', 'file2.txt', 'file3.txt' with open('outputfile', 'w') as outfile: for fname in filenames: with open(fname) as infile: outfile.write(infile.read).

FilesPython

Let’s Start

  • Pd.readexcel will read Excel data into Python and store it as a pandas DataFrame object. Be aware that this method reads only the first tab/sheet of the Excel file by default. If your Excel file contains more than 1 sheet, continue reading to the next section. Df.append will append/combine data from one file to another.
  • Thus, if you plan to do multiple append operations, it is generally better to build a list of DataFrames and pass them all at once to the concat function. In the next section, we'll look at another more powerful approach to combining data from multiple sources, the database-style merges/joins implemented in pd.merge.

I was comparing two excel files which contains the sales record of all the assets which the company sells to their customers in EU/EMEA/NA/APAC region. The two excel files I’m using is sample records from two Months i.e. Jan and Feb 2019 and contains the same no. of rows and columns

Files

Import

First we need to import the two excel files in two separate dataframes

Next Step

Python Merge Two Excel Files

Compare the No. of Columns and their types between the two excel files and whether number of rows are equal or not.

First,We will Check whether the two dataframes are equal or not using pandas.dataframe.equals , This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype

This function requires that the elements have the same dtype as their respective elements in the other Series or DataFrame

Basically, it checks for the following three things between two dataframe:

a) They have the same types and values for their elements and column labelsb) They have the same element types and values, but have different types for the column labelsc) They have different types for the same values for their elements

Compare Two Dataframe Values

Python Join Two Excel Files

In the above step we ensure that the shape and type of both the dataframes are equal and now we will compare the values of two dataframes

In just one line we have compared the values of two dataframes and the comparison value for each row and column is shown as True and False values

Index of the Cell with False value

Combine Excel Files With Python

Get the Index of all the cells where the value is False, Which means the value of the cell differ between the two dataframes.

Next we will iterate over these cells and update the first dataframe(df1) value to display the changed value in second dataframe(df2)

Export to Excel

Finally we have replaced the old value of dataframe(df1) and entered the new value in the following format:

dfl (Old Value) —-> df2(New Value)

Here is how the updated dataframe(df1) looks like:

So wherever there was a false value in the Comparison_value ndarray in the above step that has been replaced with the old and new value. Now you can export this dataframe into an excel or csv file and name it as Excel_diff.

I have set the index parameter as false otherwise the index will also be exported in the xlsx file as the first column and I have set the headers as True so that by default the dataframe headers will be the header in excel file as well.

Merge Text Files Python

Conclusion

Merge Two Excel Files In Python

Now if I compare my yesteryear code with the new and fast Pandas code then it really amuse me that how fast we have progressed and with the advent of modules like Pandas the things have become much simpler. Even you can directly read the records from SQL tables and write to the tables after processing. This new world is progressing at a faster speed and we all are optimistic with every day goes by we are near to see more intelligent and breakthroughs in the Python world.

Coments are closed

Most Viewed Posts

  • Medibang Paint Pro English
  • Best Snes Emulator Retroarch
  • Retroarch Wii Emulator

Scroll to top