Introduction

  1. Add Multiple Excel Sheets Together
  2. Combine Multiple Excel Files Into One
  3. How To Combine Two Excel Spreadsheets Into One
  4. Connect Two Excel Spreadsheets

On the Excel ribbon, go to the Ablebits tab, Merge group, click Copy Sheets, and choose one of the following options: Copy sheets in each workbook to one sheet and put the resulting sheets to one workbook. Merge the identically named sheets to one. Copy the selected sheets to one workbook. If you have two or more spreadsheets containing related data, you can merge them into a single Excel worksheet using Excel's consolidate option. Before consolidating spreadsheets, they must use the same format. Join multiple tables into one with Excel Power Query. In situations when you need to combine two or more tables with different numbers of rows and columns, Excel Power Query may come in handy. However, please be aware that joining tables with Power Query cannot be done with a mere couple of clicks. Explaining all the nuances would take far more. Merge two tables using the VLOOKUP function. In the example shown below, you'll see two tables that previously had other names to new names: 'Blue' and 'Orange.' In the Blue table, each row is a line item for an order. So, Order ID 20050 has two items, Order ID 20051 has one item, Order ID 20052 has three items, and so on.

Merge and combine rows without losing data in Excel Excel only keeps the data in the upper-left most cell, if you apply 'Merge & Center' command (Home tab Merge & Center on the Alignment panel) to merge rows of data in Excel. Users have to use another method to merge multiple rows of data into one row without deleting data.

A common task for python and pandas is to automate the process of aggregatingdata from multiple files and spreadsheets.

This article will walk through the basic flow required to parse multiple Excel files, combinethe data, clean it up and analyze it. The combination of python + pandas can be extremelypowerful for these activities and can be a very useful alternative to the manual processes or painful VBA scriptsfrequently used in business settings today.

The Problem

Before, I get into the examples, here is a simple diagram showing the challenges withthe common process used in businesses all over the world to consolidate data from multipleExcel files, clean it up and perform some analysis.

If you’re reading this article, I suspect you have experienced some of the problemsshown above. Cutting and pasting data or writing painful VBA code will quickly get old.There has to be a better way!

Python + pandas can be a great alternative that is much more scaleable and powerful.

By using a python script, you can develop a more streamlined and repeatable solution toyour data processing needs. The rest of this article will show a simple example of howthis process works. I hope it will give you ideas of how to apply these tools to your unique situation.

Collecting the Data

If you are interested in following along, here are the excel files and a link to the notebook:

The first step in the process is collecting all the data into one place.

First, import pandas and numpy

Let’s take a look at the files in our input directory, using theconvenient shell commands in ipython.

There are a lot of files, but we only want to look at the sales .xlsx files.

Use the python glob module to easily list out the files we need.

This gives us what we need. Let’s import each of our files and combinethem into one file.Panda’s concat and append can do this for us. I’m going to use append inthis example.

The code snippet below will initialize a blank DataFrame then append allof the individual files into the all_data DataFrame.

Now we have all the data in our all_data DataFrame. You can usedescribe to look at it and make sure you data looks good.

account numberquantityunit priceext price
count 1742.000000 1742.000000 1742.000000 1742.000000
mean 485766.487945 24.319173 54.985454 1349.229392
std 223750.660792 14.502759 26.108490 1094.639319
min 141962.000000 -1.000000 10.030000 -97.160000
25% 257198.000000 12.000000 32.132500 468.592500
50% 527099.000000 25.000000 55.465000 1049.700000
75% 714466.000000 37.000000 77.607500 2074.972500
max 786968.000000 49.000000 99.850000 4824.540000

A lot of this data may not make much sense for this data set but I’m mostinterested in the count row to make sure the number of data elementsmakes sense. In this case, I see all the data rows I expect.

account numbernameskuquantityunit priceext pricedate
0 740150 Barton LLC B1-20000 39 86.69 3380.91 2014-01-01 07:21:51
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.16 2014-01-01 10:00:47
2 218895 Kulas Inc B1-69924 23 90.70 2086.10 2014-01-01 13:24:58
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.05 2014-01-01 15:05:22
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.26 2014-01-01 23:26:55

It is not critical in this example but the best practice is to convertthe date column to a date time object.

Combining Data

Now that we have all of the data into one DataFrame, we can do anymanipulations the DataFrame supports. In this case, the next thing wewant to do is read in another file that contains the customer status byaccount. You can think of this as a company’s customer segmentationstrategy or some other mechanism for identifying their customers.

First, we read in the data.

account numbernamestatus
0 740150 Barton LLC gold
1 714466 Trantow-Barrows silver
2 218895 Kulas Inc bronze
3 307599 Kassulke, Ondricka and Metz bronze
4 412290 Jerde-Hilpert bronze
5 729833 Koepp Ltd silver
6 146832 Kiehn-Spinka silver
7 688981 Keeling LLC silver
8 786968 Frami, Hills and Schmidt silver
9 239344 Stokes LLC gold
10 672390 Kuhn-Gusikowski silver
11 141962 Herman LLC gold
12 424914 White-Trantow silver
13 527099 Sanford and Sons bronze
14 642753 Pollich LLC bronze
15 257198 Cronin, Oberbrunner and Spencer gold

We want to merge this data with our concatenated data set of sales. Use panda’s mergefunction and tell it to do a left join which is similar to Excel’s vlookup function.

account numbernameskuquantityunit priceext pricedatestatus
0 740150 Barton LLC B1-20000 39 86.69 3380.912014-01-01 07:21:51 gold
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.162014-01-01 10:00:47 silver
2 218895 Kulas Inc B1-69924 23 90.70 2086.102014-01-01 13:24:58 bronze
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.052014-01-01 15:05:22 bronze
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.262014-01-01 23:26:55 bronze

This looks pretty good but let’s look at a specific account.

account numbernameskuquantityunit priceext pricedatestatus
9 737550 Fritsch, Russel and Anderson S2-82423 14 81.92 1146.882014-01-03 19:07:37 NaN
14 737550 Fritsch, Russel and Anderson B1-53102 23 71.56 1645.882014-01-04 08:57:48 NaN
26 737550 Fritsch, Russel and Anderson B1-53636 42 42.06 1766.522014-01-08 00:02:11 NaN
32 737550 Fritsch, Russel and Anderson S1-27722 20 29.54 590.802014-01-09 13:20:40 NaN
42 737550 Fritsch, Russel and Anderson S1-93683 22 71.68 1576.962014-01-11 23:47:36 NaN

This account number was not in our status file, so we have a bunch ofNaN’s. We can decide how we want to handle this situation. For thisspecific case, let’s label all missing accounts as bronze. Use thefillna function to easily accomplish this on the status column.

account numbernameskuquantityunit priceext pricedatestatus
0 740150 Barton LLC B1-20000 39 86.69 3380.912014-01-01 07:21:51 gold
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.162014-01-01 10:00:47 silver
2 218895 Kulas Inc B1-69924 23 90.70 2086.102014-01-01 13:24:58 bronze
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.052014-01-01 15:05:22 bronze
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.262014-01-01 23:26:55 bronze

Check the data just to make sure we’re all good.

account numbernameskuquantityunit priceext pricedatestatus
9 737550 Fritsch, Russel and Anderson S2-82423 14 81.92 1146.882014-01-03 19:07:37 bronze
14 737550 Fritsch, Russel and Anderson B1-53102 23 71.56 1645.882014-01-04 08:57:48 bronze
26 737550 Fritsch, Russel and Anderson B1-53636 42 42.06 1766.522014-01-08 00:02:11 bronze
32 737550 Fritsch, Russel and Anderson S1-27722 20 29.54 590.802014-01-09 13:20:40 bronze
42 737550 Fritsch, Russel and Anderson S1-93683 22 71.68 1576.962014-01-11 23:47:36 bronze

Now we have all of the data along with the status column filled in. Wecan do our normal data manipulations using the full suite of pandas capability.

Using Categories

One of the relatively new functions in pandas is support for categoricaldata. From the pandas, documentation:

Categoricals are a pandas data type, which correspond to categoricalvariables in statistics: a variable, which can take on only a limited,and usually fixed, number of possible values (categories; levels in R).Examples are gender, social class, blood types, country affiliations,observation time or ratings via Likert scales.

For our purposes, the status field is a good candidate for a category type.

You must make sure you have a recent version of pandas ( > 0.15) installed for this example to work.

First, we typecast it the column to a category using astype.

This doesn’t immediately appear to change anything yet.

account numbernameskuquantityunit priceext pricedatestatus
0 740150 Barton LLC B1-20000 39 86.69 3380.912014-01-01 07:21:51 gold
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.162014-01-01 10:00:47 silver
2 218895 Kulas Inc B1-69924 23 90.70 2086.102014-01-01 13:24:58 bronze
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.052014-01-01 15:05:22 bronze
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.262014-01-01 23:26:55 bronze

Buy you can see that it is a new data type.

Categories get more interesting when you assign order to the categories.Right now, if we call sort on the column, it will sort alphabetically.

account numbernameskuquantityunit priceext pricedatestatus
1741 642753 Pollich LLC B1-04202 8 95.86 766.882014-02-28 23:47:32 bronze
1232 218895 Kulas Inc S1-06532 29 42.75 1239.752014-09-21 11:27:55 bronze
579 527099 Sanford and Sons S1-27722 41 87.86 3602.262014-04-14 18:36:11 bronze
580 383080 Will LLC B1-20000 40 51.73 2069.202014-04-14 22:44:58 bronze
581 383080 Will LLC S2-10342 15 76.75 1151.252014-04-15 02:57:43 bronze

We use set_categories to tell it the order we want to use for thiscategory object. In this case, we use the Olympic medal ordering.

Now, we can sort it so that gold shows on top.

account numbernameskuquantityunit priceext pricedatestatus
0 740150 Barton LLC B1-20000 39 86.69 3380.912014-01-01 07:21:51 gold
1193 257198 Cronin, Oberbrunner and Spencer S2-82423 23 52.90 1216.702014-09-09 03:06:30 gold
1194 141962 Herman LLC B1-86481 45 52.78 2375.102014-09-09 11:49:45 gold
1195 257198 Cronin, Oberbrunner and Spencer B1-50809 30 51.96 1558.802014-09-09 21:14:31 gold
1197 239344 Stokes LLC B1-65551 43 15.24 655.322014-09-10 11:10:02 gold

Analyze Data

The final step in the process is to analyze the data. Now that it is consolidated andcleaned, we can see if there are any insights to be learned.

For instance, if you want to take a quick look at how your top tiercustomers are performaing compared to the bottom. Use groupby to getthe average of the values.

quantityunit priceext price
status
gold 24.680723 52.431205 1325.566867
silver 23.814241 55.724241 1339.477539
bronze 24.589005 55.470733 1367.757736

Of course, you can run multiple aggregation functions on the data to getreally useful information

quantityunit priceext price
summeanstdsummeanstdsummeanstd
status
gold 8194 24.680723 14.478670 17407.16 52.431205 26.244516 440088.20 1325.566867 1074.564373
silver 15384 23.814241 14.519044 35997.86 55.724241 26.053569 865302.49 1339.477539 1094.908529
bronze 18786 24.589005 14.506515 42379.64 55.470733 26.062149 1044966.91 1367.757736 1104.129089

So, what does this tell you? Well, the data is completely random but myfirst observation is that we sell more units to our bronze customersthan gold. Even when you look at the total dollar value associated withbronze vs. gold, it looks odd that we sell more to bronze customers than gold.

Maybe we should look at how many bronze customers we have and see whatis going on?

What I plan to do is filter out the unique accounts and see how manygold, silver and bronze customers there are.

I’m purposely stringing a lot of commands together which is notnecessarily best practice but does show how powerful pandas can be. Feelfree to review my previous article hereand here to understand it better.Play with this command yourself to understand how the commands interact.

Ok. This makes a little more sense. We see that we have 9 bronzecustomers and only 4 customers. That is probably why the volumes are soskewed towards our bronze customers. This result makes sense given the factthat we defaulted to bronze for many of our customers. Maybe we shouldreclassify some of them? Obviously this data is fake but hopefully thisshows how you can use these tools to quickly analyze your own data.

Conclusion

This example only covered the aggregation of 4 simple Excel files containing random data. Howeverthe principles can be applied to much larger data sets yet you can keep the codebase very manageable. Additionally, you have the full power of python at yourfingertips so you can do much more than just simply manipulate the data.

I encourage you to try some of these concepts out on your scenarios andsee if you can find a way to automate that painful Excel task that hangsover your head every day, week or month.

Good luck!

Combine Two Excel Spreadsheets

Comments

June 14, 2018 - by Bill Jelen

David from Florida asks today's question:

I have two workbooks. Both have the same data in column A, but the remaining columns are different. How can I merge those two workbooks?

I asked David if it is possible that one workbook has more records than the other. And the answer is Yes. I asked David if the key field only appears once in each file. The answer is also yes. Today, I will solve this with Power Query. The Power Query tools are found in Windows versions of Excel 2016+ in the Get & Transform section of the Data tab. If you have Windows versions of Excel 2010 or Excel 2013, you can download the Power Query add-in for those versions.

Here is David's workbook 1. It has Product and then three columns of data.

Here is David's workbook 2. It has Product Code and then other columns. In this example, there are extra products in workbook2, but the solutions will work if either workbook has extra columns.

Here are the steps:

  1. Select Data, Get Data, From File, From Workbook:

  2. Browse to the first workbook and click OK
  3. In the Navigator dialog, choose the worksheet on the left. (Even if there is only one worksheet, you have to select it.) You will see the data on the right.
  4. In the Navigator dialog, open the Load dropdown and choose Load To...
  5. Choose Only Create a Connection and press OK.
  6. Repeat steps 1-5 for the second workbook.

    If you've done both workbooks, you should see two connections on the Queries & Connections Panel on the right of your Excel screen.

    Continue with the steps to merge the workbooks:

  7. Data, Get Data, Combine Queries, Merge.

  8. From the top drop down in the Merge dialog, choose the first query.
  9. From the second drop down in the Merge dialog, choose the second query.
  10. Click on the Product heading in the top preview (this is the key field. Note you can multi-select two or more key fields by Ctrl + Clicking)
  11. Click on the Product Code heading in the second preview.
  12. Open the Join Type and choose Full Outer (All Rows From Both)

  13. Click OK. The data preview does not show the extra rows and only shows 'Table' repeatedly in the last column.

  14. Notice there is an 'Expand' icon in the heading for DavidTwo. Click that icon.
  15. Optional, but I always unselect 'Use Original Column Name As Prefix'. Click OK.

    The results are shown in this preview:

  16. In Power Query, use Home, Close & Load.

Here is the beautiful feature: if the underlying data in either workbook changes, you can click the Refresh icon to pull new data in to the results workbook.

Note

The icon for Refresh is usually hidden. Drag the left edge of the Queries & Connections pane to the left to reveal the icon.

Watch Video

Video Transcript

Add Multiple Excel Sheets Together

Learn Excel from MrExcel Podcast, Episode 2216: Combine Two Workbooks Based on a Common Column.

Hey, welcome back to MrExcel netcast, I'm Bill Jelen. Today's question's from David, who was in my seminar in Melbourne, Florida, for the Space Coast Chapter of the IIA.

David has two different workbooks where Column A is in common between both of them. So, here's Workbook 1, here's Workbook 2-- both have product code. This one has items that the first one doesn't have, or vice versa, and David wants to combine all the columns. So, we have three columns here and four columns here. I put both of these in the same workbook, in case you're downloading the workbook to work along. Take each one of these, move it out to its own workbook and save it.

Alright, to combine these files, we're going to use Power Query. Power Query's built into Excel 2016. If you're in the Windows version of 10 or 13, you can go out to Microsoft and download Power Query. You can start from a new blank workbook with a blank worksheet. You're going to save this file-- Save as, you know, maybe Workbook, to show the results of combined files .xlsx. Alright? And what we're going to do is, we're going to do two queries. We're going to go to Data, Get Data, From File, From Workbook, and then we'll choose the first file. In a preview, select the sheet that has your data, and we don't have to do anything to this data. So just open the load box and choose Load To, Only Create Connection, click OK. Perfect. Now, we're going to repeat that for the second item-- Data, From File, From a Workbook, choose DavidTwo, choose the sheet name, and then open the load, Load To, Only Create a Connection. You'll see over here in this panel, we have both connections present. Alright.

Combine Multiple Excel Files Into One

Now the actual work-- Data, Get Data, Combine Queries, Merge, and then in the Merge dialog, choose DavidOne, DavidTwo, and this next step is completely unintuitive. You have to do this. Choose the column or columns in common-- so Product and Product. Alright. And then, be very careful here with the join type. I want all rows from both because one might have an extra row and I need to see that, and then we click OK. Alright. And here's the initial result. It doesn't look like it worked; it doesn't look like it added the extra items that were in file 2. And we have this column 5-- it's null now. I'm going to right click column 5 and say, Remove that column. So open this expand icon and uncheck this box for Use original column name as prefix, and BAM! it works. So the extra items that were in File 2, that aren't in File 1, do appear.

Alright. Now in today's file, it looks like this Product Code column is better than this Product column, because it has extra rows. But there might be a day in the future where Workbook 1 has things that Workbook 2 doesn't have. So I'm going to leave both of them there, and I'm not going to get rid of any nulls because, like, even though this row at the bottom appears to be completely null, there might be in the future a situation where we have a few nulls in here because something's missing. Alright? So, finally, Close & Load, and we have our sixteen rows.

Now, in the future, let's say that something changes. Alright, so we'll go back to one of those two files and I'll change the class for Apple to 99, and let's even insert something new and save this workbook. Alright. And then, if we want our merge file to update, come over here-- now, watch out, when you do this the first time, you can't see the Refresh icon-- you have to grab this bar and drag it over. And we will do Refresh, and 17 rows loaded, the watermelon appears, the Apple changes to 99-- it's a beautiful thing. Now, hey, do you wanna learn about Power Query? Buy this book by Ken Puls and Miguel Escobar, M is for (DATA) MONKEY. I'll get you up to speed.

Wrap-up today: David from Florida has two workbooks that he wants to combine; they both have the same fields in Column A, but the other columns are all different; one workbook might have extra items that are not in the other and David wants those; there's no duplicates in either file; we're going to use power query to solve this, so start in a new blank workbook on a blank worksheet; you're going to do three queries, first one-- Data, From File, Workbook, and then Load to only Created Connection; the same thing for the second workbook, and then Data, Get Data, Merge, select the two connections, select the column that's common in both--in my case, Product-- and then from the Join Type, you want to full join all from the File 1, all from File 2. And then the beautiful thing is if the underlying data changes, you can just refresh the query.

To download the workbook from today's video, visit the URL in the YouTube description.

Well, hey, I want like David for showing up for my seminar, I want to thank you for stopping by. I'll see you next time for another netcast from MrExcel.

Download Excel File

How To Combine Two Excel Spreadsheets Into One

To download the excel file: combine-based-on-common-column.xlsx

Power Query is an amazing tool in Excel.

Excel Thought Of the Day

I've asked my Excel Master friends for their advice about Excel. Today's thought to ponder:

Connect Two Excel Spreadsheets

'Always press F4 when you read range or matrix in a function'

Coments are closed

Most Viewed Posts

  • Burn Fire
  • Tableau Desktop 2019.2
  • Microsoft Teams Tutorial For Students

Scroll to top