On the Data tab, in the Data Tools group, click Consolidate. In the Function box, click the function that you want Excel to use to consolidate the data. In each source sheet, select your data. The file path is entered in All references. Combine multiple sheets or workbooks into one workbook After free installing Kutools for Excel, please do as below: 1. Activate Excel, click Kutools Plus Combine, a dialog pops out to remind you the workbooks you want to combine needed be closed. Oct 18, 2020 Click OK to merge multiple Excel files into one sheet This is the result after merging the Excel file, you just need to turn off the Workbook table on the right and delete the excess column (column A).

  1. Combine Data From Multiple Excel Worksheets Into One Worksheet
  2. Excel File Sample

Introduction

A common task for python and pandas is to automate the process of aggregatingdata from multiple files and spreadsheets.

This article will walk through the basic flow required to parse multiple Excel files, combinethe data, clean it up and analyze it. The combination of python + pandas can be extremelypowerful for these activities and can be a very useful alternative to the manual processes or painful VBA scriptsfrequently used in business settings today.

The Problem

Before, I get into the examples, here is a simple diagram showing the challenges withthe common process used in businesses all over the world to consolidate data from multipleExcel files, clean it up and perform some analysis.

If you’re reading this article, I suspect you have experienced some of the problemsshown above. Cutting and pasting data or writing painful VBA code will quickly get old.There has to be a better way!

Python + pandas can be a great alternative that is much more scaleable and powerful.

By using a python script, you can develop a more streamlined and repeatable solution toyour data processing needs. The rest of this article will show a simple example of howthis process works. I hope it will give you ideas of how to apply these tools to your unique situation.

Collecting the Data

If you are interested in following along, here are the excel files and a link to the notebook:

The first step in the process is collecting all the data into one place.

First, import pandas and numpy

Let’s take a look at the files in our input directory, using theconvenient shell commands in ipython.

There are a lot of files, but we only want to look at the sales .xlsx files.

Use the python glob module to easily list out the files we need.

This gives us what we need. Let’s import each of our files and combinethem into one file.Panda’s concat and append can do this for us. I’m going to use append inthis example.

The code snippet below will initialize a blank DataFrame then append allof the individual files into the all_data DataFrame.

Now we have all the data in our all_data DataFrame. You can usedescribe to look at it and make sure you data looks good.

account numberquantityunit priceext price
count 1742.000000 1742.000000 1742.000000 1742.000000
mean 485766.487945 24.319173 54.985454 1349.229392
std 223750.660792 14.502759 26.108490 1094.639319
min 141962.000000 -1.000000 10.030000 -97.160000
25% 257198.000000 12.000000 32.132500 468.592500
50% 527099.000000 25.000000 55.465000 1049.700000
75% 714466.000000 37.000000 77.607500 2074.972500
max 786968.000000 49.000000 99.850000 4824.540000

A lot of this data may not make much sense for this data set but I’m mostinterested in the count row to make sure the number of data elementsmakes sense. In this case, I see all the data rows I expect.

account numbernameskuquantityunit priceext pricedate
0 740150 Barton LLC B1-20000 39 86.69 3380.91 2014-01-01 07:21:51
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.16 2014-01-01 10:00:47
2 218895 Kulas Inc B1-69924 23 90.70 2086.10 2014-01-01 13:24:58
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.05 2014-01-01 15:05:22
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.26 2014-01-01 23:26:55
Sheet

It is not critical in this example but the best practice is to convertthe date column to a date time object.

Combining Data

Now that we have all of the data into one DataFrame, we can do anymanipulations the DataFrame supports. In this case, the next thing wewant to do is read in another file that contains the customer status byaccount. You can think of this as a company’s customer segmentationstrategy or some other mechanism for identifying their customers.

First, we read in the data.

account numbernamestatus
0 740150 Barton LLC gold
1 714466 Trantow-Barrows silver
2 218895 Kulas Inc bronze
3 307599 Kassulke, Ondricka and Metz bronze
4 412290 Jerde-Hilpert bronze
5 729833 Koepp Ltd silver
6 146832 Kiehn-Spinka silver
7 688981 Keeling LLC silver
8 786968 Frami, Hills and Schmidt silver
9 239344 Stokes LLC gold
10 672390 Kuhn-Gusikowski silver
11 141962 Herman LLC gold
12 424914 White-Trantow silver
13 527099 Sanford and Sons bronze
14 642753 Pollich LLC bronze
15 257198 Cronin, Oberbrunner and Spencer gold

We want to merge this data with our concatenated data set of sales. Use panda’s mergefunction and tell it to do a left join which is similar to Excel’s vlookup function.

account numbernameskuquantityunit priceext pricedatestatus
0 740150 Barton LLC B1-20000 39 86.69 3380.912014-01-01 07:21:51 gold
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.162014-01-01 10:00:47 silver
2 218895 Kulas Inc B1-69924 23 90.70 2086.102014-01-01 13:24:58 bronze
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.052014-01-01 15:05:22 bronze
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.262014-01-01 23:26:55 bronze

This looks pretty good but let’s look at a specific account.

account numbernameskuquantityunit priceext pricedatestatus
9 737550 Fritsch, Russel and Anderson S2-82423 14 81.92 1146.882014-01-03 19:07:37 NaN
14 737550 Fritsch, Russel and Anderson B1-53102 23 71.56 1645.882014-01-04 08:57:48 NaN
26 737550 Fritsch, Russel and Anderson B1-53636 42 42.06 1766.522014-01-08 00:02:11 NaN
32 737550 Fritsch, Russel and Anderson S1-27722 20 29.54 590.802014-01-09 13:20:40 NaN
42 737550 Fritsch, Russel and Anderson S1-93683 22 71.68 1576.962014-01-11 23:47:36 NaN

This account number was not in our status file, so we have a bunch ofNaN’s. We can decide how we want to handle this situation. For thisspecific case, let’s label all missing accounts as bronze. Use thefillna function to easily accomplish this on the status column.

account numbernameskuquantityunit priceext pricedatestatus
0 740150 Barton LLC B1-20000 39 86.69 3380.912014-01-01 07:21:51 gold
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.162014-01-01 10:00:47 silver
2 218895 Kulas Inc B1-69924 23 90.70 2086.102014-01-01 13:24:58 bronze
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.052014-01-01 15:05:22 bronze
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.262014-01-01 23:26:55 bronze

Check the data just to make sure we’re all good.

account numbernameskuquantityunit priceext pricedatestatus
9 737550 Fritsch, Russel and Anderson S2-82423 14 81.92 1146.882014-01-03 19:07:37 bronze
14 737550 Fritsch, Russel and Anderson B1-53102 23 71.56 1645.882014-01-04 08:57:48 bronze
26 737550 Fritsch, Russel and Anderson B1-53636 42 42.06 1766.522014-01-08 00:02:11 bronze
32 737550 Fritsch, Russel and Anderson S1-27722 20 29.54 590.802014-01-09 13:20:40 bronze
42 737550 Fritsch, Russel and Anderson S1-93683 22 71.68 1576.962014-01-11 23:47:36 bronze
Combine

Now we have all of the data along with the status column filled in. Wecan do our normal data manipulations using the full suite of pandas capability.

Using Categories

Combine data from multiple excel worksheets into one worksheet

One of the relatively new functions in pandas is support for categoricaldata. From the pandas, documentation:

Categoricals are a pandas data type, which correspond to categoricalvariables in statistics: a variable, which can take on only a limited,and usually fixed, number of possible values (categories; levels in R).Examples are gender, social class, blood types, country affiliations,observation time or ratings via Likert scales.

For our purposes, the status field is a good candidate for a category type.

You must make sure you have a recent version of pandas ( > 0.15) installed for this example to work.

First, we typecast it the column to a category using astype.

This doesn’t immediately appear to change anything yet.

account numbernameskuquantityunit priceext pricedatestatus
0 740150 Barton LLC B1-20000 39 86.69 3380.912014-01-01 07:21:51 gold
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.162014-01-01 10:00:47 silver
2 218895 Kulas Inc B1-69924 23 90.70 2086.102014-01-01 13:24:58 bronze
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.052014-01-01 15:05:22 bronze
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.262014-01-01 23:26:55 bronze

Buy you can see that it is a new data type.

Categories get more interesting when you assign order to the categories.Right now, if we call sort on the column, it will sort alphabetically.

account numbernameskuquantityunit priceext pricedatestatus
1741 642753 Pollich LLC B1-04202 8 95.86 766.882014-02-28 23:47:32 bronze
1232 218895 Kulas Inc S1-06532 29 42.75 1239.752014-09-21 11:27:55 bronze
579 527099 Sanford and Sons S1-27722 41 87.86 3602.262014-04-14 18:36:11 bronze
580 383080 Will LLC B1-20000 40 51.73 2069.202014-04-14 22:44:58 bronze
581 383080 Will LLC S2-10342 15 76.75 1151.252014-04-15 02:57:43 bronze

We use set_categories to tell it the order we want to use for thiscategory object. In this case, we use the Olympic medal ordering.

Now, we can sort it so that gold shows on top.

account numbernameskuquantityunit priceext pricedatestatus
0 740150 Barton LLC B1-20000 39 86.69 3380.912014-01-01 07:21:51 gold
1193 257198 Cronin, Oberbrunner and Spencer S2-82423 23 52.90 1216.702014-09-09 03:06:30 gold
1194 141962 Herman LLC B1-86481 45 52.78 2375.102014-09-09 11:49:45 gold
1195 257198 Cronin, Oberbrunner and Spencer B1-50809 30 51.96 1558.802014-09-09 21:14:31 gold
1197 239344 Stokes LLC B1-65551 43 15.24 655.322014-09-10 11:10:02 gold
One

Analyze Data

The final step in the process is to analyze the data. Now that it is consolidated andcleaned, we can see if there are any insights to be learned.

For instance, if you want to take a quick look at how your top tiercustomers are performaing compared to the bottom. Use groupby to getthe average of the values.

quantityunit priceext price
status
gold 24.680723 52.431205 1325.566867
silver 23.814241 55.724241 1339.477539
bronze 24.589005 55.470733 1367.757736

Of course, you can run multiple aggregation functions on the data to getreally useful information

quantityunit priceext price
summeanstdsummeanstdsummeanstd
status
gold 8194 24.680723 14.478670 17407.16 52.431205 26.244516 440088.20 1325.566867 1074.564373
silver 15384 23.814241 14.519044 35997.86 55.724241 26.053569 865302.49 1339.477539 1094.908529
bronze 18786 24.589005 14.506515 42379.64 55.470733 26.062149 1044966.91 1367.757736 1104.129089

So, what does this tell you? Well, the data is completely random but myfirst observation is that we sell more units to our bronze customersthan gold. Even when you look at the total dollar value associated withbronze vs. gold, it looks odd that we sell more to bronze customers than gold.

Maybe we should look at how many bronze customers we have and see whatis going on?

What I plan to do is filter out the unique accounts and see how manygold, silver and bronze customers there are.

I’m purposely stringing a lot of commands together which is notnecessarily best practice but does show how powerful pandas can be. Feelfree to review my previous article hereand here to understand it better.Play with this command yourself to understand how the commands interact.

Ok. This makes a little more sense. We see that we have 9 bronzecustomers and only 4 customers. That is probably why the volumes are soskewed towards our bronze customers. This result makes sense given the factthat we defaulted to bronze for many of our customers. Maybe we shouldreclassify some of them? Obviously this data is fake but hopefully thisshows how you can use these tools to quickly analyze your own data.

Combine Data From Multiple Excel Worksheets Into One Worksheet

Conclusion

This example only covered the aggregation of 4 simple Excel files containing random data. Howeverthe principles can be applied to much larger data sets yet you can keep the codebase very manageable. Additionally, you have the full power of python at yourfingertips so you can do much more than just simply manipulate the data.

I encourage you to try some of these concepts out on your scenarios andsee if you can find a way to automate that painful Excel task that hangsover your head every day, week or month.

Good luck!

Comments

Users of UW-Madison's institutional Tableau workbooks may need to pull data from one Microsoft Excel spreadsheet into another spreadsheet. This KB article explains how, by using an Excel formula called vLookup.

How does the vLookup formula work?
Excel's vLookup formula pulls data from one spreadsheet into another by matching on a unique identifier located in both spreadsheets. For example, we want to add a column for email address but that data exists on a separate spreadsheet. vLookup can pull email addresses from Spreadsheet 2 into Spreadsheet 1 by matching CampusID 555123123 in both spreadsheets.

  1. Locate where you want the data to go. Click that cell only once.

  2. At the top, go to the Formulas tab and click Lookup & Reference.

  3. Select vLookup

  4. Excel’s vLookup wizard will pop up. We’ll walk through each part of the formula.


  5. Lookup_value
    Find the Unique Identifier (lookup value). It is usually in the same row as the empty cell you selected.
    Click once on the Unique Identifier so that the cell position will automatically fill in. In this example it is cell B2.


  6. Go to the next field, Table_array (click in it once). In Spreadsheet 2 highlight the table containing the info you want, starting with the Unique ID.


    In this example, Excel looks up Campus ID 555123123 in the first highlighted column of Spreadsheet 2.
    Note: Make sure each Unique ID is listed only once in the table_array (on the second spreadsheet) so that vLookup retrieves the correct value. For example, if 555123123 is duplicated in the table_array, where Student [email protected] is the email in one row and Student [email protected] in the other, Excel will choose one of the emails for you.
  7. Go to Col_index_num (click in it once). This identifies which column contains the information you want from Spreadsheet 2.
    Type the number of columns your field is from the Unique ID, where the Unique ID is 1. Here, the Email field is the third column.
  8. Go to Range_lookup (click in it once). Type FALSE to search for exact matches. The result will look something like this:


  9. Finally, copy and paste the formula to pull emails for the rest of the column.
    (Note: if your table array is in the same Excel workbook, put $ signs around the cell values, similar to the example below. This ensures that you reference the correct cells in the table array, meaning that the table array does not shift down when you paste the formula down. See Advanced Tip below for more details.)
vLookup Shortcut
If you feel comfortable with the vLookup tool instructions above, you can type the formula directly in the cell instead of using the wizard.
  1. Type the beginning of the formula: =VLOOKUP(
    The formula guide will appear below.
    (Note: You may notice Excel displays the formula in 2 places: the formula bar above and directly in the cell. You can edit the formula in either place.)


  2. Follow the guide and enter each value. Remember to insert a comma between each value.
  3. Insert a closed parenthesis ) and hit Enter. The end result will look like something like this:
    =VLOOKUP(B2,'[Spreadsheet Name.xlsx]SheetName'!$B$1:$E$11,3,FALSE)
  4. Finally, copy and paste the formula to pull emails for the rest of the column. Keep relative references in mind and use $ signs where necessary. (See Advanced Tip below for more details.)
Advanced Tip on Relative References
The position of the lookup value (Unique ID) in relation to the vLookup formula is maintained when you copy and paste. If you paste the formula one cell down (to E3), it looks up the Unique ID that is also one cell down (B3). The same is true when copying right, left or up.
In other words, the formula will stay x number of columns and y number of rows away from the lookup value – no matter where you paste the formula. In our example, the formula is the fourth column from the CampusID and in the same row. No matter where you paste the formula (in this example), it will always look up the cell that is the fourth cell to the left in the same row.
However, it is possible to lock cells in place by inserting 1 or more $ signs. This means, no matter where you paste the formula, it will always reference the same cell.When copying and pasting the formula, use the $ sign to lock in cells.
Excel
  • To lock in the lookup value in cell B1, insert $ signs before the column and the row:
    =VLOOKUP($B$1,’[Spreadsheet2.xlsx]SheetName’!$B$1:$E:$11,3,FALSE)
  • To lock in the column only, insert a $ before B only.
  • To lock in the row only, insert a $ before 1 only.


Need More Information or Help?

If you have questions about this Tableau document, please contact Melissa Chan, Office of Data Management and Analytics Services (ODMAS) at

Excel File Sample

[email protected].
Keywords:Tableau Workbook Dashboard Excel 2 Two Combine Pull Data IDESuggest keywordsDoc ID:90851
Owner:Steven T.Group:Office of Data Management & Analytics Services KB
Created:2019-04-04 11:15 CDTUpdated:2020-06-20 04:08 CDT
Sites:Office of Data Management & Analytics Services KB
Feedback:243CommentSuggest a new document
Coments are closed

Most Viewed Posts

  • Tor Browser Download Win 10
  • Amazon Prime Video Television Channel
  • Discord Bit

Scroll to top