MATCH FILES is an SPSS command mostly used for merging data holding similar cases but different variables. For different cases but similar variables, use ADD FILES.MATCH FILES is also the way to go for a table lookup similar to VLOOKUP in Excel.

Merging two datasets by id, which is a unique case identifier.

Open the first file that you wish to merge. Next under the 'Merge Files' item in the 'Data' menu, select 'Add Variables If both keys have the same name one will show up in the 'Excluded Variables'.

SPSS Match Files - Basic Use

  • The most common scenario for MATCH FILES are two data files or datasets holding different variables on similar cases.
  • Each case has a unique id (identifier) in each data source. This id tells SPSS which case from one data source corresponds to which case from the other. Corresponding cases become a single case in the merged data.
  • The syntax below demonstrates a very basic MATCH FILES command. If you're not comfortable working with multiple datasets, have a look at SPSS Datasets Tutorial 1 - Basics.
  • Merge the active dataset with another open datasetor IBM SPSS Statisticsdata file containingthe same cases but different variables. From the menus choose: Data Merge Files. Select Add Casesor Add Variables. For more information on merging filesby adding cases (rows), see Add Cases. For more information on merging filesby adding variables (columns), see Add Variables.
  • At first we have to go to menu bar and select. “Data”→“Merge file”→“Add case”. Then there open a new box named “Add cases to data set1” and we have to select “An external spss data file” and click on “Browse” and select the second data set which we want to combine with “data set1″.

SPSS Match Files Syntax Example 1

*1. Create test data 1.

data list free/id test_1.
begin data
3 8 4 5 6 6
end data.
dataset name test_1.
*2. Create test data 2.

data list free/id test_2.
begin data
1 4 3 9 4 8
end data.
dataset name test_2.
*3. Match test_1 and test_2.

match files file = test_1 / file = test_2
/by id.
*4. Close all but merged dataset.

dataset close test_1.
dataset close test_2.

SPSS Match Files - Table

  • A second common scenario is having a file with respondents and their zip codes. Note that there are probably duplicate zip codes in the respondents file.
  • If we also have a table with the city (or region) indicated by each zip code, we can merge these into the respondent data. In this case we can use MATCH FILES with one FILE (with duplicates) and one TABLE (without duplicates).
  • The syntax below demonstrates how to do this. Note that * refers to the active dataset.

SPSS Match Files Syntax Example 2

*1. Table holding zip codes and cities.

data list free/zip_code (f3.0) city(a20).
begin data
123 'Amsterdam' 456 'Haarlem' 789 's Hertogenbosch'
end data.
dataset name cities.
*2. Mini data holding respondents and their zip codes.

data list free /id zip_code.
begin data
1 123 2 123 3 123 4 456 5 456 6 456 7 789 8 789 9 789
end data.
*3. Add cities to active dataset using zip_code.

match files file * / table cities
/by zip_code.
*4. Close all but merged data.

dataset close cities.

SPSS Match Files - One Data Source

  • Match files can also be used with a single data source. This is often used for reordering variables and/or dropping variables..
  • One option here is using the KEEP subcommand. It basically means “drop all variables except ...”.
  • Alternatively, the DROP subcommand means “keep all variables except ...”.Note that these subcommands can be used in a similar way in a GET FILE, SAVE and ADD FILES command.
  • The TO and ALL keywords are convenient here. However, in this case ALL means “all variables that haven't been addressed yet” rather than simply all variables.

SPSS Match Files Syntax Example 3

*1. Single case test data with wrong variable order.

data list free / v1 to v3 v5 v6 v7 v8 v4.
begin data
0 0 0 0 0 0 0 0
end data.
* 2. Reorder variables. Note the TO and ALL keywords here.

match files file * / keep v1 to v3 v4 all.

SPSS Match Files - Rules

  • Instead of merging two data sources, you may specify up to 50 data sources in one MATCH FILES command.
  • More than one variable may be used to uniquely identify cases. We'll hereafter refer to these as the BY variables since they're used on the BY subcommand. An common example are respondents having a household_id and a member_id indicating the nth member of each household. Both variables will probably have many duplicates but their combination should uniquely identify each respondent.
  • All data must be sorted on the BY variable(s) ascendingly. In case of doubt, run SORT CASES before proceeding.
  • The order of the merged variables is the order in which they're encountered. This implies that the order in which data sources are specified matters for the end result. For a demo, run the first syntax example once with file = test_1 / file = test_2 and then again with file = test_2 / file = test_1.
  • Make sure there's no duplicate variable names across data sources. In this case, values on duplicate variables that are first encountered overwrite those that are encountered later. Annoyingly, SPSS does not throw a warning if this happens.
On Jan 13, 12:10 am, RichUlrich <[email protected]> wrote:
> Or, it is *possible* to match to files without any ID variable
> in the syntax. (At least, it used to be.) (Jon - Perhaps this
> behavior ought to be available *only* when invoked by a
> special keyword?)
> If you give it no ID variable, SPSS will assume that the lines
> belong together, one by one, in the order that they exist
> respectively in the two files.
> - This should never be used if you can avoid it. Unsafe.
> - You should be *aware* of it, to avoid omitting your ID variables.
> You certainly don't want this outcome by accident.

Yes, one can still match with no use of BY. But as Rich notes, it can
be dangerous. Here's an example.

* --- Start of example --- .

data list list / id x1 (2f5.0).
begin data
1 5
2 4
3 4
4 2
5 4
end data.
dataset name dataset1 .

data list list / id x2 (2f5.0).
begin data
1 3
3 5
4 3
5 5
end data.
dataset name dataset2 .

* MATCH FILES without BY .

match files
file = 'dataset1' /
file = 'dataset2' .


match files
file = 'dataset1' /
file = 'dataset2' /

* --- End of example --- .


Notice that ID number 2 is missing from the second data set. Here is
the output.

From the first MATCH FILES (without BY):

Merge 2 spss datasets

id x1 x2
1 5 3
2 4 5 <-- Incorrect matching from here down
3 4 3
4 2 5
5 4 .

From the second MATCH FILES (with BY):

id x1 x2
1 5 3
2 4 . <-- Correct!
3 4 5
4 2 3
5 4 5

Bruce Weaver
[email protected]

Merge Datasets In Spss
'When all else fails, RTFM.'
Coments are closed

Most Viewed Posts

  • Phanuel Sigil
  • Google Authenticator Ios
  • Merge Two Worksheets In Excel
  • Sysbot
  • Mobile Onion Browser

Scroll to top