Columns to merge on can be specified by name, number or by a logical vector: the name 'row.names' or the number 0 specifies the row names. If specified by name it must correspond uniquely to a named column in the input. In R you use the merge function to combine data frames. This powerful function tries to identify columns or rows that are common between the two different data frames. How to use merge to find the intersection of data The simplest form of merge finds the.

> I think the easiest way is probably

>

> data$z <- rowMeans(data[, c('x', 'y')], na.rm=TRUE)

>

> Best,

> Ista

>

> On Fri, Feb 25, 2011 at 12:12 AM, Andrew Anglemyer

> <[hidden email]> wrote:

> > I am trying to combine two columns in a data frame into one column. Some

> > values in either column are missing, but not in the same row for the two

> > different columns. Additionally, when both columns in a row contain

> data,

> > the data are identical. I want a new column with the identical data or

> the

> > data from the column with observed data. For example:

> >

> > I have

> >>data

> > id x y

> > 1 a 1 NA

> > 2 b 2 2

> > 3 c 3 3

> > 4 d NA 4

> >

> > And I want

> >>new.data

> > id x y z

> > 1 a 1 NA 1

> > 2 b 2 2 2

> > 3 c 3 3 3

> > 4 d NA 4 4

> >

> > I've looked through the help and there are column combining solutions,

> but

> > they don't seem to work well for this solution.

> > Thanks for any help!

> > Andy

> >

> > [[alternative HTML version deleted]]

> >

> > ______________________________________________

> > [hidden email] mailing list

> > https://stat.ethz.ch/mailman/listinfo/r-help

> > PLEASE do read the posting guide

> http://www.R-project.org/posting-guide.html

> > and provide commented, minimal, self-contained, reproducible code.

> >

>

>

>

> --

> Ista Zahn

> Graduate student

> University of Rochester

> Department of Clinical and Social Psychology

> http://yourpsyche.org

>

`merge`

is a generic function whose principal method is for data frames: the default method coerces its arguments to data frames and calls the `'data.frame'`

method.

By default the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by `by.x`

and `by.y`

. The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each. For the precise meaning of ‘match’, see `match`

.

Columns to merge on can be specified by name, number or by a logical vector: the name `'row.names'`

or the number `0`

specifies the row names. If specified by name it must correspond uniquely to a named column in the input.

If `by`

or both `by.x`

and `by.y`

are of length 0 (a length zero vector or `NULL`

), the result, `r`

, is the *Cartesian product* of `x`

and `y`

, i.e., `dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y))`

.

If `all.x`

is true, all the non matching cases of `x`

are appended to the result as well, with `NA`

filled in the corresponding columns of `y`

; analogously for `all.y`

.

If the columns in the data frames not used in merging have any common names, these have `suffixes`

(`'.x'`

and `'.y'`

by default) appended to try to make the names of the result unique. If this is not possible, an error is thrown.

If a `by.x`

column name matches one of `y`

, and if `no.dups`

is true (as by default), the y version gets suffixed as well, avoiding duplicate column names in the result.

The complexity of the algorithm used is proportional to the length of the answer.

In SQL database terminology, the default value of `all = FALSE`

gives a *natural join*, a special case of an *inner join*. Specifying `all.x = TRUE`

gives a *left (outer) join*, `all.y = TRUE`

a *right (outer) join*, and both (`all = TRUE`

) a *(full) outer join*. DBMSes do not match `NULL`

records, equivalent to `incomparables = NA`

in R.

Coments are closed