This function identifies potential join pairs between two data frames based on the overlap between the distinct values in their columns. It returns a data frame showing the possible join pairs.

identify_join_pairs(..., similarity_cutoff = 0.2)

Arguments

...

A list of two data frames.

similarity_cutoff

The minimal percentage of overlap between the distinct values in the columns.

Value

A data frame showing candidate join pairs.

Examples

identify_join_pairs(iris, iris3)
#>    data_frame1_column  data_frame2_column     score
#> 1         Sepal.Width     Sepal W..Setosa 0.6956522
#> 2         Sepal.Width Sepal W..Versicolor 0.6086957
#> 3        Sepal.Length Sepal L..Versicolor 0.6000000
#> 4        Sepal.Length  Sepal L..Virginica 0.6000000
#> 5        Sepal.Length  Petal L..Virginica 0.5714286
#> 6         Sepal.Width  Sepal W..Virginica 0.5652174
#> 7         Petal.Width  Petal W..Virginica 0.5454545
#> 8         Sepal.Width Petal L..Versicolor 0.4782609
#> 9        Petal.Length  Petal L..Virginica 0.4651163
#> 10       Petal.Length Petal L..Versicolor 0.4418605
#> 11       Sepal.Length     Sepal L..Setosa 0.4285714
#> 12        Petal.Width     Petal L..Setosa 0.4090909
#> 13        Petal.Width Petal W..Versicolor 0.4090909
#> 14       Petal.Length Sepal L..Versicolor 0.3953488
#> 15       Petal.Length     Sepal L..Setosa 0.3488372
#> 16        Petal.Width     Petal W..Setosa 0.2727273
#> 17       Sepal.Length Petal L..Versicolor 0.2571429
#> 18       Petal.Length     Sepal W..Setosa 0.2558140
#> 19       Petal.Length  Sepal L..Virginica 0.2558140
#> 20        Petal.Width Sepal W..Versicolor 0.2272727
#> 21        Sepal.Width  Petal W..Virginica 0.2173913
#> 22       Petal.Length     Petal L..Setosa 0.2093023