identify_join_pairs.Rd
This function identifies potential join pairs between two data frames based on the overlap between the distinct values in their columns. It returns a data frame showing the possible join pairs.
identify_join_pairs(..., similarity_cutoff = 0.2)
A list of two data frames.
The minimal percentage of overlap between the distinct values in the columns.
A data frame showing candidate join pairs.
identify_join_pairs(iris, iris3)
#> data_frame1_column data_frame2_column score
#> 1 Sepal.Width Sepal W..Setosa 0.6956522
#> 2 Sepal.Width Sepal W..Versicolor 0.6086957
#> 3 Sepal.Length Sepal L..Versicolor 0.6000000
#> 4 Sepal.Length Sepal L..Virginica 0.6000000
#> 5 Sepal.Length Petal L..Virginica 0.5714286
#> 6 Sepal.Width Sepal W..Virginica 0.5652174
#> 7 Petal.Width Petal W..Virginica 0.5454545
#> 8 Sepal.Width Petal L..Versicolor 0.4782609
#> 9 Petal.Length Petal L..Virginica 0.4651163
#> 10 Petal.Length Petal L..Versicolor 0.4418605
#> 11 Sepal.Length Sepal L..Setosa 0.4285714
#> 12 Petal.Width Petal L..Setosa 0.4090909
#> 13 Petal.Width Petal W..Versicolor 0.4090909
#> 14 Petal.Length Sepal L..Versicolor 0.3953488
#> 15 Petal.Length Sepal L..Setosa 0.3488372
#> 16 Petal.Width Petal W..Setosa 0.2727273
#> 17 Sepal.Length Petal L..Versicolor 0.2571429
#> 18 Petal.Length Sepal W..Setosa 0.2558140
#> 19 Petal.Length Sepal L..Virginica 0.2558140
#> 20 Petal.Width Sepal W..Versicolor 0.2272727
#> 21 Sepal.Width Petal W..Virginica 0.2173913
#> 22 Petal.Length Petal L..Setosa 0.2093023