Skip to content

joshmuncke/tidygoogleway

Repository files navigation

tidygoogleway

The googleway package provides some excellent and highly versatile methods for querying and analyzing data from the Google Maps APIs.

tidygoogleway builds on the functionality in googleway with a single purpose - to provide a tidy interface to the Google Places API. The methods in this package assume that you are starting with a dataframe/tibble of location data that you wish to enrich with data from Google Places.

Installation

You can install tidygoogleway from Github using the following command:

# You must have devtools installed first
devtools::install_github("joshmuncke/tidygoogleway")

Setup

To use this package you’ll need a Google Places API key. You can save this key to your environment variables using googleway::set_key and it will be automatically picked up by tidygoogleway.

googleway::set_key("<YOUR API KEY>")

Usage

The add_google_places function expects a dataframe with (at the minimum) a field containing the name and address of the locations you wish to add Google Places data to. It will return a dataframe with the relevant Places data appended (i.e. it’s pipe-able).

Often a Google Places search will return multiple results. In this instance add_google_places function will perform a string similarity comparison on the location name and address between the values you provide and the values returned from Google. If you supply latitude and longitude fields then add_google_places will factor a geographic distance into this calculation too.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)
library(furrr)
#> Loading required package: future
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names
library(tidygoogleway)

# The macdonalds dataframe contains the name and address of 11 McDonalds locations in Los Angeles
mcdonalds %>% head(5)
#> # A tibble: 5 x 2
#>   name      address                                    
#>   <chr>     <chr>                                      
#> 1 McDonalds 2809 N Lincoln Blvd Santa Monica, CA 90405 
#> 2 McDonalds 4680 Lincoln Blvd, Los Angeles, CA 90292   
#> 3 McDonalds 2457 Lincoln Blvd, Venice, CA 90291        
#> 4 McDonalds 1540 2nd Ave, Santa Monica, CA 90405       
#> 5 McDonalds 2902 West Pico Blvd, Santa Monica, CA 90405

# Now add Google Places data to our dataframe
enriched <- mcdonalds %>% add_google_places(name, address, radar = F)
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
enriched %>% select(name, address, google_place_id, google_rating)
#> # A tibble: 11 x 4
#>    name     address                     google_place_id       google_rating
#>    <chr>    <chr>                       <chr>                         <dbl>
#>  1 McDonal… 2809 N Lincoln Blvd Santa … ChIJG1-i-tm6woARShGW…           3.6
#>  2 McDonal… 4680 Lincoln Blvd, Los Ang… ChIJsWoNelHBwoARe6bL…           3.6
#>  3 McDonal… 2457 Lincoln Blvd, Venice,… ChIJo_SkgY26woARR0FJ…           3.5
#>  4 McDonal… 1540 2nd Ave, Santa Monica… ChIJIzmJms-kwoARsrO3…           3.6
#>  5 McDonal… 2902 West Pico Blvd, Santa… ChIJlaqRZhe7woARwM2i…           3.5
#>  6 McDonal… 2712 Santa Monica Blvd, Sa… ChIJEc3LTka7woARUaiX…           3.5
#>  7 McDonal… 11300 National Blvd, Los A… ChIJM7ZVgq67woARCeiE…           3.7
#>  8 McDonal… 10623 Venice Blvd, Los Ang… ChIJZ295STC6woARhE1O…           3.5
#>  9 McDonal… 3571 Rosecrans Ave, Hawtho… ChIJL_usBci1woARxjKJ…           3.8
#> 10 McDonal… 15810 Crenshaw Blvd, Garde… ChIJgzc3DaG1woARezwC…           3.9
#> 11 McDonal… 101 W Manchester Ave, Los … ChIJMb-tCr_JwoARwjL-…           3.6

By default, only the best matching location will be returned (so the number of rows in will be the same as the number of rows out). If you wish to override this behaviour and return multiple results use .keep_all = T.

Note that if you use the default .keep_all = T you may end up with more rows than you started with. These can be filtered using the mean_distance column (geometric mean of geo-distance and string distance) or google_result_number (ordering of results from Google Places API).

Parallel processing

Often for these kinds of use cases you are iterating over a large number of locations. To speed this process up (and provide progress visibility) add_google_places utilizes the furrr library.

N.B. In order to make use of the parallel processing capabilities you must set plan(multiprocess) prior to running the add_google_places command. This syntax should work on Windows and Mac.

About

Easily add Google Places data to a dataframe

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages