Skip to content

installer proximity indicator#8

Open
india-kerle wants to merge 11 commits intodevfrom
epc_enrich
Open

installer proximity indicator#8
india-kerle wants to merge 11 commits intodevfrom
epc_enrich

Conversation

@india-kerle
Copy link
Contributor

Hey @lizgzil,

Thanks for reviewing my indicator flow - really appreciate it.

The main two scripts to review are:

  1. installer_proximity_indicator_flow.py
  2. installer_proximity_indicator_utils.py

To run the flow in the activated conda environment, python installer_proximity_indicator_flow.py run. This should generate an indicator score for a sample of 10k EPC records.

Let me know if any of my calculations in the utils file seem a bit funny or if the functions calculate_buffer_distribution and calculate_weighted_installer_proximity could be optimised for speed because it takes a bit of time for the step that uses these functions to finish running in the flow.

Cheers!

@india-kerle india-kerle requested a review from lizgzil May 5, 2022 21:17
Copy link

@lizgzil lizgzil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! I ran it and it worked (I commented out the save bit though). Just some suggestions for style, but I doubt they will save much computing time. I may have spotted one bug on line 71 of utils, hence the requested changes (a comment below it gives a fix)


3. **Generate indicator score by weighing company buffer counts per EPC home.** Where the minimum buffer count weights more than the maximum buffer count. If a given EPC home falls within many minimum company buffers, the home will have a high proximity indicator score while the opposite is true if the home falls within many maximum company buffers.

<img width="814" alt="indicator_visual" src="https://user-images.githubusercontent.com/46863334/166961845-b5faebef-6e8e-4338-a2fc-817f01669d91.png">
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this image!

Comment on lines +69 to +81
latlongs = installation_info[["longitude", "latitude"]].values
if len(latlongs) > 1:
central_point = calculate_point_midpoint(latlongs[0], latlongs[1])
elif company_name in list(set(cleaned_installer_companies.installer_name)):
central_point = tuple(
cleaned_installer_companies[
cleaned_installer_companies["installer_name"] == company_name
][["longitude", "latitude"]].values[0]
)

dists = [
distance(central_point, install_latlong) for install_latlong in latlongs
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can utilise a bit more of pandas here. Also I think you need a final else even if its an error message.

if len(installation_info) > 1:
     central_point = installation_info[["longitude", "latitude"]].mean()
elif company_name in set(cleaned_installer_companies.installer_name):
            central_point = cleaned_installer_companies[
                    cleaned_installer_companies["installer_name"] == company_name
                ][["longitude", "latitude"]].values[0]
            )
else:
   print("some sort of message")
  
def func_name(long, lat, central_point):
    return distance(tuple(central_point), (long, lat))

dists = installation_info.apply(lambda x: func_name(x['longitude'], x['latitude'], central_point), axis =1).tolist()

for company_name, installation_info in clean_installations.groupby(
"installer_name"
):
latlongs = installation_info[["longitude", "latitude"]].values
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning! you use "latlongs" as the variable name but the order is long - lat. Just thought I'd flag in case this means something needs to be switched around

central_point = tuple(
cleaned_installer_companies[
cleaned_installer_companies["installer_name"] == company_name
][["longitude", "latitude"]].values[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are taking the first one with this name, but are there any case where there might be multiple with the same name?

):
latlongs = installation_info[["longitude", "latitude"]].values
if len(latlongs) > 1:
central_point = calculate_point_midpoint(latlongs[0], latlongs[1])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this is the centroid of all the lat longs but rather the centroid of just the first and second?

Comment on lines +85 to +97
comp_buffers = [
generate_geodesic_point_buffer(
central_point[1], central_point[0], dist.km
)
for dist in (min(dists), dist_25, dist_50, dist_75, max(dists))
]
company_install_shapes[company_name] = {
"min_buffer": comp_buffers[0],
"25_buffer": comp_buffers[1],
"50_buffer": comp_buffers[2],
"75_buffer": comp_buffers[3],
"max_buffer": comp_buffers[4],
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps slightly nicer:

company_install_shapes[company_name] = {
    buffer_name:  generate_geodesic_point_buffer(central_point[1], central_point[0], dist.km)
    for buffer_name, dist in [("min_buffer", min(dists)), ("25_buffer", dist_25), .... ]
    }

Comment on lines +126 to +128
"lost ",
len(self.epc_data) - len(self.epc_geo_data),
" records when geocoding.",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed len(self.epc_geo_data) was 9999. Made me wonder if records are lost because of some sort of upper bound to the length? (no idea about this really, but 9999 just felt like a curious number)

Comment on lines +120 to +123
in_buffer = [shape.contains(epc_geo) for shape in company_shapes.values()]
if any(in_buffer) is not False:
smallest_buffer = list(company_shapes.keys())[in_buffer.index(True)]
buffer_distribution.append(smallest_buffer)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do

for buffer_name, shape in company_shapes.items():
    if shape.contains(epc_geo):
        buffer_distribution.append(buffer_name)
        break

"""
weighted_installer_closeness_score = 0
for buffer_dist, buffer_count in buffer_distb.items():
if "min_" in buffer_dist:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to cut down on ink, before the if you could set buffer_count / buffer_distb["max_buffer"] to a variable and then use it in the rest of the calculations

Copy link

@lizgzil lizgzil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for such a long delay in looking at this PR again 😳
I couldnt get it to run since I don't have asf-core-data/outputs/MCS/cleaned_geocoded_mcs_installations.csv . I've made a suggested change since I think this is correct - but see what you think

):
if len(installation_info) > 1:
central_point = installation_info[["longitude", "latitude"]].mean()
central_point = cleaned_installer_companies[["longitude", "latitude"]].median()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
central_point = cleaned_installer_companies[["longitude", "latitude"]].median()
central_point = cleaned_installer_companies[cleaned_installer_companies["installer_name"] == company_name][["longitude", "latitude"]].median()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this!!! I've got to re-jog my memory with this one but i'll wrap this up soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants