Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes/render results #37

Merged
merged 12 commits into from
May 14, 2016
Merged
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ RUN apt-get -q update && \
apt-get clean

# copy requirements.txt and run pip to install all dependencies into the virtualenv.
ADD requirements.txt /DeepOSM/requirements.txt
RUN pip install -r /DeepOSM/requirements.txt
ADD requirements_base.txt /DeepOSM/requirements_base.txt
ADD requirements_cpu.txt /DeepOSM/requirements_cpu.txt
RUN pip install -r /DeepOSM/requirements_cpu.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh yes... I knew I was forgetting something... thanks

RUN ln -s /home/vmagent/src /DeepOSM

# install libosmium and pyosmium bindings
Expand Down
5 changes: 3 additions & 2 deletions Dockerfile.devel-gpu
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ RUN apt-get -q update && \
apt-get clean

# copy requirements.txt and run pip to install all dependencies into the virtualenv.
ADD requirements.txt /DeepOSM/requirements.txt
RUN pip install -r /DeepOSM/requirements.txt
ADD requirements_base.txt /DeepOSM/requirements_base.txt
ADD requirements_gpu.txt /DeepOSM/requirements_gpu.txt
RUN pip install -r /DeepOSM/requirements_gpu.txt
RUN ln -s /home/vmagent/src /DeepOSM

# install libosmium and pyosmium bindings
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ By default, DeepOSM will download the minimum necessary training data, and use t
* It will be about 65% accurate, based on how the training/test data is constructed.
* It will use a single fully connected relu layer in [TensorFlow](https://www.tensorflow.org/).

![NAIP with Ways and Predictions](https://pbs.twimg.com/media/Cg2F_tBUcAA-wHs.png)
![NAIP with Ways and Predictions](https://pbs.twimg.com/media/CiZVcu8UgAIYA-c.jpg)

## Background on Data - NAIPs and OSM PBF

Expand Down
39 changes: 29 additions & 10 deletions bin/create_training_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def create_parser():
)
parser.add_argument("--extract-type",
default='highway',
choices=['highway', 'tennis'],
choices=['highway', 'tennis', 'footway', 'cycleway'],
help="the type of feature to identify")
parser.add_argument("--save-clippings",
action='store_true',
Expand All @@ -75,16 +75,35 @@ def main():
NAIP_SPECTRUM,
NAIP_GRID,
).download_naips()
road_labels, naip_tiles, waymap, way_bitmap_npy = random_training_data(
raster_data_paths, args.extract_type, args.band_list, args.tile_size, args.pixels_to_fatten_roads, args.label_data_files, args.tile_overlap)
equal_count_way_list, equal_count_tile_list = equalize_data(road_labels, naip_tiles,

road_labels, naip_tiles, waymap = random_training_data(raster_data_paths,
args.extract_type,
args.band_list,
args.tile_size,
args.pixels_to_fatten_roads,
args.label_data_files,
args.tile_overlap)

equal_count_way_list, equal_count_tile_list = equalize_data(road_labels,
naip_tiles,
args.save_clippings)
test_labels, training_labels, test_images, training_images = split_train_test(
equal_count_tile_list, equal_count_way_list, args.percent_for_training_data)
label_types = waymap.extracter.types
onehot_training_labels, onehot_test_labels = format_as_onehot_arrays(label_types, training_labels, test_labels)
dump_data_to_disk(raster_data_paths, training_images, training_labels, test_images, test_labels,
label_types, onehot_training_labels, onehot_test_labels)

test_labels, training_labels, test_images, training_images = split_train_test(equal_count_tile_list,
equal_count_way_list,
args.percent_for_training_data)

onehot_training_labels, onehot_test_labels = format_as_onehot_arrays(waymap.extracter.types,
training_labels,
test_labels)

dump_data_to_disk(raster_data_paths,
training_images,
training_labels,
test_images,
test_labels,
waymap.extracter.types,
onehot_training_labels,
onehot_test_labels)


if __name__ == "__main__":
Expand Down
1 change: 0 additions & 1 deletion requirements_base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ h5py==2.6.0
jupyter==1.0.0
numpy==1.11.0
Pillow==3.2.0
pyosmium==2.6.0
pyproj==1.9.5.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, do we not need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets installed from source in the Dockerfile, I think pyosmium might not be pippable at all?

Copy link
Contributor

@zain zain May 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cool if we don't include pyosmium in the file, but we should probably add a comment explaining the dependencies that aren't included in requirements, so that it's documented at least (since people will expect to look in requirements for the python dependencies).

Btw, I managed to install pyosmium with pip. See the last line of the first code block in my comment on issue 8 here for the magic incantation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'll see if I can try and get that to pip install like you have then, via reqs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't be able to add the includes flag via requirements.txt (and even if you did, it probably wouldn't be good, since it takes a hardcoded path), which is why I'm fine if we don't include pyosmium.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I thought I could add those flags in Dockerfile, but I see why that won't really work or help. I'll add a note to reqs about pyosmium.

requests==2.10.0
s3cmd==1.6.1
Expand Down
16 changes: 6 additions & 10 deletions src/create_training_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
from download_labels import download_and_extract
from geo_util import latLonToPixel, pixelToLatLng

# there is a 300 pixel buffer around NAIPs that should be trimmed off where NAIPs overlap...
# using overlapping images makes wonky train/test splits
# there is a 300 pixel buffer around NAIPs that should be trimmed off,
# where NAIPs overlap... using overlapping images makes wonky train/test splits
NAIP_PIXEL_BUFFER = 300

def read_naip(file_path, bands_to_use):
Expand Down Expand Up @@ -60,7 +60,7 @@ def tile_naip(raster_data_path, raster_dataset, bands_data, bands_to_use, tile_s

return all_tiled_data

def way_bitmap_for_naip(ways, raster_data_path, raster_dataset, rows, cols, pixels_to_fatten_roads):
def way_bitmap_for_naip(ways, raster_data_path, raster_dataset, rows, cols, pixels_to_fatten_roads=None):
'''
generate a matrix of size rows x cols, initialized to all zeroes,
but set to 1 for any pixel where an OSM way runs over
Expand All @@ -79,14 +79,11 @@ def way_bitmap_for_naip(ways, raster_data_path, raster_dataset, rows, cols, pixe
bounds = bounds_for_naip(raster_dataset, rows, cols)
ways_on_naip = []

t0 = time.time()
print("FINDING WAYS on NAIP..."),
for way in ways:
for point_tuple in way['linestring']:
if bounds_contains_point(bounds, point_tuple):
ways_on_naip.append(way)
break
print(" {0:.1f}s".format(time.time()-t0))
print("EXTRACTED {} highways in NAIP bounds, of {} ways".format(len(ways_on_naip), len(ways)))

print("MAKING BITMAP for way presence...", end="")
Expand Down Expand Up @@ -191,13 +188,13 @@ def random_training_data(raster_data_paths,
rows = bands_data.shape[0]
cols = bands_data.shape[1]

way_bitmap_npy[raster_data_path] = numpy.asarray(way_bitmap_for_naip(waymap.extracter.ways, raster_data_path, raster_dataset, rows, cols, pixels_to_fatten_roads))
way_bitmap_npy = numpy.asarray(way_bitmap_for_naip(waymap.extracter.ways, raster_data_path, raster_dataset, rows, cols, pixels_to_fatten_roads))

left_x, right_x, top_y, bottom_y = NAIP_PIXEL_BUFFER, cols-NAIP_PIXEL_BUFFER, NAIP_PIXEL_BUFFER, rows-NAIP_PIXEL_BUFFER
for col in range(left_x, right_x, tile_size/tile_overlap):
for row in range(top_y, bottom_y, tile_size/tile_overlap):
if row+tile_size < bottom_y and col+tile_size < right_x:
new_tile = way_bitmap_npy[raster_data_path][row:row+tile_size, col:col+tile_size]
new_tile = way_bitmap_npy[row:row+tile_size, col:col+tile_size]
road_labels.append((new_tile,(col, row),raster_data_path))

for tile in tile_naip(raster_data_path, raster_dataset, bands_data, band_list, tile_size, tile_overlap):
Expand All @@ -206,7 +203,7 @@ def random_training_data(raster_data_paths,
assert len(road_labels) == len(naip_tiles)

road_labels, naip_tiles = shuffle_in_unison(road_labels, naip_tiles)
return road_labels, naip_tiles, waymap, way_bitmap_npy
return road_labels, naip_tiles, waymap

def shuffle_in_unison(a, b):
'''
Expand Down Expand Up @@ -413,7 +410,6 @@ def load_data_from_disk():
onehot_training_labels = pickle.load(infile)
with open(CACHE_PATH + 'onehot_test_labels.pickle', 'r') as infile:
onehot_test_labels = pickle.load(infile)

print("DATA LOADED: time to unpickle/json test data {0:.1f}s".format(time.time() - t0))
return raster_data_paths, training_images, training_labels, test_images, test_labels, label_types, \
onehot_training_labels, onehot_test_labels
Expand Down
37 changes: 12 additions & 25 deletions src/download_labels.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,8 @@ def __init__(self, extract_type='highway'):
def way(self, w):
if self.extract_type == 'tennis':
self.extract_if_tennis_court(w)
elif self.extract_type == 'highway':
self.extract_if_highway(w)
else:
print "ERROR unknown type to extract from PBF file"
self.extract_way_type(w)

def extract_if_tennis_court(self, w):
name = ''
Expand All @@ -69,43 +67,32 @@ def extract_if_tennis_court(self, w):

self.add_linestring(w, way_dict)

def extract_if_highway(self, w):
is_highway = False
is_big = False
def extract_way_type(self, w):
should_extract = False
name = ''
highway_type = None
way_type = None
for tag in w.tags:
if tag.k == 'name':
name = tag.v
# and tag.v in ['primary', 'secondary', 'tertiary', 'trunk']
if tag.k == 'highway':
highway_type = tag.v
is_highway = True
#try:
# if tag.k == 'lanes' and int(tag.v[len(tag.v)-1]) >= 2:
# is_big = True
# # #for t in w.tags:
# # # print "tag {} {}".format(t.k, t.v)
#except:
# print("exception, weird lanes designation {}".format(tag.v))

# or not is_big
if not is_highway:
if tag.k == self.extract_type:
way_type = tag.v
should_extract = True

if not should_extract:
return

if not highway_type in self.types:
self.types.append(highway_type)
if not way_type in self.types:
self.types.append(way_type)

way_dict = {'visible': w.visible,
'deleted': w.deleted,
'uid': w.uid,
'highway_type': highway_type,
'way_type': way_type,
'ends_have_same_id': w.ends_have_same_id(),
'id': w.id,
'tags':[]}
for tag in w.tags:
way_dict['tags'].append((tag.k, tag.v))

self.add_linestring(w, way_dict)

def add_linestring(self, w, way_dict):
Expand Down
69 changes: 31 additions & 38 deletions src/render_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,54 +9,47 @@ def render_results_for_analysis(raster_data_paths,
predictions,
band_list,
tile_size):
with open(CACHE_PATH + 'raster_data_paths.json', 'r') as infile:
raster_data_paths = json.load(infile)
way_bitmap_npy = {}
for raster_data_path in raster_data_paths:
way_bitmap_npy[raster_data_path] = numpy.asarray(way_bitmap_for_naip(None, raster_data_path, None, None, None))

render_results_as_images(raster_data_paths,
training_labels,
test_labels,
predictions,
way_bitmap_npy,
band_list,
tile_size)

def render_results_as_images(raster_data_paths,
training_labels,
test_labels,
predictions,
way_bitmap_npy,
band_list,
tile_size):
training_labels_by_naip = {}
test_labels_by_naip = {}
predictions_by_naip = {}
for raster_data_path in raster_data_paths:
predictions_by_naip[raster_data_path] = []
test_labels_by_naip[raster_data_path] = []
training_labels_by_naip[raster_data_path] = []
way_bitmap_npy = numpy.asarray(way_bitmap_for_naip(None, raster_data_path, None, None, None))
render_predictions(raster_data_path,
training_labels,
test_labels,
predictions,
way_bitmap_npy,
band_list,
tile_size)

def render_predictions(raster_data_path,
training_labels,
test_labels,
predictions,
way_bitmap_npy,
band_list,
tile_size):
training_labels_by_naip = []
test_labels_by_naip = []
predictions_by_naip = []

index = 0
for label in test_labels:
predictions_by_naip[label[2]].append(predictions[index])
test_labels_by_naip[label[2]].append(test_labels[index])
if label[2] == raster_data_path:
predictions_by_naip.append(predictions[index])
test_labels_by_naip.append(test_labels[index])
index += 1

index = 0
for label in training_labels:
training_labels_by_naip[label[2]].append(training_labels[index])
training_labels_by_naip.append(training_labels[index])
index += 1

for raster_data_path in raster_data_paths:
render_results_as_image(raster_data_path,
way_bitmap_npy[raster_data_path],
training_labels_by_naip[raster_data_path],
test_labels_by_naip[raster_data_path],
band_list,
tile_size,
predictions=predictions_by_naip[raster_data_path])
render_results_as_image(raster_data_path,
way_bitmap_npy,
training_labels_by_naip,
test_labels_by_naip,
band_list,
tile_size,
predictions=predictions_by_naip)

def render_results_as_image(raster_data_path, way_bitmap, training_labels, test_labels, band_list, tile_size, predictions=None):
'''
Expand Down Expand Up @@ -120,7 +113,7 @@ def shade_labels(image, labels, predictions, tile_size):
for x in range(start_x, start_x+tile_size):
for y in range(start_y, start_y+tile_size):
r, g, b = image.getpixel((x, y))
if predictions[label_index] == 1:
if predictions[label_index][0] < predictions[label_index][1]:
# shade ON predictions blue
image.putpixel((x, y), (r, g, 255))
else:
Expand Down
13 changes: 11 additions & 2 deletions src/single_layer_network.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,5 +57,14 @@ def train(bands_to_use,
model.fit(train_images, train_labels, n_epoch=number_of_epochs, shuffle=False, validation_set=(test_images, test_labels),
show_metric=True, run_id='mlp')

# \TODO predict on batches of test_images, to avoid memory spike
return model.predict(test_images)
# batch predictions on the test image set, to avoid a memory spike
all_predictions = []
for x in range(0, len(test_images)-100, 100):
for p in model.predict(test_images[x:x+100]):
all_predictions.append(p)
remainder = len(test_images)-len(all_predictions)
for p in model.predict(test_images[len(all_predictions):]):
all_predictions.append(p)
assert len(all_predictions) == len(test_images)

return all_predictions