Google Street View Autopilot

THE PROJECT:

Having worked at Tesla, I became fascinated in the potential of autopilot technology. Recently, I thought it would be fun to create a code for a Google StreetView Autopilot.

I wrote the script in Python, using the Google Maps API Developer service. The Python API could request static images from the API by providing a series of parameters. These included location (in coordinates) and heading (in degrees from North). After further research there were easier solutions using JavaScript, however I decided to continue using Python, as I wanted practice playing around with the coordinate API’s and had already been using JavaScript for another project.

HOW IT WORKS:

The script runs by first making some definitions (cosmetic, API, and machine learning computer vision model configuration). It then takes in a start and end address as inputs from the console. This inputs are formatted to be consistent with a URL, and the get_directions() function takes them and sends a request to the Google Maps Directions API. This returns a JSON with all information Google Maps needs to direct you to your destination including, verbal directions, text, distances, and coordinates. The same function parses through the JSON and gets the initial coordinate location, as well as the destination coordinates for each direction. It then returns these in a latitude array and longitude array.

These are provided to a second function, get_coordinates(). The purpose of this function is to take the coordinates for each direction, and calculate step coordinates at each point in between, that we want to pull a StreetView static image from. It looks at the change in the latitude and longitude values in between each direction, and assigns a step size of 0.0001 coordinate to whichever direction had a greater change. A ratio is calculated by comparing the two changes and is used to calculate the step size in the remaining coordinate value. Then the number of intervals needed between each direction with the given step size is calculated, and Numpy’s linspace is used to create an array of all the coordinates we will request images from for each direction. These can be combined to create arrays of all the coordinates we will make StreetView Requests for.

These arrays are passed to another function create_slideshow(). This function iterates through these coordinates and passes them to another function get_streetview() to return an image for our end product. Before doing this, it calculates the heading, so that it can be passed on as well. The heading is the direction in degrees, relative to North, that the image is facing. In order for the autopilot to keep looking in the direction that it is “driving” the heading was calculated as the arctangent of the change in longitude over the change in latitude, converted to degrees. As arctan is a function with a domain from -90 to 90 degrees, an “if” clause was used to ensure that 360 degrees of direction could be uniquely accounted for.

After the latitude, longitude, and heading were passed to get_streetview(), a request was made along with other basic parameters, to the Google Static StreetView API to get each image. These images were also saved in this function and then modified with some additional features. At the beginning of the program I configured a machine learning computer vision model (ssd_mobilenet_v3_large_coco_2020_01_14), trained in TensorFlow, that identified cars in each still image provided. This is naturally a feature that you would want in a true autopilot. OpenCV was used to apply this model, place boxes around any cars or trucks, and also provide an updating location and heading at the top of the image. Finally the images were passed into a slideshow using OpenCV to create the end product.

Video demonstration of the Google StreetView Autopilot driving around Tampa Bay, Florida.

ROOM FOR IMPROVEMENT:

There were some limitations with the Python Google Static StreetView API. For example, as it was purely coordinate based, the product sometimes appears choppy, and can jump to another road if it crosses or runs along the current road. As said before, I may remake this project in Javascript to develop a smoother, and easier product. Additionally, more machine learning models could be applied (and even newly trained) to detect other road features such as stop signs.

MAIN SCRIPT:

import requests
import numpy as np
import math
import cv2

# Store your Google API dev key here
dev_key = ""

# Configure the OpenCV object detection
config_file = "Google_StreetView_Residential_Driver/ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt"
frozen_model = "Google_StreetView_Residential_Driver/frozen_inference_graph.pb"
model = cv2.dnn_DetectionModel(frozen_model, config_file)
model.setInputSize(320,320)
model.setInputScale(1.0/127.5)
model.setInputMean((127.5,127.5,127.5))
model.setInputSwapRB(True)

# Declare some font formatting
location_text_point = (20, 20)
heading_text_point = (20, 60)
fontScale = 1.5
fontColor = (0, 255, 0)
text_thickness = 0.5
fontFace = cv2.FONT_HERSHEY_PLAIN

# Function to get directions between a start and end location. Directions will be in the point of coordinates for each turn
def get_directions(start, end):
    """
    This function will take in two strings, one as a start address, and the other as an end address
    It will make a request to the Google Maps Developer API and return a JSON of the directions
    It will then parse through the JSON to create a set of instructions, in the form of coordinates, that our script can use. 
    """
    directions_url = f"https://maps.googleapis.com/maps/api/directions/json?origin={start}&destination={end}&key={dev_key}"
    response = (requests.get(directions_url)).json()
    # This returns JSON, we will parse it to get a starting location, and a list of ending locations for each step of the way
    start = response['routes'][0]['legs'][0]['steps'][0]['start_location']
    # Now create an array of all the latitudes and longitudes at each step of the way 
    lat = [start['lat']]
    lng = [start['lng']]
    # Parse through the JSON to add the rest of the lat and lng values for each step
    for i in response['routes'][0]['legs'][0]['steps']:
        lat.append(i['end_location']['lat'])
        lng.append(i['end_location']['lng'])
    return lat, lng

# This function intakes parameter information and creates a street view image
# Object detection and information filters are applied
def get_streetview(latitude, longitude, heading, i):
    # Create parameters for the request
    picture_url = "https://maps.googleapis.com/maps/api/streetview?"
    picture_params = {
	'size': '640x640',
	'location': f"{latitude},{longitude}", 
	'heading': f'{heading}',
	'pitch': '0',
	'key': dev_key
    }
    # Get and save street view image. 
    # We will modify it with our machine learning and computer vision models in order to detect cars and trucks
    picture_response = requests.get(picture_url, params=picture_params)
    with open(f'images/image{i}.jpg','wb') as file:
        file.write(picture_response.content)
    # Apply the Machine Learning Object detection model for cars and trucks
    img = cv2.imread(f'images/image{i}.jpg')
    ClassIndex, confidence, bbox = model.detect(img,confThreshold=0.55)
    # Get all the indices that 3 or 8 are returned as a response. 
    # These are findings with a confidence level high enough to indicate a car or truck is present.
    vehicle_index_array = np.append(np.where(ClassIndex == 3)[0], np.where(ClassIndex == 8)[0])
    # Draw boxes around vehicles and display location and heading information
    for index in vehicle_index_array:
        cv2.rectangle(img, bbox[index],(255,0,0),2)
    cv2.putText(img, f"Location: {round(latitude,5)},{round(longitude,5)}", location_text_point, fontFace = fontFace, fontScale=fontScale, color=(0, 0, 255), thickness = 2)
    cv2.putText(img, f"Heading: {heading}", heading_text_point, fontFace = fontFace, fontScale=fontScale, color=(0, 0, 255), thickness = 2)
    # Display the modified image as part of the slideshow
    cv2.imshow("Slideshow", img)
    if cv2.waitKey(50) ==ord('q'):
        return

def get_coordinates(lat, lng):
    """
    This function will take in the latitude and longitude arrays for the steps
    It will return an array of all the coordinates we will query from google streetview images
    """
    coordinate_lat_array = []
    coordinate_lng_array = []
    for i in (range(0, len(lat) - 1)):
        # step_array records the delta lat and delta lng for each direction step
        step_array = [(lat[i+1] - lat[1]), (lng[i+1] - lng[i])]
        # We will use this array to calculate a step size. 
        # The larger step in lat or lng respectively will 0.0001 (coordinate unit) and the other will be calculated by a ratio difference
        # When change in lat is greater
        if abs(step_array[0]) > abs(step_array[1]):
            step_ratio = step_array[0]/(step_array[1])
            interval_num = round(abs((lat[i] - lat[i+1])/0.0001))
            lat_add = np.linspace(lat[i], lat[i+1], interval_num)
            lng_add = np.linspace(lng[i], lng[i+1], interval_num)   
        # When change in lng is greater
        else: 
            step_ratio = step_array[0]/step_array[1]
            interval_num = round(abs((lng[i] - lng[i+1])/0.0001))
            lat_add = np.linspace(lat[i], lat[i+1], interval_num)
            lng_add = np.linspace(lng[i], lng[i+1], interval_num)   
        coordinate_lat_array.extend(lat_add)
        coordinate_lng_array.extend(lng_add)
    return coordinate_lat_array, coordinate_lng_array


def create_slideshow(coordinate_lat_array, coordinate_lng_array):
    """
    This function takes in our array of coordinates and creates a folder with google street view images sampled from each one. 
    """
    for i in range(0, len(coordinate_lat_array) - 2):
        # Calculate the heading. The tangent of the heading angle will be equal to delta lng / delta lat
        # We always want to be looking in the direction we are going
        delta_lng = coordinate_lng_array[i+1] - coordinate_lng_array[i]
        delta_lat = coordinate_lat_array[i+1] - coordinate_lat_array[i]
        if delta_lat > 0:
            heading = 360*(np.arctan((delta_lng)/(delta_lat)))/(2*math.pi)
        else:
            heading = 360*(np.arctan((delta_lng)/(delta_lat)))/(2*math.pi) + 180
        # Account for the differences in google api and numpy trigonometry syntax
        get_streetview(coordinate_lat_array[i], coordinate_lng_array[i], heading, i)



start_location  = input("Address of start location: ")
end_location = input("Address of destination: ")
start_location.replace(" ","+")
end_location.replace(" ","+")

# Get the direction coordinates
lat, lng = get_directions(start_location, end_location)

# Get the coordinates for each step size
lat_array, lng_array = get_coordinates(lat, lng)

# Create the slideshow
create_slideshow(lat_array, lng_array)