Machine Problem: More on searching

Overview

The goal for this assignment is to get more comfortable with the search functions explored in class, and to adapt them to the problem of searching a graph.

You will be writing a program that enables the user to search for a route between any two airports, given a maximum "hop distance" --- i.e., the maximum distance between any two airports on the route. A route should not contain a given airport more than once, and in addition may not always exist for a some hop distance. For airport information, we will be using the open source database maintained at OpenFlights.org.

On Graphs

Unlike trees, in which every two nodes are connected by precisely one path (formed by parent-child connections), graphs are data structures where nodes can be connected in entirely arbitrary ways. In a graph, nodes are called vertices, and connections between nodes are called edges.

This is often undesirable, however, as re-visiting a vertex during graph traversal might make for a sub-optimal route.

Consider the following graph, in which the vertices represent cities in the United States. The edges between the cities represent bi-directional connections, and the labels on the edges correspond to inter-city distances. Only cities with edges between them are considered neighbors, for the purposes of traversal.

Sample graph 1

It should be clear that there are potentially multiple routes between any two vertices in the graph. From Miami to Seattle, for instance, we may choose the [Miami, Chicago, Los Angeles, Seattle] route or the [Miami, New York, Chicago, Seattle] route, among others. Routes containing cycles are also possible; e.g., [Miami, Chicago, St. Louis, New York, Chicago, Seattle]. Your solution will need to find a route without a cycle.

Each route has a corresponding distance, which is simply the sum of the edges traversed. The [Miami, Chicago, Los Angeles, Seattle] route has a total distance of 1378 + 2015 + 1137 = 4530 miles, for instance.

Building a graph from airport data

Our airport dataset is a file that contains information for over 7100 airports located across the globe, where each row contains comma-separated fields describing a distinct airport. The most important fields to us are the airport name and the latitude and longitude, the latter of which we can use to compute distances between different airports.

Recall that our search function will be driven by the "maximum hop distance" between any two airports on the route --- the intuition being perhaps that we are seeking to travel between two cities using a plane with limited range. The cities reachable from any given city will therefore be obtained by filtering out all cities from the dataset which are more than the maximum hop distance away. This would form the basis of a neighbors function for our graph.

Data types and functions to compute the distance between two latitude/longitude pairs and represent routes are provided; here are some of them:

data Location = Location {
  locName :: String,
  latitude :: Double,
  longitude :: Double
}

data Path = Path {
  totalDist :: Double,
  route :: [Location]
} deriving (Eq)


geoDistance :: (Double,Double) -> (Double,Double) -> Double
geoDistance (lat1,lon1) (lat2,lon2)
  = 3959 * (acos (sindeg(lat1) * sindeg(lat2)
                  + cosdeg(lat1) * cosdeg(lat2) * cosdeg(lon2-lon1)))
    where sindeg = sin . toRadian
          cosdeg = cos . toRadian
          toRadian deg = pi/180.0 * deg

distanceBetween :: Location -> Location -> Double
distanceBetween (Location _ lat1 lon1) 
                (Location _ lat2 lon2) = geoDistance (lat1, lon1) 
                                                     (lat2, lon2)

Implementation details

You are to implement the function findPath, with the following type:

findPath :: [Location] -> String -> String -> Double -> Maybe Path

The first argument is the list of locations (airports), the second and third are the names of the origin and destination aiports, and the fourth is the maximum hop distance. The function should return a Just Path if a route is found and Nothing otherwise.

We recommend, in addition to findPath, that you implement a more general purpose graphSearch function that follows the same general design as that of the treeSearch function presented in class. Your graph search function will, however, need to ensure that new nodes discovered have not already been visited, which will require a list of previously visited nodes to be passed along to each recursive call.

The locations list will be loaded using a comma-separated-value (CSV) library and passed to the findPath function. A function that does this, and prompts the user interactively for the three remaining arguments to be passed along is provided:

airportCSVFile = "data/airports.csv"

searchAirports :: IO ()
searchAirports = do
  csv <- parseCSVFromFile airportCSVFile
  case csv of
    Left _ -> putStrLn "error loading airport data"
    Right airports -> do
      let cities = map (\rec -> Location (rec !! 1) 
                                         (read $ rec !! 6) 
                                         (read $ rec !! 7))
                   $ filter ((==14) . length) airports
      putStrLn "Enter origin airport:"
      origin <- getLine
      putStrLn "Enter destination airport:"
      dest <- getLine
      putStrLn "Enter hop distance:"
      hop <- readLn
      putStrLn $ show $ findPath cities origin dest hop

See the provided source file to help you get started (using it is optional, though recommended). You should add an "airports" executable to your cabal file, as shown here, and you will also need to add the data file to your repository --- ideally in the "data" subfolder (which you'll need to create) --- so that the CSV library can locate it.

Testing

Included in the provided source file is a short list of locations --- available via the cities variable --- suitable for testing:

lat_long_data = [ ("Bangkok", 13.76, 100.5),
                  ("Beijing", 39.90, 116.41),
                  ("Buenos Aires", -34.6, -58.38),
                  ("Chicago", 41.87, -87.63),
                  ("Dallas", 32.77, -96.79),
                  ("Denver", 39.73, -104.99),
                  ("Dubai", 25.20, 55.27),
                  ("Honolulu", 21.30, -157.85),
                  ("London", 51.50, -0.12),
                  ("Los-Angeles", 34.05, -118.24),
                  ("New-York", 40.71, -74.00),
                  ("Paris", 48.85, 2.35) ]

testAirports :: [Location]
testAirports = map (\(n,lat,lon)->Location n lat lon) lat_long_data

Here are some sample routes (others are possible) discovered by a reference implementation of findPath:

> findPath testAirports "Los-Angeles" "New-York" 1000
Just <2459.3319233721513: Los-Angeles - Denver - Chicago - New-York >

> findPath testAirports "Los-Angeles" "New-York" 500
Nothing

findPath testAirports "Chicago" "Bangkok" 4000
Just <10385.014182737235: Chicago - London - Dubai - Bangkok >

findPath testAirports "Chicago" "Bangkok" 8000
Just <8637.711070622021: Chicago - Beijing - Bangkok >

findPath testAirports "Chicago" "Bangkok" 10000
Just <8558.90431700429: Chicago - Bangkok >

findPath testAirports "Chicago" "Bangkok" 1000
Nothing

A validatePath function is also provided to help you ensure that all locations on your path are at most some distance apart, and that the total distance is accurate. Note that the function does not check that all locations in the path are unique.

> validatePath 1000 $ fromJust $ findPath cities "Los-Angeles" "New-York" 1000
True

When you're ready to test your implementation with the full dataset, simply use the provided searchAirports function (you can also do stack exec airports if you correctly set up the cabal file). Here are some results your function should be able to come up with -- we'll allow for up to a 15-second search time, which should be plenty, even for a simple search algorithm:

If your program can find the routes above (or correct alternatives), you'll receive full credit.


For more of a challenge, you can attempt to add heuristics to your search implementation so that it can tackle cities that are farther apart, or shorter hop distances (which makes the number of potential routes explode in number). Here are some more challenging searches (again, with a 15-second limit):

Submission

To submit your work, be sure your "MP4.hs" file is added to your repository, then commit all your changes and push your work to our shared BitBucket repository.