|
1 | 1 | [](https://circleci.com/gh/celebi-pkg/flight-analysis) |
2 | 2 | [](https://opensource.org/licenses/MIT) |
3 | | -[](https://pypi.org/project/google-flight-analysis/) |
| 3 | +[](https://pypi.org/project/google-flight-analysis/) |
| 4 | +[](https://test.pypi.org/project/google-flight-analysis/1.1.1a11/) |
4 | 5 |
|
5 | 6 | # Flight Analysis |
6 | 7 |
|
7 | | -This project provides tools and models for users to analyze, forecast, and collect data regarding flights and prices. There are currently many features in initial stages and in development. The current features (as of 4/5/2023) are: |
| 8 | +This project provides tools and models for users to analyze, forecast, and collect data regarding flights and prices. There are currently many features in initial stages and in development. The current features (as of 5/25/2023) are: |
8 | 9 |
|
9 | | -- Scraping tools for Google Flights |
| 10 | +- Detailed scraping and querying tools for Google Flights |
| 11 | +- Ability to store data locally or to SQL tables |
10 | 12 | - Base analytical tools/methods for price forecasting/summary |
11 | 13 |
|
12 | 14 | The features in development are: |
13 | 15 |
|
14 | 16 | - Models to demonstrate ML techniques on forecasting |
| 17 | +- Querying of advanced features |
15 | 18 | - API for access to previously collected data |
16 | 19 |
|
17 | 20 | ## Table of Contents |
@@ -59,19 +62,46 @@ For GitHub repository cloners, import as follows from the root of the repository |
59 | 62 |
|
60 | 63 | Here is some quick starter code to accomplish the basic tasks. Find more in the [documentation](https://kcelebi.github.io/flight-analysis/). |
61 | 64 |
|
62 | | - # Try to keep the dates in format YYYY-mm-dd |
63 | | - result = Scrape('JFK', 'IST', '2023-07-20', '2023-08-10') # obtain our scrape object |
64 | | - dataframe = result.data # outputs a Pandas DF with flight prices/info |
65 | | - origin = result.origin # 'JFK' |
66 | | - dest = result.dest # 'IST' |
67 | | - date_leave = result.date_leave # '2023-07-20' |
68 | | - date_return = result.date_return # '2023-08-10' |
| 65 | + # Keep the dates in format YYYY-mm-dd |
| 66 | + result = Scrape('JFK', 'IST', '2023-07-20', '2023-08-20') # obtain our scrape object, represents out query |
| 67 | + result.type # This is in a round-trip format |
| 68 | + result.origin # ['JFK', 'IST'] |
| 69 | + result.dest # ['IST', 'JFK'] |
| 70 | + result.dates # ['2023-07-20', '2023-08-20'] |
| 71 | + print(result) # get unqueried str representation |
69 | 72 |
|
70 | | -You can also scrape for one-way trips now: |
| 73 | +A `Scrape` object represents a Google Flights query to be run. It maintains flights as a sequence of one or more one-way flights which have a origin, destination, and flight date. The above object for a round-trip flight from JFK to IST is a sequence of JFK --> IST, then IST --> JFK. We can obtain the data as follows: |
| 74 | + |
| 75 | + ScrapeObjects(result) # runs selenium through ChromeDriver, modifies results in-place |
| 76 | + result.data # returns pandas DF |
| 77 | + print(result) # get queried representation of result |
| 78 | + |
| 79 | +You can also scrape for one-way trips: |
71 | 80 |
|
72 | 81 | results = Scrape('JFK', 'IST', '2023-08-20') |
73 | | - result.data.head() #see data |
| 82 | + ScrapeObjects(result) |
| 83 | + result.data #see data |
| 84 | + |
| 85 | +You can also scrape chain-trips, which are defined as a sequence of one-way flights that have no direct relation to each other, other than being in chronological order. |
| 86 | + |
| 87 | + # chain-trip format: origin, dest, date, origin, dest, date, ... |
| 88 | + result = Scrape('JFK', 'IST', '2023-08-20', 'RDU', 'LGA', '2023-12-25', 'EWR', 'SFO', '2024-01-20') |
| 89 | + result.type # chain-trip |
| 90 | + ScrapeObjects(result) |
| 91 | + result.data # see data |
| 92 | + |
| 93 | +You can also scrape perfect-chains, which are defined as a sequence of one-way flights such that the destination of the previous flight is the origin of the next and the origin of the chain is the final destination of the chain (a cycle). |
| 94 | + |
| 95 | + # perfect-chain format: origin, date, origin, date, ..., first_origin |
| 96 | + result = Scrape("JFK", "2023-09-20", "IST", "2023-09-25", "CDG", "2023-10-10", "LHR", "2023-11-01", "JFK") |
| 97 | + result.type # perfect-chain |
| 98 | + ScrapeObjects(result) |
| 99 | + result.data # see data |
| 100 | + |
| 101 | +You can read more about the different type of trips in the documentation. Scrape objects can be added to one another to create larger queries. This is under the conditions: |
74 | 102 |
|
| 103 | +1. The objects being added are the same type of trip (one-way, round-trip, etc) |
| 104 | +2. The objects being added are either both unqueried or both queried |
75 | 105 |
|
76 | 106 | ## Updates & New Features |
77 | 107 |
|
|
0 commit comments