Search
Time

Whilst it would be nicer to plot the visualisations using bokeh as there is interactivity, the Jupyter Book format does not enable the use of bokeh at this current point in time, thus we will use matplotlib and seaborn to create static plots.

Note, we have created a helper function which also includes relative-positioning of the 'title-box'. This was inspired by the matplotlib documentation.

import matplotlib
import matplotlib.pyplot as plt
import seaborn

%matplotlib inline

# create helper function that places text box inside a plot
## so acts like an 'in-plot title'
def add_titlebox(ax, text):
    # build a rectangle in axes coordinates - should parameterise these as arguments
    left, width = .25, .5
    bottom, height = .25, .5
    right = left + width
    top = bottom + height
    
    ax.text(x = right, y = top, s = text,
            horizontalalignment = 'right', verticalalignment = 'top',
            transform = ax.transAxes,
            bbox = dict(facecolor = 'white', alpha = 0.6),
            fontsize = 12.5)
    return ax
# pass in variable from other notebook
%store -r data_join

# wrangle
## set `region` to be categorical
data_join['region'] = data_join['region'].astype('category')

# check column data types
data_join.info()
# check all unique values in `region`
data_join['region'].unique()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2715 entries, 0 to 2714
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   arrival_date       2715 non-null   datetime64[ns]
 1   region             2715 non-null   category      
 2   total_guests       2715 non-null   float64       
 3   proportion_guests  2715 non-null   float64       
dtypes: category(1), datetime64[ns](1), float64(2)
memory usage: 87.7 KB
[Americas, Europe, Asia, Oceania, Africa]
Categories (5, object): [Americas, Europe, Asia, Oceania, Africa]

Below, we plot the data now taking an object-orientated approach by following the layout instructions here.

Note, we have split our data out into three overarching groups by region:

  1. Europe - this has the highest proportion of guests booking hotels
  2. Asia and Americas - these regions have similar levels of guests booking hotels
  3. Africa and Oceania - these regions have similar levels of guests booking hotels

From the time-plot below, we see that:

  • Europe: The trend for hotel bookings is slightly falling over time with deep troughs being reached in October 2015 and December 2016. There does not seem to be clear suggestions of seasonality as noted by the absence of a seasonal pattern that is frequent year on year.

  • Asia and Americas: Like Europe, there does not seem to be a clear trend. However, spikes in hotel bookings to these regions occur on November 2015 for Asia and December 2016 for the Americas. Interestingly, these deep troughs seem to coincide with peaks being seen in hotel bookings in Europe. For seasonality, there is a peak for each December in hotel bookings to the Americas, with this probably being explained by the Christmas period. As for Asia, there appears to be no clear seasonal pattern, even during Chinese New Year!

  • Africa and Oceania: Again, no clear trend though there are spikes in March 2016 and February 2017 for Africa. For Oceania, there is a small spike in May 2016 and a larger one in June 2017. There appears to be no seasonality noticeable from the plot, though it may hide weekly seasonality.

# take object-orientated approach to plotting

## set plotting grid
gridsize = (3,2)
fig = plt.figure(figsize = (12, 8))
ax1 = plt.subplot2grid(shape = gridsize, loc = (0, 0), colspan = 2, rowspan = 2)
ax2 = plt.subplot2grid(shape = gridsize, loc = (2, 0))
ax3 = plt.subplot2grid(shape = gridsize, loc = (2, 1))

## create plots and populate grid
ax1.set_title(label = 'Time-plot: \nTotal guests staying in hotels by region',
              fontdict = {'fontsize': 14})
ax1.plot('arrival_date', 'total_guests', data = data_join.query('region == "Europe"'))
ax1.set_ylabel(ylabel = 'Total guests')
add_titlebox(ax = ax1, text = 'Europe')
ax2.plot('arrival_date', 'total_guests', data = data_join.query('region == "Asia"'))
ax2.plot('arrival_date', 'total_guests', data = data_join.query('region == "Americas"'))
ax2.set_ylabel(ylabel = 'Total guests')
add_titlebox(ax = ax2, text = 'Asia and the Americas')
ax3.plot('arrival_date', 'total_guests', data = data_join.query('region == "Africa"'))
ax3.plot('arrival_date', 'total_guests', data = data_join.query('region == "Oceania"'))
ax3.set_ylabel(ylabel = 'Total guests')
add_titlebox(ax = ax3, text = 'Africa and Oceania')
<matplotlib.axes._subplots.AxesSubplot at 0x12daded60>

The Asia, Americas, Africa and Oceania plots above are quite small in comparison to the Europe one. If you wish to explore them in further detail, you can click the '+' icon below to reveal the plots and code that generates them.

# focus on Asia, Americas, Africa and Oceania more
data_asiaamericas = data_join.query('region in ("Asia", "Americas")')
data_asiaamericas.pivot(index = 'arrival_date', columns = 'region', values = 'total_guests').plot()

data_africaoceania = data_join.query('region in ("Africa", "Oceania")')
data_africaoceania.pivot(index = 'arrival_date', columns = 'region', values = 'total_guests').plot()
<matplotlib.axes._subplots.AxesSubplot at 0x10fe6caf0>

Summary: Break-points

Therefore, from visually-inspecting the time-plots, our candidate break-points to test for a structural break are:

  • Europe: October 2015.
  • Asia: October 2015 and to a lesser extent, May 2017.
  • America: December 2016 - possibly a bumper Christmas period where people booked hotels during this time.
  • Africa: October 2015, March 2016, February 2017 and July 2017.
  • Oceania: April 2016 if concerned but June 2017 more likely.

Since the number of guests staying in hotels within Africa and Oceania in this dataset are so low, we will ignore these regions in the analysis.

%store data_join
Stored 'data_join' (DataFrame)