{ "cells": [ { "cell_type": "markdown", "id": "59d19f73", "metadata": {}, "source": [ "# Analytics Example - Spark Routes - Contract Month Seasonality\n", "\n", "This script allows you to plot the price evolution & seasonality for a given route and month. In this case the route chosen is Sabine Pass to Futtsu via COGH.\n", "\n", "This script uses elements from our API code samples. If you'd like a more basic and informative example of how to pull data via the Spark API, please visit our Github or API website:\n", "\n", "- Github: https://github.com/spark-commodities/api-code-samples/blob/master/jupyter_notebooks/\n", "- API Website: https://www.sparkcommodities.com/api/code-examples/jupyter.html\n", "\n", "\n", "### Have any questions?\n", "\n", "If you have any questions regarding our API, or need help accessing specific datasets, please contact us at:\n", "\n", "__data@sparkcommodities.com__\n", "\n", "or refer to our API website for more information about this endpoint:\n", "https://www.sparkcommodities.com/api/lng-freight/routes.html\n", "\n", "__N.B. This script requires a Freight subscription__" ] }, { "cell_type": "markdown", "id": "9e00ae34", "metadata": {}, "source": [ "## 1. Importing Data\n", "\n", "Here we define the functions that allow us to retrieve the valid credentials to access the Spark API.\n", "\n", "This section can remain unchanged for most Spark API users." ] }, { "cell_type": "code", "execution_count": null, "id": "d9ea2c58", "metadata": {}, "outputs": [], "source": [ "import json\n", "import os\n", "import sys\n", "import pandas as pd\n", "import numpy as np\n", "from base64 import b64encode\n", "from pprint import pprint\n", "from urllib.parse import urljoin\n", "import datetime\n", "from io import StringIO\n", "\n", "\n", "try:\n", " from urllib import request, parse\n", " from urllib.error import HTTPError\n", "except ImportError:\n", " raise RuntimeError(\"Python 3 required\")\n", "\n", "\n", "API_BASE_URL = \"https://api.sparkcommodities.com\"\n", "\n", "\n", "def retrieve_credentials(file_path=None):\n", " \"\"\"\n", " Find credentials either by reading the client_credentials file or reading\n", " environment variables\n", " \"\"\"\n", " if file_path is None:\n", " client_id = os.getenv(\"SPARK_CLIENT_ID\")\n", " client_secret = os.getenv(\"SPARK_CLIENT_SECRET\")\n", " if not client_id or not client_secret:\n", " raise RuntimeError(\n", " \"SPARK_CLIENT_ID and SPARK_CLIENT_SECRET environment vars required\"\n", " )\n", " else:\n", " # Parse the file\n", " if not os.path.isfile(file_path):\n", " raise RuntimeError(\"The file {} doesn't exist\".format(file_path))\n", "\n", " with open(file_path) as fp:\n", " lines = [l.replace(\"\\n\", \"\") for l in fp.readlines()]\n", "\n", " if lines[0] in (\"clientId,clientSecret\", \"client_id,client_secret\"):\n", " client_id, client_secret = lines[1].split(\",\")\n", " else:\n", " print(\"First line read: '{}'\".format(lines[0]))\n", " raise RuntimeError(\n", " \"The specified file {} doesn't look like to be a Spark API client \"\n", " \"credentials file\".format(file_path)\n", " )\n", "\n", " print(\">>>> Found credentials!\")\n", " print(\n", " \">>>> Client_id={}, client_secret={}****\".format(client_id, client_secret[:5])\n", " )\n", "\n", " return client_id, client_secret\n", "\n", "\n", "def do_api_post_query(uri, body, headers):\n", " url = urljoin(API_BASE_URL, uri)\n", "\n", " data = json.dumps(body).encode(\"utf-8\")\n", "\n", " # HTTP POST request\n", " req = request.Request(url, data=data, headers=headers)\n", " try:\n", " response = request.urlopen(req)\n", " except HTTPError as e:\n", " print(\"HTTP Error: \", e.code)\n", " print(e.read())\n", " sys.exit(1)\n", "\n", " resp_content = response.read()\n", "\n", " # The server must return HTTP 201. Raise an error if this is not the case\n", " assert response.status == 201, resp_content\n", "\n", " # The server returned a JSON response\n", " content = json.loads(resp_content)\n", "\n", " return content\n", "\n", "\n", "def do_api_get_query(uri, access_token, format='json'):\n", " \"\"\"\n", " After receiving an Access Token, we can request information from the API.\n", " \"\"\"\n", " url = urljoin(API_BASE_URL, uri)\n", "\n", " if format == 'json':\n", " headers = {\n", " \"Authorization\": \"Bearer {}\".format(access_token),\n", " \"Accept\": \"application/json\",\n", " }\n", " elif format == 'csv':\n", " headers = {\n", " \"Authorization\": \"Bearer {}\".format(access_token),\n", " \"Accept\": \"text/csv\"\n", " }\n", " else:\n", " raise ValueError('The format parameter only takes `csv` or `json` as inputs')\n", "\n", " # HTTP GET request\n", " req = request.Request(url, headers=headers)\n", " try:\n", " response = request.urlopen(req)\n", " except HTTPError as e:\n", " print(\"HTTP Error: \", e.code)\n", " print(e.read())\n", " sys.exit(1)\n", "\n", " resp_content = response.read()\n", " #status = response.status\n", "\n", " # The server must return HTTP 200. Raise an error if this is not the case\n", " assert response.status == 200, resp_content\n", "\n", " # Storing response based on requested format\n", " if format == 'json':\n", " content = json.loads(resp_content)\n", " elif format == 'csv':\n", " content = resp_content\n", "\n", " return content\n", "\n", "\n", "def get_access_token(client_id, client_secret):\n", " \"\"\"\n", " Get a new access_token. Access tokens are the thing that applications use to make\n", " API requests. Access tokens must be kept confidential in storage.\n", "\n", " # Procedure:\n", "\n", " Do a POST query with `grantType` in the body. A basic authorization\n", " HTTP header is required. The \"Basic\" HTTP authentication scheme is defined in\n", " RFC 7617, which transmits credentials as `clientId:clientSecret` pairs, encoded\n", " using base64.\n", " \"\"\"\n", "\n", " # Note: for the sake of this example, we choose to use the Python urllib from the\n", " # standard lib. One should consider using https://requests.readthedocs.io/\n", "\n", " payload = \"{}:{}\".format(client_id, client_secret).encode()\n", " headers = {\n", " \"Authorization\": \"Basic {}\".format(b64encode(payload).decode()),\n", " \"Accept\": \"application/json\",\n", " \"Content-Type\": \"application/json\",\n", " }\n", " body = {\n", " \"grantType\": \"clientCredentials\",\n", " }\n", "\n", " content = do_api_post_query(uri=\"/oauth/token/\", body=body, headers=headers)\n", "\n", " print(\n", " \">>>> Successfully fetched an access token {}****, valid {} seconds.\".format(\n", " content[\"accessToken\"][:5], content[\"expiresIn\"]\n", " )\n", " )\n", "\n", " return content[\"accessToken\"]\n" ] }, { "cell_type": "code", "execution_count": null, "id": "fb27e003", "metadata": {}, "outputs": [], "source": [ "# Define function for listing routes from API\n", "def list_routes(access_token):\n", " \"\"\"\n", " Fetch available routes. Return route ids and Spark price release dates.\n", "\n", " The URI used returns a list of all available Spark routes. With these routes, we can find the price breakdown of a specific route.\n", "\n", " # Procedure:\n", "\n", " Do a GET query to /v1.0/routes/ with a Bearer token authorization HTTP header.\n", " \"\"\"\n", " content = do_api_get_query(uri=\"/v1.0/routes/\", access_token=access_token)\n", "\n", " print(\">>>> All the routes you can fetch\")\n", " tickers = []\n", " for contract in content[\"data\"][\"routes\"]:\n", " tickers.append(contract[\"uuid\"])\n", "\n", " reldates = content[\"data\"][\"sparkReleaseDates\"]\n", "\n", " dicto1 = content[\"data\"]\n", "\n", " return tickers, reldates, dicto1" ] }, { "cell_type": "markdown", "id": "1e890e9e", "metadata": {}, "source": [ "### N.B. Credentials\n", "\n", "N.B. You must have downloaded your client credentials CSV file before proceeding. Please refer to the API documentation if you have not dowloaded them already. Instructions for downloading your credentials can be found here:\n", "\n", "https://www.sparkcommodities.com/api/request/authentication.html\n" ] }, { "cell_type": "code", "execution_count": null, "id": "51b8a89c", "metadata": {}, "outputs": [], "source": [ "## Input your file location here\n", "client_id, client_secret = retrieve_credentials(file_path=\"/tmp/client_credentials.csv\")\n", "\n", "# Authenticate:\n", "access_token = get_access_token(client_id, client_secret)\n", "print(access_token)\n", "\n", "# Fetch all contracts:\n", "routes, reldates, dicto1 = list_routes(access_token)\n", "\n" ] }, { "cell_type": "markdown", "id": "2345f33a", "metadata": {}, "source": [ "## 2. Describing available routes" ] }, { "cell_type": "code", "execution_count": null, "id": "e75c2b64", "metadata": {}, "outputs": [], "source": [ "# Store route characteristics as a DataFrame\n", "\n", "import pandas as pd\n", "\n", "def check_and_store_characteristics(dict1):\n", " \"\"\"\n", " # Store some of the route characteristics in lists, and check these lists are the same length\n", " # N.B. these are not all the characteristics available!\n", " # Check the Output of the raw dictionary (above) to see all available characteristics.\n", " \"\"\"\n", "\n", " routes_info = {\n", " \"UUID\": [],\n", " \"Load Location\": [],\n", " \"Discharge Location\": [],\n", " \"Via\": [],\n", " \"Load Region\": [],\n", " \"Discharge Region\": [],\n", " \"Load UUID\": [],\n", " \"Discharge UUID\": []\n", " }\n", " for route in dict1[\"routes\"]:\n", " \n", " routes_info['UUID'].append(route[\"uuid\"])\n", " routes_info['Via'].append(route[\"via\"])\n", "\n", " routes_info['Load UUID'].append(route[\"loadPort\"][\"uuid\"])\n", " routes_info['Load Location'].append(route[\"loadPort\"][\"name\"])\n", " routes_info['Load Region'].append(route[\"loadPort\"][\"region\"])\n", "\n", " routes_info['Discharge UUID'].append(route[\"dischargePort\"][\"uuid\"])\n", " routes_info['Discharge Location'].append(route[\"dischargePort\"][\"name\"])\n", " routes_info['Discharge Region'].append(route[\"dischargePort\"][\"region\"])\n", " \n", " \n", " route_df = pd.DataFrame(routes_info)\n", "\n", " return route_df\n" ] }, { "cell_type": "markdown", "id": "b53ee7af", "metadata": {}, "source": [ "### Exploring the data\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6db5808e", "metadata": {}, "outputs": [], "source": [ "# We use the stored route characteristics to create the dataframe\n", "route_df = check_and_store_characteristics(dicto1)\n", "\n", "# View the output\n", "route_df.head()" ] }, { "cell_type": "markdown", "id": "2854904d", "metadata": {}, "source": [ "## 3. Analysing a Specific Route\n", "\n", "\n", "Here we define the function that allows us to pull data for a specific route and release date.\n", "\n", "We then define a given route ID ('my_route') and release date ('my_release') below the function, and these values are printed out for the user to check the parameters." ] }, { "cell_type": "code", "execution_count": null, "id": "c7502772", "metadata": {}, "outputs": [], "source": [ "## Defining the function\n", "\n", "def fetch_route_data(access_token, ticker, release, congestion_laden= None, congestion_ballast= None):\n", " \"\"\"\n", " For a route, fetch then display the route details\n", "\n", " # Procedure:\n", "\n", " Do GET queries to https://api.sparkcommodities.com/v1.0/routes/{route_uuid}/\n", " with a Bearer token authorization HTTP header.\n", " \"\"\"\n", "\n", " query_params = \"?release-date={}\".format(release)\n", " if congestion_laden is not None:\n", " query_params += \"&congestion-laden-days={}\".format(congestion_laden)\n", " if congestion_ballast is not None:\n", " query_params += \"&congestion-ballast-days={}\".format(congestion_ballast)\n", "\n", " uri = \"/v1.0/routes/{}/{}\".format(ticker, query_params)\n", " #print(uri)\n", "\n", " content = do_api_get_query(\n", " uri=\"/v1.0/routes/{}/{}\".format(ticker, query_params),\n", " access_token=access_token,\n", " )\n", "\n", " my_dict = content[\"data\"]\n", "\n", " return my_dict\n", "\n" ] }, { "cell_type": "markdown", "id": "5240c5ea", "metadata": {}, "source": [ "### Route Parameters" ] }, { "cell_type": "code", "execution_count": null, "id": "8d57eaef", "metadata": {}, "outputs": [], "source": [ "## Calling that function and storing the output\n", "\n", "# Here we store the entire dataset called from the API\n", "\n", "load = 'Sabine Pass'\n", "discharge = 'Futtsu'\n", "via = 'cogh'\n", "\n", "my_route = route_df[(route_df[\"Load Location\"] == load) & \\\n", " (route_df[\"Discharge Location\"] == discharge) & \\\n", " (route_df['Via'] == via)]['UUID'].values[0]\n" ] }, { "cell_type": "markdown", "id": "178b1572", "metadata": {}, "source": [ "### Historical Routes Data Import, Storage in Dataframe and Grouping by Contract Month" ] }, { "cell_type": "code", "execution_count": null, "id": "4dcf31db", "metadata": {}, "outputs": [], "source": [ "import time\n", "import datetime as dt\n", "\n", "def routes_history(tick, reldates, laden=None, ballast=None):\n", " \n", " my_route = {\n", " \"Period\": [],\n", " \"Start Date\": [],\n", " \"End Date\": [],\n", " \"Total Cost USD\": [],\n", " \"Cost USDperMMBtu\": [],\n", " \"Hire USDperMMBtu\": [],\n", " \"Fuel USDperMMBtu\": [],\n", " \"Port USDperMMBtu\": [],\n", " \"Canal USDperMMBtu\": [],\n", " \"Congestion USDperMMBtu\": [],\n", " \"Release Date\": [],\n", " \"Cal Month\":[],\n", " }\n", "\n", " for r in reldates:\n", " try:\n", " my_dict = fetch_route_data(access_token, tick, release=r, congestion_laden=laden, congestion_ballast=ballast)\n", " except:\n", " print('Bad Date: ' + r)\n", " \n", " for data in my_dict[\"dataPoints\"]:\n", " my_route['Start Date'].append(data[\"deliveryPeriod\"][\"startAt\"])\n", " my_route['End Date'].append(data[\"deliveryPeriod\"][\"endAt\"])\n", " my_route['Period'].append(data[\"deliveryPeriod\"][\"name\"])\n", "\n", " my_route['Total Cost USD'].append(data[\"costsInUsd\"][\"total\"])\n", " my_route['Cost USDperMMBtu'].append(data[\"costsInUsdPerMmbtu\"][\"total\"])\n", "\n", " my_route['Hire USDperMMBtu'].append(data[\"costsInUsdPerMmbtu\"][\"hire\"])\n", " my_route['Fuel USDperMMBtu'].append(data[\"costsInUsdPerMmbtu\"][\"fuel\"])\n", " my_route['Port USDperMMBtu'].append(data[\"costsInUsdPerMmbtu\"][\"port\"])\n", " my_route['Canal USDperMMBtu'].append(data[\"costsInUsdPerMmbtu\"][\"canal\"])\n", " \n", " try:\n", " my_route['Congestion USDperMMBtu'].append(data[\"costsInUsdPerMmbtu\"][\"congestion\"])\n", " except:\n", " my_route['Congestion USDperMMBtu'].append(0)\n", "\n", " my_route['Release Date'].append(r)\n", " my_route['Cal Month'].append(dt.datetime.strptime(data[\"deliveryPeriod\"][\"startAt\"], '%Y-%m-%d').strftime('%b%y'))\n", "\n", " \n", " # Sleep function is used to avoid API rate limiting.\n", " time.sleep(0.2)\n", " \n", " my_route_df = pd.DataFrame(my_route)\n", "\n", "\n", " ## Changing the data type of these columns from 'string' to numbers.\n", " ## This allows us to easily plot a forward curve, as well as perform statistical analysis on the prices.\n", " for col in [\"Total Cost USD\",\"Cost USDperMMBtu\",\"Hire USDperMMBtu\",\"Fuel USDperMMBtu\",\"Port USDperMMBtu\",\"Canal USDperMMBtu\",\"Congestion USDperMMBtu\"]:\n", " my_route_df[col] = pd.to_numeric(my_route_df[col])\n", " \n", " ## Changing the data types to pandas datetime\n", " for col in ['Release Date', 'Start Date', 'End Date']:\n", " my_route_df[col] = pd.to_datetime(my_route_df[col])\n", "\n", " my_route_df['Time Diff'] = (my_route_df['Release Date'] - my_route_df['Start Date']).dt.days\n", " \n", " return my_route_df\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "64e930b9", "metadata": {}, "outputs": [], "source": [ "# Calling function to get historical data\n", "# Here we call all historical data\n", "histdf = routes_history(my_route, reldates[:], laden =None, ballast=None)" ] }, { "cell_type": "code", "execution_count": null, "id": "6bcccb90", "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Preview of the data\n", "histdf.head(5)" ] }, { "cell_type": "code", "execution_count": null, "id": "2bc3b91c", "metadata": {}, "outputs": [], "source": [ "# Define functon to calculate the \"Day of Year\" number for each datapoint. \n", "# Y-1 datapoints are offset by 365 days, and Y-2 by 730 days, so that timeseries can be plotted in a single, trackable graph.\n", "def sort_years(df):\n", " \n", " reldates = df['Release Date'].to_list()\n", " startdates = df['Start Date'].to_list()\n", "\n", " dayofyear = []\n", " \n", " for r in reldates:\n", " ir = reldates.index(r)\n", " if r.year - startdates[ir].year == -1:\n", " dayofyear.append(r.timetuple().tm_yday - 365)\n", " elif r.year - startdates[ir].year == -2:\n", " dayofyear.append(r.timetuple().tm_yday - 730)\n", " else:\n", " dayofyear.append(r.timetuple().tm_yday)\n", " \n", " df['Day of Year'] = dayofyear\n", "\n", " return df" ] }, { "cell_type": "code", "execution_count": null, "id": "93fff09c", "metadata": {}, "outputs": [], "source": [ "# Group DataFrame by Contract Month\n", "groups = histdf.groupby('Cal Month')\n", "\n", "# Calling the sorting function, and defining which contract months we want to compare\n", "m25 = groups.get_group('Oct25')\n", "m24 = groups.get_group('Oct24')\n", "m23 = groups.get_group('Oct23')\n", "\n", "m25 = sort_years(m25)\n", "m24 = sort_years(m24)\n", "m23 = sort_years(m23)\n", "\n", "# Viewing one example of the output\n", "m25.head()" ] }, { "cell_type": "markdown", "id": "eb1d3734", "metadata": {}, "source": [ "## 4. Plotting" ] }, { "cell_type": "code", "execution_count": null, "id": "6e96c572", "metadata": {}, "outputs": [], "source": [ "## Plotting\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "sns.set_style()\n", "sns.set_theme(style=\"whitegrid\")\n", "\n", "fig, ax = plt.subplots(figsize=(15, 7))\n", "\n", "ax.scatter(m25[\"Day of Year\"], m25[\"Cost USDperMMBtu\"])\n", "ax.scatter(m24[\"Day of Year\"], m24[\"Cost USDperMMBtu\"])\n", "ax.scatter(m23[\"Day of Year\"], m23[\"Cost USDperMMBtu\"])\n", "\n", "ax.plot(m25[\"Day of Year\"], m25[\"Cost USDperMMBtu\"], label='2025')\n", "ax.plot(m24[\"Day of Year\"], m24[\"Cost USDperMMBtu\"], label='2024')\n", "ax.plot(m23[\"Day of Year\"], m23[\"Cost USDperMMBtu\"], label='2023')\n", "\n", "plt.xlabel(\"Release Date\")\n", "plt.ylabel(\"Cost in USD/MMBtu\")\n", "\n", "# Setting custom x-axis ticks location and labels.\n", "#xlabels = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'Year End']\n", "xlabels = ['Y-1', 'March', 'May', 'July', 'September', 'November', 'Y+0', 'March', 'May', 'July', 'September', 'November', 'Year End']\n", "\n", "# xpos gives the first day of every other month in terms of 'day of year'\n", "#xpos = [0,32,60,91,121,152,182,213,244,274,305,335, 365]\n", "xpos = [-365,-305,-244,-183,-121,-60,0,60,121,182,244,305,365]\n", "\n", "current_values = plt.gca().get_yticks()\n", "plt.gca().set_yticklabels(['$ {:,.0f}'.format(x) for x in current_values])\n", "\n", "plt.xticks(xpos, xlabels)\n", "\n", "plt.xlim(-365,max([m25[\"Day of Year\"].max(), m24[\"Day of Year\"].max(), m23[\"Day of Year\"].max()])+10)\n", "\n", "# Setting the graph legend\n", "plt.legend()\n" ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }