Code Guide

From Derek
Jump to: navigation, search

Basic Description

The data captured from AEMO is stored in a database built on the HDF5 format. A more detailed description of the structure can be found in section 2.4. The basic premise of our solution was to use a web scraper to download the data files from AEMO, then extract the pertinent data from these files and rearrange them into a more easy to use and logical format. The AEMO data comes in spreadsheet form, with data from one five minute sample per spreadsheet or one days data per spreadsheet depending on the dataset.

Tracking Time

Keeping track of the sample times was very important for this project. As the samples are recorded by AEMO in discrete five minute (or thirty minute) intervals, we chose to simply use a reference point and data indices to track the passage of time. This allowed us to save significant memory space as storing time data separately would have meant two values per data point rather than just one. We chose a reference time of 1st January 2015 at 00:00:00 (the start of the new year). This means that midday on January 1st 2015 has an index of 144. The functions Sample- ToDatetime and DatetimeToSample perform the conversion between sample index and time strings.

Database Modules

ausp_archivedownloader: Downloads files from the AEMO archive that have not been previously downloaded.

ausp_configuration: This file is where all user defined settings are stored. This includes websites, spreadsheet indices, database structure and attributes.

ausp_databasesetup: Creates a database structure based on the user defined lists of structure data in the configuration.

ausp_datetimetosample: Converts date-time strings to sample numbers.

ausp_dbupdater: Searches the AEMO archive files for files that have not been downloaded. Then downloads and processes the files into the database.

ausp_filelogger: Logs downloaded file names into the file log dataset.

ausp_firstrun: This is the first module to run if there is no database in place. It will setup a database, fill data and set attributes.

ausp_generaldownloader: Downloads a file that from a list of filenames

ausp_importdispatchisdata: Performs the processing to transfer data from AEMO Dispatch spreadsheet into the database

ausp_importpvactualdata: Performs the processing to transfer data from AEMO PV spreadsheet into the database

ausp_importscadadata: Performs the processing to transfer data from AEMO SCADA spreadsheet into the database

ausp_parse: Parses the AEMO website for the spreadsheet or zipfile file names

ausp_rawdatafileprocessor: Processes spreadsheets downloaded from AEMO, extracting and placing the data into the database.

ausp_sampletodatetime: Converts a sample index to a datetime string based on the reference time listed in the configuration file

ausp_scrapelistoffilesarchive: Creates lists of files that are on AEMO data dashboard (Archive) for each of the three datasets used: SCADA, Reports and rooftop PV.

ausp_scrapelistoffilescurrent: Creates lists of files that are on AEMO data dashboard (Current) for each of the three datasets used: SCADA, Reports and rooftop PV.

ausp_setgeneratorattributes: Sets generator attributes from the NEM Registration list spreadsheet provided by AEMO

ausp_setinterconnectorattributes: Sets interconnector attributes based on the user defined attributes in the configuration module

ausp_setregionattribute: Sets region attributes based on the user defined attributes in the configuration module

GA Module

The GA module contains all the functions required to run the GA as we have for this project. The functions are:

generate_parent: Generates a list of specified length containing randomly generated values between 0 and 1. This is a real valued parent that can be used in a fitness function to determine a fitness as required. It is a simple matter to instead make discrete parents if that is what is required.

generate_population: Generates a population of parents using the generate_parent function. This population can be any integer value.

mutate: This function traverses a parent and mutates genes based on a mutation probability. A mutation involves replacing the gene with a randomly generated new gene. In this case, the genes are real valued in the range [0, 1], but could be modified to discrete as required.

crossover: The crossover function takes a population as its main input and outputs a new population of the same size that has undergone the crossover procedure. Crossover requires pairs of parents and each parent is only considered once per generation and so the population size must be divisible by two to ensure all parents are considered. An error will be raised if the population is not divisible by two.

tournament_selection: This function takes a population then pairs the parents up at random and compares the fitness of each pair. The parent with the better fitness is kept while the other parent is discarded. This function outputs a population that is half the size of the input population and so should generally be run twice in each generation to ensure population size is consistent throughout the simulation. As the function pairs up parents, the population size must again be divisible by two.

fitness: This is the key user determined function for the GA. This function will change depending on the aims, constraints and requirements of each simulation. The function must utilise a parent to determine its output or else the GA will essentially be a fancy random number generator.

main: This function puts all the other functions together to allow simple operation of the GA. The steps of the main function are described generally in figure 4.

Useful Code Fragments

Using matplotlib on a Mac

I ran into an issue using python to plot using matplotlib on macOS. Importing matplotlib in the following way solved the problem.


1

2 import matplotlib as mpl

3 mpl.use(’TkAgg’)

4 import matplotlib.pyplot as plt

5 import any_other_packages_as_required


Retrieving data from a .csv

Without using any other packages, the data in a column of a .csv file can be extracted to a python list using the following:


1 data = []

2 column_index = 0

3 with open (’file.csv’,’r’) as file:

4      for lines in file:

5           try:

6                x = lines.split(’,’)

7                data.append(float(x[column_index]))

8           except:

9                pass


Create a .csv file from a list of lists

It was often useful to export data from the database to a .csv file for later retrieval. There are a few ways to do this. The first method uses the pandas package:


1 import pandas as pd

2

3 column_data = [colA, colB, colC]

4 labels = [’A’, ’B’, ’C’]

5

6 df = pd.DataFrame(column_data)

7 df = df.transpose() # Remove this line if you want rows of data

8 df.columns = labels

9 pd.DataFrame.to_csv(df, ’filename.csv’, index = False)


To achieve the same result with default packages:


1 labels = [’A’, ’B’]

2 colAdata = [some_data]

3 colBdata = [some_other_data]

4 RowsAsStrings = [ ]

5

6 # Convert data to strings

7 for i in range (0,len (longest_column)):

8      RowsAsStrings.append(str(colAdata[i]) + ’,’ + str(colBdata[i]))

9

10 # Write strings to a file

11 with open (’file.csv’, ’w+’) as file:

12      file.writelines(’labels’)

13      for i in data_text:

14           file.writelines(i + ’\n’)


Extract data from database

Extracting data from the database is simple. For example this code will retrieve all the demand data for the SA1 region:


1 import h5py

2

3 with h5py.File(’STORPROJ_Database.h5’, ’r’) as f:

4      demand_data = f[’Regions’][’SA1’][’Demand’][...,0]


Extracting data over a specified time frame:


1 import h5py

2 from AUSP_DatetimeToSample import DatetimeToSample as d2s

3

4 start_date = d2s (’2016/09/01 00:00:00’)

5 end_date = d2s (’2017/09/01 00:00:00’)

6

7 with h5py.File(’STORPROJ_Database.h5’, ’r’) as f:

8      demand_data = f[’Regions’][’SA1’][’Demand’][start_date:end_date][...,0]


Extracting and summing several known datasets. The following code will extract all the data from the generators HDWF1 and HDWF2 and sum across each sample resulting in an array of summed generator outputs.


1 import h5py

2 import numpy as np

3

4 with h5py.File(’STORPROJ_Database.h5’, ’r’) as f:

5      data = np.nansum(zip(f[’DUIDs’][’HDWF1’][...,0],f[’DUIDs’][’HDWF2’][...,0]),1)


Finally, extracting all data from a range of datasets that have an attribute that is the same. This example retrieves all generators that are in the SA1 region with a primary fuel source of "wind". It also skips past any generators that have no attributes.


1 import h5py

2 import numpy as np

3

4 with h5py.File(’STORPROJ_Database.h5’,’r’) as f:

5      for generator in f[’DUIDs’]:

6           if f[’DUIDs’][generator].attrs.keys( ) != [ ]:

7                if f[’DUIDs’][generator].attrs[’Fuel Source - Primary’ ] == "Wind":

8                     if f[’DUIDs’][generator].attrs[’Region’] == "SA1":

9                          sa_wind = [np.nansum(x) for x in zip(sa_wind,f[’DUIDs’][generator][...,0])]


Plotting and Animating

Plotting data is simple using matplotlib and the documentation is full of information and examples on how to fully customise your plots. The following examples shows how simple it is to get a quick plot:


1 import matplotlib as mpl

2 mpl.use(’TkAgg’) # macOS only!

3 import matplotlib.pyplot as plt

4

5 data = [some_data]

6

7 plt.plot(data)

8 plt.show()


Animating is a bit trickier and I found the examples confusing at first so here is an example of how to animate some data:


1 import ma tplot l ib as mpl

2 mpl.use(’TkAgg’)

3 import matplotlib.pyplot as plt

4 import matplotlib.animation as animation

5

6 data = [some_data]

7 fig = plt.figure()

8 axis = fig.add_axes([0.1, 0.1, 0.8, 0.7])

9

10 def animate (frames, kwargs):

11      plt.plot(data[frames])

12

13 ani = animation.FuncAnimation(fig, animate, interval = 1000, frames = len(data), kwargs = None)

14 ani.save(’filename.mp4’)

15 plt.show()


Basically, the FuncAnimation function creates the animation using the user written function "animate" to create all the frames of the animation. The frames parameter is a generator (similar to a list of a range, e.g. frames = 10 is similar to, but not the same as frames = range(0, 10)). The interval between frames in this example is 1000ms, or 1s. The animate function can be as complex as necessary and as long as the frames input changes the plot, there will be an animation video at the end. I found that the video file that is created works better if the frames are short in duration (20ms or so). This means that an animation will usually need a large number of frames.