eidaqc.eida_availability module

Test availability of data on random station in European Integrated Data Archive.

A test can be executed by calling run() in a script or from command line using:

eida avail [--ignore_missing] <configfile>

The optional --ignore_missing flag can be used to force the creation of a station inventory despite missing reference networks. This may be useful when no cached inventory is available, e.g. when running for the first time.

Alternatively, you can try to create a more complete inventory using the script scripts/create_inventory.py on GitHub. It tries to add missing networks by executing direct FDSN requests to the respective servers.

The test consists of:

randomly selecting a station from the inventory
requesting the inventory at response level for this station
randomly selecting a channel from station inventory
request data for a randomly selected interval from a given time range
Remove instrument response from retrieved data
Write status code representing the outcome of steps 4 and 5 to file

The intended use case is to regularly run the request e.g. via cron job to build up a data base. The results can be evaluated using the module eida_report and should help to assess the reliability of access to the data.

Meta (“inventory”) and waveform data are requested via obspy.clients.fdsn.RoutingClient using get_stations() and get_waveforms(), respectively.

Notes

The code does not use the waveform catalog, therefore empty waveform returns are due to data gaps or due to problems in data access and delivery.

An inventory of available EIDA stations is created regularly.

class eidaqc.eida_availability.DoubleProcessCheck(maxage=300)

Bases: object

Check if a process is already running.

Process id is stored in a temporary file while program is running.

Notes

Make sure that maxage is sufficiently long, e.g. maxage >= eia_timeout.

create_pidfile()

process_active()

release()

should_exit(): True, if instance is up and running. If not, a process file is created, returns False. Subsequent calls within maxage should return True.

class eidaqc.eida_availability.EidaAvailability(eia_datapath=None, wanted_channels=('HHZ', 'BHZ', 'EHZ', 'SHZ'), eia_global_timespan_days=365, maxcacheage=432000, minreqlen=60, maxreqlen=600, eia_timeout=60, eia_min_num_networks=80, reference_networks=[], exclude_networks=[], large_networks={}, inv_update_waittime=3600, ignore_missing=False)

Bases: object

Manage and execute random data requests.

The main methods to use are:

random_request()
process_request()

Parameters

eia_datapath (str, None) – path to output directory. In this directory results are placed in a sub-directory log
wanted_channels (tuple of str) – channels used to create inventory. However for actual request any of available channels at a station is selected. Default: (‘HHZ’, ‘BHZ’, ‘EHZ’, ‘SHZ’),
eia_global_timespan_days (int [365]) – days into the past for which data requests will be created
maxcacheage (int, [5*86400]) – age of cached inventory in seconds. If inventory file is older, a new one is created from service.
minreqlen (int, [60]) – minimum length of waveform to request, in seconds
maxreqlen (int, [600]) – maximum length of waveform, in seconds
eia_timeout (int, [60]) – timeout in seconds for server requests, passed to RoutingClient( “eida-routing” ).get_stations()
eia_min_num_networks (int [80]) – minimum number of networks in new inventory to accept it
reference_networks (list of str []) – list of reference networks, that must be present to accept the automatic inventory from service
exclude_networks (list of str []) – list of networks to exclude from selection for data request. Can be e.g. non-european networks that are available through the Eida-routing client; or very small or temporary networks
large_networks (dict) – indicate probability for selection for specific networks. E.g. set large_networks = {‘NL’:0.5} to reduce probability for selecting a station from network ‘NL’
inv_update_waittime (int [3600]) – seconds to wait until update of inventory from service is tried again after failure.
ignore_missing (bool [False]) – ignore missing reference networks when inventory is updated from service. Useful to create an initial inventory.

Notes

Channels:
wanted_channel is only used when an inventory of metadata is created by the obspy routing client. From this inventory a station is selected randomly. Subsequently, meta data at response level is requested again specifically for the targeted station. From this new meta data, a random channel is selected randomly for which data is requested. In other words, even if wanted_channels contains only z-components, data will be requested for other components as well.
reference networks:
These should be large networks, which are representative for different servers. If one of these networks is missing in the inventory after update from service, the cached inventory is used, unless ignore_missing=True. Most likely the server which provides this network was not available to the routing client at the time of request. Assuming that the old inventory is more complete, it is used until a successfull update yields all reference networks. This may however be problematic at the beginning when no cached inventory is available. For this purpose, use ignore_missing=True
large networks:
By default, all networks have equal chance of 1 to be selected for a data request. However, this may lead to overrepresentation of very large networks in the statistics. A common setting might be {'NL': 0.5}.
Meta data (inventory) update:
Meta data (names of networks, available stations and channels) is obtained using routing_client.RoutingClient(“eida-routing”).get_stations() from obspy.clients.fdsn We ask regularly (maxcacheage) for all meta data at channel level (i.e. network, station and channel names). The inventory is stored as chanlist_cache.pickle in the output directory. This cached inventory is used until it is older than maxcacheage seconds. Then a new inventory is requested from service. Ideally, all servers contributing to EIDA respond and a full inventory of all networks in EIDA is obtained. This is approximately tested by checking if all reference networks are present. If this is not the case, we try to add the missing networks from the old inventory, provided there is one. Moreover a minimum number of eia_min_num_networks should be present. If the previous checks fail, we reuse the old inventory for now, but try to update the inventory from service every inv_update_waittime seconds. Please choose inv_update_waittime and maxcacheage carefully since these routing requests place a significant load in the servers and should not be called more often than necessary.

_check_path(pname, varname='')

Manage path to results.

Parameters

pname (str) – If pname is None, results are written to current working directory. If pname is string, it should be a valid path to a directory.
varname (str) – Explanatory text, passed to error and log messages.

We check for existence of this directory and create new one if absent. Expands users and variables in path.

_get_inventory_from_cache(overrideage=False)

Read station inventory from cached pickle self.slist_cache.

Returns None if

no cached pickle file is found or
if file is too old and overrideage=False (default)

Else inventory is read from file.

_get_inventory_from_service()

Retrieve station inventory from EIDA routing client.

Requests a full inventory of all available networks, stations, channels in EIDA.

Calls:

slist = RoutingClient( "eida-routing" ).get_stations(
    level='channel',
    channel=','.join(self.wanted_channels),
    starttime=UTCDateTime()-86400*eia_global_timespan_days,
    endtime=UTCDateTime(),
    timeout=self.eia_timeout, includerestricted=False )

Returns
Return type: obspy inventory or None

If total number of networks in slist is > eia_min_num_networks and no networks from reference_networks are missing, the inventory is stored as pickle 'chanlist_cache.pickle' to be used by _get_inventory_from_cache(). Returns None if any of the above fails.

_get_random_request_interval()

_get_random_request_length()

_servers_missing(inv)

Check inventory inv for reference networks.

reference networks = main network of each server.

get_inventory(force_cache=False): Read inventory from cache or from routing client.

get_station_meta(netsta, reqspan)

Retrieve full inventory including response of selected station.

Parameters

netsta (str) – network and station as net.sta
reqspan (list-like, len=2) – start and end time for request interval

is_operating(fullinv, network, station)

Check in inventory if station is currently operating.

Parameters

fullinv (obspy inventory) – inventory (usually the full Eida inventory obtained from cache or routing client)
network (str) –
station (str) –

logresult(exc=None, sta=None, reqspan=None)

Write result of request into a file database.

Called by - process_request() to store request result - get_station_meta()

number_of_networks(inv): Get number of networks in inventory inv.

process_request(channel, stainv, reqspan)

Retrieve and evaluate waveform data, result is a status code.

Second main worker. Takes output of random_request() and executes the request.

Status codes are stored in self.status. At the end, this info is written to a log file.

Parameters

random_request() (Takes output of) –
station – not used
channel (str) – string giving ‘network.station.location.channel’
stainv (inventory) – corresponding inventory
reqspan (list-like, len 2) – start and end time

Returns

Return type

status, meta_time, wave_time

random_request()

Create random request parameters and return them.

This is one of the main workers. It selects a random station, chooses an interval for the request, collects station meta data, randomly selects a channel from meta data and returns all as variables. Also available as self.requestpar.

Returns None at any point where no info is found.

Returns
Return type: selchan, stainv, reqspan or None

Collective call of

self.select_random_station()
self._get_random_request_interval()
self.get_station_meta( sta, reqspan )
select_random_station_channel( stainv, infotext )

select_random_station()

Select random station from inventory.

Notes

Calls get_inventory()
Uses self.large_networks()

select_random_station_channel(stainv, infotext='')

Randomly select channel from station inventory.

Parameters

stainv (obspy inventory) –
infotext (str) – passed to logger message

class eidaqc.eida_availability.RetryManager(name, waittime=3600)

Bases: object

Manages age of inventory

new_retry()

Return True/False depending on age of flag file.

Also try again if no flagfile exists.

try_failed(): Mark a failed try by touching the flag file.

eidaqc.eida_availability.merge_missing_inventory_entries(oldinv, newinv): Merges networks missing in newinv from oldinv.

eidaqc.eida_availability.run(configfile, maxage=300, ignore_missing=False)

Execute EIDA data availability test using parameters from configfile

Reads configuration file
configures logging handler (error messages and
runtime info)
initializes EidaAvailability
selects a random station from available meta data via random_request()
requests test data and tries to apply restitution via process_request()

Parameters

configfile (str, path-like) – path and name of configuration file. Passed to eida_config
maxage (int, None) – does not run if another process is found which started less than maxage seconds ago. Passed to DoubleProcessCheck(). If None, we use eia_timeout from configs.
ignore_missing (bool [False]) – Whether missing reference networks in inventory should be ignored when updating from service. Helpful to force creation of an initial inventory cache.

Notes

Only runs if no other instance is found (DoubleProcessCheck()). Make sure though, that maxage is sufficiently long, e.g. maxage >= eia_timeout.