eidaqc.eida_availability module
Test availability of data on random station in European Integrated Data Archive.
A test can be executed by calling run()
in a script or
from command line using:
eida avail [--ignore_missing] <configfile>
The optional --ignore_missing
flag can be used to force the creation
of a station inventory despite missing reference networks. This may be
useful when no cached inventory is available, e.g. when running for the
first time.
Alternatively, you can try to create a more complete inventory
using the script scripts/create_inventory.py
on GitHub.
It tries to add missing networks by executing direct FDSN requests
to the respective servers.
The test consists of:
randomly selecting a station from the inventory
requesting the inventory at response level for this station
randomly selecting a channel from station inventory
request data for a randomly selected interval from a given time range
Remove instrument response from retrieved data
Write status code representing the outcome of steps 4 and 5 to file
The intended use case is to regularly run the request e.g. via cron job
to build up a data base. The results can be evaluated using the
module eida_report
and should help to assess the reliability of access
to the data.
Meta (“inventory”) and waveform data are requested via
obspy.clients.fdsn.RoutingClient
using get_stations()
and get_waveforms()
, respectively.
Notes
The code does not use the waveform catalog, therefore empty waveform returns are due to data gaps or due to problems in data access and delivery.
An inventory of available EIDA stations is created regularly.
- class eidaqc.eida_availability.DoubleProcessCheck(maxage=300)
Bases:
object
Check if a process is already running.
Process id is stored in a temporary file while program is running.
Notes
Make sure that
maxage
is sufficiently long, e.g. maxage >= eia_timeout.- create_pidfile()
- process_active()
- release()
- should_exit()
True, if instance is up and running. If not, a process file is created, returns False. Subsequent calls within
maxage
should return True.
- class eidaqc.eida_availability.EidaAvailability(eia_datapath=None, wanted_channels=('HHZ', 'BHZ', 'EHZ', 'SHZ'), eia_global_timespan_days=365, maxcacheage=432000, minreqlen=60, maxreqlen=600, eia_timeout=60, eia_min_num_networks=80, reference_networks=[], exclude_networks=[], large_networks={}, inv_update_waittime=3600, ignore_missing=False)
Bases:
object
Manage and execute random data requests.
The main methods to use are:
random_request()
process_request()
- Parameters
eia_datapath (str, None) – path to output directory. In this directory results are placed in a sub-directory log
wanted_channels (tuple of str) – channels used to create inventory. However for actual request any of available channels at a station is selected. Default: (‘HHZ’, ‘BHZ’, ‘EHZ’, ‘SHZ’),
eia_global_timespan_days (int [365]) – days into the past for which data requests will be created
maxcacheage (int, [5*86400]) – age of cached inventory in seconds. If inventory file is older, a new one is created from service.
minreqlen (int, [60]) – minimum length of waveform to request, in seconds
maxreqlen (int, [600]) – maximum length of waveform, in seconds
eia_timeout (int, [60]) – timeout in seconds for server requests, passed to RoutingClient( “eida-routing” ).get_stations()
eia_min_num_networks (int [80]) – minimum number of networks in new inventory to accept it
reference_networks (list of str []) – list of reference networks, that must be present to accept the automatic inventory from service
exclude_networks (list of str []) – list of networks to exclude from selection for data request. Can be e.g. non-european networks that are available through the Eida-routing client; or very small or temporary networks
large_networks (dict) – indicate probability for selection for specific networks. E.g. set large_networks = {‘NL’:0.5} to reduce probability for selecting a station from network ‘NL’
inv_update_waittime (int [3600]) – seconds to wait until update of inventory from service is tried again after failure.
ignore_missing (bool [False]) – ignore missing reference networks when inventory is updated from service. Useful to create an initial inventory.
Notes
- Channels:
wanted_channel
is only used when an inventory of metadata is created by the obspy routing client. From this inventory a station is selected randomly. Subsequently, meta data at response level is requested again specifically for the targeted station. From this new meta data, a random channel is selected randomly for which data is requested. In other words, even ifwanted_channels
contains only z-components, data will be requested for other components as well.
- reference networks:
These should be large networks, which are representative for different servers. If one of these networks is missing in the inventory after update from service, the cached inventory is used, unless
ignore_missing=True
. Most likely the server which provides this network was not available to the routing client at the time of request. Assuming that the old inventory is more complete, it is used until a successfull update yields all reference networks. This may however be problematic at the beginning when no cached inventory is available. For this purpose, useignore_missing=True
- large networks:
By default, all networks have equal chance of 1 to be selected for a data request. However, this may lead to overrepresentation of very large networks in the statistics. A common setting might be
{'NL': 0.5}
.
- Meta data (inventory) update:
Meta data (names of networks, available stations and channels) is obtained using routing_client.RoutingClient(“eida-routing”).get_stations() from obspy.clients.fdsn We ask regularly (
maxcacheage
) for all meta data at channel level (i.e. network, station and channel names). The inventory is stored aschanlist_cache.pickle
in the output directory. This cached inventory is used until it is older than maxcacheage seconds. Then a new inventory is requested from service. Ideally, all servers contributing to EIDA respond and a full inventory of all networks in EIDA is obtained. This is approximately tested by checking if all reference networks are present. If this is not the case, we try to add the missing networks from the old inventory, provided there is one. Moreover a minimum number ofeia_min_num_networks
should be present. If the previous checks fail, we reuse the old inventory for now, but try to update the inventory from service everyinv_update_waittime
seconds. Please chooseinv_update_waittime
andmaxcacheage
carefully since these routing requests place a significant load in the servers and should not be called more often than necessary.
- _check_path(pname, varname='')
Manage path to results.
- Parameters
pname (str) – If
pname
is None, results are written to current working directory. Ifpname
is string, it should be a valid path to a directory.varname (str) – Explanatory text, passed to error and log messages.
We check for existence of this directory and create new one if absent. Expands users and variables in path.
- _get_inventory_from_cache(overrideage=False)
Read station inventory from cached pickle
self.slist_cache
.Returns
None
ifno cached pickle file is found or
if file is too old and
overrideage=False
(default)
Else inventory is read from file.
- _get_inventory_from_service()
Retrieve station inventory from EIDA routing client.
Requests a full inventory of all available networks, stations, channels in EIDA.
Calls:
slist = RoutingClient( "eida-routing" ).get_stations( level='channel', channel=','.join(self.wanted_channels), starttime=UTCDateTime()-86400*eia_global_timespan_days, endtime=UTCDateTime(), timeout=self.eia_timeout, includerestricted=False )
- Returns
- Return type
obspy inventory or None
If total number of networks in
slist
is >eia_min_num_networks
and no networks fromreference_networks
are missing, the inventory is stored as pickle'chanlist_cache.pickle'
to be used by_get_inventory_from_cache()
. ReturnsNone
if any of the above fails.
- _get_random_request_interval()
- _get_random_request_length()
- _servers_missing(inv)
Check inventory
inv
for reference networks.reference networks = main network of each server.
- get_inventory(force_cache=False)
Read inventory from cache or from routing client.
- get_station_meta(netsta, reqspan)
Retrieve full inventory including response of selected station.
- Parameters
netsta (str) – network and station as net.sta
reqspan (list-like, len=2) – start and end time for request interval
- is_operating(fullinv, network, station)
Check in inventory if station is currently operating.
- Parameters
fullinv (obspy inventory) – inventory (usually the full Eida inventory obtained from cache or routing client)
network (str) –
station (str) –
- logresult(exc=None, sta=None, reqspan=None)
Write result of request into a file database.
Called by -
process_request()
to store request result -get_station_meta()
- number_of_networks(inv)
Get number of networks in inventory
inv
.
- process_request(channel, stainv, reqspan)
Retrieve and evaluate waveform data, result is a status code.
Second main worker. Takes output of random_request() and executes the request.
Status codes are stored in self.status. At the end, this info is written to a log file.
- Parameters
random_request() (Takes output of) –
station – not used
channel (str) – string giving ‘network.station.location.channel’
stainv (inventory) – corresponding inventory
reqspan (list-like, len 2) – start and end time
- Returns
- Return type
status, meta_time, wave_time
- random_request()
Create random request parameters and return them.
This is one of the main workers. It selects a random station, chooses an interval for the request, collects station meta data, randomly selects a channel from meta data and returns all as variables. Also available as
self.requestpar
.Returns
None
at any point where no info is found.- Returns
- Return type
selchan, stainv, reqspan or None
Collective call of
self.select_random_station()
self._get_random_request_interval()
self.get_station_meta( sta, reqspan )
select_random_station_channel( stainv, infotext )
- select_random_station()
Select random station from inventory.
Notes
Calls
get_inventory()
Uses
self.large_networks()
- select_random_station_channel(stainv, infotext='')
Randomly select channel from station inventory.
- Parameters
stainv (obspy inventory) –
infotext (str) – passed to logger message
- class eidaqc.eida_availability.RetryManager(name, waittime=3600)
Bases:
object
Manages age of inventory
- new_retry()
Return True/False depending on age of flag file.
Also try again if no flagfile exists.
- try_failed()
Mark a failed try by touching the flag file.
- eidaqc.eida_availability.merge_missing_inventory_entries(oldinv, newinv)
Merges networks missing in newinv from oldinv.
- eidaqc.eida_availability.run(configfile, maxage=300, ignore_missing=False)
Execute EIDA data availability test using parameters from
configfile
Reads configuration file
- configures logging handler (error messages and
runtime info)
initializes
EidaAvailability
selects a random station from available meta data via
random_request()
requests test data and tries to apply restitution via
process_request()
- Parameters
configfile (str, path-like) – path and name of configuration file. Passed to
eida_config
maxage (int, None) – does not run if another process is found which started less than
maxage
seconds ago. Passed toDoubleProcessCheck()
. IfNone
, we useeia_timeout
from configs.ignore_missing (bool [False]) – Whether missing reference networks in inventory should be ignored when updating from service. Helpful to force creation of an initial inventory cache.
Notes
Only runs if no other instance is found (
DoubleProcessCheck()
). Make sure though, thatmaxage
is sufficiently long, e.g. maxage >= eia_timeout.