Help: dataselect v.1

Description

The ph5ws-dataselect service provides access to PH5 active and passive source seismic data. The service supports requests by FDSN standard parameters as well as shot and receiver gathers. Data may be selected using the FDSN standard network, station, location, channel identifiers and a data format specification (mseed, sac, segy, or geocsv) or by using additional options only applicable to shot and receiver gather requests.

To support the required fields for the FDSN web service specification version 1 additional parameters have been added to this service. The Standard Options are valid for all request types.

Restrictions:

  • The starttime and endtime parameters will be ignored for Shot and Receiver gather requests. Instead use the length parameter.

This service is an implementation of the FDSN web service specification version 1.

To retrieve raw waveform data in miniSEED, SAC (ZIP), SEGY (ZIP), or GeoCSV format, submit a request by either of two methods:

  • via HTTP GET : Provide a series of parameter-value pairs in the URL that specify the start-time and end-time, along with the desired networks, stations, locations and channels. Wildcards are supported. Please visit the ph5ws-dataselect service interface for parameter usage details.
  • via HTTP POST: Submit a pre-formatted request (e.g. a text file) to the service containing a list of the desired networks, stations, locations, channels, start-times and end-times. The POST method is described in more detail at the bottom of this page. HTTP POST requests are currently supported by the FDSN request type.

This service is designed to handle very large1 data requests and can easily be used with command line programs such as wget, curl or similar utilities.

Data selection

A data selection is composed of a list of network, station, location, channel, start-time and end-time entries. Channel codes follow the conventions documented in Appendix A of the SEED Manual. The appendix has been reproduced here to be more easily searched.

An example selection, submitted using HTTP POST, might look like:

reqtype=FDSN
format=mseed
ZI 1002 -- DPZ 2015-06-29T04:45:00 2015-06-29T05:00:00
ZI 1003 -- DPZ 2015-06-29T04:45:00 2015-06-29T05:00:00
ZI 1004 -- DPZ 2015-06-29T04:45:00 2015-06-29T05:00:00
  • Glob expressions (wildcards) are allowed in all fields except date fields.

wget example

Requests can be made with a selection file and the wget Unix command line utility.

$ cat waveform.request
reqtype=FDSN
format=mseed
ZI 1002 -- DPZ 2015-06-29T04:45:00 2015-06-29T05:00:00
ZI 1003 -- DPZ 2015-06-29T04:45:00 2015-06-29T05:00:00
ZI 1004 -- DPZ 2015-06-29T04:45:00 2015-06-29T05:00:00
4C DAN -- DPZ 2015-08-04T16:30:00 2015-08-04T20:30:00
$ wget --post-file=waveform.request -O data.miniseed http://service.iris.edu/ph5ws/dataselect/1/query

This will save the results to a file named data.miniseed

cURL example

Requests can also be made with a selection file and the curl Unix command line utility.

$ cat waveform.request
reqtype=FDSN
ZI 1002 -- DPZ 2015-06-29T04:45:00 2015-06-29T05:00:00
ZI 1003 -- DPZ 2015-06-29T04:45:00 2015-06-29T05:00:00
$ curl -L --data-binary @waveform.request -o ZI.miniseed http://service.iris.edu/ph5ws/dataselect/1/query

Here is the equivalent request using query parameters instead of a selection file…

$ curl -L -o ZI.miniseed "http://service.iris.edu/ph5ws/dataselect/1/query?net=ZI&sta=1002,1003&cha=DPZ&loc=--&starttime=2015-06-29T04:45:00&endtime=2015-06-29T05:00:00"

This will save the results to a file named ZI.miniseed

We recommend always using the -L option to allow curl to follow HTTP redirections specified by our systems. The DMC uses HTTP redirection to continue servicing requests during maintenance periods.

You may wish to use the -f option. This will cause curl to return an exit code of 22 if data is not found or the request is improperly formatted.

See http://curl.haxx.se/docs/manpage.html for more information.

Shot and Receiver Gathers

The PH5 Dataselect Web Service supports seamless extraction of time-series as Shot and Receiver gathers. Custom gathers can be made in a single web service request.

Shot Gather

The following is a shot gather web service request for 60 seconds of SEG-Y data from array 001 immediately after shot 5013 in shotline 001.

http://service.iris.edu/ph5ws/dataselect/1/query?reqtype=shot&format=segy1&shotline=001&shotid=5013&array=001&length=60&reportnum=15-016

Shot Gather Example

Receiver Gather

The following is a receiver gather web service request for 35 seconds of SEG-Y data from station G09 offset 5 seconds before shots matching the wildcard pattern 208?,209?,21?? in shotline 001.

http://service.iris.edu/ph5ws/dataselect/1/query?reqtype=receiver&format=segy1&station=G09&shotline=001&shotid=208?,209?,21??&length=35&offset=-5&reportnum=10-012

Receiver Gather Example

ObsPy Shot Gather Example

Users requesting active source data can create a common shot gather using ObsPy.

See https://github.com/PIC-IRIS/PH5/wiki/PH5-Web-Services-Shot-Gather for more information.

Working with miniSEED

A variety of software tools are available from the DMC to assist with organizing and viewing miniSEED data or converting it to another format. Detailed descriptions and usage examples for each piece of software can be found by clicking the links below.

mseed2sac – for converting miniSEED to SAC format
mseed2ascii – for converting miniSEED to ASCII formats
dataselect – for selecting and sorting miniSEED
miniSEED Inspector – for quickly parsing and summarizing miniSEED data
rdseed – for reading and extracting data in SEED volumes. NOTE: A dataless SEED volume must be used in combination with miniSEED for most conversions. A request must be submitted prior to downloading the rdseed software. http://ds.iris.edu/ds/nodes/dmc/forms/rdseed

Accessing restricted data

Requesting restricted data via this web service requires authentication. The authentication is done using a standard HTTP mechanism called digest access authentication, a sort of 3-way handshake. To submit a request with authentication credentials, use the queryauth method of the service in place of the query method. All of the common IRIS DMC clients support accessing restricted data through digest authentication.

For example, submitting a request and subsequently initiating the authentication handshake would be done by requesting this URL:

http://service.iris.edu/ph5ws/dataselect/1/queryauth?net=8A&sta=1002,1003,1004&loc=--&cha=EPZ&start=2014-11-24T04:00:00.0&end=2014-11-24T05:00:00.0

This request could be submitted, along with authentication credentials, using a command line tool like curl:

$ curl -L --digest --user EMAIL:PASSWORD -o data.mseed 'http://service.iris.edu/ph5ws/dataselect/1/queryauth?net=8A&sta=1002,1003,1004&loc=--&cha=EPZ&start=2014-11-24T04:00:00.0&end=2014-11-24T05:00:00.0'

where you replace EMAIL and PASSWORD with your own credentials. If you are submitting this request from the command line, then for security purposes, you may consider not including PASSWORD in your request, as it is an optional parameter. If only EMAIL is specified, then curl will prompt you for your password when the request is submitted.

You may try out authentication using your software with the following test credentials: [email protected] and password=anonymous. A working version of the curl example above using the test credentials would be:

$ curl -L --digest --user [email protected]:anonymous -o data.mseed 'http://service.iris.edu/ph5ws/dataselect/1/queryauth?net=8A&sta=1002,1003,1004&loc=--&cha=EPZ&start=2014-11-24T04:00:00.0&end=2014-11-24T05:00:00.0'

Note: A known problem can occur when repeatedly submitting queryauth requests for longer than a minute or so. The symptom is an authentication failure occurring, despite using proper credentials, after one or more successful requests. The work-around is for the client to re-submit the queryauth request. Only a single re-submission should be needed. If authentication repeatedly fails for queryauth requests, it indicates a different problem. The DMC will continue to look for a long-term solution to this issue, but for now, the recommendation of a single retry should work robustly.

Considerations

1 In general, it is preferable to not ask for too much data in a single request. Large requests can take longer to complete. If a large request fails due to any networking issue, it will have to be resubmitted to be completed. This will cause the entire request to be completely reprocessed and re-transmitted. By breaking large requests into smaller requests, only the smaller pieces will need to be resubmitted and re-transmitted if there is a networking problem. Web service network connections will break after 5 to 10 minutes if no data is transmitted. For large requests, the ph5-dataselect web service can take several minutes before it starts returning data. When this happens, the web service may “flush” the HTTP headers with an “optimistic” success (200) code to the client in order to keep the network connection alive. This gives about 10 minutes to the underlying data retrieval mechanism to start pulling data out of the IRIS archive. Thus for larger requests, the HTTP return code can be unreliable. As data is streamed back to the client, the ph5-dataselect service partially buffers the returned data. During time periods when the underlying retrieval mechanism stalls, the web service will dribble the partial buffer to the client in an effort to keep the network connection alive.

It is less efficient to ask for too little data in each request. Each time a request is made, a network connection must be established and a request processing unit started. For performance reasons, it is better to group together selections from the same stations and place them in the same request. This is especially true of selections that cover the same time periods.

2 Requests for data in SAC format are returned in a compressed ZIP64 file format defined in PKZIP Application Note. Please make sure your ZIP file extraction client supports this format.


Problems with this service?

Please send an email report of which service you were using, your URL query, and any error feedback to:
[email protected]
We will address your issue as soon as possible.