Tutorial to use our exoplanet.eu API

All of our database is accessible in python via our API. Here, we show how to use the API to query our database and collect/handle the data you want in an easy way.

Our API is very general as it has been developed by the Virtual Observatory (VO) for all of the databases they give access to (exoplanet.eu being one among many). As such, the syntax presented in this tutorial can be used for other databases (that can be found here: https://vespa.obspm.fr/) very easily.

QUICK START

1) The first thing to do to use our API in python is to install one package called pyvo to your python libraries (just type in: pip install pyvo, or use your favorite way of installing new packages).

2) When this is done, open a new python file (or do it interactively with ipython) and then import this new pyvo module:

import pyvo

3) Select the service you want to use (here the exoplanet database is part of the more general planeto service, which includes many databases, see here, e.g. you have access to asteroid and cometary databases, the minor planet center, ... but some other services that are hosting non-planet-related databases can also be accessed):

service = pyvo.dal.TAPService("http://voparis-tap-planeto.obspm.fr/tap")

4) Now you have to query this selected service and select the right exoplanet.eu table, which is exoplanet.epn_core as can be seen here. The language to query is based on SQL and is called ADQL (Astronomical Data Query Language), which is a language to retrieve information from tables stored in a database. We will detail more about the query possibilities later in this tutorial (point 16). For now, let's query all the fields (*) in the main database exoplanet.epn_core for planets that have semi-major axes lower than 5 au:

query = "SELECT * FROM exoplanet.epn_core WHERE semi_major_axis < 5"

5) To get the results of your query in a table called results, you should type in:

results = service.search(query)

6) When we have the results of our query stocked in the results array. We can first check the number of rows (i.e. planets) corresponding to our query by typing:

len(results)

7) And the number of columns (fields):

len(results[0])

8) In order to get a specific value for a given field (here the target_name field for instance), one can use:

results[0].get('target_name') or results['target_name',0]

9) You can now save your data permanently as a votable or in a numpy table (after importing numpy by adding: import numpy as np):

To store it as a VO table, first access to the VO table in your results array:

vot=results.votable

Then save your VO table to myresults.xml:

vot.to_xml("myresults.xml")

If you wanted to save it as a numpy array, you could do instead:

np.save('myresults',results.table)

10) And to access your data again from the saved file

If you want to access to the stored VO table again later, use parse (first import the right package: from astropy.io.votable import parse)

vot = parse("myresults.xml")

and then access to the data by selecting the first unique table

results = votable.get_first_table()

and if you saved it as a numpy array, then access it via

results=np.load('myresults.npy')

More advanced features after you have reached a plateau using the QUICK START section:

11) In order to know the syntax of all the fields you can query, you can have a look on the exoplanet.epn_core info page, or on vespa which also shows the values of the table or you can also do it directly in python by querying as such:

query="SELECT column_name FROM TAP_SCHEMA.columns where table_name = 'exoplanet.epn_core'"

and then returning the results as follows:

resultsfields = service.search(query)

which will indeed return an array called resultsfields with all the accessible fields of the exoplanet.eudatabase.

12) To know the maximum number of results that can be fetched in one go:

print(service.maxrec)

which is 20000 for now (i.e. greater than the number of known exoplanets).

13) This maximum number of results that can be fetched can, however, be changed by changing service.maxrec directly but it will never go beyond service.hardlimit, which is:

print(service.hardlimit)

20000000

i.e., it will be fine for a few decades!

14) You can also access to the field names that your recorded in a results array (from your query) into the results array by doing:

print(results.fieldnames)

15) Asynchronous queries: It is also possible to start asynchronous queries. This can be useful for long queries or if you need to run thousands of queries as you can start your runs and then come back when they are finished.

Say you want to execute this query asynchronously:

query = "SELECT * FROM exoplanet.epn_core WHERE semi_major_axis < 5"

Then the command is:

service.submit_job(query)

But to really start running it you need to type:

job.run()

To know whether it's still running or completed you can use

print(job.phase)

which will return: RUN or COMPLETED

It can be useful to remember the job's url to access it when you come back to check the results:

job.url

It returns something like that: u'http://voparis-tap-planeto.obspm.fr:80/__system__/tap/run/tap/async/jHFATK'

Then, to be able to retrieve your results later, you can use

job = pyvo.dal.tap.AsyncTAPJob(job.url)

and then the results of the job can be fetched via:

results=job.fetch_result()

16) More about queries and the ADQL language:

A query is usually composed of three parts: the table columns you want to retrieve from the database (the SELECT part), the table or tables that store the data (the FROM part) and the conditions to restrict the data you obtain (the WHERE part). It can be summarised as follow:

SELECT <columns> FROM <tables> WHERE <conditions>

For instance, if you want to select the mass and radius from the exoplanet.eu database you may type:

query="SELECT mass,radius FROM exoplanet.epn_core"

You could also pass in some more complex queries such as only selecting a certain region of the sky, e.g. if you want to only select same part of the KEPLER field (note that there is an easier way of doing that):

query = "SELECT mass,radius FROM exoplanet.epn_core WHERE CONTAINS(POINT('ICRS',ra,dec),CIRCLE('ICRS',290.,41., 5))=1"

We let the reader refer to the end of point 16 to find the most useful parameters that can be used to query the database.

We now show a few more tricks to query the database.

If you are only interested in the first 10 results then your query should be:

query="SELECT TOP 10 mass,radius FROM exoplanet.epn_core"

You can count the number of systems that have a disc by doing so:

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE detected_disc IS NOT NULL"

or the number of systems that have a measured planet radius:

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE radius IS NOT NULL"

Similar functions that return only one element are MIN, MAX, SUM or AVG (arithmetic mean), which you can also use.

You can sort your results directly by using ORDER BY (ascending order ASC is the default and descending order DESC should be specified when wanted) as such:

query="SELECT TOP 10 mass,radius FROM exoplanet.epn_core ORDER BY mass DESC"

You can also use AS to add a new column which will be in your output results and than you can use to sort your table with, e.g.:

query="SELECT TOP 10 LOG10(period) AS logperiod FROM exoplanet.epn_core ORDER BY logperiod"

In order to GROUP some elements together, you can use GROUP BY. For instance, let's say you want to sum the masses of all planets in a given system and list the results for each planetary system, you could GROUP all systems by their star name and sum the planet masses for these different GROUPS:

query="SELECT star_name, SUM(mass) AS smass FROM exoplanet.epn_core GROUP BY star_name ORDER BY smass DESC"

you can also use AND (or OR):

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE (radius IS NOT NULL AND mass IS NOT NULL)"

and mathematical functions such as:

SIN(x), COS(x), TAN(x),ASIN(x), ACOS(x), ATAN(x), ATAN2(x,y), DEGREES(x),RADIANS(x)

EXP(x), LOG(x), LOG10(x), POWER(x, y), SQRT(x)

ROUND(x,n), FLOOR(x), CEILING(x),TRUNCATE(x,n)

ABS(), RAND(n), MOD(x,y)

PI()

You can also define some regions of the sky as a BOX or a POLYGON similar to CIRCLE that we have already evoked above.

The most useful fields from the exoplanet.eu database that could be used for your queries are:

target_name

obs_id

detection_type (transit, RV, ...)

publication_status (refereed or not, published or not)

ra (deg)

dec (deg)

mass ('jupiterMass') mass_error_min mass_error_max

radius ('jupiterRad') radius_error_min radius_error_max

mass_sin_i ('jupiterMass') mass_sin_i_error_min mass_sin_i_error_max

semi_major_axis (AU) semi_major_axis_error_min semi_major_axis_error_max

period (d) period_error_min period_error_max

eccentricity eccentricity_error_min eccentricity_error_max

discovered (the year of discovery)

angular_distance (arcsec)

mass_detection_type

radius_detection_type

alt_target_name (alternate names of planets)

star_name

star_distance (pc) star_distance_error_min star_distance_error_max

star_spec_type (Spectral types of the host stars)

mag_v mag_i mag_j mag_h mag_k (magnitudes of the host stars in band V, I, J, H and K)

star_metallicity

star_mass (solar Mass)

star_radius (solar Radius)

star_age (Gyr)

star_teff (K)

detected_disc (when a disc is detected, this field is filled in with the disc detection method)

And the less often used (but still useful sometimes):

creation_date

modification_date

species (the species detected in the planet's atmosphere)

temp_calculated (K)

temp_measured (K)

log_g (of the planet)

albedo albedo_error_min albedo_error_max

updated (last update of the parameters)

remarks (some comments about a specific system)

other_web (some links to data when available)

periastron (deg) periastron_error_min periastron_error_max (argument of periastron)

t_peri (d) t_peri_error_min t_peri_error_max (epoch of periastron)

tzero_tr (d) tzero_tr_error_min tzero_tr_error_max (epoch of primary transit)

tzero_vr (d) tzero_vr_error_min tzero_vr_error_max (epoch of zero radial velocity)

t_conj (d) t_conj_error_min t_conj_error_max (epoch of conjunction)

inclination (deg) inclination_error_min inclination_error_max (orbital inclination)

tzero_tr_sec (d) tzero_tr_sec_error_min tzero_tr_sec_error_max (epoch of secondary transit)

lambda_angle (deg) lambda_angle_error_min lambda_angle_error_max (Sky-projected angle between the planetary orbital planet and the stellar rotational spin)

k (m/s) k_error_min k_error_max (Velocity semi-amplitude)

impact_parameter impact_parameter_error_min impact_parameter_error_max

magnetic_field (say Yes if a magnetic field is detected)

Some other parameters that we haven't described are specific to every single database accessible via the Virtual Observatory and these parameters are described here if needed.