Tutorial to use our exoplanet.eu API

All of our database is accessible in python via our API. Here, we show how to use the API to query our database and collect/handle the data you want in an easy way. 

Our API is very general as it has been developed by the Virtual Observatory (VO) for all of the databases they give access to (exoplanet.eu being one among many). As such, the syntax presented in this tutorial can be used for other databases (that can be found here: http://vespa.obspm.fr/) very easily.

QUICK START 

1) The first thing to do to use our API in python is to install one package called pyvo to your python libraries (just type in: pip install pyvo, or use your favorite way of installing new packages).

2) When this is done, open a new python file (or do it interactively with ipython) and then import this new pyvo module: 

import pyvo

3) Select the service you want to use (here the exoplanet database is part of the more general planeto service, which includes many databases, see here, e.g. you have access to asteroid and cometary databases, the minor planet center, ... but some other services that are hosting non-planet-related databases can also be accessed):

service = pyvo.dal.TAPService("http://voparis-tap-planeto.obspm.fr/tap")

4) Now you have to query this selected service and select the right exoplanet.eu table, which is exoplanet.epn_core as can be seen here. The language to query is based on SQL and is called ADQL (Astronomical Data Query Language), which is a language to retrieve information from tables stored in a database. We will detail more about the query possibilities later in this tutorial (point 16). For now, let's query all the fields (*) in the main database exoplanet.epn_core for planets that have semi-major axes lower than 5 au:

query = "SELECT * FROM exoplanet.epn_core WHERE semi_major_axis < 5"

5) To get the results of your query in a table called results, you should type in:

results = service.search(query) 

6) When we have the results of our query stocked in the results array. We can first check the number of rows (i.e. planets) corresponding to our query by typing:

len(results)

7) And the number of columns (fields):

len(results[0])

8) In order to get a specific value for a given field (here the target_name field for instance), one can use:

results[0].get('target_name') or results['target_name',0]

9) You can now save your data permanently as a votable or in a numpy table (after importing numpy by adding: import numpy as np):

To store it as a VO table, first access to the VO table in your results array:

vot=results.votable

Then save your VO table to myresults.xml:

vot.to_xml
("myresults.xml")

If you wanted to save it as a numpy array, you could do instead:

np.save('myresults',results.table)


10) And to access your data again from the saved file


If you want to access to the stored VO table again later, use parse (first import the right package: 
from astropy.io.votable import parse    

vot = parse("myresults.xml")

and then access to the data by selecting the first unique table

results = votable.get_first_table()

and if you saved it as a numpy array, then access it via

results=np.load('myresults.npy')

More advanced features after you have reached a plateau using the QUICK START section:

11) In order to know the syntax of all the fields you can query, you can have a look on the exoplanet.epn_core info page, or on vespa which also shows the values of the table or you can also do it directly in python by querying as such:

query="SELECT column_name FROM TAP_SCHEMA.columns where table_name = 'exoplanet.epn_core'" 

and then returning the results as follows:

resultsfields = service.search(query) 

which will indeed return an array called resultsfields with all the accessible fields of the exoplanet.eudatabase.

12) To know the maximum number of results that can be fetched in one go:

print(service.maxrec)

which is 20000 for now (i.e. greater than the number of known exoplanets).

13) This maximum number of results that can be fetched can, however, be changed by changing service.maxrec directly but it will never go beyond service.hardlimit, which is:

print(service.hardlimit)
20000000

i.e., it will be fine for a few decades!

14) You can also access to the field names that your recorded in a results array (from your query) into the results array by doing:

print(results.fieldnames)

15) Asynchronous queries: It is also possible to start asynchronous queries. This can be useful for long queries or if you need to run thousands of queries as you can start your runs and then come back when they are finished.

Say you want to execute this query asynchronously: 

query = "SELECT * FROM exoplanet.epn_core WHERE semi_major_axis < 5"

Then the command is: 

service.submit_job(query)

But to really start running it you need to type:

job.run()

To know whether it's still running or completed you can use

print(job.phase)
which will return: RUN or COMPLETED

It can be useful to remember the job's url to access it when you come back to check the results:

job.url

It returns something like that: u'http://voparis-tap-planeto.obspm.fr:80/__system__/tap/run/tap/async/jHFATK'

Then, to be able to retrieve your results later, you can use

job = pyvo.dal.tap.AsyncTAPJob(job.url)

and then the results of the job can be fetched via:

results=job.fetch_result()

16) More about queries and the ADQL language:

A query is usually composed of three parts: the table columns you want to retrieve from the database (the SELECT part), the table or tables that store the data (the FROM part) and the conditions to restrict the data you obtain (the WHERE part). It can be summarised as follow:

SELECT <columns> FROM <tables> WHERE <conditions>

For instance, if you want to select the mass and radius from the exoplanet.eu database you may type:

query="SELECT mass,radius FROM exoplanet.epn_core"

You could also pass in some more complex queries such as only selecting a certain region of the sky, e.g. if you want to only select same part of the KEPLER field (note that there is an easier way of doing that):

query = "SELECT mass,radius FROM exoplanet.epn_core WHERE CONTAINS(POINT('ICRS',ra,dec),CIRCLE('ICRS',290.,41., 5))=1"

We let the reader refer to the end of point 16 to find the most useful parameters that can be used to query the database.

We now show a few more tricks to query the database.

If you are only interested in the first 10 results then your query should be:

query="SELECT TOP 10 mass,radius FROM exoplanet.epn_core"

You can count the number of systems that have a disc by doing so:

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE detected_disc IS NOT NULL"

or the number of systems that have a measured planet radius:

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE radius IS NOT NULL"

Similar functions that return only one element are MIN, MAX, SUM or AVG (arithmetic mean), which you can also use.

You can sort your results directly by using ORDER BY (ascending order ASC is the default and descending order DESC should be specified when wanted) as such:

query="SELECT TOP 10 mass,radius FROM exoplanet.epn_core ORDER BY mass DESC"

You can also use AS to add a new column which will be in your output results and than you can use to sort your table with, e.g.:

query="SELECT TOP 10 LOG10(period) AS logperiod FROM exoplanet.epn_core ORDER BY logperiod"

In order to GROUP some elements together, you can use GROUP BY. For instance, let's say you want to sum the masses of all planets in a given system and list the results for each planetary system, you could GROUP all systems by their star name and sum the planet masses for these different GROUPS:

query="SELECT star_name, SUM(mass) AS smass FROM exoplanet.epn_core GROUP BY star_name ORDER BY smass DESC"

you can also use AND (or OR):

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE (radius IS NOT NULL AND mass IS NOT NULL)"

and mathematical functions such as:
     SIN(x), COS(x), TAN(x),ASIN(x), ACOS(x), ATAN(x), ATAN2(x,y), DEGREES(x),RADIANS(x)
     EXP(x), LOG(x), LOG10(x), POWER(x, y), SQRT(x)    
ROUND(x,n), FLOOR(x), CEILING(x),TRUNCATE(x,n)    
ABS(), RAND(n), MOD(x,y)    
PI()

You can also define some regions of the sky as a BOX or a POLYGON similar to CIRCLE that we have already evoked above.

The most useful fields from the exoplanet.eu database that could be used for your queries are:

target_name 
obs_id 
detection_type  (transit, RV, ...)
publication_status (refereed or not, published or not)
ra (deg) 
dec (deg) 
mass ('jupiterMass') mass_error_min mass_error_max 
radius ('jupiterRad')  radius_error_min radius_error_max 
mass_sin_i ('jupiterMass') mass_sin_i_error_min  mass_sin_i_error_max 
semi_major_axis (AU)  semi_major_axis_error_min  semi_major_axis_error_max
period (d)  period_error_min  period_error_max 
eccentricity  eccentricity_error_min  eccentricity_error_max 
discovered (the year of discovery)
angular_distance (arcsec) 
mass_detection_type 
radius_detection_type 
alt_target_name (alternate names of planets)
star_name 
star_distance (pc)  star_distance_error_min  star_distance_error_max 
star_spec_type  (Spectral types of the host stars)
mag_v  mag_i  mag_j  mag_h  mag_k  (magnitudes of the host stars in band V, I, J, H and K)
star_metallicity 
star_mass (solar Mass) 
star_radius (solar Radius) 
star_age (Gyr) 
star_teff (K) 
detected_disc (when a disc is detected, this field is filled in with the disc detection method)


And the less often used (but still useful sometimes):

creation_date 
modification_date 
species (the species detected in the planet's atmosphere)
temp_calculated (K)
temp_measured (K) 
log_g (of the planet) 
albedo  albedo_error_min  albedo_error_max 
updated (last update of the parameters)
remarks (some comments about a specific system)
other_web (some links to data when available)
periastron (deg)  periastron_error_min periastron_error_max (argument of periastron)
t_peri (d)  t_peri_error_min  t_peri_error_max  (epoch of periastron)
tzero_tr (d)  tzero_tr_error_min  tzero_tr_error_max  (epoch of primary transit)
tzero_vr (d)  tzero_vr_error_min  tzero_vr_error_max  (epoch of zero radial velocity)
t_conj (d)  t_conj_error_min  t_conj_error_max  (epoch of conjunction)
inclination (deg)  inclination_error_min  inclination_error_max (orbital inclination)
tzero_tr_sec (d)  tzero_tr_sec_error_min tzero_tr_sec_error_max (epoch of secondary transit)
lambda_angle (deg)  lambda_angle_error_min  lambda_angle_error_max (Sky-projected angle between the planetary orbital planet and the stellar rotational spin)
k (m/s)  k_error_min  k_error_max (Velocity semi-amplitude)
impact_parameter  impact_parameter_error_min  impact_parameter_error_max 
magnetic_field (say Yes if a magnetic field is detected)


Some other parameters that we haven't described are specific to every single database accessible via the Virtual Observatory and these parameters are described here if needed.