sherlock ∞
A python package with command-line tools for contextually classifying variable/transient astronomical sources. Sherlock mines a library of historical and on-going astronomical survey data in an attempt to identify sources transient/variable events, and predict their classifications based on the associated crossmatched data.
Here’s a summary of what’s included in the python package:
Classes
sherlock.catalogue_conesearch |
The worker class for the conesearch module |
sherlock.commonutils.update_wiki_pages |
Update sherlock’s github wiki pages with some useful info regarding the crossmatch database catalogue tables |
sherlock.database |
the database object for sherlock, setting up ssh tunnels and various database connections |
sherlock.database_cleaner |
Clean and maintain the database helper tables used by sherlock |
sherlock.imports._base_importer |
The base importer object used to import new catalgues into sherlock-catalogues database |
sherlock.imports.ifs |
Importer for the Multi Unit Spectroscopic Explorer (MUSE) IFS galaxy catalogue stream |
sherlock.imports.marshall |
Import the ePESSTO Marshall transient streams (includes multiple on-going transient survey streams) into the Sherlock-catalogues database |
sherlock.imports.ned |
Using a list of coordinates, query the online NED database and import sources found within a given search radius of each of the loctions into the sherlock-catalogues database |
sherlock.imports.ned_d |
Import the * `NED-D <https://ned.ipac.caltech.edu/Library/Distances/>`_ *galaxy catalogue in to the sherlock-catalogues database |
sherlock.imports.veron |
importer object for the VERON AGN catalogue |
sherlock.transient_catalogue_crossmatch |
crossmatch a list of transients against a suite of catalogues according to given search algorithm |
sherlock.transient_classifier |
The Sherlock Transient Classifier |
Functions
sherlock.commonutils.get_crossmatch_catalogues_column_map |
Query the sherlock-catalogues helper tables to generate a map of the important columns of each catalogue |
Installation ∞
Although you can get Sherlock from a simple pip
install, it’s best to install it within a Conda environment under Anaconda. If you’re not familiar with Anaconda, you’ll find a good tutorial here to get you up and running.
Once you have Anaconda installed, go ahead and create a new Conda environment to host Sherlock:
conda create -n sherlock python=2.7 pip numpy
Now activate the environment and install sherlock:
source activate sherlock
pip install qub-sherlock
At any point in the future you can upgrade to the latest version of sherlock with the command:
pip install qub-sherlock --upgrade
If instead you want to clone the github repo and install from a local version of the code:
git clone git@github.com:thespacedoctor/sherlock.git
cd sherlock
source activate sherlock
python setup.py install
Development ∞
If you want to tinker with the code, then install in development mode. This means you can modify the installed code from whereever you clone the repo to:
git clone git@github.com:thespacedoctor/sherlock.git
cd sherlock
source activate sherlock
python setup.py develop
Pull requests are welcomed!
Sublime Snippets ∞
If you use Sublime Text as your code editor, and you’re planning to develop your own python code with sherlock, you might find my Sublime Snippets useful.
Command-Line Usage ∞
# SHERLOCK #
: INFERING TRANSIENT-SOURCE CLASSIFICATIONS FROM SPATIALLY CROSS-MATCHED CATALOGUED SOURCES :
=============================================================================================
Documentation for sherlock can be found here: http://qub-sherlock.readthedocs.org/en/stable
.. todo ::
- docuument cl_utils module
- tidy usage text
Usage:
sherlock init
sherlock info [-s <pathToSettingsFile>]
sherlock [-NA] dbmatch [--update] [-s <pathToSettingsFile>]
sherlock [-bN] match -- <ra> <dec> [<pathToSettingsFile>]
sherlock clean [-s <pathToSettingsFile>]
sherlock wiki [-s <pathToSettingsFile>]
sherlock import ned <ra> <dec> <radiusArcsec> [-s <pathToSettingsFile>]
sherlock import cat <cat_name> <pathToDataFile> <cat_version> [-s <pathToSettingsFile>]
sherlock import stream <stream_name> [-s <pathToSettingsFile>]
Options:
init setup the sherlock settings file for the first time
match XXXX
dbmatch database match
clean XXXX
wiki XXXX
import XXXX
ned use the online NED database as the source catalogue
cat import a static catalogue into the sherlock-catalogues database
stream download/stream new data from a give source catalogue into the sherlock sherlock-catalogues database
info print an overview of the current catalogues, views and streams in the sherlock database ready for crossmatching
ra the right-ascension coordinate with which to perform a conesearch (sexegesimal or decimal degrees)
dec the declination coordinate with which to perform a conesearch (sexegesimal or decimal degrees)
radiusArcsec radius in arcsec of the footprint to download from the online NED database
cat_name name of the catalogue being imported (veron|ned_d)
stream_name name of the stream to import into the sherlock-catalogues database (ifs)
-N, --skipNedUpdate do not update the NED database before classification
-A, --skipMagUpdate do not update the peak magnitudes and human readable text annotations of objects (can eat up some time)
-h, --help show this help message
-s, --settings the settings file
-b, --verbose print more details to stdout
-l, --transientlistId the id of the transient list to classify
-u, --update update the transient database with new classifications and crossmatches
-v, --version print the version of sherlock
Documentation ∞
Documentation for sherlock is hosted by Read the Docs (last stable version and latest version).
Sherlock Tutorial ∞
Before you begin using sherlock you will need to populate some custom parameters within the sherlock settings file.
To setup the default settings file at ~/.config/sherlock/sherlock.yaml
run the command:
sherlock init
This should create and open a new config file; follow the instructions in the file to populate the missing parameters values (usually given an XXX
placeholder).
Todo
- add tutorial
Initialisation and Setup ∞
Populating Sherlock’s Settings File ∞
The settings file now contains every option required to change the way the code runs, including database settings and the actual search algorithm.
Database Settings ∞
database settings:
static catalogues:
db: crossmatch_catalogues
host: 127.0.0.1
user: pessto
password: p355t0
transients:
user: pessto
password: p355t0
db: ps13pipublic
host: 127.0.0.1
transient table: tcs_transient_objects
transient query: "select id as 'id', followup_id as 'alt_id', ra_psf 'ra', dec_psf 'dec', local_designation 'name', object_classification as 'object_classification'
from tcs_transient_objects
where detection_list_id = 2
and object_classification is null
order by followup_id"
transient id column: id
transient classification column: object_classification
pessto marshall:
user: pessto
password: p355t0
db: pessto_marshall
host: 127.0.0.1
The static catalogues
settings are the settings for connecting to the static catalogues database. Do not edit these settings unless you know what you’re doing. If you have your RSA key on starbase, the code will setup a ssh-tunnel for you so that you can connect to this database remotely.
The transients
settings are for the database you have your transients stored in. transient table
is the name of the table containing your transients, transient query
is the SQL query that need executed to get the following info for the transients needing classified:
id
- the primary ID for the transient in the databasealt_id
- human readable name (optional)ra
- the ra of the objectdec
- the dec of the objectname
- a further alt id (optional)
The transient id column
is the primary ID column in the transient database and transient classification column
is the column you wish to add the classification to.
The Search Algorithm ∞
The order searches appear in the search algorithm
section is the order they shall be run in the actual code:
search algorithm:
sdss qso:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_cat_v_sdss_dr9_spect_qsos
milliquas:
angular radius arcsec: 3.0
transient classification: AGN
database table: tcs_cat_milliquas
veron:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_veron_cat
ned qso:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_cat_v_ned_qsos
ned nt:
angular radius arcsec: 3.0
physical radius kpc: 0.5
transient classification: NT
database table: tcs_cat_v_ned_galaxies
sdss spec nt:
angular radius arcsec: 3.0
physical radius kpc: 0.5
transient classification: NT
database table: tcs_cat_v_sdss_dr9_spect_galaxies
sdss phot nt:
angular radius arcsec: 0.5
transient classification: NT
database table: tcs_cat_v_sdss_dr9_galaxies_notspec
...
The first time you run sherlock
you will be told to add your settings to the empty settings file that’s been created in yaml
.
For details about all of the catalogue in the catalogues database, run:
sherlock info
Classifying Transients ∞
A Single Transient Classification ∞
Classifying Transients in a Transient Database ∞
The Classification Workflow ∞
Synonyms vs Associations ∞
Sherlock distinguishes between what it views as transient objects synonymous with a catalogued source (the same as or very closely linked to), synonyms, and those it deems as merely associated with the catalogued source, associations.
Examples of transient-synonym matches are CVs, AGN and variable stars (VS) that match within 1-2 arcsec of their catalogue counterpart. Stretching the definition of synonym a little, Sherlock will also match transients close to the centre of galaxies as synonyms[1]. Transient-associations include those transients that are located near, but not on top of, a catalogued source. Example of these associations are ‘transients’ matching close to bright-stars and are classified as bright-star artefacts (BS) resulting from poor image subtractions near bright stars (\(~>14-16^{th}\) mag) or transients matched near to a galaxy which may be classified as supernovae (SN). By definition synonyms are a more secure match than associations.
Each search algorithm module should contain a synonym and an association key-value sets. For example here is a Guide-Star Catalogue search module:
gsc bright stars:
angular radius arcsec: 100.0
synonym: VS
association: BS
database table: tcs_view_star_guide_star_catalogue_v2_3
bright mag column: B
bright limit: 16.
If a transient is matched on top of a source in the GSC it’s identified as a synonym and classified as a variable star, but if it is match near to the source but not co-located if may been identified as an association and classified as a potential bright-star artefact (BS).
There’s also a top-level synonym radius arcsec
parameter in the
Sherlock settings file that defines the maximum transient-catalogue
source separation that secures a synonym identification.
synonym radius arcsec: 0.5
Sherlock performs a two-staged catalogue match, first looking for synonym matches and then for associations. For an individual transient if a synonym match is found within the first search stage the second search stage for associations is skipped as it becomes irrelevant. For example consider the image below (transients marked in red):
The first stage search should match transients A, C and E as synonyms (NT, VS, VS), these transients are then removed from a further association search. The second stage search then flags B as associated with the large galaxy at the centre of the image and transient D as either associated with the bright-star in the bottom right corner of the image or with the galaxy in the centre.
[1] | could be classified as a nuclear transient or supernova depending on search algorithm parameters |
NED Stream Updater ∞
The settings in the settings file relating to the NED stream are:
ned stream search radius arcec: 300
first pass ned search radius arcec: 240
ned stream refresh rate in days: 90
To update the NED stream, for each transient coordinates the code does a conesearch on the tcs_helper_ned_query_history
table to see if a search has already been performed within the designated ned stream refresh rate in days
. If a match isn’t found then NED is queried and the tcs_helper_ned_query_history
is updated for the transient coordinates.
Search Algoritm ∞
The algorithm is written and modified within the sherlock.yaml
settings file. This means you can modify the algorithm without affecting anyone else’s search (as long as you are working off the different transient databases).
search algorithm:
sdss qso:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_view_qso_sdss_spect_galaxies_qsos_dr12
stop algorithm on match: False
match nearest source only: False
milliquas:
angular radius arcsec: 3.0
transient classification: AGN
database table: tcs_view_agn_milliquas_v4_5
stop algorithm on match: False
match nearest source only: False
veron:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_view_agn_veron_v13
stop algorithm on match: False
match nearest source only: False
ned qso:
angular radius arcsec: 2.0
transient classification: AGN
...
Note, to remove a module temporarily, simply comment it out in the settings file (yaml treats lines beginning with #
as comments).
Behind the scenes there are 2 types of searches performed on the catalogues.
- Angular Separation Search
- Physical Separation Search
Angular Separation Search ∞
An example of an angular separation search looks like this in the settings file:
milliquas:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_view_agn_milliquas_v4_5
stop algorithm on match: False
match nearest source only: False
The code performs a cone-search on database table
using the angular radius arcsec
. If matches are found the associated transient is given a transient classification
and the results are added to the tcs_cross_matches
table of the transients database. If stop algorithm on match
is true the code breaks out of the search algorithm and starts afresh with the next transient to be classified, otherwise the algorithm contines and all matches are recorded in the tcs_cross_matches
table. If match nearest source only
is true only the closest match from each catalogue query is be recorded in the tcs_cross_matches
table.
Physical Separation Search ∞
If the physical radius kpc
key is found in the conesearch module then a physical separation search is performed. First of all an angular cone-search is performed at the coordinates using a suitably large search radius. After this a further search is done on the physical distance parameters returned (distance, physical separation distance, semi-major axis length …) for each match.
A physical match is found if:
- The transient falls within 1.5 x semi-major axis of a galaxy
- The transient is within the
physical radius kpc
of a galaxy
As before, all matches are recorded in the tcs_cross_matches
table.
Classification Rankings ∞
If transients are found:
- within 2.0 arc of source, OR
- within 20 kpc of host galaxy AND assigned a SN classification, OR
- within 1.2 times the semi-major axis of the host AND assigned a SN classification
they are all given the same top level ranking for classification. After this catalogue weights come into effect to determine the orders of classifications. The catalogue weights are found in the [tcs_helper_catalogue_tables_info
](Crossmatch Catalogue Tables) table of the catalogues database and give an indication of the accuracy of the classifications of sources in the catalogue. For example the tcs_cat_sdss_spect_galaxies_qsos_dr12
is given a greater weight than tcs_cat_sdss_photo_stars_galaxies_dr12
as classifications of the objects based on spectral observations is more accurate than photometry alone.
Once the classifications for each individual transient are ranked, a final, ordered classification listing is given to the transient within its original database table. For example SN/VARIABLE STAR
means the the transient is most likely a SN but may also be a variable star.
A transient is matched against a source in the sherlock-catalogues because it is either synonymous with a point-like catalogue source (e.g. a variable star or an AGN) or it is hosted by the catalogue source (e.g. supernova, nuclear transient).
A synonymous crossmatch is always a simple angular crossmatch with a search radius that reflects the astrometric accuracy of the RMS combined astrometric errors of the transient source location and that of the catalogue being matched against.
Sherlock’s Catalogue Database ∞
Database Table Naming Scheme ∞
There’s a [strict table naming syntax for the crossmatch-catalogues](Crossmatch-Catalogues Database Scheme) database to help deal with catalogue versioning (as updated versions of out sherlock-catalogues are released) and to help ease the burden of modifying crossmatch algorithms employed.
[See here for an up-to-date list of the crossmatch-catalogues](Crossmatch Catalogue Tables) and the [views](Crossmatch Catalogue Views) found on those tables.
Table Classes ∞
There are 4 classes of tables in the crossmatch_catalogues
database:
tcs_cat
| The table is named with the scheme tcs_cat_
<catalogue name> <version> | tcs_cat_ned_d_v10_2_0
|tcs_view
| The view is named with the scheme tcs_view_
<object type contained> <source table name> | tcs_view_galaxies_ned_d
|tcs_helper
| Mostly used to store relational information, notes on database tables and book-keeper info | tcs_helper_catalogue_tables_info
|legacy_tcs_
| Legacy tables used in previous incarnations of the transient classifier | legacy_tcs_cat_md01_chiappetti2005
|Versioning ∞
Each catalogue is versioned by appending a version indicator to the end of the table name. There are 3 indicator types:
_final
to show that the catalogue is now at it’s final version and shall remain unchanged._stream
to show that the catalogue is constantly being updated_vX_X
to show a version number for the catalogue, e.g. for v10.2 this would be_v10_2
. We can also have data-release versions (e.g._dr12
).
Maintainance and Updates of Catalogues Database ∞
Todo
- write about marshall stream updates
- write about helper table updates
- write that some tasks need automated
There are various cron-scripts that run on PESSTO-VM03 to automate some tasks. These tasks include
- updating of data-streams into the crossmatch-catalogues database and
- the updates of certain helper tables in the crossmatch-catalogues database.
Currently there are scripts running every:
- 5 mins
- 30 mins
- 1 hr
- 3 hrs
- 12 hrs
- 24 hrs
Updating Catalogues and Adding New Catalogues to the Database ∞
Todo
- list current catalogue importers and how to use them
- add tutorial about creating a new importer
- add steps for adding a catalogue to the database and the search algorithm
- add details about updating the column map
- write code into conf.py to generate tables for docs and link them from here (views, tables and streams)
Using the sherlock-import
command it’s possible to import and update various catalogues and data-streams including Milliquas, Veron AGN and the NED-D catalogues. [See here for details](Catalogue Importers).
sherlock-importers cat <cat_name> <pathToDataFile> <cat_version> [-s <pathToSettingsFile>]
sherlock-importers stream <stream_name> [-s <pathToSettingsFile>]
The command to import new versions of catalogues and data streams into the crossmatch_catalogues
database is:
Usage:
sherlock-importers cat <cat_name> <pathToDataFile> <cat_version> [-s <pathToSettingsFile>]
sherlock-importers stream <stream_name> [-s <pathToSettingsFile>]
For example:
> sherlock-importers cat milliquas ~/Desktop/milliquas.txt 4.5
1153111 / 1153111 milliquas data added to memory
1153111 / 1153111 rows inserted into tcs_cat_milliquas_v4_5
5694 / 5694 htmIds added to tcs_cat_milliquas_v4_5
The command currently supports imports for the following catalogues:
- Milliquas
- Veron AGN
- NED-D
Using the command:
sherlock-importers stream pessto
will import all of the various data-streams added to the PESSTO marshall (ASASSN, CRTS, LSQ, PSST …).
THE COLUMN MAP LIFTED FROM tcs_helper_catalogue_tables_info
TABLE IN CATALOGUE DATABASE (COLUMN NAMES ENDDING WITH ‘ColName’)
Installation ∞
Although you can get Sherlock from a simple pip
install, it’s best to install it within a Conda environment under Anaconda. If you’re not familiar with Anaconda, you’ll find a good tutorial here to get you up and running.
Once you have Anaconda installed, go ahead and create a new Conda environment to host Sherlock:
conda create -n sherlock python=2.7 pip numpy
Now activate the environment and install sherlock:
source activate sherlock
pip install qub-sherlock
At any point in the future you can upgrade to the latest version of sherlock with the command:
pip install qub-sherlock --upgrade
If instead you want to clone the github repo and install from a local version of the code:
git clone git@github.com:thespacedoctor/sherlock.git
cd sherlock
source activate sherlock
python setup.py install
Development ∞
If you want to tinker with the code, then install in development mode. This means you can modify the installed code from whereever you clone the repo to:
git clone git@github.com:thespacedoctor/sherlock.git
cd sherlock
source activate sherlock
python setup.py develop
Pull requests are welcomed!
Sublime Snippets ∞
If you use Sublime Text as your code editor, and you’re planning to develop your own python code with sherlock, you might find my Sublime Snippets useful.
Command-Line Usage ∞
# SHERLOCK #
: INFERING TRANSIENT-SOURCE CLASSIFICATIONS FROM SPATIALLY CROSS-MATCHED CATALOGUED SOURCES :
=============================================================================================
Documentation for sherlock can be found here: http://qub-sherlock.readthedocs.org/en/stable
.. todo ::
- docuument cl_utils module
- tidy usage text
Usage:
sherlock init
sherlock info [-s <pathToSettingsFile>]
sherlock [-NA] dbmatch [--update] [-s <pathToSettingsFile>]
sherlock [-bN] match -- <ra> <dec> [<pathToSettingsFile>]
sherlock clean [-s <pathToSettingsFile>]
sherlock wiki [-s <pathToSettingsFile>]
sherlock import ned <ra> <dec> <radiusArcsec> [-s <pathToSettingsFile>]
sherlock import cat <cat_name> <pathToDataFile> <cat_version> [-s <pathToSettingsFile>]
sherlock import stream <stream_name> [-s <pathToSettingsFile>]
Options:
init setup the sherlock settings file for the first time
match XXXX
dbmatch database match
clean XXXX
wiki XXXX
import XXXX
ned use the online NED database as the source catalogue
cat import a static catalogue into the sherlock-catalogues database
stream download/stream new data from a give source catalogue into the sherlock sherlock-catalogues database
info print an overview of the current catalogues, views and streams in the sherlock database ready for crossmatching
ra the right-ascension coordinate with which to perform a conesearch (sexegesimal or decimal degrees)
dec the declination coordinate with which to perform a conesearch (sexegesimal or decimal degrees)
radiusArcsec radius in arcsec of the footprint to download from the online NED database
cat_name name of the catalogue being imported (veron|ned_d)
stream_name name of the stream to import into the sherlock-catalogues database (ifs)
-N, --skipNedUpdate do not update the NED database before classification
-A, --skipMagUpdate do not update the peak magnitudes and human readable text annotations of objects (can eat up some time)
-h, --help show this help message
-s, --settings the settings file
-b, --verbose print more details to stdout
-l, --transientlistId the id of the transient list to classify
-u, --update update the transient database with new classifications and crossmatches
-v, --version print the version of sherlock
Documentation ∞
Documentation for sherlock is hosted by Read the Docs (last stable version and latest version).
Sherlock Tutorial ∞
Before you begin using sherlock you will need to populate some custom parameters within the sherlock settings file.
To setup the default settings file at ~/.config/sherlock/sherlock.yaml
run the command:
sherlock init
This should create and open a new config file; follow the instructions in the file to populate the missing parameters values (usually given an XXX
placeholder).
Todo
- add tutorial
Initialisation and Setup ∞
Populating Sherlock’s Settings File ∞
The settings file now contains every option required to change the way the code runs, including database settings and the actual search algorithm.
Database Settings ∞
database settings:
static catalogues:
db: crossmatch_catalogues
host: 127.0.0.1
user: pessto
password: p355t0
transients:
user: pessto
password: p355t0
db: ps13pipublic
host: 127.0.0.1
transient table: tcs_transient_objects
transient query: "select id as 'id', followup_id as 'alt_id', ra_psf 'ra', dec_psf 'dec', local_designation 'name', object_classification as 'object_classification'
from tcs_transient_objects
where detection_list_id = 2
and object_classification is null
order by followup_id"
transient id column: id
transient classification column: object_classification
pessto marshall:
user: pessto
password: p355t0
db: pessto_marshall
host: 127.0.0.1
The static catalogues
settings are the settings for connecting to the static catalogues database. Do not edit these settings unless you know what you’re doing. If you have your RSA key on starbase, the code will setup a ssh-tunnel for you so that you can connect to this database remotely.
The transients
settings are for the database you have your transients stored in. transient table
is the name of the table containing your transients, transient query
is the SQL query that need executed to get the following info for the transients needing classified:
id
- the primary ID for the transient in the databasealt_id
- human readable name (optional)ra
- the ra of the objectdec
- the dec of the objectname
- a further alt id (optional)
The transient id column
is the primary ID column in the transient database and transient classification column
is the column you wish to add the classification to.
The Search Algorithm ∞
The order searches appear in the search algorithm
section is the order they shall be run in the actual code:
search algorithm:
sdss qso:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_cat_v_sdss_dr9_spect_qsos
milliquas:
angular radius arcsec: 3.0
transient classification: AGN
database table: tcs_cat_milliquas
veron:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_veron_cat
ned qso:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_cat_v_ned_qsos
ned nt:
angular radius arcsec: 3.0
physical radius kpc: 0.5
transient classification: NT
database table: tcs_cat_v_ned_galaxies
sdss spec nt:
angular radius arcsec: 3.0
physical radius kpc: 0.5
transient classification: NT
database table: tcs_cat_v_sdss_dr9_spect_galaxies
sdss phot nt:
angular radius arcsec: 0.5
transient classification: NT
database table: tcs_cat_v_sdss_dr9_galaxies_notspec
...
The first time you run sherlock
you will be told to add your settings to the empty settings file that’s been created in yaml
.
For details about all of the catalogue in the catalogues database, run:
sherlock info
Classifying Transients ∞
A Single Transient Classification ∞
Classifying Transients in a Transient Database ∞
The Classification Workflow ∞
Synonyms vs Associations ∞
Sherlock distinguishes between what it views as transient objects synonymous with a catalogued source (the same as or very closely linked to), synonyms, and those it deems as merely associated with the catalogued source, associations.
Examples of transient-synonym matches are CVs, AGN and variable stars (VS) that match within 1-2 arcsec of their catalogue counterpart. Stretching the definition of synonym a little, Sherlock will also match transients close to the centre of galaxies as synonyms[1]. Transient-associations include those transients that are located near, but not on top of, a catalogued source. Example of these associations are ‘transients’ matching close to bright-stars and are classified as bright-star artefacts (BS) resulting from poor image subtractions near bright stars (\(~>14-16^{th}\) mag) or transients matched near to a galaxy which may be classified as supernovae (SN). By definition synonyms are a more secure match than associations.
Each search algorithm module should contain a synonym and an association key-value sets. For example here is a Guide-Star Catalogue search module:
gsc bright stars:
angular radius arcsec: 100.0
synonym: VS
association: BS
database table: tcs_view_star_guide_star_catalogue_v2_3
bright mag column: B
bright limit: 16.
If a transient is matched on top of a source in the GSC it’s identified as a synonym and classified as a variable star, but if it is match near to the source but not co-located if may been identified as an association and classified as a potential bright-star artefact (BS).
There’s also a top-level synonym radius arcsec
parameter in the
Sherlock settings file that defines the maximum transient-catalogue
source separation that secures a synonym identification.
synonym radius arcsec: 0.5
Sherlock performs a two-staged catalogue match, first looking for synonym matches and then for associations. For an individual transient if a synonym match is found within the first search stage the second search stage for associations is skipped as it becomes irrelevant. For example consider the image below (transients marked in red):
The first stage search should match transients A, C and E as synonyms (NT, VS, VS), these transients are then removed from a further association search. The second stage search then flags B as associated with the large galaxy at the centre of the image and transient D as either associated with the bright-star in the bottom right corner of the image or with the galaxy in the centre.
[1] | could be classified as a nuclear transient or supernova depending on search algorithm parameters |
NED Stream Updater ∞
The settings in the settings file relating to the NED stream are:
ned stream search radius arcec: 300
first pass ned search radius arcec: 240
ned stream refresh rate in days: 90
To update the NED stream, for each transient coordinates the code does a conesearch on the tcs_helper_ned_query_history
table to see if a search has already been performed within the designated ned stream refresh rate in days
. If a match isn’t found then NED is queried and the tcs_helper_ned_query_history
is updated for the transient coordinates.
Search Algoritm ∞
The algorithm is written and modified within the sherlock.yaml
settings file. This means you can modify the algorithm without affecting anyone else’s search (as long as you are working off the different transient databases).
search algorithm:
sdss qso:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_view_qso_sdss_spect_galaxies_qsos_dr12
stop algorithm on match: False
match nearest source only: False
milliquas:
angular radius arcsec: 3.0
transient classification: AGN
database table: tcs_view_agn_milliquas_v4_5
stop algorithm on match: False
match nearest source only: False
veron:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_view_agn_veron_v13
stop algorithm on match: False
match nearest source only: False
ned qso:
angular radius arcsec: 2.0
transient classification: AGN
...
Note, to remove a module temporarily, simply comment it out in the settings file (yaml treats lines beginning with #
as comments).
Behind the scenes there are 2 types of searches performed on the catalogues.
- Angular Separation Search
- Physical Separation Search
Angular Separation Search ∞
An example of an angular separation search looks like this in the settings file:
milliquas:
angular radius arcsec: 2.0
transient classification: AGN
database table: tcs_view_agn_milliquas_v4_5
stop algorithm on match: False
match nearest source only: False
The code performs a cone-search on database table
using the angular radius arcsec
. If matches are found the associated transient is given a transient classification
and the results are added to the tcs_cross_matches
table of the transients database. If stop algorithm on match
is true the code breaks out of the search algorithm and starts afresh with the next transient to be classified, otherwise the algorithm contines and all matches are recorded in the tcs_cross_matches
table. If match nearest source only
is true only the closest match from each catalogue query is be recorded in the tcs_cross_matches
table.
Physical Separation Search ∞
If the physical radius kpc
key is found in the conesearch module then a physical separation search is performed. First of all an angular cone-search is performed at the coordinates using a suitably large search radius. After this a further search is done on the physical distance parameters returned (distance, physical separation distance, semi-major axis length …) for each match.
A physical match is found if:
- The transient falls within 1.5 x semi-major axis of a galaxy
- The transient is within the
physical radius kpc
of a galaxy
As before, all matches are recorded in the tcs_cross_matches
table.
Classification Rankings ∞
If transients are found:
- within 2.0 arc of source, OR
- within 20 kpc of host galaxy AND assigned a SN classification, OR
- within 1.2 times the semi-major axis of the host AND assigned a SN classification
they are all given the same top level ranking for classification. After this catalogue weights come into effect to determine the orders of classifications. The catalogue weights are found in the [tcs_helper_catalogue_tables_info
](Crossmatch Catalogue Tables) table of the catalogues database and give an indication of the accuracy of the classifications of sources in the catalogue. For example the tcs_cat_sdss_spect_galaxies_qsos_dr12
is given a greater weight than tcs_cat_sdss_photo_stars_galaxies_dr12
as classifications of the objects based on spectral observations is more accurate than photometry alone.
Once the classifications for each individual transient are ranked, a final, ordered classification listing is given to the transient within its original database table. For example SN/VARIABLE STAR
means the the transient is most likely a SN but may also be a variable star.
A transient is matched against a source in the sherlock-catalogues because it is either synonymous with a point-like catalogue source (e.g. a variable star or an AGN) or it is hosted by the catalogue source (e.g. supernova, nuclear transient).
A synonymous crossmatch is always a simple angular crossmatch with a search radius that reflects the astrometric accuracy of the RMS combined astrometric errors of the transient source location and that of the catalogue being matched against.
Sherlock’s Catalogue Database ∞
Database Table Naming Scheme ∞
There’s a [strict table naming syntax for the crossmatch-catalogues](Crossmatch-Catalogues Database Scheme) database to help deal with catalogue versioning (as updated versions of out sherlock-catalogues are released) and to help ease the burden of modifying crossmatch algorithms employed.
[See here for an up-to-date list of the crossmatch-catalogues](Crossmatch Catalogue Tables) and the [views](Crossmatch Catalogue Views) found on those tables.
Table Classes ∞
There are 4 classes of tables in the crossmatch_catalogues
database:
tcs_cat
| The table is named with the scheme tcs_cat_
<catalogue name> <version> | tcs_cat_ned_d_v10_2_0
|tcs_view
| The view is named with the scheme tcs_view_
<object type contained> <source table name> | tcs_view_galaxies_ned_d
|tcs_helper
| Mostly used to store relational information, notes on database tables and book-keeper info | tcs_helper_catalogue_tables_info
|legacy_tcs_
| Legacy tables used in previous incarnations of the transient classifier | legacy_tcs_cat_md01_chiappetti2005
|Versioning ∞
Each catalogue is versioned by appending a version indicator to the end of the table name. There are 3 indicator types:
_final
to show that the catalogue is now at it’s final version and shall remain unchanged._stream
to show that the catalogue is constantly being updated_vX_X
to show a version number for the catalogue, e.g. for v10.2 this would be_v10_2
. We can also have data-release versions (e.g._dr12
).
Maintainance and Updates of Catalogues Database ∞
Todo
- write about marshall stream updates
- write about helper table updates
- write that some tasks need automated
There are various cron-scripts that run on PESSTO-VM03 to automate some tasks. These tasks include
- updating of data-streams into the crossmatch-catalogues database and
- the updates of certain helper tables in the crossmatch-catalogues database.
Currently there are scripts running every:
- 5 mins
- 30 mins
- 1 hr
- 3 hrs
- 12 hrs
- 24 hrs
Updating Catalogues and Adding New Catalogues to the Database ∞
Todo
- list current catalogue importers and how to use them
- add tutorial about creating a new importer
- add steps for adding a catalogue to the database and the search algorithm
- add details about updating the column map
- write code into conf.py to generate tables for docs and link them from here (views, tables and streams)
Using the sherlock-import
command it’s possible to import and update various catalogues and data-streams including Milliquas, Veron AGN and the NED-D catalogues. [See here for details](Catalogue Importers).
sherlock-importers cat <cat_name> <pathToDataFile> <cat_version> [-s <pathToSettingsFile>]
sherlock-importers stream <stream_name> [-s <pathToSettingsFile>]
The command to import new versions of catalogues and data streams into the crossmatch_catalogues
database is:
Usage:
sherlock-importers cat <cat_name> <pathToDataFile> <cat_version> [-s <pathToSettingsFile>]
sherlock-importers stream <stream_name> [-s <pathToSettingsFile>]
For example:
> sherlock-importers cat milliquas ~/Desktop/milliquas.txt 4.5
1153111 / 1153111 milliquas data added to memory
1153111 / 1153111 rows inserted into tcs_cat_milliquas_v4_5
5694 / 5694 htmIds added to tcs_cat_milliquas_v4_5
The command currently supports imports for the following catalogues:
- Milliquas
- Veron AGN
- NED-D
Using the command:
sherlock-importers stream pessto
will import all of the various data-streams added to the PESSTO marshall (ASASSN, CRTS, LSQ, PSST …).
THE COLUMN MAP LIFTED FROM tcs_helper_catalogue_tables_info
TABLE IN CATALOGUE DATABASE (COLUMN NAMES ENDDING WITH ‘ColName’)
Subpackages ∞
sherlock.commonutils |
common tools used throughout package |
sherlock.imports |
Classes ∞
sherlock.catalogue_conesearch |
The worker class for the conesearch module |
sherlock.commonutils.update_wiki_pages |
Update sherlock’s github wiki pages with some useful info regarding the crossmatch database catalogue tables |
sherlock.database |
the database object for sherlock, setting up ssh tunnels and various database connections |
sherlock.database_cleaner |
Clean and maintain the database helper tables used by sherlock |
sherlock.imports._base_importer |
The base importer object used to import new catalgues into sherlock-catalogues database |
sherlock.imports.ifs |
Importer for the Multi Unit Spectroscopic Explorer (MUSE) IFS galaxy catalogue stream |
sherlock.imports.marshall |
Import the ePESSTO Marshall transient streams (includes multiple on-going transient survey streams) into the Sherlock-catalogues database |
sherlock.imports.ned |
Using a list of coordinates, query the online NED database and import sources found within a given search radius of each of the loctions into the sherlock-catalogues database |
sherlock.imports.ned_d |
Import the * `NED-D <https://ned.ipac.caltech.edu/Library/Distances/>`_ *galaxy catalogue in to the sherlock-catalogues database |
sherlock.imports.veron |
importer object for the VERON AGN catalogue |
sherlock.transient_catalogue_crossmatch |
crossmatch a list of transients against a suite of catalogues according to given search algorithm |
sherlock.transient_classifier |
The Sherlock Transient Classifier |
Functions ∞
sherlock.commonutils.get_crossmatch_catalogues_column_map |
Query the sherlock-catalogues helper tables to generate a map of the important columns of each catalogue |