CKG Builder¶

create_user.py¶

create_user_from_dict(driver, data)[source]¶

Creates graph database node for new user and adds properties to the node.

Parameters

driver – py2neo driver, which provides the connection to the neo4j graph database.
data (dict) – dictionary with the user information).

create_user_node(driver, data)[source]¶

Creates graph database node for new user and adds respective properties to node.

Parameters

driver (py2neo driver) – py2neo driver, which provides the connection to the neo4j graph database.
data (Series) – pandas Series with new user identifier and required user information (see set_arguments()).

create_user_from_command_line(args, expiration)[source]¶

Creates new user in the graph database and corresponding node, from a terminal window (command line), and adds the new user information to the users excel and import files. Arguments as in set_arguments().

Parameters

args (any object with __dict__ attribute) – object. Contains all the parameters neccessary to create a user (‘username’, ‘name’, ‘email’, ‘secondary_email’, ‘phone_number’ and ‘affiliation’).
expiration (int) – number of days users is given access.

Note

This function can be used directly with python create_user_from_command_line.py -u username -n user_name -e email -s secondary_email -p phone_number -a affiliation .

create_user_from_file(filepath, expiration)[source]¶

Creates new user in the graph database and corresponding node, from an excel file. Rows in the file must be users, and columns must follow set_arguments() fields.

Parameters

filepath (str) – filepath and filename containing users information.
output_file (str) – path to output csv file.
expiration (int) – number of days users is given access.

Note

This function can be used directly with python create_user_from_file.py -f path_to_file .

validate_user(driver, username, email)[source]¶

get_new_user_id(driver)[source]¶

create_user(data, output_file, expiration=365)[source]¶

Creates new user in the graph database and corresponding node, through the following steps:

Checks if a user with given properties already exists in the database. If not:

Generates new user identifier

Creates new local user (access to graph database)

Creates new user node

Saves data to users.tsv

Parameters

data – pandas dataframe with users as rows and arguments and columns.
output_file (str) – path to output csv file.
expiration (int) – number of days users is given access.

Returns

Writes relevant .tsv file for the users in data.

set_arguments()[source]¶: This function sets the arguments to be used as input for create_user.py in the command line.

importer.py¶

Generates all the import files: Ontologies, Databases and Experiments. The module is reponsible for generating all the csv files that will be loaded into the Graph database and also updates a stats object (hdf table) with the number of entities and relationships from each dataset imported. A new stats object is created the first time a full import is run.

ontologiesImport(importDirectory, ontologies=None, download=True, import_type='partial')[source]¶

Generates all the entities and relationships from the provided ontologies. If the ontologies list is not provided, then all the ontologies listed in the configuration will be imported (full_import). This function also updates the stats object with numbers from the imported ontologies.

Parameters

importDirectory (str) – path of the import directory where files will be created.
ontologies (list) – a list of ontology names to be imported.
download (bool) – wether database is to be downloaded.
import_type (str) – type of import (´full´ or ´partial´).

databasesImport(importDirectory, databases=None, n_jobs=1, download=True, import_type='partial')[source]¶

Generates all the entities and relationships from the provided databases. If the databases list is not provided, then all the databases listed in the configuration will be imported (full_import). This function also updates the stats object with numbers from the imported databases.

Parameters

importDirectory (str) – path of the import directory where files will be created.
databases (list) – a list of database names to be imported.
n_jobs (int) – number of jobs to run in parallel. 1 by default when updating one database.
import_type (str) – type of import (´full´ or ´partial´).

experimentsImport(projects=None, n_jobs=1, import_type='partial')[source]¶

Generates all the entities and relationships from the specified Projects. If the projects list is not provided, then all the projects the experiments directory will be imported (full_import). Calls function experimentImport.

Parameters

projects (list) – list of project identifiers to be imported.
n_jobs (int) – number of jobs to run in parallel. 1 by default when updating one project.
import_type (str) – type of import (´full´ or ´partial´).

experimentImport(importDirectory, experimentsDirectory, project)[source]¶

Generates all the entities and relationships from the specified Project. Called from function experimentsImport.

Parameters

importDirectory (str) – path to the directory where all the import files are generated.
experimentDirectory (str) – path to the directory where all the experiments are located.
project (str) – identifier of the project to be imported.

usersImport(importDirectory, import_type='partial')[source]¶

Generates User entities from excel file and grants access of new users to the database. This function also writes the relevant information to a tab-delimited file in the import directory.

Parameters

importDirectory (str) – path to the directory where all the import files are generated.
import_type (str) – type of import (´full´ or ´partial).

fullImport(download=True, n_jobs=4)[source]¶: Calls the different importer functions: Ontologies, databases, experiments. The first step is to check if the stats object exists and create it otherwise. Calls setupStats.

generateStatsDataFrame(stats)[source]¶: Generates a dataframe with the stats from each import. :param list stats: a list with statistics collected from each importer function. :return: Pandas dataframe with the collected statistics.

setupStats(import_type)[source]¶: Creates a stats object that will collect all the statistics collected from each import.

createEmptyStats(statsCols, statsFile, statsName)[source]¶

Creates a HDFStore object with a empty dataframe with the collected stats columns.

Parameters

statsCols (list) – a list of columns with the fields collected from the import statistics.
statsFile (str) – path where the object should be stored.
statsName (str) – name if the file containing the stats object.

writeStats(statsDf, import_type, stats_name=None)[source]¶: Appends the new collected statistics to the existing stats object. :param statsDf: a pandas dataframe with the new statistics from the importing. :param str statsName: If the statistics should be stored with a specific name.

getStatsName(import_type)[source]¶

Generates the stats object name where to store the importing statistics from the CKG version, which is defined in the configuration.

Returns: statsName: key used to store in the stats object.
Return type: str

loader.py¶

Populates the graph database with all the files generated by the importer.py module: Ontologies, Databases and Experiments. The module loads all the entities and relationships defined in the importer files. It calls Cypher queries defined in the cypher.py module. Further, it generates an hdf object with the number of enities and relationships loaded for each Database, Ontology and Experiment. This module also generates a compressed backup file of all the loaded files.

There are two types of updates:

Full: all the entities and relationships in the graph database are populated
Partial: only the specified entities and relationships are loaded

The compressed files for each type of update are named accordingly and saved in the archive/ folder in data/.

load_into_database(driver, queries, requester)[source]¶

This function runs the queries provided in the graph database using a py2neo driver.

Parameters

driver (py2neo driver) – py2neo driver, which provides the connection to the neo4j graph database.
queries (list[dict]) – list of queries to be passed to the database.
requester (str) – identifier of the query.

updateDB(driver, imports=None, specific=[])[source]¶

Populates the graph database with information for each Database, Ontology or Experiment specified in imports. If imports is not defined, the function populates the entire graph database based on the graph variable defined in the grapher_config.py module. This function also updates the graph stats object with numbers from the loaded entities and relationships.

Parameters

driver (py2neo driver) – py2neo driver, which provides the connection to the neo4j graph database.
imports (list) – a list of entities to be loaded into the graph.

fullUpdate()[source]¶: Main method that controls the population of the graph database. Firstly, it gets a connection to the database (driver) and then initiates the update of the entire database getting all the graph entities to update from configuration. Once the graph database has been populated, the imports folder in data/ is compressed and archived in the archive/ folder so that a backup of the imports files is kept (full).

partialUpdate(imports, specific=[])[source]¶

Method that controls the update of the graph database with the specified entities and relationships. Firstly, it gets a connection to the database (driver) and then initiates the update of the specified graph entities. Once the graph database has been populated, the data files uploaded to the graph are compressed and archived in the archive/ folder (partial).

Parameters: imports (list) – list of entities to update

archiveImportDirectory(archive_type='full')[source]¶

This function creates the compressed backup imports folder with either the whole folder (full update) or with only the files uploaded (partial update). The folder or files are compressed into a gzipped tarball file and stored in the archive/ folder defined in the configuration.

Parameters: archive_type (str) – whether it is a full update or a partial update.

builder.py¶

Builds the database in two main steps:

Imports all the data from ontologies, databases and experiments
Loads these data into the database

The module can perform full updates, executing both steps for all the ontologies, databases and experiments or a partial update. Partial updates can execute step 1 or step 2 for specific data.

run_minimal_update(user, n_jobs=3)[source]¶

run_full_update(user, download, n_jobs=3)[source]¶

set_arguments()[source]¶: This function sets the arguments to be used as input for builder.py in the command line.