Uploading computational data (Celery tasks and django template)¶
Introducton¶
Uploading a Computed Set (see [link]) is a process that takes a long time. Typical requests handled in django applications usually only take milliseconds. Because this process takes a long time, we have implemented the required components as Celery tasks, and written views that can handle kicking off the relevant tasks, and retrieve the results from the tasks by pinging the message queue that brokers the tasks (Redis).
Celery is a task queuing software package, allowing execution of asynchronous workloads. A synchronous operation blocks a process till the operation completes. An asynchronous operation is non-blocking and only initiates the operation. In the context of a web-server, this means we can kick off an operation through an endpoint, and query another endpoint to periodically check if the task has completed, and when it has, retrieve the results.
This is advantageous when using processes that can take a long time, because browsers often have built in settings where the longest it will wait for the result of a request is around 30 seconds. If the task we want to run takes longer than that, the user will receive a timeout message.
Instead, we can use a django template that uses a bit of javascript to dynamically ping the endpoint that checks on the status of a job, and renders a result or message in the same page when it is completed or fails.
A good tutorial about using Celery and Redis with django can be found here: https://stackabuse.com/asynchronous-tasks-in-django-with-redis-and-celery/
Template - the upload page for computed sets¶
A django template (https://docs.djangoproject.com/en/3.1/topics/templates/) is an HTML document that can dynamically load content from django objects, such as models and views. For example, we could query the Target model, and dynamically create a list of all Target names in a template.
The template for the upload page for computed sets is found at viewer/templates/viewer/upload-cset.html. The
main parts of the template are as follows:
1. The upload form - this part of the template makes use of viewer.forms.CSetForm: a version of a Model
that is used to describe what information can be posted as a request through a form contained in a template.
-
class
viewer.forms.CSetForm(data=None, files=None, auto_id='id_%s', prefix=None, initial=None, error_class=<class 'django.forms.utils.ErrorList'>, label_suffix=None, empty_permitted=False, field_order=None, use_required_attribute=None, renderer=None)¶ A Django form used for uploading Computed sets at viewer/upload_cset
- Parameters
target_name (CharField) – The name of the target, as written in viewer.models.Targets that you want to upload a computed set for
sdf_file (FileField) – The sdf file that you want to upload, containing information about the 3D structure of all molecules in the computed set.
pdb_zip (FileField) – A zip file of apo pdb files referenced by the molecules in sdf_file (optional)
submit_choice (CharField) – Whether to validate (0) or validate and upload (1) - displayed as a radio button
upload_key (CharField) – The user-specific upload key generated by viewer.views.cset_key
The code that shows this form is:
<form method="post" enctype="multipart/form-data">
{% csrf_token %}
{{ form.as_ul }}
<button type="submit">Submit</button>
</form>
This code uses standard html tags to show that we’re using a form, and uses django’s templating language to insert the
form as a list ({{ form.as_ul}})
2. Dynamic loading of validate task status and results - this part of the template is included as a Javascript script, and uses a function to periodically ping the task endpoint, updating what is displayed on the page depending on the tasks status:
{% if validate_task_id %}
<script>
var content = document.getElementById('content');
content.innerHTML = "";
var taskUrl = "{% url 'validate_task' validate_task_id=validate_task_id %}";
var dots = 1;
var progressTitle = document.getElementById('progress-title');
updateProgressTitle();
var timer = setInterval(function() {
updateProgressTitle();
axios.get(taskUrl)
.then(function(response){
var taskStatus = response.data.validate_task_status
if (taskStatus === 'SUCCESS') {
var content = document.getElementById('links');
content.innerHTML = response.data.html;
clearTimer('');
}
else if (taskStatus === 'FAILURE') {
clearTimer('An error occurred - see traceback below');
var content = document.getElementById('links');
content.innerHTML = response.data.validate_traceback;
}
})
}, 800);
function updateProgressTitle() {
dots++;
if (dots > 3) {
dots = 1;
}
progressTitle.innerHTML = 'validating files';
for (var i = 0; i < dots; i++) {
progressTitle.innerHTML += '.';
}
}
function clearTimer(message) {
clearInterval(timer);
progressTitle.innerHTML = message;
}
</script>
{% endif %}
The {if validate_task_id} and { endif } use djangos templating language to make sure that the code
wrapped in <script>...</script> is only executed if there is a value for validate_task_id. This value
comes from the Validate task, which is described below.
The different variables (var:...) are used to decide what to render on the page in the different elements
defined in functions (e.g. document.getElementById('links') - which looks for the HTML div named links).
The most important variable for dynamic loading is the response.data.validate_task_status variable - this is the
status of the validate task from the Celery task, returned by the Validate task, which is described below.
The responses returned by the View are described in Computational Data (Views)
3. Dynamic loading of upload task status and results - this part of the template is included as a Javascript script, and uses a function to periodically ping the task endpoint, updating what is displayed on the page depending on the tasks status:
{% if upload_task_id %}
<script>
var content = document.getElementById('content');
content.innerHTML = "";
var taskUrl = "{% url 'upload_task' upload_task_id=upload_task_id %}";
var dots = 1;
var progressTitle = document.getElementById('progress-title');
updateProgressTitle();
var timer = setInterval(function() {
updateProgressTitle();
axios.get(taskUrl)
.then(function(response) {
var taskStatus = response.data.upload_task_status
if (taskStatus === 'SUCCESS') {
var validatedStatus = response.data.validated
if (validatedStatus === 'Not validated') {
var content = document.getElementById('links');
content.innerHTML = response.data.html;
clearTimer('');
}
if (validatedStatus === 'Validated') {
clearTimer('Your files were uploaded! The download links are:');
var url_a = response.data.results.cset_download_url;
var content = document.getElementById('links');
var a = document.createElement("a");
var link = document.createTextNode(" Compound Set ");
a.appendChild(link);
a.title = 'compound set';
a.href = url_a;
content.appendChild(a);
var br = document.createElement('br');
content.appendChild(br);
var url_b = response.data.results.pset_download_url;
var b = document.createElement("a");
var link_b = document.createTextNode(" Protein Set ");
b.appendChild(link_b);
b.title = 'protein set';
b.href = url_b;
content.appendChild(b);
}
var moleculesProcessed = response.data.processed
if (moleculesProcessed === 'None') {
var content = document.getElementById('links');
content.innerHTML = response.data.html;
clearTimer('');
}
}
else if (taskStatus === 'FAILURE') {
clearTimer('An error occurred - see traceback below');
var content = document.getElementById('links');
content.innerHTML = response.data.upload_traceback;
}
})
}, 800);
function updateProgressTitle() {
dots++;
if (dots > 3) {
dots = 1;
}
progressTitle.innerHTML = 'processing uploaded files';
for (var i = 0; i < dots; i++) {
progressTitle.innerHTML += '.';
}
}
function clearTimer(message) {
clearInterval(timer);
progressTitle.innerHTML = message;
}
</script>
{% endif %}
This code works in the same way as the Javascript code for the validate task, but instead uses the Upload task described in Computational Data (Views)
Celery task - validating uploaded data¶
The first task that has to be completed when uploading a computed set is validation of the data. This task checks the
format of the uploaded SDF file provided to viewer.views.UploadCSetView to make sure it is in the correct format
and contains all of the required information (specified here: [link]) to upload save the data to the database through
the viewer.views.UploadTaskView into the relevant models specified in
Computational Data (Models).
-
class
viewer.tasks.validate(*a, **kw)¶ Celery task to process validate the uploaded files for a computed set upload. SDF file is mandatory, zip file is optional
- Parameters
sdf_file (str) – filepath of the uploaded sdf file, which is saved to temporary storage by viewer.views.UploadCSet
target (str) – name of the target (viewer.models.Target.title) to add add the computed set to
zfile (dict) – dictionary where key is the name of the file minus extension and path, and value is the filename, which is saved to temporary storage by viewer.views.UploadCSet
- Returns
validate_output –
- contains the following:
validate dict (dict): dict containing any errors found during the calidation step
validated (bool): True if the file(s) were validated, False if not
filename (str): name of the uploaded sdf file
target (str): name of the target that the computed set is associated with
zfile (dict): dictionary where key is the name of the file minus extension and path, and value is the filename, which is saved to temporary storage by viewer.views.UploadCSet
submitter_name (str): name of the author of the computed set
submitter_method (str): name of the method used to generate the computed set
- Return type
tuple
Celery task - processing and saving uploaded data¶
The second task that has to be completed when uploading a computed set is the upload itself. This task checks takes the
output of viewer.tasks.validate - the uploaded files must be validated before their data can be saved to the
database.
-
class
viewer.tasks.process_compound_set(*a, **kw)¶ Celery task to process a computed set, that takes the output of the validation task, and uploads molecules to a new computed set if the uploaded files are valid
- Parameters
validate_output (tuple) –
- contains the following:
validate dict (dict): dict containing any errors found during the validation step
validated (bool): True if the file(s) were validated, False if not
filename (str): name of the uploaded sdf file
target (str): name of the target that the computed set is associated with
zfile (dict): dictionary where key is the name of the file minus extension and path, and value is the filename, which is saved to temporary storage by viewer.views.UploadCSet
submitter_name (str): name of the author of the computed set
submitter_method (str): name of the method used to generate the computed set
- Returns
compound_set.name – name of the computed set
- Return type
str