Hi Damian,
Sorry for the delay.
If I understand you correctly, the current system already supports it. Here is what I see it:
You have three data sources:
expression (partitioned with multiple datasets, eg, expression1, expression2, expression3)
gene (single dataset)
anatomy (single dataset)
Data source expression is linked with gene dataset via gene_id, and it is linked with anatomy via anatomic_term_id.
Now, create a config (config1) of expression and add gene_symbol as a pointer filter pointing to gene_symbol filter in gene dataset. Similarly create another config (config2) for expression and add anatomical_term as a pointer filter pointing to anatomical_term filter in anatomy dataset.
Finally, the queries:
1. Give me all results from all the datasets for gene X:
<dataset name="expression1,expression2,expression3" config="config1">
<filter name="gene_symbol" value="gene X"/>
<attribute name="xxxxxxx"/>
<!-- more expression attributes here -->
</dataset>
2. Give me all results from all the datasets for anatomical term X:
<dataset name="expression1,expression2,expression3" config="config2">
<filter name="anatomical_term" value="term X"/>
<attribute name="xxxxxxx"/>
<!-- more expression attributes here -->
</dataset>
Both queries will return the union of results from three expression datasets: expression1, expression2, expression3. For the queries to work properly, it's not needed for the link to be config specific. When a pointer filter is picked up in the query, BioMart query engine will be able to pick up the correct link to perform the join.
These queries are similar to the following one which is a real query from the ICGC data portal. This query gives you the methylation results of two cancers for genes involved in 'Apoptosis' pathway.
<Dataset name="hsapiens_gene_ensembl_tcgaREAD,hsapiens_gene_ensembl_tcgaSTAD" config="gene_ensembl_config">
<Filter name="_displayname" value="Apoptosis"/> <!-- this is a pointer filter from pathway dataset, pathway and gene datasets are linked via ensembl_gene_id -->
<Attribute name="cancertype"/>
<Attribute name="ensembl_gene_id"/>
<Attribute name="hsapiens_gene_ensembl__methylation__dm__tumour_sample_id"/>
<Attribute name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_1"/>
<Attribute name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_2"/>
</Dataset>
Let me know if that makes sense or I just completely missed the point.
Cheers,
Junjun
Hi,
Just helping set up some new expression database BioMarts. We want to have one config where the datasets are linked by gene for the use case "Give me all results from all the datasets for gene X". But we also want to have another config linked by anatomical term to satisfy the query "Give me all results from all the datasets for anatomical term X"
But linking seems to be set up at the dataset rather than config level? Is there a way round this?
Thanks
Damian