sivan ophir | 2 Jun 2011 09:42

[BioMart Users] help connecting to DB

Hi,
Im new to Biomart and Im having a bit of a problem. I am under the impression that I do not fully understand how to work with the registry files.
I have a large query batch that I would like to run using the perl api . but when connecting through the URL using the following registry

<MartURLLocation
                    name         = "ensembl"
                    displayName  = "ensembl"
                    host         = "www.biomart.org"
                    port         = "80"
                    visible      = "1"
                    default      = ""
                     includeDatasets = "hsapiens_gene_ensembl"
                    martUser     = ""
/>


 I get a time out error . I understand that this may be due to the fact the my query is very big ,I have a few thousand proteins that I want to query. I read that one solution would be to connect to the Database directly and not via the URL.
I tried the following registry

<MartDBLocation
                    name         = "ensembl"
                    displayName  = "ensembl"
                    databaseType = "mysql"
                    host         = "martdb.ensembl.org"
                    port         = "3306"
                    database     = "ensembl"
                    schema       = "ensembl"
                    user         = "anonymous"
                    password     = ""
                    visible      = "1"
                    default      = ""
                    includeDatasets = "hsapiens_gene_ensembl"
                    martUser     = ""
/>

but was unsuccessful to connect , in order to connect to the ensembl database and use the human gene dataset. what am I doing wrong?


--
Sivan Ophir Goren

_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users
Joachim Baran | 2 Jun 2011 15:52
Picon

Re: [BioMart Users] help connecting to DB

Hi!

On 11-06-02 3:42 AM, "sivan ophir" <sivan <at> ofranlab.org> wrote:
Hi,<MartDBLocation
                    name         = "ensembl"
                    displayName  = "ensembl"
                    databaseType = "mysql"
                    host         = "martdb.ensembl.org"
                    port         = "3306"
                    database     = "ensembl"
                    schema       = "ensembl"
                    user         = "anonymous"
                    password     = ""
                    visible      = "1"
                    default      = ""
                    includeDatasets = "hsapiens_gene_ensembl"
                    martUser     = ""
/>
  The port for the MySQL server on martdb.ensembl.org is actually 5316 (http://useast.ensembl.org/info/data/mysql.html).

  I am a bit uncertain, but if you then put "ensembl_mart_62" for database and schema, it should work. Perhaps leave includeDatasets blank to ensure all datasets are covered.

Best Wishes,
Joachim
_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users
Damian Smedley | 2 Jun 2011 18:16
Picon
Picon
Favicon

Re: [BioMart Users] setting up multiple links between the same datasets


Hi Junjun,

Thanks for the tips.

Looks like for the anatomical term query either a MartReport solution or just creating a simplified combined partitioned dataset with the common fields may be the way to go then.

Cheers
Damian

On Mon, May 30, 2011 at 4:36 AM, Junjun Zhang <Junjun.Zhang <at> oicr.on.ca> wrote:
Hi Damian,

OK, I see. So you basically want to apply the same filter (eg, anatomic_term or gene) to three different expression datasets and retrieve results from all of them. It makes sense if this runs as three queries and returns three separate sets of results. If it were to run as one query where three datasets are joined, it would be possible to run, but the results would be very difficult to understand. For example, when applying one anatomic_term, dataset1 returns 10 rows, dataset2 returns 15 rows, dataset3 returns 20 rows, when they are joined, it will produce 10X15X20=3000 rows in the resulting table.

Other than joining the three datasets, it might be better to utilize martreport to retrieve data from three datasets and present them on the same page as different sections. Let's say, we create a anatomy dataset, then link it to all three expression datasets with anatomic_term (or anatomy_id, or something). Next, we create a report for the anatomy dataset choosing anatomic_term as the primary identifier. Last, add three expression datasets to the report as three different sections. Here is an example of a report for a gene at the ICGC data portal: http://dcc.icgc.org/martreport/?report=report&mart=gene_report&ensembl_gene_id=ENSG00000141510

Hope this helps,

Junjun



From: Damian Smedley <damian <at> ebi.ac.uk>
Date: Sat, 28 May 2011 05:22:27 -0400
To: jzhang <junjun.zhang <at> oicr.on.ca>
Cc: "users <at> biomart.org" <users <at> biomart.org>
Subject: Re: [BioMart Users] setting up multiple links between the same datasets

Hi Junjun,

The problem here is that the three expression databases they host at MRC-HGU are from different projects and contain quite different data and have been rightly configured as separate datasets. Partitioning would not be desirable here. So the situation is 3 expression datasets. The results you would get from linking by gene and linking by anatomy will sometimes be very different but this is a valid use case.

Cheers
Damian


On Sat, May 28, 2011 at 3:26 AM, Junjun Zhang <Junjun.Zhang <at> oicr.on.ca> wrote:
Hi Damian,

Sorry for the delay.

If I understand you correctly, the current system already supports it. Here is what I see it:

You have three data sources:
  expression (partitioned with multiple datasets, eg, expression1, expression2, expression3)
  gene (single dataset)
  anatomy (single dataset)

Data source expression is linked with gene dataset via gene_id, and it is linked with anatomy via anatomic_term_id.

Now, create a config (config1) of expression and add gene_symbol as a pointer filter pointing to gene_symbol filter in gene dataset. Similarly create another config (config2) for expression and add anatomical_term as a pointer filter pointing to anatomical_term filter in anatomy dataset.

Finally, the queries:

1. Give me all results from all the datasets for gene X:

<dataset name="expression1,expression2,expression3" config="config1">
  <filter name="gene_symbol" value="gene X"/>
  <attribute name="xxxxxxx"/>
  <!-- more expression attributes here -->
</dataset>

2. Give me all results from all the datasets for anatomical term X:

<dataset name="expression1,expression2,expression3" config="config2">
  <filter name="anatomical_term" value="term X"/>
  <attribute name="xxxxxxx"/>
  <!-- more expression attributes here -->
</dataset>

Both queries will return the union of results from three expression datasets: expression1, expression2, expression3. For the queries to work properly, it's not needed for the link to be config specific. When a pointer filter is picked up in the query, BioMart query engine will be able to pick up the correct link to perform the join.

These queries are similar to the following one which is a real query from the ICGC data portal. This query gives you the methylation results of two cancers for genes involved in 'Apoptosis' pathway.

<Dataset name="hsapiens_gene_ensembl_tcgaREAD,hsapiens_gene_ensembl_tcgaSTAD" config="gene_ensembl_config">

<Filter name="_displayname" value="Apoptosis"/> <!-- this is a pointer filter from pathway dataset, pathway and gene datasets are linked via ensembl_gene_id -->

<Attribute name="cancertype"/>

<Attribute name="ensembl_gene_id"/>

<Attribute name="hsapiens_gene_ensembl__methylation__dm__tumour_sample_id"/>

<Attribute name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_1"/>

<Attribute name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_2"/>

</Dataset>


Let me know if that makes sense or I just completely missed the point.

Cheers,
Junjun


From: Damian Smedley <damian <at> ebi.ac.uk>
Date: Thu, 26 May 2011 09:14:55 -0400
To: "users <at> biomart.org" <users <at> biomart.org>
Subject: [BioMart Users] setting up multiple links between the same datasets

Hi,

Just helping set up some new expression database BioMarts. We want to have one config where the datasets are linked by gene for the use case "Give me all results from all the datasets for gene X".  But we also want to have another config linked by anatomical term to satisfy the query "Give me all results from all the datasets for anatomical term X"

But linking seems to be set up at the dataset rather than config level? Is there a way round this?

Thanks
Damian



_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users
Junjun Zhang | 2 Jun 2011 20:48
Picon

Re: [BioMart Users] setting up multiple links between the same datasets

Cool, either way fits better your need is a good choice.
Cheers,
Junjun


From: Damian Smedley <damian <at> ebi.ac.uk>
Date: Thu, 2 Jun 2011 12:16:15 -0400
To: jzhang <junjun.zhang <at> oicr.on.ca>
Cc: "users <at> biomart.org" <users <at> biomart.org>
Subject: Re: [BioMart Users] setting up multiple links between the same datasets


Hi Junjun,

Thanks for the tips.

Looks like for the anatomical term query either a MartReport solution or just creating a simplified combined partitioned dataset with the common fields may be the way to go then.

Cheers
Damian

On Mon, May 30, 2011 at 4:36 AM, Junjun Zhang <Junjun.Zhang <at> oicr.on.ca> wrote:
Hi Damian,

OK, I see. So you basically want to apply the same filter (eg, anatomic_term or gene) to three different expression datasets and retrieve results from all of them. It makes sense if this runs as three queries and returns three separate sets of results. If it were to run as one query where three datasets are joined, it would be possible to run, but the results would be very difficult to understand. For example, when applying one anatomic_term, dataset1 returns 10 rows, dataset2 returns 15 rows, dataset3 returns 20 rows, when they are joined, it will produce 10X15X20=3000 rows in the resulting table.

Other than joining the three datasets, it might be better to utilize martreport to retrieve data from three datasets and present them on the same page as different sections. Let's say, we create a anatomy dataset, then link it to all three expression datasets with anatomic_term (or anatomy_id, or something). Next, we create a report for the anatomy dataset choosing anatomic_term as the primary identifier. Last, add three expression datasets to the report as three different sections. Here is an example of a report for a gene at the ICGC data portal: http://dcc.icgc.org/martreport/?report=report&mart=gene_report&ensembl_gene_id=ENSG00000141510

Hope this helps,

Junjun



From: Damian Smedley <damian <at> ebi.ac.uk>
Date: Sat, 28 May 2011 05:22:27 -0400
To: jzhang <junjun.zhang <at> oicr.on.ca>
Cc: "users <at> biomart.org" <users <at> biomart.org>
Subject: Re: [BioMart Users] setting up multiple links between the same datasets

Hi Junjun,

The problem here is that the three expression databases they host at MRC-HGU are from different projects and contain quite different data and have been rightly configured as separate datasets. Partitioning would not be desirable here. So the situation is 3 expression datasets. The results you would get from linking by gene and linking by anatomy will sometimes be very different but this is a valid use case.

Cheers
Damian


On Sat, May 28, 2011 at 3:26 AM, Junjun Zhang <Junjun.Zhang <at> oicr.on.ca> wrote:
Hi Damian,

Sorry for the delay.

If I understand you correctly, the current system already supports it. Here is what I see it:

You have three data sources:
  expression (partitioned with multiple datasets, eg, expression1, expression2, expression3)
  gene (single dataset)
  anatomy (single dataset)

Data source expression is linked with gene dataset via gene_id, and it is linked with anatomy via anatomic_term_id.

Now, create a config (config1) of expression and add gene_symbol as a pointer filter pointing to gene_symbol filter in gene dataset. Similarly create another config (config2) for expression and add anatomical_term as a pointer filter pointing to anatomical_term filter in anatomy dataset.

Finally, the queries:

1. Give me all results from all the datasets for gene X:

<dataset name="expression1,expression2,expression3" config="config1">
  <filter name="gene_symbol" value="gene X"/>
  <attribute name="xxxxxxx"/>
  <!-- more expression attributes here -->
</dataset>

2. Give me all results from all the datasets for anatomical term X:

<dataset name="expression1,expression2,expression3" config="config2">
  <filter name="anatomical_term" value="term X"/>
  <attribute name="xxxxxxx"/>
  <!-- more expression attributes here -->
</dataset>

Both queries will return the union of results from three expression datasets: expression1, expression2, expression3. For the queries to work properly, it's not needed for the link to be config specific. When a pointer filter is picked up in the query, BioMart query engine will be able to pick up the correct link to perform the join.

These queries are similar to the following one which is a real query from the ICGC data portal. This query gives you the methylation results of two cancers for genes involved in 'Apoptosis' pathway.

<Dataset name="hsapiens_gene_ensembl_tcgaREAD,hsapiens_gene_ensembl_tcgaSTAD" config="gene_ensembl_config">

<Filter name="_displayname" value="Apoptosis"/> <!-- this is a pointer filter from pathway dataset, pathway and gene datasets are linked via ensembl_gene_id -->

<Attribute name="cancertype"/>

<Attribute name="ensembl_gene_id"/>

<Attribute name="hsapiens_gene_ensembl__methylation__dm__tumour_sample_id"/>

<Attribute name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_1"/>

<Attribute name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_2"/>

</Dataset>


Let me know if that makes sense or I just completely missed the point.

Cheers,
Junjun


From: Damian Smedley <damian <at> ebi.ac.uk>
Date: Thu, 26 May 2011 09:14:55 -0400
To: "users <at> biomart.org" <users <at> biomart.org>
Subject: [BioMart Users] setting up multiple links between the same datasets

Hi,

Just helping set up some new expression database BioMarts. We want to have one config where the datasets are linked by gene for the use case "Give me all results from all the datasets for gene X".  But we also want to have another config linked by anatomical term to satisfy the query "Give me all results from all the datasets for anatomical term X"

But linking seems to be set up at the dataset rather than config level? Is there a way round this?

Thanks
Damian



_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users
Zhiliang | 3 Jun 2011 00:00

[BioMart Users] quick install errors


I try to install Biomart on a new RHEL5 server and got follow problems:

I followed Quick Start steps in the BioMart 0.8 User Manual:
> svn checkout https://code.oicr.on.ca/svn/biomart/biomart-java/branches/release-0_8-candidate_6/

Upon this step:
> ant
I got following errors:

compile.main:
     [javac] /usr/local/biomart/build.xml:33: warning: 'includeantruntime' 
was not set, defaulting to build.sysclasspath=last; set to false for 
repeatable builds
     [javac] Compiling 527 source files to 
/usr/local/biomart/build/classes/main
     [javac] ----------
     [javac] 1. ERROR in 
/usr/local/biomart/src/org/biomart/api/enums/Operation.java (at line 1)
     [javac]     package org.biomart.api.enums;
     [javac]     ^^
     [javac] The type Enum is not generic; it cannot be parameterized with 
arguments <Operation>
     [javac] ----------
     [javac] 1 problem (1 error)

BUILD FAILED

I am not a java person - wonder what was wrong and what should I do?

Thank you,
Zhiliang
_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users

Peter Ansell | 3 Jun 2011 00:13
Picon
Gravatar

Re: [BioMart Users] quick install errors

What version of java are you using? This code may not be compatible
with old versions.

You can check by running "java -version"

For example, on my Mac OSX machine, it returns

$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode)

Cheers,

Peter

On 3 June 2011 08:00, Zhiliang <hu <at> animalgenome.org> wrote:
>
> I try to install Biomart on a new RHEL5 server and got follow problems:
>
> I followed Quick Start steps in the BioMart 0.8 User Manual:
>>
>> svn checkout
>> https://code.oicr.on.ca/svn/biomart/biomart-java/branches/release-0_8-candidate_6/
>
> Upon this step:
>>
>> ant
>
> I got following errors:
>
> compile.main:
>    [javac] /usr/local/biomart/build.xml:33: warning: 'includeantruntime' was
> not set, defaulting to build.sysclasspath=last; set to false for repeatable
> builds
>    [javac] Compiling 527 source files to
> /usr/local/biomart/build/classes/main
>    [javac] ----------
>    [javac] 1. ERROR in
> /usr/local/biomart/src/org/biomart/api/enums/Operation.java (at line 1)
>    [javac]     package org.biomart.api.enums;
>    [javac]     ^^
>    [javac] The type Enum is not generic; it cannot be parameterized with
> arguments <Operation>
>    [javac] ----------
>    [javac] 1 problem (1 error)
>
> BUILD FAILED
>
> I am not a java person - wonder what was wrong and what should I do?
>
> Thank you,
> Zhiliang
> _______________________________________________
> Users mailing list
> Users <at> biomart.org
> https://lists.biomart.org/mailman/listinfo/users
>
_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users
Howsun Jow | 3 Jun 2011 01:27
Picon
Picon
Favicon

[BioMart Users] Ensembl Plants biomart - Changes between version 8 and version 9

Hello,
       A few months ago I wrote an R script that used the Ensembl Plants biomart (version 8 and dataset
Arabidopsis Thaliana) to get the corresponding uniprot identifiers for a list of refseq_dna identifiers e.g.

Arabidopsis = useMart("plant_mart_9", dataset="athaliana_eg_gene")
x<- getBM(attributes = c("refseq_dna","uniprot_swissprot_accession"), values=c("NM_122767"), 
                                     mart=Arabidopsis,  filters = c("refseq_dna"))


This, however, no longer works for version 9 of the database as "uniprot_swissprot_accession" and "uniprot_sptrembl"
attributes no longer exist in version 9 of the biomart. Is this a deliberate omission and if so can version 8 of the Ensembl 
Plant biomart be accessed elsewhere?

Thank you.

Dr. Howsun Jow. 

_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users
Elena Rivkin | 3 Jun 2011 14:42
Picon

Re: [BioMart Users] Ensembl Plants biomart - Changes between version 8 and version 9

Dr. Howsun Jow, 

Yes you are right, the attributes UniProtKB/SwissProt Accession, and UniProtKB/TrEMBL Accession are not available in the latest Ensembl Plants Mart release 9. For information on why it was not included, please contact Ensembl helpdesk at helpdesk <at> ensembl.org Thank you, Elena

From: Howsun Jow <Howsun.Jow <at> newcastle.ac.uk>
Date: Thu, 2 Jun 2011 19:27:53 -0400
To: "users <at> biomart.org" <users <at> biomart.org>
Subject: [BioMart Users] Ensembl Plants biomart - Changes between version 8 and version 9

Hello,
       A few months ago I wrote an R script that used the Ensembl Plants biomart (version 8 and dataset
Arabidopsis Thaliana) to get the corresponding uniprot identifiers for a list of refseq_dna identifiers e.g.

Arabidopsis = useMart("plant_mart_9", dataset="athaliana_eg_gene")
x<- getBM(attributes = c("refseq_dna","uniprot_swissprot_accession"), values=c("NM_122767"), 
                                     mart=Arabidopsis,  filters = c("refseq_dna"))


This, however, no longer works for version 9 of the database as "uniprot_swissprot_accession" and "uniprot_sptrembl"
attributes no longer exist in version 9 of the biomart. Is this a deliberate omission and if so can version 8 of the Ensembl 
Plant biomart be accessed elsewhere?

Thank you.

Dr. Howsun Jow. 

_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users
Isaac cano | 3 Jun 2011 15:09
Picon

[BioMart Users] Data presentation

Dear BioMart users,


I'm using the BioMart graphical user web interface to query a local database that latter on will be federated with other marts. There are several attributes of different databases that for example are encoded like Booleans (0/1) but I would like to show them as (true/false or other strings representing the semantics). Is it possible to configure BioMart to do so? If so please let me know how to do it.

Thanks!

Isaac Cano 
_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users
Dan Staines | 3 Jun 2011 15:19
Picon
Picon
Favicon

Re: [BioMart Users] Ensembl Plants biomart - Changes between version 8 and version 9


On 06/03/2011 01:42 PM, Elena Rivkin wrote:
>
> Yes you are right, the attributes UniProtKB/SwissProt Accession, and
> UniProtKB/TrEMBL Accession are not available in the latest Ensembl
> Plants Mart release 9. For information on why it was not included,
> please contact Ensembl helpdesk athelpdesk <at> ensembl.org
> <https://lists.biomart.org/mailman/listinfo/users>

Apologies for this - there was a problem with our production process in
the last release. This is now fixed on our marts at plants.ensembl.org.

Best,

Dan.

--

-- 
Dan Staines, PhD               Ensembl Genomes Technical Coordinator
EMBL-EBI                       Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
_______________________________________________
Users mailing list
Users <at> biomart.org
https://lists.biomart.org/mailman/listinfo/users


Gmane