Rose, Peter | 2 Jul 07:26 2015

[Job] Java Web Developer Positions at RCSB Protein Data Bank, UC San Diego

The Research Collaboratory for Structural Bioinformatics (RCSB) Protein
Data Bank (www.rcsb.org)
 is seeking exceptional developers, and we know we're not alone in our
search. So why choose to work with us? Our team values open discussion
and contribution. Starting from your first day, you will shape software
and services used by thousands of people around the world. RCSB PDB can
trace its lineage back to the 1970's, but we still operate like a
start-up. Have a great idea? Let's hear it. Want to try a new
technology? Let's learn it. Want to write code at scale? Let's do it.
Everyone at our organization is passionate about what we do, and that is
 why we are leaders in our field. We want to hear from skilled
Developers, people passionate about their craft and what they can bring
to the field. We are currently looking for experienced developers to
join our team of agile software Developers at the University of
California, San Diego.

The RCSB Protein Data Bank (www.rcsb.org)
 is one of the world's leading biological databases with more than
300,000 unique users per month from over 160 countries. It enables
access to the singular global archive of the three-dimensional
structures of proteins and nucleic acids and is a key resource for the
design of new medicines, biofuels, nanomaterials, and enables
fundamental discoveries in biology and medicine.
Experienced web developer will be part of our agile team of
scientists and software developers who expand the capabilities of our
databases, search engines, websites, web services, and visualization and
 analysis tools. The candidate will implement cutting edge technology on
 our in-house cloud using industry standards and best web development
Practices.

(Continue reading)

Rose, Peter | 27 Jun 19:30 2015

[Job] Postdoctoral positions in Big Data/Structural Bioinformatics at University of California, San Diego

Summary: We are looking for two highly motivated post-docs as part of our
new project ┬│Compressive Structural Bioinformatics┬▓ funded by the US
National Institutes of Health (NIH) Big Data to Knowledge (BD2K)
initiative. 

The Challenge: To enable efficient research on the rapidly growing number
of 3D molecular structures of ever increasing size and complexity. Develop
highly scalable 3D structural search, analysis, workflow, data-exchange,
and visualization tools.

Qualifications: Ph.D. in structural bioinformatics, structural biology,
bioinformatics, computational biology or chemistry, computer science, or
related discipline. Experience with scientific software development as
demonstrated by publications or participation in open source software
projects. Experience with several programming languages, including Java,
JavaScript, C++, or Python, and software development tools. Strong skills
in applied mathematics and algorithm design are required. Experience with
distributed parallel computing or 3D visualization applications are a
plus. Excellent interpersonal, written, and oral presentation skills are
essential.

Note, this position is reviewed annually on the basis of performance and
can be renewed for a maximum of three years.

Our Environment:

The Structural Bioinformatics Group (http://bioinformatics.sdsc.edu) at
the San Diego Supercomputer Center (SDSC) (http://www.sdsc.edu) is
involved in research and development activities centered around 3D
structures of proteins and nucleic acids, the integration of structural
(Continue reading)

Rose, Peter | 27 Jun 19:26 2015

[Job] Java Web Developers at RCSB PDB, University of California, San Diego

The RCSB PDB is seeking exceptional Developers, and we know we're not alone in our search. So why choose to work with us? Our team values open discussion and contribution. Starting from your first day, you will shape software and services used by thousands of people around the world. Our organization can trace its lineage back to the 1970's, but we still operate like a start up. Have a great idea, let's hear it. Want to try a new technology, let's learn it. Want to write code at scale, let's do it. Everyone at our organization is passionate about what we do, and that is why we are leaders in our field. We want to hear from skilled Developers, people passionate about their craft and what they can bring to the field.

We are looking for two experienced Developers to join our team of agile software Developers at the University of California, San Diego. By joining our team, a successful applicant would be able to contribute to a variety of projects ranging from:

  • Front end development using HTML, CSS, Javascript, JSP, and NodeJS

    • Our core business is our website and web services

  • Middleware development that leverages Memcached, Hibernate and RabbitMQ

    • How we scale to meet tens of thousands of unique users every day

  • Back end development using Java, MySQL/MariaDB, and NoSQL solutions

    • How we incorporate and add value to the scientific community

  • Special projects

    • Search using Apache Solr

    • Scalable solutions built on top of OpenStack, Hadoop, and Spark


The RCSB Protein Data Bank (www.rcsb.org) is one of the world’s leading biological databases with more than 300,000 unique users per month from over 160 countries. It enables access to the singular global archive of the three-dimensional structures of proteins and nucleic acids and is a key resource for the design of new medicines, biofuels, nanomaterials, and enables fundamental discoveries in biology and medicine.

Requirements:

  • BS degree in Computer Science or related field

  • A minimum of 2 years of experience developing dynamic, highly scalable, database-driven web applications using HTML, CSS, JavaScript and Java/JSP

  • Demonstrable experience with database design and systems

    • Experience with NoSQL database systems, object-relational mapping using Hibernate and distributed parallel computing is a plus

  • Citable experience using agile software development and test-driven design


For more requirements or to apply, please view the UCSD job page.

--

Peter Rose, Ph.D.

Site Head, RCSB Protein Data Bank West (http://www.rcsb.org)

Principal Investigator, Structural Bioinformatics Laboratory (http://bioinformatics.sdsc.edu)

San Diego Supercomputer Center (http://www.sdsc.edu)

University of California, San Diego

+1-858-822-5497

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Andreas Prlic | 27 Jun 00:37 2015
Picon

BioJava 4.1.0 released

BioJava 4.1.0 has been released and is available using Maven from Maven Central as well as through manual download.

This release contains over 240 commits from 8 authors.
https://github.com/biojava/biojava/compare/5131f3aaff5a5bbbf221f5f52cfe3b849a002d87...biojava-4.1.0

BioJava 4.1.0 offers a few new features, as well several bug-fixes.

New Features:

* New algorithm for multiple structure alignments
* Improved visualization of structural alignments in Jmol
* Support for the ECOD protein classification
* Better mmCIF support: limited write support, better parsing

About BioJava:

BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats, and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language.

Happy BioJava-ing,

Andreas
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Guoying Qi | 25 Jun 15:38 2015
Picon
Picon

Partial ORGANISM_SCIENTIFIC name returned from PDB 4xcw

For PDB 4xcw, only partial organism scientific name returned by the following method:

getOrganismScientific()
org.biojava.nbio.structure.Compound 

HELICOBACTER PYLORI (STRAIN J99 / ATCC 7

I traced back to PDBFileParser,  it seems this PDB file being treated in legacy format and the line length being trimmed into 72.

Can this problem be fixed somehow?

Thanks.

Guoying Qi
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Toorn, H.W.P. van den (Henk | 15 Jun 10:58 2015
Picon

Fasta parsing question

Dear List,

I've just started using BioJava 4.0.0 in my projects, and wanted to ask 
a question about parsing large Fasta files. There is the option to read 
parts of the fasta file.

FastaReader.process(number)

The problem I have is that it's not documented what happens if the file 
is read in its entirety. I was expecting a null or an empty map, or even 
some exception, but none happened and the parser kept on producing 
(empty) sequences.

Could anyone enlighten me? I'm probably missing the point here. Maybe 
there is a better way to do this (there used to be the SequenceIterator 
if I remember correctly, but I can't find that in version 4.0).

Regards, Henk

My setup: windows 7 64-bit, java 1.8.0_45 64 bit, BioJava 4.0.0 via Maven.
--

-- 

Attachment (h_w_p_vandentoorn.vcf): text/x-vcard, 295 bytes
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
simon rayner | 4 Jun 08:57 2015
Picon

Fwd: GenBank parsing

We resolved things, sort of, But at some point we fell off the mailing list. Here is the full message chain

thanks again to all for the help

Andreas, repeating my question here,  would it be any use if I added a more complete code sample to the tutorial show how to pull the Feature information out of a GenBank file?

cheers

Simon


---------- Forwarded message ----------
From: Paolo Pavan <paolo.pavan <at> gmail.com>
Date: Wed, Jun 3, 2015 at 5:34 PM
Subject: Re: [Biojava-l] GenBank parsing
To: simon rayner <simon.rayner.cn <at> gmail.com>


Oh, I'm realizing now that we went outside of the mailing list.
You can forward all the conversation to the list and ask for Andreas there.

Paolo

2015-06-03 17:29 GMT+02:00 Paolo Pavan <paolo.pavan <at> gmail.com>:
Simon,
As far as I  have read on the mailing list, I know that Andreas Prlic is interested in this kind of collaborations. I think he will answer you shortly.

Bye bye!

2015-06-03 17:14 GMT+02:00 simon rayner <simon.rayner.cn <at> gmail.com>:
Hi Paolo

I think its okay. For now, perhaps it would be good to clarify this somewhere (perhaps in the tutorial sample?). And would it be any use if I added a more complete code sample to the tutorial show how to pull the Feature information out of a GenBank file?

Simon

On Wed, Jun 3, 2015 at 5:11 PM, Paolo Pavan <paolo.pavan <at> gmail.com> wrote:
Hi Simon,
Now I see what you mean and unfortunately I must say that those retrieval are not supported yet. They aren't in the section I put my hands on and I must say that I wasn't actually aware of that.

The file responsible for this behaviour is GenbankSequenceParser.java, I don't know if there are someone of the original authors out of there that can add something.

You are unlucky, let me know if I can be of any help more.
Paolo

2015-06-03 15:55 GMT+02:00 simon rayner <simon.rayner.cn <at> gmail.com>:
​Hi Paolo

 sequence.getFeaturesByType("source");

will return the 'source' entry at the top of the FEATURE tree, but it won't help me retrieve anything outside the FEATURE tree (from the top of the file and at the bottom before the sequence)

For example, in the following GenBank file

LOCUS AY102993 400 bp mRNA linear VRL 22-FEB-2006 DEFINITION Rabies virus isolate RV61 nucleoprotein mRNA, partial cds. ACCESSION AY102993 AY247649 VERSION AY102993.2 GI:34099643 KEYWORDS . SOURCE Rabies virus ORGANISM Rabies virus Viruses; ssRNA viruses; ssRNA negative-strand viruses; Mononegavirales; Rhabdoviridae; Lyssavirus. REFERENCE 1 (bases 1 to 400) AUTHORS Smith,J., McElhinney,L., Parsons,G., Brink,N., Doherty,T., Agranoff,D., Miranda,M.E. and Fooks,A.R. TITLE Case report: rapid ante-mortem diagnosis of a human case of rabies imported into the UK from the Philippines JOURNAL J. Med. Virol. 69 (1), 150-155 (2003) PUBMED 12436491 REFERENCE 2 (bases 1 to 400)
     .
     .
     .

COMMENT On Aug 22, 2003 this sequence version replaced gi:25986720. FEATURES Location/Qualifiers source 1..400 /organism="Rabies virus" /mol_type="mRNA" /isolate="RV61" /host="Homo sapiens" /db_xref="taxon:11292" /country="United Kingdom" /note="isolated in 1987" CDS 1..>400

sequence.getFeaturesByType("source");

will return the portion

source 1..400 /organism="Rabies virus" /mol_type="mRNA" /isolate="RV61" /host="Homo sapiens" /db_xref="taxon:11292" /country="United Kingdom" /note="isolated in 1987"


which is important data, but what about the KEYWORDS, SOURCE and REFERENCE information at the  top and COMMENT at the bottom?

I can use the following calls to get some information

getOriginalHeader() -> LOCUS
getDescription() -> DEFINITION
getAccession() -> ACCESSION 

What am I missing here?

thanks

Simon

On Wed, Jun 3, 2015 at 3:22 PM, Paolo Pavan <paolo.pavan <at> gmail.com> wrote:
Can't you find those information in the "source" feature? Check this list: 
List l = sequence.getFeaturesByType("source");

This come from the fact that in new version of genbank file, source is a compulsory feature and they move many info from top level "Features tag" into "Source" tag qualifiers.

Let us know,
Paolo


2015-06-03 14:29 GMT+02:00 simon rayner <simon.rayner.cn <at> gmail.com>:
Thanks to all for taking the time to answer. 

I had already got as far as parsing out the feature information using something like

LinkedHashMap<String, DNASequence> dnaSequences = GenbankReaderHelper.readGenbankDNASequence( dnaFile );
for (DNASequence sequence : dnaSequences.values()) {

                List<FeatureInterface<AbstractSequence<NucleotideCompound>, NucleotideCompound>> fl =   sequence.getFeatures();
                for (FeatureInterface fi : fl) {

                    HashMap <String, Qualifier> quals = fi.getQualifiers();
                    for(Map.Entry<String, Qualifier> entry : quals.entrySet()){
                        logger.info("--\t" + entry.getKey() + "\t|\t" + entry.getValue().getName() 
                                + "  /  " + entry.getValue().getValue() + "\\" + entry.getValue().toString());                       
                    }
                    logger.info("SHORT\t" + fi.getShortDescription());
                    logger.info("SOURCE\t" + fi.getSource());
                    logger.info("TYPE\t" + fi.getType());
                    logger.info("HASHCODE\t" + fi.hashCode());
                    logger.info("-");
                }

}

But I am still stumped as to how to access the annotation information at the top of a GenBank file. 

For example, getAccession gets me the accession number of the sequence, but what about all the other data that is there (e.g. the pubmed records)?

In BJ3, there was a RichAnnotation class, but I don't see anything equivalent in BJ4.

cheers

Simon



On Wed, Jun 3, 2015 at 12:39 PM, Paolo Pavan <paolo.pavan <at> gmail.com> wrote:
Hi Simon,
I took care about last updates to the Genbank parser (reader). At the state of the art, there are two ways to read annotated Genbank files: via GenbankReader and via GenbankProxySequenceReader .

The first one:
GenbankReader<ProteinSequence, AminoAcidCompound> GenbankProtein
                = new GenbankReader<ProteinSequence, AminoAcidCompound>(
                        inStream,
                        new GenericGenbankHeaderParser<ProteinSequence, AminoAcidCompound>(),
                        new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())
                );
LinkedHashMap<String, ProteinSequence> proteinSequences = GenbankProtein.process();
        inStream.close();


The second one is:

GenbankProxySequenceReader<AminoAcidCompound> genbankProteinReader
                = new GenbankProxySequenceReader<AminoAcidCompound>("/my_directory", "NP_000257", AminoAcidCompoundSet.getAminoAcidCompoundSet());
        ProteinSequence proteinSequence = new ProteinSequence(genbankProteinReader);


Just keep in mind to use NucleotideCompound and a DNASequenceCreator(DNACompoundSet.getDNACompoundSet()) if you need to parse genbank nucleotide files.

You can access annotation stored via getFeatures() methods family of the readed sequence object. Also note that features have qualifiers (those starting with / in the genbank file) and they must be accessed from the feature object with getQualifiers().
Also note that feature can have complex locations (rare, but present) in this case you will find nested locations in the feature retrieved.

Does this answer your question?
Bye bye,
Paolo






2015-06-03 10:27 GMT+02:00 Jose Manuel Duarte <jose.duarte <at> psi.ch>:
I can't offer much help regarding GenBank parsing itself, but I would at least like to clarify the situation with the different (indeed confusing) versions:

BJ4 is the current release, well maintained and under development. BJ3 has been completely superseded by BJ4. That means that BJ4 does everything that BJ3 did. In the cookbook and tutorials everything that refers to BJ3 should work in BJ4, with the only difference that the namespace of packages has changed from org.biojava.bio/org.biojava3 to org.biojava.nbio.

BJ1 and BJX are both legacy projects, with some maintenance but not much active development. I believe that some of the features in them were not ported to BJ3+.

Cheers

Jose



On 02.06.2015 11:40, Simon Rayner wrote:
Hi

I'm coming back to BioJava (BJ) after a couple of years away and am somewhat confused by the current collection of cookbooks, tutorials and APIs. There appear to be a few examples for handling protein structure data, but relatively little for more mainstream stuff such as parsing Genbank files, which I first need to get the information I want to investigate protein structure. But when I look at the relevant code samples to do this, they refer back to BJ3, BJ1, or even BJX. Even the Wiki page still refers to BJ3 despite the release of BJ4 back in Feb 2015.

I have everything working for parsing GenBank data, but I'm still trying to get the Annotation information out of the top of a GenBank file, and can't find any way of doing this using BJ4 - the BJ4 API appears to refer to the RichAnnotation type in BJX release. Can anyone clarify what you are supposed to do here? Start mixing in some BJX? (and is BJX still active?) or should I still be using BJ3 until BJ4 stabilizes. I realise this is an open source project, but some clarification on the current status of things would be handy if the project is going to appeal to a larger community :)

Thanks!



_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l


_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l








_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Simon Rayner | 2 Jun 11:40 2015
Picon
Picon

GenBank parsing

Hi

I'm coming back to BioJava (BJ) after a couple of years away and am somewhat confused by the current
collection of cookbooks, tutorials and APIs. There appear to be a few examples for handling protein
structure data, but relatively little for more mainstream stuff such as parsing Genbank files, which I
first need to get the information I want to investigate protein structure. But when I look at the relevant
code samples to do this, they refer back to BJ3, BJ1, or even BJX. Even the Wiki page still refers to BJ3
despite the release of BJ4 back in Feb 2015.

I have everything working for parsing GenBank data, but I'm still trying to get the Annotation information
out of the top of a GenBank file, and can't find any way of doing this using BJ4 - the BJ4 API appears to refer to
the RichAnnotation type in BJX release. Can anyone clarify what you are supposed to do here? Start mixing
in some BJX? (and is BJX still active?) or should I still be using BJ3 until BJ4 stabilizes. I realise this is
an open source project, but some clarification on the current status of things would be handy if the
project is going to appeal to a larger community :)

Thanks!

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l

Rose, Peter | 27 May 19:23 2015

[Job] Postdoctoral Fellows Computational Biology - UC San Diego

University of California, San Diego
San Diego Supercomputer Center
Postdoctoral Fellow

Summary: We are looking for two highly motivated post-docs as part of our new project “Compressive Structural Bioinformatics” funded by the US National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative. 

The Challenge: To enable efficient research on the rapidly growing number of 3D molecular structures of ever increasing size and complexity. Develop highly scalable 3D structural search, analysis, workflow, data-exchange, and visualization tools. 

Qualifications: Ph.D. in structural bioinformatics, structural biology, bioinformatics, computational biology or chemistry, computer science, or related discipline. Experience with scientific software development as demonstrated by publications or participation in open source software projects. Experience with several programming languages, including Java, JavaScript, C++, or Python, and software development tools. Strong skills in applied mathematics and algorithm design are required. Experience with distributed parallel computing or 3D visualization applications are a plus. Excellent interpersonal, written, and oral presentation skills are essential.

Note, this position is reviewed annually on the basis of performance and can be renewed for a maximum of three years.

Our Environment:

The Structural Bioinformatics Group (http://bioinformatics.sdsc.edu) at the San Diego Supercomputer Center (SDSC) (http://www.sdsc.edu) is involved in research and development activities centered around 3D structures of proteins and nucleic acids, the integration of structural data with other domains such as Medicine, Genomics, Biology, Drug Discovery, and the development of scalable solution to Big Data problems in Structural Bioinformatics. Our group leads the RCSB Protein Data Bank (PDB) west-coast operations. The RCSB PDB (http://www.rcsb.org) represents the preeminent source of experimentally determined macromolecular structure information for research and teaching in biology, biological chemistry, and medicine. With over 300,000 unique users from over 160 countries around the world, the RCSB PDB is one of the leading worldwide Biological Databases. Our group is involved in the National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative.

As an Organized Research Unit of UC San Diego, SDSC is a world leader in data-intensive computing and cyber infrastructure, providing resources, services, and expertise to the national research community, including industry and academia.

To apply, please send cover letter and resume to Dr. Peter Rose (pwrose <at> ucsd.edu).

--
Peter Rose, Ph.D.
Site Head, RCSB Protein Data Bank West (http://www.rcsb.org)
Principal Investigator, Structural Bioinformatics Laboratory (http://bioinformatics.sdsc.edu)
San Diego Supercomputer Center (http://www.sdsc.edu)
University of California, San Diego
+1-858-822-5497
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Jose Colbes | 19 May 20:00 2015
Picon

Possible Issue with Calc.angle() and Calc.torsion()

Greetings,

As a part of my work, I use these methods a lot. But I get NaN when the vectors are parallel. 

I know that it is a common issue (rounding errors and acos()) and there is info in the web to get around the problem, but I wanted you to know anyway.

Best regards,
Jose

PD: Thank you for Biojava.
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Andreas Prlic | 15 May 00:46 2015
Picon

Fwd: Obtaining second structure with biojava

Hi Mohammad,

Please don't send BioJava related questions to me directly, but to the mailing list.

The secondary structure assignment code in BioJava is still in beta. If you want to get the author's assignment of secondary structure (most of the time the same as DSSP assignments), you can take a look at the tutorial for how to access it. Check the section "Working with groups" for an example.


Hope that helps,

Andreas



---------- Forwarded message ----------
From: Mohammad Taheri <mo.taheri.ledari <at> gmail.com>
Date: Thu, May 14, 2015 at 2:01 AM
Subject: Obtaining second structure with biojava
To: andreas.prlic <at> gmail.com


Hello Mr Andreas Prlic.

I am using biojava to load and analyze protein structure, but i have problem with obtaining the secondary structure of a protein. I use the code here to get the second structure but it is not giving me right locations of alpha helices. For example it considers some 3/10 helices as alpha helices.
I even tried using raw pdb file to obtain alpha helices and beta sheets by using toPDB()  method of the structure object but this method is not giving me the oroginal pdb file with HELIX and SHEET sections and just giving me atoms section.
May you tell me how can i obtain right and exact second structure of a protein chain by using biojava?

Thank you in advance for your help.


_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l

Gmane