Andreas Prlic | 24 May 20:30 2016
Picon
Gravatar

Re: Documentation, tutorials and cookbook about pairwise sequence alignment

Hi Andreas,

please don't mail me directly with questions, but keep them on the list, for the benefit of other users who might be in the same situation.

You are right, the signature of SubstitutionMatrix has changed at some point in the past. Try this:

SubstitutionMatrix<AminoAcidCompound> matrix = SubstitutionMatrixHelper.getBlosum65();
There is also another example in the "demo" package of the alignment module, that you could take a look at.

About the tutorial: When you have the basics figured out, any contributions/pull requests to improve the tutorial will be welcome! ;-)

A


On Tue, May 24, 2016 at 5:51 AM, Andrea Battistelli <andreyas.1688 <at> gmail.com> wrote:
Hi Andreas.

I have seen the codes in the page you suggested me.
I have encountered some errors about:

import org.biojava.nbio.alignment.template.SequencePair;
import org.biojava.nbio.alignment.template.SubstitutionMatrix;

Regarding all the other imports, there are not problems.
Indeed in the code I have errors in this part:

SubstitutionMatrix<AminoAcidCompound> matrix = new SimpleSubstitutionMatrix<AminoAcidCompound>();
List<SequencePair<ProteinSequence, AminoAcidCompound>> alig = Alignments.getAllPairsAlignments(lst,
                PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix);
for (SequencePair<ProteinSequence, AminoAcidCompound> pair : alig) {
            System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), pair.getTarget().getAccession(), pair);
}
It is told me to create these two classes.

I am using the last version of the BioJava project (4.2.1) by means of Maven (as described in the site).
Probably in that version there are not anymore those classes. Could be this the problem?
How could I solve this?

Thanks a lot.

P.S.
Regarding the "BioJavaTutorial page - book 2: The Alignment module" (https://github.com/biojava/biojava-tutorial/blob/master/alignment/README.md), could be possible kindly to have the explanation of the pages?
Unfortunaltely it is the onyl part of the tutorial without explanation.


2016-05-24 14:12 GMT+02:00 Andrea Battistelli <andreyas.1688 <at> gmail.com>:
Hi Andreas.

Thank you very much for the fixes. Surely it will help me.
I have understood the problem you encountered with the old site.
Now I will try to apply those things.

Thanks again,
Andrea

2016-05-23 19:27 GMT+02:00 Andreas Prlic <andreas <at> sdsc.edu>:
Hi Andrea,

Sorry for the inconvenience. We had to turn the old site off, since there were too many security breaches which resulted in content spamming. This is across all bio* projects...

I fixed the links on the CookBook4 page. Checking some of the related pages, there are still a lot of formatting issues. That will take a bit to fix them all up. However, each page at the bottom has an "edit this page" link, which makes it easy to access the raw markdown content.  It is possible to see the code in a more readable way. I also fixed up this page for now: http://biojava.org/wikis/BioJava:CookBook3:PSA/

Hope that helps,

Andreas



On Sat, May 21, 2016 at 4:09 AM, Andrea Battistelli <andreyas.1688 <at> gmail.com> wrote:
Hello everyone.

I have been studying the pairwise sequence alignment concept for a project that I need to do for an exam.
In particular I am interested in the global and local alignments and all the things related to them (how they are implemented, how I can generate a paiwise alignment, ect.).
I would implement my project in Java employing indeed BioJava as starting point.
I set up correctly the Maven project on Eclipse so no problem under that point of view.

Few months ago I have seen there was all the documentation, tutorials and specifications about the BioJava project based on the Wiki pages.
I have seen it has been all migrated into a new web site but now I am not finding any documentation, tutorial or cookbook about the topic of personal interest.

- In the BioJavaTutorial page - book 2, the alignment module has no pages of explanation.
- In the CookBook4.0, all the links relative to the paiwise sequence alignment give me back an "address uninterpretable" message.
- Finally in the Wiki Pages section there are many links. In some of these there are links interesting for me but they contain not well formatted codes.

I know that it may taketime to migrate all the things but in the meantime:
Is there any possibility to have the reference to the old web site?
Anyone can help me to indicate me where I can find some good pages of documentation, tutorial, etc. about pairwise sequence alignment for BioJava?

Thanks a lot.

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l



--
-----------------------------------------------------------------------
Dr. Andreas Prlic
RCSB PDB Protein Data Bank
Technical & Scientific Team Lead
University of California, San Diego

Editor Software Section 
PLOS Computational Biology

BioJava Project Lead
-----------------------------------------------------------------------






_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Andrea Battistelli | 21 May 13:09 2016
Picon

Documentation, tutorials and cookbook about pairwise sequence alignment

Hello everyone.

I have been studying the pairwise sequence alignment concept for a project that I need to do for an exam.
In particular I am interested in the global and local alignments and all the things related to them (how they are implemented, how I can generate a paiwise alignment, ect.).
I would implement my project in Java employing indeed BioJava as starting point.
I set up correctly the Maven project on Eclipse so no problem under that point of view.

Few months ago I have seen there was all the documentation, tutorials and specifications about the BioJava project based on the Wiki pages.
I have seen it has been all migrated into a new web site but now I am not finding any documentation, tutorial or cookbook about the topic of personal interest.

- In the BioJavaTutorial page - book 2, the alignment module has no pages of explanation.
- In the CookBook4.0, all the links relative to the paiwise sequence alignment give me back an "address uninterpretable" message.
- Finally in the Wiki Pages section there are many links. In some of these there are links interesting for me but they contain not well formatted codes.

I know that it may taketime to migrate all the things but in the meantime:
Is there any possibility to have the reference to the old web site?
Anyone can help me to indicate me where I can find some good pages of documentation, tutorial, etc. about pairwise sequence alignment for BioJava?

Thanks a lot.
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Rose, Peter | 17 May 22:49 2016

Postdoctoral Fellow Position in Structural Bioinformatics at RCSB PDB/UC San Diego

Summary: We are looking for a talented and highly motivated postdoc to join our multidisciplinary team at UC San Diego.

 

The Challenge: Develop innovate analysis, data integration, query, and visualization tools for 3D biomolecular structures to help accelerate research and training in biology, medicine, and related disciplines. In this project, we will employ the latest advances in computer science to develop highly interactive features and scalable services and workflows for the RCSB PDB website (http://www.rcsb.org).

 

This position is a unique opportunity to engage in leading edge research, development, and outreach activities of the RCSB PDB with worldwide impact.

 

Qualifications: Ph.D. in one or more of the following research areas

·       Structural Bioinformatics, or related field with a focus on software development

·       Structural Biology with a focus on software development

·       Computer Science with a focus on bioinformatics algorithm development or visualization

 

Demonstrated proficiency in a high-level programming language, such as Java or Python and experience with state of the art software development tools. Experience with front-end programming languages (JavaScript) and libraries. Strong skills in problem solving and algorithm design are required. High productivity demonstrated by publications and contributions to open source software projects. Experience in the development of modern web applications, user interface design, or scientific visualization is a plus. Excellent written and oral communication skills.

 

Note, this position is reviewed annually on the basis of performance and can be renewed.

 

Our Environment:

 

The Structural Bioinformatics Group (http://bioinformatics.sdsc.edu) at the San Diego Supercomputer Center (SDSC) (http://www.sdsc.edu) is involved in research and development activities centered around 3D structures of proteins and nucleic acids, the integration of structural data with other domains such as Medicine, Genomics, Biology, Drug Discovery, and the development of scalable solution to Big Data problems in Structural Bioinformatics. Our group leads the RCSB Protein Data Bank (PDB) west-coast operations. The RCSB PDB (http://www.rcsb.org) represents the preeminent source of experimentally determined macromolecular structure information for research and teaching in biology, biological chemistry, and medicine. With over 300,000 unique users from over 160 countries around the world, the RCSB PDB is one of the leading worldwide Biological Databases. Our group is also involved in the National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative.

 

As an Organized Research Unit of UC San Diego, SDSC is a world leader in data-intensive computing and cyber infrastructure, providing resources, services, and expertise to the national research community, including industry and academia.

 

To apply, please send cover letter and resume to Dr. Peter Rose (peter.rose <at> rcsb.org).

 

--

Peter Rose, Ph.D.

Site Head, RCSB Protein Data Bank West (http://www.rcsb.org)

Principal Investigator, Structural Bioinformatics Laboratory (http://bioinformatics.sdsc.edu)

San Diego Supercomputer Center (http://www.sdsc.edu)

University of California, San Diego

+1-858-822-5497

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Jose Duarte | 4 May 02:08 2016

BioJava 4.2.1 released

BioJava 4.2.1 has been released and is available using Maven from Maven Central as well as manual download. This is purely a bug fix release correcting issues found since the 4.2.0 release.

Please see the full release notes at:




About BioJava:

BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats, and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language.

Happy BioJava-ing!

Jose
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Rose, Peter | 2 May 23:46 2016

Postdoc position: Structural Bioinformatics/Big Data at UC San Diego

Summary: We are looking for a highly motivated postdoc as part of our new project “Compressive Structural Bioinformatics” funded by the US National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative.

 

The Challenge: To enable efficient research on the rapidly growing number of 3D molecular structures of ever increasing size and complexity. Current algorithms used in Structural Bioinformatics do not scale well with the rapid growth in structural data. In this project, we will employ the latest advances in computer science to develop highly scalable, distributed parallel algorithms to overcome these limitations.

 

Qualifications: Ph.D. in one or more of the following research areas

·      Computer Science with a focus on large scale scientific computing

·      Structural Bioinformatics with a focus on new methods development

·      Computational Structural Biology with a focus on improved structure solution methods

 

Experience with the development and performance optimization of scientific software. Demonstrated proficiency in a high-level programming language, such as Java or Python and experience with state of the art software development tools. Strong skills in applied mathematics and algorithm design are required. High productivity demonstrated by publications and contributions to open source software projects. Experience in the development and application of modern distributed parallel computing environments, such as Apache Big Data projects including Apache Spark is a plus. Excellent interpersonal, written, and oral presentation skills are essential.

 

Note, this position is reviewed annually on the basis of performance and can be renewed for a maximum of two years.

 

Our Environment:

 

The Structural Bioinformatics Group (http://bioinformatics.sdsc.edu) at the San Diego Supercomputer Center (SDSC) (http://www.sdsc.edu) is involved in research and development activities centered around 3D structures of proteins and nucleic acids, the integration of structural data with other domains such as Medicine, Genomics, Biology, Drug Discovery, and the development of scalable solution to Big Data problems in Structural Bioinformatics. Our group leads the RCSB Protein Data Bank (PDB) west-coast operations. The RCSB PDB (http://www.rcsb.org) represents the preeminent source of experimentally determined macromolecular structure information for research and teaching in biology, biological chemistry, and medicine. With over 300,000 unique users from over 160 countries around the world, the RCSB PDB is one of the leading worldwide Biological Databases. Our group is involved in the National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative.

 

As an Organized Research Unit of UC San Diego, SDSC is a world leader in data-intensive computing and cyber infrastructure, providing resources, services, and expertise to the national research community, including industry and academia.

 

To apply, please send cover letter and resume to Dr. Peter Rose (pwrose <at> ucsd.edu).

 

--

Peter Rose, Ph.D.

Site Head, RCSB Protein Data Bank West (http://www.rcsb.org)

Principal Investigator, Structural Bioinformatics Laboratory (http://bioinformatics.sdsc.edu)

San Diego Supercomputer Center (http://www.sdsc.edu)

University of California, San Diego

+1-858-822-5497

 

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Andreas Prlic | 9 Apr 21:08 2016
Picon
Gravatar

biojava wiki update

Hi,

The BioJava wiki page at biojava.org went down a few days ago due to a server hack that is still being investigated. In the meanwhile we have switched the homepage 


to the developmental new homepage that is built on top of jekyll and github pages at


I just did another conversion of the mediawiki database to markdown and pushed it there. There is still a lot of cleanup required but I feel that we should move forward and work on making this the new and permanent biojava.org site.

There are still some unsolved problems: e.g. how do we host javadocs and the downloads that we used to provide. Much of the download nowadays happens via maven central, but we still have a few custom files.

Andreas

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Spencer Bliven | 15 Mar 15:58 2016
Picon
Gravatar

BioJava Wiki

I have finished upgrading the BioJava wiki. In an effort to curb spam, users are now required to log in using a Google account. This is more secure for you, since your password is seen only by Google and not by this wiki, and also greatly reduces spam accounts.

If you previously used the email address associated with your google account on this wiki, your account should be automatically connected. If not, a new user will automatically be created based on your email address. If you previously used non-google email with your account and would like to keep the same username, DO NOT log in. Instead, email spencer.bliven <at> gmail.com with your username and google-associated email for manual conversion.

Sorry for the hassle, and please let me know if anything appears broken.


-Spencer

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Jose Duarte | 11 Mar 20:53 2016

BioJava 4.2.0 released

BioJava 4.2.0 has been released and is available using Maven from Maven Central as well as through manual download.

This release contains over 750 commits from 10 contributors.


BioJava 4.2.0 offers a few new features, as well as many bug-fixes. It is the fist release that requires Java 1.7 as a minimum. 

New Features:

* Secondary structure assignment full implementation (DSSP compatible)
* New SearchIO framework to interface blast (or blast-like) searches
* Unified StructureIdentifier framework
* More complete mmCIF and chemical components parsing: bonds, sites, charges
* Many improvements in symmetry detection code


About BioJava:

BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats, and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language.

Happy BioJava-ing!

Jose

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Rose, Peter | 1 Mar 07:41 2016

[Job] Postdoctoral position: Big Data/Computational Biology at UC San Diego

Summary: We are looking for a highly motivated postdoc as part of our new project “Compressive Structural Bioinformatics” funded by the US National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative.

 

The Challenge: To enable efficient research on the rapidly growing number of 3D molecular structures of ever increasing size and complexity. Current algorithms used in Structural Bioinformatics do not scale well with the rapid growth in structural data. In this project, we will employ the latest advances in computer science to develop highly scalable, distributed parallel algorithms to overcome these limitations.

 

Qualifications: Ph.D. in one or more of the following research areas

·       Computer Science with a focus on large scale scientific computing

·       Structural Bioinformatics with a focus on new methods development

·       Computational Structural Biology with a focus on improved structure solution methods

 

Experience with the development and performance optimization of scientific software. Demonstrated proficiency in a high-level programming language, such as Java or Python and experience with state of the art software development tools. Strong skills in applied mathematics and algorithm design are required. High productivity demonstrated by publications and contributions to open source software projects. Experience in the development and application of modern distributed parallel computing environments, such as Apache Big Data projects including Apache Spark is a plus. Excellent interpersonal, written, and oral presentation skills are essential.

 

Note, this position is reviewed annually on the basis of performance and can be renewed for a maximum of two years.

 

Our Environment:

 

The Structural Bioinformatics Group (http://bioinformatics.sdsc.edu) at the San Diego Supercomputer Center (SDSC) (http://www.sdsc.edu) is involved in research and development activities centered around 3D structures of proteins and nucleic acids, the integration of structural data with other domains such as Medicine, Genomics, Biology, Drug Discovery, and the development of scalable solution to Big Data problems in Structural Bioinformatics. Our group leads the RCSB Protein Data Bank (PDB) west-coast operations. The RCSB PDB (http://www.rcsb.org) represents the preeminent source of experimentally determined macromolecular structure information for research and teaching in biology, biological chemistry, and medicine. With over 300,000 unique users from over 160 countries around the world, the RCSB PDB is one of the leading worldwide Biological Databases. Our group is involved in the National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative.

 

As an Organized Research Unit of UC San Diego, SDSC is a world leader in data-intensive computing and cyber infrastructure, providing resources, services, and expertise to the national research community, including industry and academia.

 

To apply, please send cover letter and resume to Dr. Peter Rose (pwrose <at> ucsd.edu).

 

--

Peter Rose, Ph.D.

Site Head, RCSB Protein Data Bank West (http://www.rcsb.org)

Principal Investigator, Structural Bioinformatics Laboratory (http://bioinformatics.sdsc.edu)

San Diego Supercomputer Center (http://www.sdsc.edu)

University of California, San Diego    
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Spencer Bliven | 29 Feb 18:15 2016
Picon
Gravatar

Row order from multiple sequence alignment

I'm creating a multiple sequence alignment using Alignments.getMultipleSequenceAlignment, as described in the cookbook. The problem is that the returned profile has rows in a different order than the input array of sequenced. How can I map from an index in the inputs to the rows of the profile?

Thanks,
Spencer

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l
Andreas Prlic | 12 Jan 19:06 2016
Picon
Gravatar

Re: [Biojava-dev] Increasing Java version requirement for BioJava

Based on some RCSB PDB analytics data, I'd estimate that about 2/3 of all users are already on 1.8. However there is still a significant number of users on 1.7 (somewhere around 1/4). 

As such my vote is to upgrade to 1.7 for now and move to 1.8 at some point in the future, when 1.7 usage has declined further.

Andreas






On Tue, Jan 12, 2016 at 8:56 AM, Terry Casstevens <tmc46 <at> cornell.edu> wrote:
Dear Spencer,

I'm the lead developer for the Tassel software, and we use the Biojava
libraries.  We've required Java 8 for Tassel since August 2014.  If
you change, some users will need to upgrade Java regardless.  I
recommend going to Java 8.

maizegenetics.net/tassel

Best,

Terry


On Tue, Jan 12, 2016 at 7:16 AM, Spencer Bliven
<spencer.bliven <at> gmail.com> wrote:
> There has been some informal discussion of increasing the Java version
> requirement for BioJava from the current Java 6 to either 7 or 8. It would
> be great to hear from the larger BioJava community about whether this would
> be a welcome change or not.
>
> I will start the discussion by listing what I see as the pros and cons of
> setting each version as the minimum requirement for BioJava.
>
> Java 6:
> ---------
> + Greatest backwards compatibility
> - No updates since Feb 2013*
> - Some dependencies are not compatible, requiring the use of older versions
> (currently only log4j, but could be others in the future)
>
> Java 7:
> ---------
> + Most popular version currently
> + Some minor language features added
> - No updates since Apr 2015*
>
> Java 8:
> ---------
> + Tons of awesome new programming features, e.g. lambda functions
> + Only version supported by Oracle
> - Not available for many systems
>
> * Note that all versions are backwards compatible, so you can always use a
> more up-to-date JDK for downstream projects. Running outdated software is
> generally a bad idea, so users are encouraged to use the Java 8 JRE,
> regardless of the minimum BioJava requirement.
>
>
> One thing I would like to get a sense of is how many BioJava users are still
> using 6 and 7. <at> emckee2006 mentioned on github that they still have some
> servers on 6. I know that getting Java 8 installed on CentOS is rather
> painful, so probably some users haven't yet updated to 8.
>
> Let me know if I missed anything!
>
>
> Cheers,
>
> Spencer
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev <at> mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
_______________________________________________
biojava-dev mailing list
biojava-dev <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-dev



--
-----------------------------------------------------------------------
Dr. Andreas Prlic
RCSB PDB Protein Data Bank
Technical & Scientific Team Lead
University of California, San Diego

Editor Software Section 
PLOS Computational Biology

BioJava Project Lead
-----------------------------------------------------------------------
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biojava-l

Gmane