HP Insight Online

12th Annual Bioinformatics Open Source Conference
BOSC 2011
Vienna, Austria, July 15-16, 2011
http://www.open-bio.org/wiki/BOSC_2011
Welcome to BOSC 2011! The Bioinformatics Open Source Conference, established in 2000, is
held every year as a Special Interest Group (SIG) meeting in conjunction with the Intelligent
Systems for Molecular Biology (ISMB) Conference.
BOSC is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group
dedicated to promoting the practice and philosophy of Open Source software development
within the biological research community.
We have an exciting lineup of topics and speakers, including keynote speakers Matt Wood (the
Technology Evangelist for Amazon Web Services) and Lawrence Hunter (director of the
Computational Bioscience Program at the University of Colorado, and one of the founders of
ISMB). A major theme of BOSC this year is cloud-based approaches to improving software and
data accessibility. Other sessions focus on approaches to organizing and analyzing highthroughput 'omics and next-generation sequencing data; projects that involve the semantic web;
and data visualization tools. The second day features a panel discussion about the challenges
of inter-institutional collaborations, which are very relevant to many of us who work on Open
Source projects.
This year, BOSC includes posters as well as talks. There are three scheduled poster sessions.
We have space for several last-minute posters in addition to those listed in the program.
Also new is the chance to vote for your favorite BOSC talk. The talk with the most votes will be
announced at the Awards session that closes the meeting. Please fill out your ballot and return
it to us before the panel discussion (4:30 on the second day).
Thanks to generous support from Eagle Genomics and another sponsor, we were able to offer
Student Travel Awards to the authors of the three best student abstracts. Congratulations to the
student winners: Florian P. Breitwieser, Kerensa McElroy, and Konstantin Okonechnikov.
BOSC is a community effort. We thank the organizing committee, the program committee, the
session chairs, and the ISMB SIG chair for their help. If you are interested in participating in the
organization of BOSC 2012 (which will take place in July 2012 in Long Beach, California) please
email bosc@open-bio.org.
2011 Organizing Committee:
Nomi Harris (Co-Chair), Peter Rice (Co-Chair), Brad Chapman, Peter Cock, Kam Dahlquist,
Erwin Frise, Darin London, Ron Taylor
2011 Program Committee:
Jan Aerts, Enis Afgan, Tiago Antao, Kazuharu Arakawa, Brad
Chapman, Peter Cock, Kam Dahlquist, Heiko Dietze, Thomas Down,
Erwin Frise, Cyrus Harmon, Nomi Harris, Michael Heuer, Richard
Holland, Alex Lancaster, Hilmar Lapp, Heikki Lehvaslaiho, Darin
London, Scott Markel, HervГ© MГ©nager, Dave Messina, Jim Procter,
Peter Rice, Olivier Sallou, Martin Senger, William Spooner, Ronald
Taylor, Mark Wilkinson, Christian Zmasek
BOSC 2011 Schedule
Day 1 (Friday, July 15, 2011)
Time
Title
9:00-9:15
Introduction
Speaker or
Session Chair
Nomi Harris (CoChair, BOSC 2011)
10:15-10:45
Keynote: The role of openness in knowledge-based
systems for biomedicine
Coffee Break
10:45-12:30
Session: Genome Content Management
Chair: Peter Rice
Konstantin
Okonechnikov
Thomas Down
12:25-12:32
12:30-2:00
Unipro UGENE: an open source toolkit for complex
genome analysis
Exploring the genome with Dalliance
InterMine - Using RESTful Webservices for
Interoperability
easyDAS: Automatic creation of DAS servers
Enacting Taverna Workflows through Galaxy
Mobyle 1.0: new features, new types of services
BioMart 0.8 offers new tools, more interfaces, and
increased flexibility through plug-ins
Running Workflows Through Taverna Server
Lunch
1:30-2:30
Poster Session
2:30-3:30
Session: Visualization
Chair: Jan Aerts
2:30-2:50
Michael Smoot
3:30-4:00
Cytoscape 3.0: Architecture for Extension
Applying Visual Analytics to Extend the Genome
Browser from Visualization Tool to Analysis Tool
WebApollo: A web-based sequence annotation editor
for community annotation
The isobar R package: Analysis of quantitative
proteomics data
Coffee Break
4:00-5:30
Session: Next-Generation Sequencing
9:15-10:15
10:45-11:05
11:05-11:25
11:25-11:45
11:45-11:55
11:55-12:05
12:05-12:15
12:15-12:25
2:50-3:10
3:10-3:20
3:20-3:30
4:00-4:20
4:20-4:40
4:40-4:50
4:50-5:00
Stacks: building and genotyping loci de novo from
short-read sequences
Large scale NGS pipelines using the MOLGENIS
platform: processing the Genome of the Netherlands
Bio-NGS: BioRuby plugin to conduct programmable
workflows for Next Generation Sequencing data
Goby framework: native support in GSNAP, BWA and
IGV 2.0
2
Larry Hunter
Alex Kalderimis
Bernat Gel
Kostas Karasavvas
HervГ© MГ©nager
Junjun Zhang
Donal Fellows
Jeremy Goecks
Nomi Harris
Florian P.
Breitwieser
Chair: Thomas
Down
Julian Catchen
Morris Swertz
Raoul Bonnal
Kevin C. Dorff
Time
Speaker or
Session Chair
Title
5:30-6:30
A Scalable Multicore Implementation of the TEIRESIAS
Algorithm
Biomanycores, open-source parallel code for manycore bioinformatics
GemSIM: General, Error-Model Based Simulator of
next-generation sequencing
Poster Session and BOFs
7:00
Optional dinner for BOSC attendees
5:00-5:10
5:10-5:20
5:20-5:30
Frank Drews
Jean-FrГ©dГ©ric
Berthelot
Kerensa McElroy
Location TBA
Day 2 (Saturday, July 16, 2011)
Speaker or
Session Chair
Nomi Harris and
Peter Rice
Time
Title
8:45-8:50
Announcements
8:50-9:50
Keynote: Into the Wonderful
Matt Wood
9:50-10:15
Securing and sharing bioinformatics in the cloud
Richard Holland
10:15-10:45
Coffee Break
10:45-12:30
Session: Cloud Computing
Chair: Brad
Chapman
10:45-11:05
Mygene.info: Gene Annotation as a Service - GAaaS
Chunlei Wu
11:05-11:25
Cloud BioLinux: open source, fully-customizable
bioinformatics computing on the cloud for the genomics
community and beyond
Konstantinos
Krampis
11:25-11:35
OBIWEE : an open source bioinformatics cloud
environment
Olivier Sallou
11:35-11:45
SeqWare: Analyzing Whole Human Genome Sequence
Data on Amazon's Cloud
Brian O'Connor
11:45-12:05
Sequencescape - a cloud enabled Laboratory
Information Management Systems (LIMS) for second
and third generation sequencing
Lars Jorgensen
12:05-12:15
Enabling NGS Analysis with(out) the Infrastructure
Enis Afgan
12:15-12:25
12:30-2:00
Hadoop-BAM: A Library for Genomic Data Processing
Lunch
Aleksi Kallio
1:00-2:00
Poster Session
2:00-3:30
2:00-2:10
2:10-2:20
2:20-2:30
Session: Semantic Web and Misc. Open Source
Projects
SADI for GMOD: Bringing Model Organism Data onto
the Semantic Web
Scufl2: Because a workflow is more than its definition
OntoCAT - an integrated programming toolkit for
common ontology application tasks
3
Chair: Peter Cock
Ben Vandervalk
Stian Soiland-Reyes
Tomasz Adamusiak
Time
Speaker or
Session Chair
Title
2:50-3:10
3:10-3:20
3:20-3:27
3:27-3:34
3:30-4:00
Debian Med: individuals' expertise and their sharing of
package build instructions
The BALL project: The Biochemical Algorithms Library
(BALL) for Rapid Application Development in Structural
Bioinformatics and its graphical user interface
BALLView
Biopython Project Update
What's new with GMOD
Exploring human variation data with Clojure
Coffee Break
4:00-4:30
Session: Misc. Open Source Projects
2:30-2:50
4:00-4:10
4:10-4:20
4:20-4:30
EMBOSS: New developments and extended data
access
G-language Project: the last 10 years and beyond
A Framework for Bioinformatics on the Microsoft
Platform
4:30-5:20
Panel: Multi-Institution Collaboration
5:20-5:30
Presentation of awards
5:30-6:30
BOFs
Steffen Möller
Andreas Hildebrandt
Peter Cock
Scott Cain
Brad Chapman
Chair: Jim Procter
Peter Rice
Kazuharu Arakawa
Simon Mercer
Moderator: Brad
Chapman
Panelists: Richard
Holland, Hilmar
Lapp, Jean
Peccoud, Peter Rice
Nomi Harris
* Any last-minute schedule updates will be posted at http://www.open-bio.org/wiki/BOSC_2011_Schedule
4
Keynote Speakers
Lawrence Hunter
Lawrence Hunter is Professor of Pharmacology and Computer Science at the University of Colorado
and director of the Computational Bioscience Program at the School of Medicine. He is one of the
founders of ISMB, a fellow of the ISCB, and is well known for contributions in a broad range of
problems in computational biology.
Dr. Hunter will be giving a talk entitled The role of openness in knowledge-based systems for
biomedicine. Knowledge-based approaches to the analysis to genome-scale data require the
extraction, sharing and use of very large amounts of knowledge about biomedicine. Developments
such as the open source software movement, the Open Biomedical Ontologies, Semantic Web
standards such as OWL and SPARQL, and the spread of open access publishing are creating the
potential for powerful knowledge-based computer systems that may play an important role in the
future of biomedical research. Yet several critical challenges remain before this vision can be
realized. Dr. Hunter will discuss some relevant recent resources developed in my lab, some of the
socio-political barriers that remain, and what you can do to overcome them.
Matt Wood
As the Technology Evangelist for Amazon Web Services, Matt discusses the technical and
organisational aspects of cloud computing across the world. With a background in the life sciences,
Matt is interested in helping teams of all sizes bring their ideas to life through technology. Before
joining Amazon he built web-scale search engines at Cornell University, sequenced DNA in Hinxton
and developed scientific software in Cambridge. He is a frequent speaker at international
conferences, a blogger, published author and an advocate of research productivity.
Matt's talk, Into the Wonderful, will feature a discussion of the constraints of working with the size,
scope and complexity of modern research data, and how cloud computing can help accelerate
academic research. We'll take a look at the current state of the art, the role cloud computing plays in
increasing the impact of open source tools, the use of public hosted data in the cloud and how
academic cloud platforms can help promote collaboration, reproducibility and reuse across
disciplines.
Audience Favorite Talk
This year, BOSC attendees can vote for their favorite talk. Ballots (included in the printed
program) must be turned in before the panel session on the second day at 4:30pm. The talk
that receives the most votes will be announced in the closing session.
O|B|F Membership
Professionals, scientists, students, and others active in the Open Source Software arena in the
life sciences are invited to join the Open Bioinformatics Foundation (the O|B|F). The
membership body was formally established at the 2005 Board of Directors meeting. As laid out
in the bylaws, officers in the Board of Directors are elected by the membership among
nominees, and candidates for future Directors will be nominated from the membership when
seats are added or a term expires.
The eligibility criteria are met by anyone who is "interested in the objectives of the OBF", and
there are no dues at present. You can join the O|B|F at BOSC by filling out the application form
included in this program, signing it, and giving it to a Board member. You may also e-mail the
5
scanned form to the current Parliamentarian, Hilmar Lapp, at hlapp@drycafe.net. (The O|B|F is
legally required to have signatures on record for all members.)
If you are interested in meeting and talking to some of the O|B|F Directors and members, please
join us at the no-host dinner (location TBA) the evening of the first day of the conference.
Talk and Poster Abstracts
Talk abstracts are included in this program in the order in which they will be presented at the
conference. Some, but not all, of the talks will also be presented as posters. Abstracts for
posters that are not talks appear after the talk abstracts, ordered by poster number.
All posters should be put up before the first poster session (2:00 on the first day). After that
time, unused poster slots will be made available for last-minute posters. Please be sure to use
the poster slot that is assigned to you.
6
O|B|F - Open Bioinformatics Foundation
Membership Application
I wish to apply for membership in the Open Bioinformatics Foundation (O|B|F).
First and Last Name: ________________________________________________________
Street Address: _____________________________________________________________
City, State, Zip Code: ________________________________________________________
Country of Residence: _______________________________________________________
Email Address: ______________________________________________________________
All fields are mandatory. The O|B|F will treat all personal information as strictly confidential and will
not share personal information with anyone except members of the O|B|F Board of Directors, or
entities or persons appointed by the Board to administer membership communication. This may be
subject to change; please see below.
I am an attendee of BOSC 201___:
! Yes
! No
If you answered No, please state why you meet the membership eligibility
requirement of being interested in the objectives of the O|B|F:
(Use back of page if you need more space)
I understand that membership rights and duties are laid down in the O|B|F
Bylaws which may be downloaded from the O|B|F homepage at
http://www.open-bio.org/. I understand that if the O|B|F’s privacy statement
changes I will be notified at my email address (as known to O|B|F), and if I do
not express disagreement with the proposed change(s) by terminating my
membership within 10 days of receipt of the notification, I consent to the
change(s).
Signature
Bioinformatics Open Source Conference 2011
7
BOSC 2011
Talk and Poster Abstracts
8
!"#$%&'!()*)+',"'&$-"'.&/%0-'1&&23#1'4&%'0&5$2-6'7-"&5-',",28.#.'
!"#$%&#%'#()*"#+,-#'*"./0()12&(3"1"$".&40(51+6+7(8&91&:".40(;'*-&'1(<=9$".4
/(
>"."$'?'9$*(@%&%+(A#'.+9$'%70(B=$$'&
*C"*"#+,-#'*".D2:&'1C,":
4(
E+#%+9("F(G#F"9:&%'"#(H+,-#"1"2'+$(IA#'J9"I0(>"."$'?'9$*0(B=$$'&
J9"K+,%(L+?M$'%+N(-%%ONPP=2+#+C=#'O9"C9=(
@"=9,+(,"Q+N(-%%O$NPP=2+#+C=#'O9"C9=P$.#P=2+#+P%9=#*(
R',+#$+N(3JR(.4
A#'O9" (A3S>S('$(&#("O+#M$"=9,+($"F%L&9+(O&,*&2+(F"9(:"1+,=1&9(?'"1"2'$%$C(H-+(:&'#(2"&1("F(
%-+(O9"K+,%('$(%"('#%+29&%+(O"O=1&9(?'"'#F"9:&%',$(%""1$(&#Q(&12"9'%-:$(O9".'Q'#2(?"%-(29&O-',&1(
=$+9(&#Q(,"::&#Q(1'#+('#%+9F&,+$C(A3S>S(&11"L$(,"#$%9=,%'#2(9+=$&?1+(L"9*F1"L$(&#Q(9=##'#2(
%-+:("#(&(1",&1(:&,-'#+("9('#(TJE(+#.'9"#:+#%C(H-+($"F%L&9+(O&,*&2+('$(L9'%%+#('#(EUU(=$'#2(
%-+(V%(F9&:+L"9*(&#Q('#,"9O"9&%+$(&(?='1%M'#(O1=2'#($7$%+:0(L-',-(&11"L$(+6%+#Q'#2(%-+(O9"K+,%(
L'%-(#+L(F=#,%'"#&1'%7C(A3S>S('$(F9++17(&.&'1&?1+(F"9(;@(W'#Q"L$0(R'#=6(&#Q(;&,()@(XC(
H-+('#%+29&%+Q(%""1$(&#Q(&12"9'%-:$($"1.+(&(.&9'+%7("F(?'"'#F"9:&%',$(%&$*$(%-&%('#,1=Q+(&(O&%%+9#(
$+&9,-0(1",&1($+Y=+#,+( &1'2#:+#%(Z@:'%-MW&%+9:&#[0 ( :=1%'O1+($+Y=+#,+(&1'2#:+#%(Z;A@ERS0(
!&1'2#[0(T;;SB0(9+$%9',%'"#($'%+$(&#&17$'$0($-"9%(9+&Q(&1'2#:+#%(Z\"L%'+[(&#Q(:&#7("%-+9$C(5(
*+7(&Q.&#%&2+("F(A3S>S('$(%-&%(%-+(:"$%("F(%-+(&12"9'%-:$(&9+('#%+29&%+Q('#%"(%-+($"=9,+(O&,*&2+(
&#Q ( :"Q'F'+Q ( %" ( =$+ ( '#%+9#&1 ( A3S>S ( Q&%& ( :"Q+1C ( H-'$ ( &11"L$ ( "#+ ( %" ( &."'Q ( :&#=&1 ( Q&%&(
,"#.+9$'"# ( ?+%L++# ( %-+ ( %""1$] ( '#O=% ( &#Q ( "=%O=%C ( @":+ ( "F ( %-+ ( &12"9'%-:$ ( &9+ ( "O%':'^+Q ( F"9(
:=1%',"9+(+#.'9"#:+#%(&#Q(-&.+(3JA(':O1+:+#%&%'"#$C(A3S>S($=OO"9%$(9+&Q'#2(&#Q(L9'%'#2(F"9(
:"9+(%-&#(4_(?'"1"2',&1(Q&%&(F"9:&%$(&#Q(&11"L$(.'$=&1'^'#2($=,-(?'"1"2',&1("?K+,%$(&$(&##"%&%+Q(
`>5PB>5(&#Q(O9"%+'#($+Y=+#,+0(:=1%'O1+($+Y=+#,+(&1'2#:+#%0(:&,9":"1+,=1&9(a`($%9=,%=9+(&#Q(
`>5(&$$+:?17C(A3S>S('$(&1$"(,&O&?1+(%"(9+Y=+$%(*+7(?'"1"2',&1("#1'#+(Q&%&?&$+$($=,-(&$(>E\G(
3+#?&#*0(J`\(&#Q("%-+9$C
)#+("F(A3S>S(:&'#(,":O"#+#%$('$(%-+(W"9*F1"L(`+$'2#+90(&(.'$=&1(%""1(F"9(?='1Q'#2(,":O1+6(
&#&17$'$(O'O+1'#+$C(@'#,+(A3S>S ('$(&($%&#QM&1"#+(&OO1',&%'"#("#+(Q"+$#]%(#++Q(%"('#$%&11(&#7(
&QQ'%'"#&1(,":O"#+#%$("9(=O1"&Q(&#Q(Q"L#1"&Q(&#7(Q&%&('#("9Q+9(%"(=$+(%-+(W"9*F1"L(`+$'2#+9C(
H-'$(F&,%"9(&1"#2(L'%-('#%='%'.+(&#Q(=$+9MF9'+#Q17('#%+9F&,+(:&*+$(%-+(+#%97(%-9+$-"1Q(F"9(#+L(
=$+9$ ( 1"L+9 ( '# ( ,":O&9'$"# ( %" ( $':'1&9 ( O9"K+,%$ ( Z$=,- ( &$ ( H&.+9#& ( "9 ( 3&1&67[C ( T"L+.+90 ( F"9(
&Q.&#,+Q(=$+9$(%-+(W"9*F1"L(`+$'2#+9(O9".'Q+$(,&O&?'1'%'+$(%"(,9+&%+(,=$%":(L"9*F1"L(+1+:+#%$(
+'%-+9('#(EUU("9('#(V%@,9'O%(O9"29&::'#2(1&#2=&2+(&#Q(,=$%":'^+(.&9'"=$(&$O+,%$("F($,-+:&(
+6+,=%'"#C(S.+97(L"9*F1"L($,-+:&(&11"L$($+%%'#2($O+,'&1(&1'&$+$(F"9('%$(O&9&:+%+9$b(%-+$+(&1'&$+$(
,&#(?+(=$+Q(%"(O9".'Q+(,=$%":(&92=:+#%$(L-+#(9=##'#2(%-+($,-+:&(&$(&(,"::&#Q(1'#+(%""1C(@=,-(
F+&%=9+(:&*+$('%(+&$7(%"(9=#(L"9*F1"L$('#(TJE(+#.'9"#:+#%("9('#,1=Q+(%-+:('#%"(=$+9$]($,9'O%$C(
H-+(W"9*F1"L(`+$'2#+9(-&$(&#(+6O+9':+#%&1($=OO"9%(F"9(1&=#,-'#2(,":O=%&%'"#&1(L"9*F1"L$("#(
5:&^"#(SE4($+9.+9$C
A3S>S('$(&(O&9%("F("FF','&1(A?=#%=(&#Q(<+Q"9&(R'#=6(Q'$%9'?=%'"#$(&#Q(-&$(&(29"L'#2(,"::=#'%7(
"F(=$+9$(&9"=#Q(%-+(L"91QC(H-+(O9"K+,%('$(,"#$%&#%17(+."1.'#2C()#+("F(%-+(:&K"9(O9'"9'%'+$(,=99+#%17(
'$(%-+(&?'1'%7(%"(L"9*(L'%-(-=2+(Q&%&$+%$C(H-'$('#,1=Q+$('#%+29&%'"#("F(#+6%M2+#+9&%'"#($+Y=+#,'#2(
&#&17$'$ ( &#Q ( .'$=&1'^&%'"# ( :+%-"Q$0 ( %-=$ ( "#+ ( "F ( %-+ ( :&'# ( F=9%-+9 ( Q+.+1"O:+#% ( Q'9+,%'"#$ ( '$(
A3S>S(5$$+:?17(\9"L$+9C
Poster 1
!"#$%&'()*+,-*)-(%.-*/'+,*01$$'1(2'!"#$%&'()'*#+,'''
'%,.'!/$'0)'1)'2344%5.6
'7899:#$8'!53&;<=>?@'A35.#,'B,&;/;3;8C'?,/D85&/;E'#F '=%$45/.G8C'?@
6
'7899:#$8'!53&;'H%,G85'B,&;/;3;8C'=%$45/.G8C'?@
15#I8:;'+84&/;8J'";;KJ<<+++)4/#.%99/%,:8)#5G<
H#35:8'.#+,9#%.J'";;KJ<<G/;"34):#$<.%&$#;"<.%99/%,:8
L/:8,&8J'MH*
A8,#$8 ' 45#+&85& ' %58 ' % ' D/;%9 ' K%5; ' #F ' ;"8 ' G8,#$/:& ' +#5NF9#+) ' ' OE84%99/,G ' .%;% ' "%& ' K5#D8.'
/,./&K8,&%498'F#5'&K#;;/,G'3,8PK8:;8.':#5589%;/#,&C'F#5$39%;/,G',8+'"EK#;"8&8&C'#5'&/$K9E'&%,/;EQ
:"8:N/,G',8+9E':#998:;8.'.%;%)''!"8'&8R38,:/,G'58D#93;/#,'"%&'$%.8';"/&'8D8,'$#58'/$K#5;%,;)'
!#.%EC ' 8D8, ' &$%99 ' 9%4& ' %58' 5#3;/,89E' %KK9E/,G ' ;8:",/R38& ' 9/N8 ' ="B1Q&8RC ' >S(Q&8RC ' #5' 8P#$8'
58&8R38,:/,GC'#F;8,'+/;"'9/$/;8.'4/#/,F#5$%;/:'&3KK#5;)''!#'N88K'K%:8'+/;"';"8&8';58,.&C'+8',88.'
K#+85F39 ' D/&3%9/T%;/#, ' %,. ' .%;% ' /,;8G5%;/#, ' ;##9& ' +"/:" ' $%N8 ' 9#%./,G ' 9%5G8 ' .%;%&8;& ' U ' 8/;"85 '
./58:;9E'#5'%&'%,'%3;#$%;/:'F/,%9'&8;'/,'%,%9E&/&'+#5NF9#+&'U'%&'&/$K98'%&'K#&&/498)
*%99/%,:8'V-W'/&'%',8+'G8,#$8'45#+&85'+"/:"'$%N8&'%GG58&&/D8'3&8'#F '2!XLY';8:",#9#G/8&';# '
#FF85'%'"/G"'98D89'#F '/,;85%:;/D/;E'%,.'K#+85F39',%D/G%;/#,'%,.'8PK9#5%;/#,';##9&'+"/98'53,,/,G'
+/;"/, ' % ',#5$%9 '+84 ' 45#+&85) ' '!"8' ./&K9%E' :%,' 48 'F5889E '&:5#998. '%,. 'T##$8. '+/;" '$#3&8'
G8&;358&'%,.'N8E4#%5.':#,;5#9&)'H"#5;:3;&'F#5',%D/G%;/#,'48;+88,'F8%;358&'%99#+&'&K%5&8'.%;%&8;&'
;#'48'5%K/.9E'8PK9#58.)'78'"%D8'%.#K;8.'%'F399E'./&;5/43;8.'%KK5#%:"C'+/;"',#'$%&;85'&85D85'
"#&;/,G';"8'K5/$%5E'.%;%&8;)''*%;%':%,'48'/,;8G5%;8.'3&/,G'8/;"85';"8'&;%,.%5.'*(H'K5#;#:#9'V6WC '
#5'F5#$'.%;%'K%:N%G8.'/,'%'&;%,.%5.'/,.8P8.'4/,%5E'F#5$%;'Z:3558,;9E'M/G7/GC'M/GM8.'V[WC'%,.'
M(X'V\W'%58'&3KK#5;8.]'#,'%'&;%,.%5.'+84'&85D85)' '!"8'9%;;85'#K;/#,'$%N8&'/,;8G5%;/,G',8+'
G8,#$/:'.%;%&8;&C'K%5;/:39%59E';"8'58&39;&'#F '"/G"Q;"5#3G"K3;'&8R38,:/,G'8PK85/$8,;&C'D85E'R3/:NC'
%,.'%::8&&/498';#'^#::%&/#,%9'4/#/,F#5$%;/:/%,&_'+"#'#F;8,'"%D8'9/$/;8.'&E&%.$/,'8PK85/8,:8'U'
%,. ' 9/$/;8. ' 8,;"3&/%&$ ' F#5 ' /,&;%99/,G ' 8P;5% ' &85D85 ' &#F;+%58` ' ' B; ' %9&# ' %99#+& ' /,&;%,; ' %::8&& ' ;# '
.%;%&8;& ' /, ' ;"/& ' F#5$%; ' F5#$ ' 58$#;8 ' +84 ' &85D85& ' +/;"#3; ' ;/$8Q:#,&3$/,G ' .#+,9#%./,G ' Z8)G)'
OS=a*O'.%;%&8;&'F5#$'?=H=])'?,3&3%99E'F#5'%'+84Q4%&8.'%KK9/:%;/#,C'*%99/%,:8'%9&#'%99#+&'
D/8+/,G'#F '.%;%'./58:;9E'F5#$'9#:%9'./&N'#,'E#35'#+,'$%:"/,8)
V-W'*#+,'!(C'1//K%5/'XC'2344%5.'!0)'!"##$"%&'()$%*'+"&*$,')-'%./'),$'0$%-).%)*1')0'2'M/#/,F#5$%;/:&'
Z6b--]'OK34'%"8%.'#F 'K5/,;
V6W ' 08,N/,&#, ' ( ' 8; ' %93' 4%*'-+"*$%- ) 2$.#.-$&"# ) 5"*" ) 6 ) *1' ) !$7*+$28*'5 ) 9%%.*"*$.% ) :;7*'/) ' MX='
M/#/,F#5$%;/:&'Z6bbc]'dZ&3KK9c]JH[
V[W'@8,;'70'8;'%9) '<$-=$-)"%5)<$-<'5()'%"2#$%-)2+.07$%-).> )#"+-')5$7*+$28*'5)5"*"7'*73 )M/#/,F#5$%;/:&'
Z6b-b]'6eJ66b\Q66bf
V\W'L/'2'8;'%9) '?1'):'@8'%&')9#$-%/'%*AB"C)>.+/"*)"%5):9B*..#73' M/#/,F#5$%;/:&'Z6bbd]'6YJ6bfcQ
6bfd
Poster 2
!"#$%&'"$()(*+'",(-./0123(4$5+$%6'7$+(18%(!"#$%89$%:5'3'#;
<*0=>-/?(
!"#$%&'"(#)*+*,-%.'/*#"'%012'/3-%!()*'/%4'))-%5#)6*3%43/2)*/3-%71%8#/691#/-%:*;#%
<9/#-%=*>?')(%5+*2?-%='(#;%@2ABC/-%D1"*#%51""*E'/-%F3,%:*>;"#+%GHIJ
<@@!A!<0!>B?
K/*E#),*29%3L%4'+M)*(6#-%'"#$
% %N
% %L"9+*/#
%
%O%%3)6%
C->D.E0(*-A?(
%PPP%O%%*/2#)+*/#%O%%3)6%
/>*-E.(E>F.?
%?22B%QRR
% %PPP
%
%O%%*/2#)+*/#%O%%3)6%R%%M)3P,#)
%
%-%?22B
% %QRR
% %PPP
%
%O%%*/2#)+*/#%O%%3)6%R%%P*;*
% %R%%5ST4?#>;312
%
%
A!E.B/.?
<FH<
@*BF!BG?
U#"">3+#%V)1,2%'/(%2?#%TI7RT7F=IO
I/2#):*/#%W%!/%XB#/%531)>#%.'2'WU')#?31,#%'/(%Y1#)9%*/2#)L'>#
%I/2#):*/#%%%*,%'/%'>2*E#"9%(#E#"3B#(%3B#/W,31)>#%('2'WP')#?31,*/6%,9,2#+-%P?*>?%'""3P,%931%23Q%
! I/2#6)'2#%('2'%L)3+%(*E#),#%,31)>#,O
! =1/% >3+B"*>'2#(-% >1,23+% Z1#)*#, %3E#)% +1"2*B"# % ('2',#2, %Z1*>;"9 % M9%2';*/6% '(E'/2'6# % 3L%
Z1#)9%3B2*+*,'2*3/O
! U)*2#%B)#W(#L*/#(%Z1#)9%2#+B"'2#,%*/%'%6)'B?*>'"%*/2#)L'>#O
! KB"3'(%'/(%'/'"9,#%"*,2,%3L%('2'O
! [$B"3)#%'/(%E*,1'"*,#%('2'%2?)316?%'%P#M%*/2#)L'>#O
! !>>#,,%('2'%B)36)'++'2*>'""9%2?)316?%B1M"*,?#(%!HI,O
U?*"# % 6#/#)*> % */ % *2, % 1/(#)"9*/6 % (#,*6/- % I/2#):*/# % >3+#, % P*2? % ' % P*(# % )'/6# % 3L % B3P#)L1"%
M*3*/L3)+'2*>,W,B#>*L*>%233",%23%?'/("#%2',;,%*/>"1(*/6%('2'%"3'(*/6%GL)3+%,2'/(')(%L3)+'2,%,1>?%',%
F88\-%8!5V!-%>?'(3-%',%P#""%',%K/*B)32%'/(%[/,#+M"%('2'J-%'/(%('2'%'/'"9,*,%G*/>"1(*/6%,2'2*,2*>'" %
#/)*>?+#/2%'/'"9,*,%'/(%*/2#)'>2*3/%E*,1'"*,'2*3/JO
41))#/2%I+B"#+#/2'2*3/,%]%I/2#):3(
X)*6*/'""9%(#E#"3B#(%L3)%8"9:*/#-%2?#)#%')#%/3P%+'/9%I/2#):*/#%*+B"#+#/2'2*3/,%*/%3B#)'2*3/O%I/%
B')2*>1"')- % I/2#):*/# % *, % 1,#( % M9 % ,#E#)'" % :3(#" % X)6'/*,+ % .'2'M',#, % G:X.,J- % */>"1(*/6 %5F.%
G^#',2:*/#J-%=F.%G='2:*/#J-%'/(%_8IT%G_8IT+*/#J-%2?#%L31/(*/6%+#+M#),%3L%2?#%I/2#):X.%B)3`#>2%
G:X.%+*/#,%L3)%P3)+-%+31,#%'/(%5O%B3+M#%')#%>1))#/2"9%*/%(#E#"3B+#/2JO%
I/2#):*/# % *, % '",3 % 2?# % #/6*/# % M#?*/( % ,#E#)'" % (#(*>'2#( % ('2'W+*/*/6 % B)3`#>2,- % */>"1(*/6 %
+#2'M3"*>:*/#%'/(%2?#%+3([/>3(#
%
%%%%B)3`#>2%O%!%/1+M#)%3L%B)*E'2#%'/(%*/(#B#/(#/2%+*/#,%'",3%#$*,2O
I/2#)3B#)'M*"*29
V?# % */>)#',*/6 % /1+M#) % 3L % I/2#):*/# % *+B"#+#/2'2*3/, % ?', % '""3P#( % L3) % 6)#'2#) % */2#)3B#)'2*3/-%
P?*>?%*,%+#(*'2#(%2?)316?%P#M,#)E*>#,%3E#)%'%B1M"*,?#(%=[5VL1"
%
%%% %!HI%O%V?*,%'""3P,%,*2#,%23%Z1#)9%'%
+*/#a,%('2'-%'/(%L3)%'/9%P#M,*2#%23%#+M#(%2'M"#,%3L%('2'%/'2*E#"9%*/%2?#*)%B'6#O%V?#%L31/('2*3/'"%
2#>?/3"36*#, % M#?*/( % 2?*, % ')# % P#M,#)E*>#, % 2?'2 % )#21)/ % D5XT- % D5XTH % L3) % >)3,,W(3+'*/%
>3++1/*>'2*3/-%'/(%'%D'E'5>)*B2%"*M)')9%L3)%P#M%>"*#/2%'>>#,,O
U#M,#)E*>#%*/2#)3B#)'M*"*29%#/'M"#,%+*/#,%23%(*,B"'9%?3+3"369%('2'%'/(%B)3E*(#%"*/;,%23%('2'%*/%
(*LL#)#/2%,B#>*#,O%:X.,%>'/%'",3%)#B"'>#%2?#*)%>1,23+%('2'M',#%Z1#)*#,%P*2?%+3)#%B#)L3)+'/2 %
I/2#):*/#%3/#,O%[/(%1,#),%>'/%'123+'2#%P3);L"3P,%1,*/6%>"*#/2%"*M)')*#,O%:*/#,%>'/%*/2#6)'2#%P*2?%
P3);WL"3P%'/(%('2'W'/'"9,*,%B)3`#>2,-%,1>?%',%F'"'$9O
Poster 3
easyDAS: Automatic creation of DAS servers
Bernat Gel1,2в€— , Andrew M Jenkinson3 , Rafael C Jimenez3 , Xavier Messeguer Peypoch1 and
Henning Hermjakob3
1 Software Department, UPC-BarcelonaTech, Barcelona, Spain.
2 Hereditary Cancer Program, Institute for Personalised and Predictive Medicine of Cancer, Badalona,
Spain.
3 European Bioinformatics Institute, Hinxton, Cambridge, UK.
в€— E-mail: bgel@lsi.upc.edu
Project URL: http://www.ebi.ac.uk/panda-srv/easydas/
Code URL: http://code.google.com/p/easydas/
License: GNU Lesser General Public License (LGPL)
Abstract
Background: The Distributed Annotation System (DAS) has proven to be a successful way to publish
and share biological data. Although there are more than 1000 registered servers, setting up a DAS server
involves a fair amount of work and requires some specific skills such as programming and server managing
and a reliable infrastructure supporting it.
There are many research groups who will not have easy access to people proficient enough in programming
to implement the required data access layer or able to set up and manage an internet accessible machine
to host the server. Those difficulties can represent too big an overhead for many data generators when
publishing their data, particularly for those with small data sets.
Given the clear advantage that the generalized sharing of relevant biological data is for the research
community it would be desirable to convert all those data generators into data providers, increasing the
amount and variety of the biological data available to the scientific community and contributing to the
collective annotation of biological sequences.
Results: easyDAS is a web-based and ready-to-use system for biological data sharing using DAS. The
user only needs to upload a text file (GFF or CSV) with the data into easyDAS and set a few configuration
options with a wizard-like interface, mainly stating what the data represents, and the system will automatically create a new DAS source serving that data. Although the DAS source will be automatically created
and managed by easyDAS, the user will retain full control over the data and will be able to modify or delete
it at any point using the same web interface.
Sources created with easyDAS are fully compliant with the latest specification of the standard, DAS
1.6, and can be integrated on any DAS client. easyDAS encourages exhaustive meta-annotation of the data
source offering both a list of available coordinate systems and an ontology browser interface based on the
Ontology Lookup Service to specify an ontology term for each feature type.
easyDAS has been written in perl and javascript. Data is stored in a MySQL database and uses ProServer
-the perl DAS server- and its hydra functionality to serve DAS data.
An instance of easyDAS is running at http://www.ebi.ac.uk/panda-srv/easydas/ and is freely available to
any researcher wanting to create a new DAS source. That instance is running at the EBI and takes advantage
of its high storage capacity and connectivity. The code is available at http://code.google.com/p/easydas/
and can be easily installed at any other institution.
Conclusions: easyDAS is an automated DAS source creation system which can help many researchers
in sharing their biological data, potentially increasing the amount of relevant biological data available to the
scientific community.
Poster 4
!"#$%&"'()#*+,"#(-.,/01.23(%4,.5'4(6#1#78
!"#$%&#%'#"$(!&)&$&**&$+,-,(./)'$%'#0(./'1/0$%0)+,(2"#(.)3'14$/&#45,(6"70)%(8&'#0$9,(
2"#&:(;0::"<$9(&#=(>&)1"(6""$+,?
+
@0%/0):&#=$(A'"'#B")C&%'1$(.0#%)0,(D00)%(D)""%0E:0'#(?F,(GH?H(DI(@'JC0K0#,(L/0(@0%/0):&#=$
A'"M0C&#%'1$(K)"3E,(83C&#(D0#0%'1$(20E&)%C0#%,(N0'=0#(O#'*0)$'%P(>0='1&:(.0#%)0,(I:7'#3$=)00B(?,(?555(
QI(N0'=0#,(L/0(@0%/0):&#=$
5
M1/"":("B(R:01%)"#'1$(&#=(."CE3%0)(M1'0#10,(O#'*0)$'%P("B(M"3%/&CE%"#,(8'K/B'0:=(.&CE3$,(O#'*0)$'%P(6"&=,(
M"3%/&CE%"#(MS+T(+AU,(O!
9
M1/"":("B(."CE3%0)(M1'0#10,(O#'*0)$'%P("B(>&#1/0$%0),(SVB")=(6"&=,(>&#1/0$%0),(>+5(WXN,(O!
X)0$0#%'#K(&3%/")Y(4"$%&$Z4&)&$&**&$[#7'1Z#:
!"#$%&'()*'%+(!""#$%&&"'()*+,-)*+.&/.(,0()"1'2&3-4-&/5(.(62
,#-%(./01(2*&%34%5+((!""#$%&&"'()*+,-)*+.&$7+&/.(,0()"1'2
?
674'"8&'
L/0(='*0)$'%P(&#=(&*&':&7':'%P("B(7'"'#B")C&%'1$(%"":$(/&*0('#1)0&$0=('#()010#%(P0&)$Z(M"C0(
%"":$(=0&:(<'%/(%/0($&C0(E)"7:0C(3$'#K(&(='BB0)0#%(&EE)"&1/,("%/0)$(E)"*'=0(='BB0)0#%(&110$$(
C01/&#'$C$(%"(%/0($&C0()0$"3)10$Z(I#"%/0)(1:&$$("B(%"":$(E)"*'=0(&KK)0K&%'"#(C01/&#'$C$(%"(
C&40(3$0("B(&(#3C70)("B(%"":$('#(&(3#'B")C(<&PZ(;")(%/0(:&%%0),($"B%<&)0(E'E0:'#'#K($P$%0C$(
&)0(701"C'#K(%/0(#")C('#(7'"'#B")C&%'1$Z(\'=0:P(3$0=(0V&CE:0$("B($31/($P$%0C$(&)0(D&:&VP,(
&(<07]E")%&:(&#=(B)&C0<")4,(&#=(L&*0)#&,(&(<")4B:"<(C&#&K0C0#%($P$%0CZ(L/0(K"&:("B(7"%/(
"B(%/0$0($P$%0C$('$(%"(E)"*'=0(&(E:&%B")C(%/&%(7'"'#B")C&%'1'&#$(&#=(7'":"K'$%$(1&#(3$0(%"(
=0$1)'70(%/0')(0VE0)'C0#%$('#(&(#3C70)("B(<0::]=0$1)'70=(E)"10$$'#K($%0E$,('Z0Z(<")4B:"<$Z
A"%/("B(%/0(&B")0C0#%'"#0=($P$%0C$(E)"*'=0(&(<'=0()&#K0("B(B3#1%'"#&:'%P("3%("B(%/0(7"VZ(
L/0)0 ( '$ ( $"C0 ( "*0):&E ( 73% ( %/0 ( )0&: ( &==0= ( *&:30 ( <"3:= ( 70 ( 'B ( <0 ( 1"3:= ( 1"C7'#0 ( %/&%(
B3#1%'"#&:'%P(&#=(%)P(%"(K0%(%/0(70$%("B(7"%/($P$%0C$Z(L/'$(<")4(&'C$(%"(7)'=K0(%/0(%<"(7P(
&::"<'#K(%/0(C")0(0VE)0$$'*0(73%(1"CE:0V(L&*0)#&(<")4B:"<$(^0ZKZ($3EE")%('%0)&%'"#$(&#=(
1"#='%'"#&:$_(%"(0V013%0(*'&(%/0($'CE:0(&#=('#%3'%'*0('#%0)B&10(%/&%(D&:&VP("BB0)$Z(L"(%/'$(0#=(
<0(73':%(&(D&:&VP(%"":(K0#0)&%")(%/&%(K'*0#(&(L&*0)#&(<")4B:"<(=0$1)'E%'"#(&#=(&(L&*0)#&(
M0)*0)(1"#$%)31%$(&(D&:&VP(%"":(%"(0#&7:0(%/0(0#&1%C0#%("B(%/&%(<")4B:"<(*'&(D&:&VPZ(230(%"(
13))0#%(:'C'%&%'"#$("B(D&:&VP,(%/'$(#0<(%"":(#00=$(%"(70('#$%&::0=('#%"(D&:&VP(C&#3&::PZ
L/0(K0#0)&%")('$(&*&':&7:0(B")(="<#:"&=,(73%('$(&:$"(E&)%("B(CPRVE0)'C0#%Z")K,(&(<")4B:"<(
)0E"$'%")P(&#=(1"CC3#'%P(<07($'%0(B")(1"CE3%&%'"#&:($1'0#%'$%$Z(80)0,(%/0(7'"'#B")C&%'1'&#(
<'::(J3$%(7)"<$0(%/0()0E"$'%")P(%"('=0#%'BP(L&*0)#&(<")4B:"<$("B('#%0)0$%(&#=(<'::(%/0#(70(&7:0(
%"(="<#:"&=(&(<")4B:"<(&$(&(D&:&VP(%"":(&#=('#$%&::(%/0(%"":('#%"(&(D&:&VP($0)*0)('#(%/0(3$3&:(
<&PZ(
L/0(K0#0)&%")('$('CE:0C0#%0=('#(637P(&#=(='$$0C'#&%0=(&$(&(K0CZ(`%(1)0&%0$(%<"(B':0$Y(+_(&#(
a>N(="13C0#%(%/&%('$(3$0=(7P(D&:&VP(%"('=0#%'BP(%/0(0V013%&7:0,('%$('#E3%$(&#=("3%E3%$,(&#=(
%/0(:&P"3%(B")(%/0(D&:&VP(3$0)('#%0)B&10,(&#=(?_(&()37P($1)'E%(%/&%(1&#(&110$$(&(L&*0)#&($0)*0)(
&#=(0#&1%(%/0(<")4B:"<Z(A01&3$0(%/0(6RML('#%0)B&10("B(CPRVE0)'C0#%('$(3$0=,('%(<"3:=(70(
)0:&%'*0:P(0&$P(%"(0V%0#=(%/0(K0#0)&%")(%"(1)0&%0($1)'E%$(%/&%(&110$$("%/0)(<")4B:"<(0#K'#0$Z
L/0)0(&)0(E:&#$(%"(B3)%/0)(&3%"C&%0(%/'$(E)"10=3)0(7P(&110$$'#K(%/0(CPRVE0)'C0#%($'%0(&$(&(
D&:&VP (0V%0)#&: ('#%0)B&10, (</'1/ (<':: (=P#&C'1&::P ( &==( %/0( <")4B:"<$ ( &$ ( #0< (%"":$ (B)"C(
<'%/'#(D&:&VPZ( ;")(%/'$(%"(<")4($0&C:0$$:P,(D&:&VP(#00=$(%/0(B"::"<'#K(#0<(B3#1%'"#&:'%PY(
=P#&C'1&::P(&=='#K(#0<(%"":$(E)"K)&CC&%'1&::P,(&#=(&$$"1'&%'"#("B(3$0)()":0$(%"($E01'B'1(
%"":$(B")($013)'%PZ(I::('$$30$(C0#%'"#0=(/&*0(700#(1"CC3#'1&%0=(%"(%/0(D&:&VP(=0*0:"EC0#%(
%0&CZ
Poster 5
Mobyle 1.0: new features, new types of services
HervВґe MВґenager1 * , Bertrand NВґeron1 , Vivek Gopalan4 ,
Sandrine LarroudВґe1 , Julien Maupetit2,3 , Adrien Saladin2 ,
Pierre Tuff´ery2,3 , Yentram Huyen4 , Bernard Caudron1
1
2
Groupe Projets et DВґeveloppements en Bioinformatique, Institut Pasteur,
{hmenager,bneron}@pasteur.fr; * : presenting author
MTi, INSERM UMR-S 973, UniversitВґe Paris Diderot (Paris 7), Paris, France,
3
4
RPBS, UniversitВґe Paris Diderot (Paris 7), Paris, France,
{julien.maupetit,adrien.saladin,pierre.tuffery}@univ-paris-diderot.fr
Bioinformatics and Computational Biosciences Branch, OCICB, NIAID, NIH, Bethesda, MD 20892, USA
{gopalanv,huyeny}@niaid.nih.gov
website: https://projets.pasteur.fr/wiki/mobyle,
downloads: ftp://ftp.pasteur.fr/pub/gensoft/projects/mobyle/
open source license being used: GNU GPLv2
Performing bioinformatics analyses requires the selection and combination of tools and data to answer
a given scientific question. Many bioinformatics applications are command-line only and researchers are
often hesitant to use them based on installation issues and complex command requirements. Mobyle is a
framework and web portal specifically aimed at the integration of bioinformatics software and databanks.
It allows to run bioanalyses through a web interface without installing anything locally. In addition to
a web interface to command-line tools, the latest release of Mobyle, version 1.0, offers the possibility
to execute predefined workflows, and enhances visualisation possibilities with browser-embedded client
components, the viewers. We focus here on these major improvements.
Chaining automation with Workflows Mobyle uses an XML-based service description system,
where parameters and user data include a description of the nature and format of the information they
convey, allowing to determine the compatibility between them. In the interface, this allows to (1) suggest the relevant options to interactively chain successive programs using an intelligent piping suggestion
system, and (2) facilitate the reuse of data over successive analyses by storing data bookmarks that can
be directly loaded into a form.
To enable the automation of these chainings, the data model has been extended to incorporate Workflows, which define a dataflow-based coordination of programs that run successive and/or parallel tasks
to perform an analysis. Similarly to programs, workflows are viewed as services, sharing most of their description with programs, with the exception of the execution, which consists of a coordination of subtasks
rather than the generation and execution of a command line.
Data visualisation with Viewers When running an analysis in Mobyle, job result files can be directly
pre-visualized in the portal. However, the understandability of the result is still often hindered by the
necessity to browse potentially large and complex text-based files. To overcome this limitation, we created
a specific type of service, Viewers. Viewers are a way to embed type-dependant visualization components
for the data displayed in the Mobyle Portal. As opposed to programs and workflows, viewers are not
executed on the server side, but rather rely entirely on browser-embedded code. The XML description
files provide a way to incorporate custom interface code that will display data of a given type in the
browser, incorporating HTML-embeddable components such as Java or Flash applets, Javascript code,
etc.
The new version of Mobyle, v1.0, extends the spectrum of services available to include workflows and
viewers. Current and future works include (1) the development of an interface that allows the “de novo“
creation of workflows directly by users, and the automation of interactive chainings into workflows, and
(2) the extension of the integration capabilities for client-side components beyond simple visualisation,
to the edition of user data.
1
Poster 6
!"#$%&'( )*+( #,,-&.( /-0( '##1.2( 3#&-( "/'-&,%4-.2( %/5( "/4&-%.-5(
,1-6"7"1"'8('9&#:;9(<1:;="/.(
!"#$%&'("!")%*%&("+",*-.("!/"0123*4%&("5"6%783*("!"6.1("9":7%&'(";"<7=>7&("!"?%&'("/"?-&'@
;*%.41.(":"9%-("+"A%.B*CD>"
E&F%*7-"G&.F7F1F3"H-*",%&I3*"<3.3%*I$("J-*-&F-("E&F%*7-"/K0"L+M(",%&%8%"
;4%7NO"!1&P1&QC$%&'R-7I*Q-&QI%""
!"#$%&''($))!!!*#+,-./'*,/0)%
123$%&''(1$))4,5"*,+4/*,3*4.)123)#+,-./')#+,-./'67.2.)#/.34&"1)/"8".1"69:;64.35+5.'":<%
8+4"31"$%=>?%@"11"/%="3"/.8%AB#8+4%@+4"31"%2C*D%
>7.'&%4'(
BioMart is an open-source data management and federation system used by dozens of
biological databases, many of which are available through BioMart Central Portal. For the latest
release of BioMart, version 0.8, the software has been completely rewritten, incorporating many
new features and optimizations.
BioMart 0.8 is an integrated Java application, making it possible to build a data source,
configure querying and presentation interfaces, and deploy a BioMart server from a single tool.
On the database side, BioMart now supports more relational database systems, adding SQL
Server and DB2 as well as continuing to support MySQL, PostgreSQL, and Oracle. In addition
to querying “mart” databases, BioMart can now query directly against any normalized database.
New optimizations have been added to the querying, including parallelization of query execution
and the ability to create indices for links between datasets, leading to faster-than-ever data
retrieval. The BioMart server now includes built-in support for the HTTPS protocol, as well as
OpenID and OAuth-based user authentication, so BioMart can now be easily configured to
handle complex access control for sensitive data.
The BioMart user experience has been improved by the addition of several new graphical user
interfaces (GUIs). There are four database search GUIs, each tailored for a different degree of
complexity and flexibility in querying. There is also a new way of presenting data, the
MartReport GUI, where information about one single data entity (usually a biological entity such
as a gene) can be collated from several different resources.
In addition to the built-in GUIs, BioMart 0.8 adds a robust plug-in framework to support the
development of novel data analysis and visualization tools. Several such plug-ins have been
developed and used in BioMart Central Portal and ICGC Data Portal, including tools for
visualizing the pathways most frequently affected by somatic mutations; an ID converstion tool;
and a gene sequence retrieval module.
The latest version of BioMart continues to offer developers programmatic access through
several application programming interfaces (APIs). REST- and SOAP-based querying continues
to be offered, as well as a new JAVA API and a SPARQL interface for semantic web querying.
All of these access methods are integrated with the interactive querying, so that after a query
has been constructed interactively, the same query can be presented in API format at the push
of a button.
With these new features and extensibility BioMart remains a top choice for managing dataintensive collaborative projects.
Poster 7
!"##$#%&'()*+,(-.&/0)("%0&1234)#2&54)34)&
!"#$%&'(%%")*+&,"-(./&0$1#(*+&2%$#&31%%1$4*+&2%(5*$#6.$&7(#$618+&9$."%(&:"-%(&
;8<""%&"=&9"4>?/(.&;81(#8(+&@#1A(.*1/B&"=&C$#8<(*/(.+&@D&
E6"#$%F5F=(%%")*+&."-(./F<$1#(*+&$%$#F.F)1%%1$4*+&&
$F#(#$618+&8$."%(F$FG"-%(HI4$#8<(*/(.F$8F?5&
6)(748/9.&'4:&.$/4;&<//>JKK)))F/$A(.#$F".GF?5K&
5(")84&8(<4;&<//>JKK.?-BG(4*F".GKG(4*K/LM*(.A(.N&
<//>JKK8"6(FG""G%(F8"4K>K/$A(.#$K*"?.8(K-.")*(K/$A(.#$K>."6?8/*K?5F".GF/$A(.#$F*(.A(.&
=$84#84;&O:PO&LFQ&R*(.A(.S+&T;!&R8%1(#/S&
&
U$A(.#$&1*&$#&">(#&*"?.8(&$#6&6"4$1#M1#6(>(#6(#/&*?1/(&"=&/""%*&?*(6&/"&6(*1G#&$#6&(V(8?/(&*81(#/1=18&)".5=%")*F&U$A(.#$&
3".5-(#8<&1*&$&6(*5/">&8%1(#/&$>>%18$/1"#&).1//(#&1#&W$A$&/<$/&8"4-1#(*&$&G.$><18$%&1#/(.=$8(&/"&8.($/(&/<(&)".5=%")*&)1/<&$&
)".5=%")&(V(8?/1"#&(#G1#(F&U$A(.#$&1*&$%*"&$A$1%$-%(&$*&$&8"44$#6&%1#(&/""%&=".&(V(8?/1#G&)".5=%")*&=."4&$&/(.41#$%F&
2*& "=& A(.*1"#& LFL+& U$A(.#$& 1*& $%*"& $A$1%$-%(& $*& $& *(.A(.& /"& $%%")& .(4"/(& (V(8?/1"#& "=& )".5=%")*F& U$A(.#$& L& ;(.A(.& (#$-%(*&
($*1(.&1#/(G.$/1"#&"=&U$A(.#$X*&)".5=%")&(V(8?/1"#&(#G1#(&)1/<&3(-&>"./$%*&$#6&"/<(.&$>>%18$/1"#*+&)<18<&8$#&8"44?#18$/(&
)1/<&/<(&;(.A(.&A1$&*/$#6$.6&1#/(.=$8(*F&&
U$A(.#$&L&;(.A(.&(V>"*(*&,Y;U&$#6&;Z2P&2P[*N&(1/<(.&8$#&-(&?*(6&/"&$88(**&/<(&=?#8/1"#$%1/B&"=&/<(&;(.A(.F&[#/(.#$%%B+&/<(&
8?..(#/& A(.*1"#& "=& /<(& ;(.A(.& 1*& -$*(6& "#& W2\M,;& $#6& W2\M3;& R=".& ,Y;U& $#6& ;Z2P& *?>>"./+& .(*>(8/1A(%BS& )1/<& /<(& -$*18&
)".5=%")&(V(8?/1"#&=?#8/1"#$%1/B&<$#6%(6&-B&6(%(G$/1"#&/"&/<(&U$A(.#$&8"44$#6&%1#(&/""%F&U"&/<(&-$*18&(V(8?/1"#&4"6(%&"=&
/<(& 8"44$#6& %1#(& /""%+& /<(& ;(.A(.& $66*& =1%(& $#6& 61.(8/".B& 4$#$G(4(#/+& $*& )(%%& $*& %""51#G& $=/(.& 6(%(/1#G& $%%& .(*"?.8(*&
$**"81$/(6&)1/<&/<(&*(.A(.&"#8(&6"#(F&U<(&;(.A(.&1*&14>%(4(#/(6&?*1#G&2>$8<(&9\'&1#*16(&$&;>.1#G&-($#&8"#/$1#(.+&6(*1G#(6&
/"&-(&6(>%"B$-%(&1#&$#B&W$A$&*(.A%(/&8"#/$1#(.&R(FGF+&U"48$/+&W(//B+&:%$**=1*<SF&&
U<(&8%1(#/&*/$./*&/<(&1#/(.$8/1"#&)1/<&/<(&;(.A(.&-B&*?-41//1#G&/<(&6(=1#1/1"#&=1%(&"=&/<(&)".5=%")&/"&-(&(V(8?/(6F&U<1*&8.($/(*&
$& ])".5=%")& .?#^& "#& /<(& ;(.A(.& $#6& .(/?.#*& /<(& .?#& 16(#/1=1(.& /"& /<(& 8%1(#/F& 7(V/+& /<(& 8%1(#/& *(/*& ?>& /<(& 1#>?/& 6$/$& =".& /<(&
)".5=%")X*&1#>?/&>"./*+&1=&$#B+&-B&(1/<(.&>."A161#G&A$%?(*&61.(8/%B&".&?>%"$61#G&=1%(*&)<(.(&1#>?/*&$.(&/"&-(&.($6&=."4F&U<(&
)".5=%")&1*&#")&.($6B&=".&(V(8?/1"#&_&*(//1#G&/<(&)".5=%")X*&*/$/?*&/"&]Z>(.$/1#G^&"#&/<(&;(.A(.&1#1/1$/(*&/<(&)".5=%")&.?#F&
7(V/+&/<(&8%1(#/&>"%%*&/<(&.?#X*&*/$/?*&"#&/<(&;(.A(.+&)$1/1#G&=".&1/&/"&=1#1*<F&2%/(.#$/1A(%B+&/<(B&8$#&)$/8<&/<(&2/"4&=((6&=".&
#"/1=18$/1"#& "=& /(.41#$/1"#+& ".& .(G1*/(.& $& .(8(1A(.& =".& "#(& "=& /<(& "/<(.& *?>>"./(6& >."/"8"%*F& '1#$%%B+& /<(& 8%1(#/& 8"%%(8/*& /<(&
.(*?%/&6$/$&(1/<(.&1#61A16?$%%B&=".&($8<&"?/>?/&>"./&".&8"%%(8/1A(%B&$*&"#(&\CO&".&`1>&=1%(F&
U"& 8"4>%(4(#/& /<(& U$A(.#$& L& ;(.A(.+& )(& <$A(& $%*"& 6(A(%">(6& 8%1(#/& %1-.$.1(*& 1#& ,?-B& $#6& W$A$& /<$/& (V>"*(& /<(& ;(.A(.X*&
8$>$-1%1/1(*&$#6&(#$-%(&>(">%(&/"&a?185%B&1#/(G.$/(&U$A(.#$&)1/<1#&/<(1.&$>>%18$/1"#*F&T"/<&8%1(#/&%1-.$.1(*&1#/(.#$%%B&$88(**&
/<(&;(.A(.X*&,Y;U&1#/(.=$8(F&U<(&,?-B&8%1(#/&1*&$A$1%$-%(&$*&$&G(4&$#6&<$*&-((#&?*(6&-B&/<(&U$A(.#$M:$%$VB&1#/(G.$/1"#&/"&
(#$-%(&).$>>1#G&U$A(.#$&)".5=%")*&$*&:$%$VB&/""%*&$#6&8$%%1#G&/<(4&=."4&/<(&:$%$VB&)".5=%")*F&[#/(.#$%%B+&/<(&:$%$VB&/""%&
*(#6*&/<(&).$>>(6&U$A(.#$&)".5=%")&=".&(V(8?/1"#&"#&$&U$A(.#$&;(.A(.&$#6&=(/8<(*&.(*?%/*&=."4&1/F&&
b$.1"?*&"/<(.&>."c(8/*&$.(&?/1%1*1#G&U$A(.#$&L&;(.A(.&1#&/<(1.&1#=.$*/.?8/?.(*J&&
•
•
•
•
;<$.(6& :(#"418*& R<//>*JKK)))F#1-<1F".GF?5K*<$.(6G(#"418*K6(=$?%/F$*>VS_& ?*1#G& U$A(.#$& L& ;(.A(.& /"& .?#&
G(#(/18&6$/$&)".5=%")*&
0YO[Z&R<//>JKK)))F<(%1"MA"F(?S&_&?*1#G&U$A(.#$&L&;(.A(.&/"&.?#&<(%1"><B*18*&)".5=%")*&
8$:.16&R<//>JKK)))F8$G.16F".GS&_&(V(8?/1#G&$#$%B/18&$#6&6$/$&*(.A18(*&1#&8$#8(.M>.(618/1#G&)".5=%")*&"#&U$A(.#$&
L&;(.A(.&.?##1#G&"#&8$T[:X*&8$:.16&>%$/=".4&
7([;;&R)))F#(1**F".GF?5S&_&.?##1#G&>">?%$/1"#&$#6&/.$==18&*14?%$/1"#&)".5=%")*&=."4&$&>"./$%&
U<(&#(V/&.(%($*(&"=&/<(&U$A(.#$&L&;(.A(.&R*8<(6?%(6&=".&W?#(&LdQQS&)1%%&<$A(&14>."A(6&<$#6%1#G&"=&%$.G(&1#>?/&$#6&"?/>?/&
6$/$+&=?%%BM1#/(G.$/(6&*(8?.1/B&*?>>"./&R)1/<&8%1(#/&$?/<(#/18$/1"#&/"&/<(&;(.A(.&4$#$G(6&-B&;>.1#G&;(8?.1/B+&$?/<(#/18$/(6&
*(.A18(& 1#A"8$/1"#*& =."4& 1#*16(& (V(8?/1#G& )".5=%")*+& $#6& )".5=%")& .?#& $88(**& 8"#/."%& *"& /<$/& ?*(.*& 8$#& >(.41/& /<(1.&
8"%%$-".$/".*&/"&*((&/<(1.&.(*?%/*&61.(8/%BS+&14>."A(6&c"-&4$#$G(4(#/&$#6&$88"?#/1#G+&*(4$#/18&.(*?%/&6$/$&6(*8.1>/1"#&$#6&
A$.1"?*&#"/1=18$/1"#&4(8<$#1*4*&=".&%(//1#G&?*(.*&5#")&)<(#&/<(&)".5=%")&.?#*&<$*&8"4>%(/(6&R1#8%?61#G&$#&2/"4&=((6&$#6&
*?>>"./&=".&U)1//(.+&(M4$1%&$#6&W$--(.&4(**$G1#GSF&
!"#$%$&'(&)*%&+,-.&!/0&12+345&6789%:);&<7=#)&#8>&?@ABCBD&=#$&)*%&,E-.&FFGHI&1G(.J5&6789%:);&<7=#)&#8>&FFK4LLMC@0KND>&
Poster 8
!"#$%&'()*+,-.*/0&12#)&#30)*4$0*56#)7%2$7*
*
82&1')9*:;$$#*<*=72>)0%2#"*$4*!'924$072'*:'7*?2)@$*A;%;$$#B3&%C,)C3D*
E0$F)&#*G)H%2#).*1##(.II&"#$%&'(),$0@*
E0$F)&#*%$30&)*&$C).*1##(.II&12'7#2,3&%C,)C3I%>7I&"#$%&'()**
J()7*:$30&)*K2&)7%).*KLEK*
*
!"#$%&'()*2%*'*($(39'0M*$()7*%$30&)*C)%N#$(*'((92&'#2$7*4$0*>2%3'92O27@*'7C*'7'9"O27@*
H2$9$@2&'9*7)#G$0N%,**!"#$%&'()*P,Q*&$7%2%#%*$4*'*&$0)*'((92&'#2$7*#1'#*(0$>2C)%*'*
>2%3'92O'#2$7*'7C*'7'9"%2%*&'('H292#2)%*'9$7@*G2#1*'7*/ER*4$0*)6#)7C27@*!"#$%&'()S%*
437&#2$7'92#"*#10$3@1*T(93@27%,U*:&2)7#2%#%*'7C*$#1)0*!"#$%&'()*3%)0%*H)7)42#*40$;*#1)*
'7'9"#2&'9*C)(#1*(0$>2C)C*H"*#1)*(93@27%M*G129)*(93@27*'3#1$0%*H)7)42#*40$;*#1)*&$0)*
!"#$%&'()*437&#2$7'92#"*'7C*#1)*40';)G$0N*4$0*C2%#02H3#27@*'7C*'C>)0#2%27@*(93@27%,*V12%*
;3#3'99"*H)7)42&2'9*0)9'#2$7%12(*1'%*0)%39#)C*27*$>)0*W--*(93@27%*
A1##(.II&"#$%&'(),$0@I(93@27%,1#;9D*'9$7@*G2#1*C$O)7%*$4*(3H92&'#2$7%*'H$3#*#1)*(93@27%*
#1);%)9>)%,***
*
!"#$%&'()*+,-*0)(0)%)7#%*'7*'##);(#*#$*0)4'&#$0*!"#$%&'()*#$*;'N)*(93@27*G02#27@*%2;(9)0*
G129)*'#*#1)*%';)*#2;)*(0$>2C27@*;$0)*%#'H292#"M*($G)0M*'7C*49)62H292#"*#$*#1)*%"%#);*'%*'*
G1$9),**X20%#*'7C*4$0);$%#M*!"#$%&'()*+,-*1'%*H))7*;$C39'02O)C*G2#1*#1)*/ER*&9)'79"*
%)('0'#)C*40$;*#1)*2;(9);)7#'#2$7,**V12%*;$C39'02#"*2%*H)27@*4'&292#'#)C*'7C*)74$0&)C*G2#1*
J:L2*A1##(.II$%@2,$0@DM*'*($(39'0*;$C39'02O'#2$7*40';)G$0N,**J:L2S%*;2&0$*%)0>2&)*
'0&12#)&#30)*'99$G%*(02>'#)*2;(9);)7#'#2$7*&$C)*#$*0);'27*(02>'#)*H"*0)@2%#)027@*;2&0$*
%)0>2&)%M*G12&1*0)9"*$79"*$7*#1)*(3H92&*/ER,*V12%*;)'7%*#1'#*'7"*(93@27*$79"*1'%*#1)*
$(($0#372#"*#$*C)()7C*$7*#1)*(3H92&*/ERM*G12&1*G299*1$()4399"*&9'024"*'7C*%2;(924"*G1'#*2%*
7))C)C*#$*G02#)*'*(93@27,*Y)*1'>)*'9%$*H)@37*3%27@*#1)*:);'7#2&*Z)0%2$727@*%#'7C'0C*
A1##(.II%);>)0,$0@D*4$0*!"#$%&'()*&$C)*#$*;'N)*&9)'0*1$G*'7C*G1)7*'*(3H92&*/ER*;'"*
&1'7@),*V12%*G299*'99*@$*#$G'0C%*1)9(27@*#1)*!"#$%&'()*&$0)*;'27#'27*H'&NG'0C%*
&$;('#2H292#"M*G12&1*G299*@0)'#9"*27&0)'%)*(93@27*%#'H292#",**/99*T(93@27%U*27*+,-*&'7*H)*
G02##)7*'%*J:L2*H37C9)%M*F3%#*92N)*#1)*!"#$%&'()*&$0)*;$C39)%,**V12%*;)'7%*#1'#*(93@27%*G299*
7$G*1'>)*#1)*$(($0#372#"*#$*0)@2%#)0*#1)20*$G7*(3H92&*/ERM*)92;27'#27@*#1)*C2%#27&#2$7*
H)#G))7*&$0)*'7C*(93@27*'7C*0)%39#27@*27*'*;3&1*;$0)*($G)0439*'7C*49)62H9)*%"%#);,**Y129)*
#1)*'0&12#)&#30)*$4*!"#$%&'()*+,-*0)92)%*$7*J:L2M*>)0"*92##9)*&$C)*C$)%,**Y)*1'>)*9)>)0'@)C*
#1)*:(027@*?"7';2&*8$C39)%*%"%#);*#$*)92;27'#)*7)'09"*'99*0)92'7&)*$7*#1)*J:L2*/ER*27*#1)*
&$0)*'((92&'#2$7,**/#*#1)*&$%#*$4*Q8K*&$742@30'#2$7*429)%M*'99*&$C)*)7C%*3(*H)27@*T(9'27*$9C*
['>'*$HF)&#%U*G12&1*%1$39C*1$()4399"*;'N)*#1)*&$0)*&$C)*;3&1*)'%2)0*#$*37C)0%#'7C,*
*
Y129)*7$#*G2#1$3#*02%NM*G)*H)92)>)*#1'#*!"#$%&'()*+,-*G299*)7'H9)*'*7)G*@)7)0'#2$7*$4*
!"#$%&'()*(93@27%*'%*G)99*'%*;3&1*@0)'#)0*$(($0#372#"*4$0*&$99'H$0'#2$7*G2#1*C244)0)7#*
%"%#);%,**V12%*#'9N*G299*)9'H$0'#)*$7*#1)*7)G*!"#$%&'()*'0&12#)&#30)*27&93C27@*2#%*H)7)42#%M*
&1'99)7@)%M*'7C*02%N%,*
Poster 9
Applying Visual Analytics to Extend the Genome Browser from
Visualization Tool to Analysis Tool
Jeremy Goecks (jeremy.goecks@emory.edu)1, Kanwei Li1, The Galaxy Team2, and James Taylor1
1Departments
of Biology and Math & Computer Sciences, Emory University
2http://galaxyproject.org
Website: http://galaxyproject.org, Code: http://bitbucket.org/galaxy/galaxy-central/
License: Academic Free License
Genome browsers play a central role in genomic research by enabling scientists to visualize large textual
and numerical datasets in a biologically meaningful way, making it possible to observe patterns both
within and across datasets. We have applied the principles of visual analytics to develop the Galaxy Track
Browser (GTB), a Web-based genome browser integrated into the Galaxy platform (see Figure). GTB
utilizes a client-server model to scale efficiently and support the large amounts of data produced by
experiments using next-generation sequencing technologies. This model ensures that GTB users can
completely customize the display of each data track. GTB leverages the Galaxy platform to combine data
visualization and data analysis; GTB users can run tools to produce and visualize new data and
dynamically filter data. Using GTB, a user can specify parameters and run a tool on the subset of data that
is visible; GTB renders the tool’s output when it has completed. Running a tool and viewing its output can
be done interactively (quickly) because a tool runs only on the subset of data visible to the user. GTB also
supports real-time data filtering. Users can dynamically filter their data by using sliders to specify ranges
for attribute values; data with values outside an attribute range are hidden as a user makes changes. GTB
is available on every Galaxy server. Users can create visualizations for both standard and custom genome
builds and can also share visualizations with colleagues or publish them on the Web. While GTB leverages
many of Galaxy’s features, it is modular and can be configured to work outside of Galaxy and with
different data providers. GTB is available in the stable Galaxy distribution.
1
2
3
4
5
6
7
Figure. Using the Galaxy Track Browser for analysis of ENCODE RNA-seq data. From top to bottom: (1) partial view of
mapped RNA-seq reads from ENCODE cell line h1-hESC; (2) a form for running Cufflinks, a tool for assembling mapped
reads into transcripts; (3) first attempt at transcript assembly; (4-6) improving the assembly using different parameter
values for Cufflinks; (7) filtering assembled transcripts from the GM12878 cell line using transcript attributes.
Poster 10
WebApollo: A Web-Based Sequence Annotation Editor for Community Annotation
Ed Lee1, Gregg Helt1, Nomi Harris1, Mitch Skinner2, Christopher Childers3, Justin Reese3, Jay
Sandaram3, Christine Elsik3, Ian Holmes2, Suzanna Lewis1
1
Berkeley Bioinformatics Open-source Projects, Lawrence Berkeley National Laboratory, Berkeley, CA,
94720, USA
2
Department of Bioengineering, University of California at Berkeley, Berkeley, CA, 94720, USA
3
Georgetown University, Washington DC, 20057, USA
Contact: Ed Lee (elee@berkeleybop.org)
As technical advances make sequencing faster and cheaper there are more and more
community annotation efforts, augmenting the traditional centralized model where curators for a
given genome project are all at the same physical location. This trend is particularly strong for
smaller genome projects that largely rely on contributions from geographically dispersed
community experts. WebApollo was designed to provide an easy to use web-based
environment that allows multiple distributed users to edit and share sequence annotations.
WebApollo is comprised of three components: a web-based client, a server-side annotation
editing engine, and a server-side service for providing the client with data from different sources,
including databases at the University of California at Santa Cruz, Ensembl, and Chado.
The web-based client is designed as an extension to JBrowse, a Javascript-based genome
browser that provides a fast, highly interactive interface for visualizing genomic data. This
JBrowse extension provides the gestures needed for editing annotations, such as dragging and
dropping features to create new annotations, dragging to change boundaries of existing
annotations, and using context-specific menus for modifying features. The extension offers
access to the annotation-editing service and the data-providing services as well.
The server-side annotation-editing engine is written in Java. It handles all the logic for editing
and deals with the complexities of modifications in a biological context, where a single change
(for example, splitting or merging transcripts) can have multiple cascading effects. Edits are
stored persistently in the server, allowing users to quickly recover their data, should either their
browser or the server crash. The server makes use of the Comet model to provide
synchronized updating over multiple browser instances, so that if one user edits an annotation,
anyone who is viewing that annotation sees the changes instantly in their browser window.
The server-side service for providing data to the client is built on top of Trellis, a DAS server
framework. It uses format injection to provide JBrowse-supported JSON data structures, rather
than the more verbose DAS XML. We also developed a Trellis plugin to access data from the
UCSC MySQL genome database, which provides quick access to that popular data source.
All three components are open source and provided under the BSD License.
Source code and demo:
https://github.com/berkeleybop/jbrowse (client side code)
http://code.google.com/p/apollo-web (annotation editing engine server code)
http://code.google.com/p/gbol (data model and I/O layer code used by edit engine)
http://code.google.com/p/genomancer (Trellis server code)
http://icebox.lbl.gov:8080/ApolloWebDemo (demo)
h?2 BbQ#` _ T+F;2, MHvbBb Q7 [mMiBiiBp2 T`Qi2QKB+b /i
6HQ`BM SX "`2BirB2b2` M/ C+[m2b *QHBM;2
mi?Q`b {HBiBQM
S`2b2MiBM; mi?Q` 1KBH
S`QD2+i q2# aBi2
S`QD2+i *Q/2
PT2M aQm`+2 GB+2Mb2
*2JJ @ _2b2`+? *2Mi2` 7Q` JQH2+mH` J2/B+BM2
Q7 i?2 mbi`BM +/2Kv Q7 a+B2M+2b- oB2MM- mbi`B
7#`2BirB2b2`!+2KKXQ2rX+Xi
?iiT,ff#BQBM7Q`KiB+bX+2KKXQ2rX+XifBbQ#`
?iiT,ff#BQBM7Q`KiB+bX+2KKXQ2rX+XifBbQ#`fBbQ#`Xi`X;x
G:SGpkXR
"+F;`QmM/
Jbb aT2+i`QK2i2`b UJaV HHQr i?2 B/2MiB}+iBQM Q7 i?QmbM/b Q7 T`Qi2BMb BM #BQHQ;B+H bKTH2b pB Kbb@pb@
BMi2MbBiv bT2+i` Q7 Bib T2TiB/2bX ZmMiBiiBp2 /Bz2`2M+2b BM T`Qi2BM #mM/M+2 +M #2 K2bbm`2/ #v H#2HHBM;
T2TiB/2b rBi? BbQ#`B+ bi#H2 BbQiQT2b UBh_Z hJhV- r?B+? T`Q/m+2 bB;Mim`2 BQMb r?Qb2 BMi2MbBiB2b `2
+QKT`2/X PM i?2 #BQBM7Q`KiB+b bB/2- [mMiBiBp2 BM7Q`KiBQM ?b iQ #2 2ti`+i2/ M/ biiBbiB+H K2i?Q/b
mb2/ iQ KT `iBQb K2bm`2/ i i?2 BM/BpB/mH bT2+i`mK H2p2H iQ i?2 T`Qi2BM H2p2H M/- }MHHv- +QKT`2
#BQHQ;B+H bKTH2bX
.2b+`BTiBQM
h?2 BbQ#` T+F;2 BKTH2K2Mib 7mM/K2MiH T`Qi2QKB+b /i `2T`2b2MiiBQM BM a9 +Hbb2b #b2/ QM "BQ@
+QM/m+iQ`X Ai T`b2b T`Qi2BM B/2MiB}+iBQM `2bmHib UJb+Qi M/ S?2Mvtc JxA/2MiJG BM /2p2HQTK2MiV M/
/2i2`KBM2b T`Qi2BM ;`QmTb M/ T2TiB/2b b?`2/ #v KmHiBTH2 T`Qi2BMbX LQp2H biiBbiB+H KQ/2Hb i?2M +Tim`2
i2+?MB+H b r2HH b #BQHQ;B+H bQm`+2b Q7 /i p`B#BHBivX S`Qi2BM `iBQb `2 `2TQ`i2/ rBi? S@pHm2b M/ +QM@
}/2M+2 BMi2`pHbX *QKTmiiBQMb `2 T2`7Q`K2/ BM i?2 _ 2MpB`QMK2Mi r?B+? 7+BHBii2b /i 2tTHQ`iBQM M/
pBbmHBxiBQMX lb2`@Q`B2Mi2/ Gh1s M/ 1t+2H [mHBiv@+QMi`QH M/ MHvbBb `2TQ`ib +M 2bBHv #2 ;2M2`i2/
pB b+`BTibX
+?R
+?k
R
*QMi`QH
h`2iK2Mi
k
*QMi`QH
j
XX
X
T`Qi2BM
;`QmT
T2TiB/2b
bT2+i`
a2`TBMR2, Zyy3N3
RfR
d
R
h`2iK2Mi
++, Z8aqlN1,2
kfk
8
9
*QMi`QH
h`2iK2Mi
XX
X
iT8D, SNd98y
RfR
XX
X
9
XX
X
RN
RjR
*QMi`QH
h`2iK2Mi
SQbiM, ZekyyN1в€’5
8f8
R
j
Rjk
*QMi`QH
h`2iK2Mi
Jv?d, ZNRw3j
RfR
Rk3
ek
`iBQ
@8
R
X
yXkk I
yX9y I
X
=
X
=
X
=
jXy8 I
jXee I
X
=
X
=
yX9N I
XX
X
*QM+HmbBQMb
q2 /2pBb2/ biiBbiB+H K2i?Q/b M/ BKTH2K2Mi2/ i?2 T+F;2 BbQ#` 7Q` _ HHQrBM; ++m`i2 MHvbBb M/
pBbmHBxiBQM Q7 Bh_Z M/ hJh /iX AbQ#` +QMiBMb 7mM/K2MiH +Hbb2b M/ K2i?Q/b 7Q` Ja /i
`2T`2b2MiiBQM i?i +M #2 2tTM/2/ iQ T`QpB/2 ;2M2`B+ Ja 7`K2rQ`F BM _- BM/2T2M/2Mi 7`QK [mMiBiiBp2
T`Qi2QKB+b bT2+B}+ M22/bX
Poster 11
8
Title: Stacks: building and genotyping loci de novo from short-read sequences
Authors: Julian M. Catchen*, Angel Amores§, Paul Hohenlohe*, William Cresko*,
and John H. Postlethwait§
*
University of Oregon, Center for Ecology and Evolutionary Biology, Eugene OR
97403 USA
В§
University of Oregon, Institute of Neuroscience, Eugene OR 97403 USA
Email: jcatchen@uoregon.edu
Project: http://creskolab.uoregon.edu/stacks/
License: GNU GPL
Abstract
Advances in sequencing technology provide special opportunities for
genotyping individuals rapidly and cheaply, but the lack of software for the
automated calling of tens of thousands of genotypes over hundreds of individuals
has hindered progress. Stacks is a software system that identifies and genotypes
loci in a set of individuals from short-read sequence data, either de novo or by
comparison to a reference genome. Using Stacks to analyze reduced
representation Illumina sequence data, such as RAD-tags, can recover
thousands of single nucleotide polymorphism (SNP) markers that can be used for
the genetic analysis of crosses or populations. Stacks can generate markers for
ultra-dense genetic maps, can facilitate the examination of population
phylogeography, and can help in assembly of reference genomes. We report
here the algorithms implemented in Stacks and demonstrate its efficacy by
constructing loci from simulated RAD-tags taken from the stickleback reference
genome and by recapitulating and improving a genetic map of the zebrafish,
Danio rerio. Further, we demonstrate the application of Stacks in non-model
organisms to develop a genetic map and assemble an emerging reference
genome in two teleost fishes.
Large scale NGS pipelines using the MOLGENIS platform:
processing the Genome of the Netherlands
Heorhiy V Byelas1*, Danny Arends2*, Freerk van Dijk1,4, K Joeri van der Velde2, Laurent Francioli1,4,
Martijn Dijkstra1,4, Alexandros Kanterakis1, Ishtiaq Ahmad3,5, David van Enckvoort5, Leon Mei5, Peter
Horvatovich3,5, many other members of BBMRI-NL4, NBIC5 and Target6, Morris A. Swertz1,2,4-6
1
Genomics Coordination Center, University Medical Center Groningen 2Groningen Bioinformatics Center,
University of Groningen, 3Pharmacy Department, University of Groningen, 4BBMRI-NL biobank consortium,
5
Netherlands Bioinformatics Center, NBIC, 6Groningen Centre for IT & Target/LifeLines infrastructure project
*equal contribution. Contact: Morris Swertz (m.a.swertz@rug.nl)
Project website: http://www.molgenis.org
Code:
http://www.molgenis.org/svn/molgenis_apps/
License:
LGPLv3
Abstract
Many have embarked on large-scale next generation sequencing and GWAS imputation studies. However,
running all the necessary analysis scripts on large, parallel, compute clusters and keeping track of what protocols
were used to produce a particular sequence or quality control has become a huge challenge. Last year we
presented the open source MOLGENIS database toolkit to address data management challenges around creation,
management and reporting of genomic data1-4. Here, we present a new flexible tool �MOLGENIS/compute’ that
extends MOLGENIS to define, run and manage large analysis pipelines for next generation sequencing,
imputation and other bioinformatics analyses on large scale ICT infrastructures5-7.
Processing 750 Dutch genomes
An example of a large-scale analysis challenge is
the Genome of the Netherlands project5,6, a Dutch
National initiative of five universities to sequence
750 Dutch individuals in 250 parent/child trios to
establish a �hapmap’ of the Dutch population.
Already the first phase of this project has been a
major data management and compute challenge
with 10.000s of compute jobs for alignment and
SNP calling of 2250 lanes/750 samples which each
requires >10 analysis steps (bwa, realignment,
quality recalibration, etc) and tracking all
biomaterials involved (sequence lanes, samples,
cohorts, individuals, etc.). This hapmap will
subsequently be used to impute all existing Dutch
cohorts up to 100.000 individuals, which will be
similarly challenging in managing GWAS quality
control and imputation (impute2 and beagle
imputation, etc). And also proteomics mass
spectrometry analyses are targeted by this project.
Define compute protocols and pipelines
To address these challenges, bioinformaticians can
use MOLGENIS/compute web user interface to
specify all compute protocols needed inside the
database, i.e., executable scripts written in shell or
R, their input and output parameters and data sets
needed, e.g. �bwa alignment’. Each script can use
REST and R programming interfaces to upload
(references to) result data and link it to their
analysis targets, e.g. individuals, cohorts, flowcells,
lanes, samples, etc. A template mechanism allows
for flexible scripting necessary for parallelization.
Complete analysis pipelines can be made by
chaining compute protocols into simple workflows,
e.g. �alignment and QC’, which can then be used by
biologist just as individual compute protocols.
Poster 12
Run and monitor analyses at a large scale
Researchers can select, run and monitor
computational protocols. When a compute protocol
is selected (e.g. bwa-align-pe), the user gets an
auto-generated dialog box to fill in all analysis
parameters (e.g. bwa-reference-genome) and
analysis targets (e.g. illumina lane) based on a
bioinformatics definition as described above. Then
the user clicks �start’ and all the necessary compute
jobs will be generated as actual scripts that will be
first stored in the database as analysis logbook and
automatically sent to a compute cluster or cloud for
execution, currently: local PCs, LANs, PBS clusters
or Amazon EC2(alpha) and MOTEUR and Galaxy
exports are under development. During analysis,
the user can monitor all scripts submitted.
Integrated analysis and data management
Finally, the user can view all results integrated with
genomic data management such as quality reports
and SNP calls for all 750 Genomes of the
Netherland. MOLGENIS/compute is available free
for all to use and co-develop as open source.
References
1.
2.
3.
4.
5.
6.
7.
http://www.molgenis.org
Swertz MA, Jansen RC (2007) Beyond standardization:
dynamic software infrastructures for systems genetics.
Nature Reviews Genetics. 8(3).
Swertz et al (2010) XGAP: a uniform and extensible data
model and software platform for genotype and phenotype
experiments. Genome Biology. 9;11(3):R2
Swertz et al (2010) The MOLGENIS toolkit: rapid
prototyping of biosoftware at the push of a button. BMC
Bioinformatics 11(Suppl 12):S12
BBMRI-NL Genome of NL. http://tinyurl.com/bbmri-gonl
BBMRI-NL bioinformatics team. http://www.bbmriwiki.nl
Target infrastructure.
http://www.rug.nl/target/infrastructuur/computing
Bio-NGS: BioRuby plugin to conduct programmable workflows for Next
Generation Sequencing data
1
2
3
2
Bonnal R. , Strozzi F. , Katayama T. , Stella A. , Pagani M.
1
1
Istituto Nazionale Genetica Molecolare (INGM), Via F. Sforza 28, Milan 20122, Italy,
bonnal@ingm.org
2
Parco Tecnologico Padano, Via Einstein Loc. Cascina Codazza 26900 Lodi Italy
3
Laboratory of Genome Database, Human Genome Center, Institute of Medical Science, University
of Tokyo, Japan
Project
Home: http://bioruby.open-bio.org/wiki/Next_Generation_Sequencing
Code base: http://github.com/helios/bioruby-ngs
License: The Ruby License
BioRuby is a well-established bioinformatics library for the Ruby 1.9 programming
language. Here we present a new package, Bio-NGS, for BioRuby to perform Next
Generation Sequencing (NGS) analyses based on a recently introduced plugin
system (Bio-Gem), which allows users to extend the core library for adding new
functionalities. Tools and libraries have been written in other languages using
different approaches, but Ruby will facilitate the development of a light, flexible and
customizable solution to face the new challenges of NGS data analysis. This NGS
plugin can handle standard software like BWA, Bowtie, TopHat, Cufflinks, SAMtools
and many others, in a common way. Third-party tools and applications can be
integrated with Bio-NGS in different manners, by wrapping command line
applications or by binding low-level libraries/functionalities. Wrapping is the easiest
way to support and maintain third-party software while binding offers the benefit to
use some internal functionality which is not exposed to the end user. The
applications for which we provide a wrapper will be included in the package when
possible, as Linux and OSX binaries usually pre-compiled at 64 bit. This approach
gives flexibility to choose the best solution and provide a ready to run NGS-analysis
tool. Binding for BWA and SAMtools respectively bio-bwa and bio-samtools, are
available as separated plugins as well.
The plugin provides a task management framework and a set of predefined tasks to
use third-party applications and trigger specific procedures, including pre- and postprocessing operations, often required to filter the data by quality control on the input
or to visualize the outputs of the analyses. The Ruby programming language is
perfectly suited to define pipelines and procedures given its flexible and clean
syntax. Users of Bio-NGS will be able to add new tasks and develop their custom
workflows with popular Ruby DSLs like Rake and Thor. Every operation submitted is
recorded using a built-in history manager that will store all the settings and
parameters used for a given procedure. Tasks can also be submitted as parallel jobs
on different environments like multi-core machines, computer clusters and in the
near future in the cloud as well. A monitoring system which tracks the tasks is under
consideration. Bio-NGS provides a command-line reusable, yet customizable,
system for demanding NGS data analysis and workflow management. Bio-NGS is
developed following Ruby 1.9 specifications and is working on MRI 1.9.2 and JRuby
1.6.0 .
Poster 13
Goby framework: native support in GSNAP, BWA and IGV 2.0
1
Kevin C. Dorff (kcd2001@med.cornell.edu), Nyasha Chambwe
1,2*
Campagne (fac2003@med.cornell.edu)
1,2
3
4
, Thomas D. Wu , Jim T. Robinson , Fabien
1
The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine; Weill Medical College of Cornell
2
University, New York, NY 10021; Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York,
3
4
NY 10021; Genentech, Inc., South San Francisco, CA 94080, USA; Broad Institute of Massachusetts Institute of Technology and
Harvard,!Cambridge, Massachusetts, USA.
Project URL:
http://goby.campagnelab.org/
!
Source Code URL:
http://campagnelab.org/software/goby/download-goby/
!
Open Source License:
GNU General Public License!
We will present an update about the Goby framework, a set of high performance APIs and tools that support a variety
of data analysis for Next-Generation Sequencing (NGS) projects. We reported last year that the framework offered
file formats 80 to 90% smaller than BAM files, while retaining sufficient information to perform RNA-Seq data
analysis. We now report that we have extended three widely used NGS open-source tools with native support for
Goby file formats. The following describes our rational for targeting these complementary sets of tools. BWA is a fast
aligner to map reads derived from genomic DNA [1]. In contrast to Bowtie, BWA can align reads that contain
insertions and deletions. GSNAP is an aligner that can map reads which span exon-exon junctions as well as perform
SNP-tolerant mapping [2]. The first feature is useful for RNA-Seq applications, while the second helps avoid
reference bias in alignments. GSNAP also supports the alignment of reads derived from bisulfite treated DNA
samples used in the RRBS or Methyl-Seq protocols. IGV is a widely used genome viewer that supports data
integration across modalities [3].
GSNAP and BWA support. We have extended GSNAP and BWA to read Goby compact read files natively and to
write Goby alignment files. These extensions support aligning an arbitrary slice of a very large read file and are key to
parallelizing alignments efficiently (the input file is partitioned into non-overlapping slices, which are aligned in
parallel, results are concatenated). Using these techniques, we have developed a parallel alignment tool that aligns
about 100 million single end 100bp reads per hour on a three nodes cluster (24 threads on each node). Such a
cluster could be bought for less than $24,000 in 2011. The alignment time reported includes the time to generate
wiggle tracks and various statistics of alignment quality.
IGV support. We have extended IGV to load and display Goby alignment files. Alignments that have been sorted can
be loaded into IGV along with other data tracks. Visualization shows mapped reads individually as well as differences
between the read and the reference sequence. The version of IGV that supports Goby alignments is currently
available for download as an Early Access Version (http://www.broadinstitute.org/software/igv/download).
Extensions to the Goby read format. We have added a number of features to the Goby formats over the last year.
Read files were extended to support paired-end reads (both pairs of reads are represented in a single Goby reads
file). We also developed a mechanism to store meta-information about the reads in the reads file. This can be used
for instance to document the organism the reads were derived from, the type of sequencing instrument, or the date
the run was performed (useful to detect batch effects). Arbitrary key-value pairs make it possible to encode userdefined meta-information about a collection of reads.
Extensions to the Goby alignment format. We have extended the alignment format to store paired-end alignments,
as well as alignment of reads that map across exon-exon junctions. These extensions to the file format were made
with backward and forward compatibility with previous versions of the file format or version of the framework.
Extensions to the Goby toolbox. We have extended the Goby toolbox with high-performance, well documented and
simple to use tools for calling genotypes (DNA-Seq), estimating allele frequencies (RNA-Seq), finding variants that
differ between groups of samples, and estimating methylation rates at CpG sites (RRBS or Methyl-Seq). Several of
these new applications will be discussed.
Citations. 1. Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009.
25(14): p. 1754-1760. 2. Wu, T.D. and S. Nacu, Fast and SNP-tolerant detection of complex variants and splicing in short reads.
Bioinformatics. 26(7): p. 873-881. 3. Robinson, J.T., et al., Integrative genomics viewer. Nat Biotechnol. 29(1): p. 24-6.
Poster 14
A Scalable Multicore Implementation of the TEIRESIAS Algorithm
Lee Nau1 , Frank Drews1 , and Lonnie Welch1
1
School of Electrical Engineering and Computer Science, Ohio University, Athens, OH
47501, USA
E-mail: welch@ohio.edu
Project URL: http://code.google.com/p/pteir/
Code URL: http://code.google.com/p/pteir/source/checkout
Open Source License: MIT License
Abstract
The TEIRESIAS algorithm is a combinatorial pattern (motif) discovery technique originally published
by IBM Research. The lack of an available, open-source implementation of this algorithm motivated this
open-source project. This algorithm is computationally expensive, and thus in order to reduce run times
and achieve scalable performance, parallel processing techniques were used to make this implementation
modern and efficient.
This algorithm is an important one in the field of motif discovery, allowing the discovery of variablelength, and potentially long patterns. TEIRESIAS defines the notion of maximal patterns, which are
as exact as possible without changing the number of occurrences. These have inherent value and might
contain more useful information than a simple collection of strings. TEIRESIAS has been used in the
discovery of long biological patterns, and as the basis of subsequent, modified algorithms based on this
fundamental one.
This fast, parallel, memory-efficient version of the algorithm has been implemented in C and uses the
OpenMP parallel programming model. Experiments were conducted using up to sixteen threads on the
Ohio Supercomputer Center’s Glenn cluster using System x3755 compute nodes (quad socket, quad core
2.4 GHz Opterons).
Speedups achieved reached a maximum amount of approximately 10x, when scaling to sixteen cores/threads
total. While not linear, this still represents a signifcant improvement in run times without much additional memory overhead. This project details the parallel processing techniques used to achieve such a
performance increase and characterizes the scalability given several real biological datasets.
Specifically, there are two stages in the TEIRESIAS algorithm: scanning and convolution. Both are
distinct steps and have different processing techniques. The scan phase builds a database of candidate
patterns from the input sequence(s). It is an embarrassingly parallelizable problem and was therefore
straightforward to decompose into parallel tasks.
The convolution phase is a more complex stage and parallelization strategies are not immediately
obvious. This stage is recursive, and contains strict data and processing ordering dependencies. However,
there exist certain categories of patterns which belong to the same equivalence class. That is, some can be
segmented into discrete, equivalent groups. Due to the existence of these equivalence classes among the
patterns, they may be processed in parallel due to the lack of a class’ interdependence. It was determined
through statistical analysis of these classes resulting from real genome sequences that sufficient potential
concurrency existed to process these sets simultaneously.
Combined, these parallelization strategies resulted in improved run times and achieved a consistent
speedup across organisms for the entire algorithm pipeline. For cases that tax either one stage or both,
a significant improvment is seen for completion times, within a reasonable memory envelope.
1
Biomanycores, open-source parallel code for many-core bioinformatics
Mathieu Giraud1 , StВґephane Janot1 , Jean-FrВґedВґeric Berthelot1 , Charles Deltel2 ,
Laetitia Jourdan1 , Dominique Lavenier2 , HВґel`ene Touzet1 , Jean-StВґephane VarrВґe1
1
2
LIFL, UMR CNRS 8022, UniversitВґe Lille 1 and INRIA Lille, France
IRISA, UMR CNRS 6074, UniversitВґe Rennes 1, ENS Cachan, and INRIA Rennes, France
contact@biomanycores.org
URL: http://www.biomanycores.org
Licenses: Various open-source licenses
Graphics processing units (GPUs) enable efficient
parallel processing at a very low cost, and are a first
step towards the generalization of massively manycore architectures. Since CUDA (in 2007) and the
standard OpenCL (2009), many GPU bioinformatics
applications have been developed, from sequence
alignment to proteomics or phylogenetics (review
in [4]).
Biomanycores is a collection of bioinformatics
tools, designed to bridge the gap between researches
in OpenCL/CUDA high-performance computing on
GPU and other “manycore processors” and usual
bioinformaticians and biologists.
The main goal is to gather parallel programs and
interface them with the BioJava [2], BioPerl [3] and
Biopython [1] frameworks. We will also provide
benchmarks to show in which cases it is worth using parallel versions of the programs.
Weight Matrix (PWM) scan algorithm is in O(nm),
where n is the sequence length and m the matrix
size. PWM performance can be measured in millions
of operations per second, one operation being the
scoring of one sequence character against one matrix
column. While the native Biopython search_pwm
method peaks at 2 Mop/s, the Biomanycores GPU
version (TFM-CUDA) peaks at more than 2 Gop/s,
1000 times faster, even once included the overhead
due to the Python interpreter.
In the coming months, we plan to integrate new
applications into Biomanycores. Moreover, we wish
to receive critical feedback from BioJava, BioPerl and
Biopython communities and users in order to improve our interfaces, and discuss further integration
into these frameworks.
People who are developing CUDA or OpenCL
bioinformatics code (available under a free license)
are welcome to get involved in the project – we are
willing to help them to integrate their code. Please
contact us at contact@biomanycores.org.
Biomanycores was presented at BOSC 2009. Since References
November 2010, a developer is working full-time on [1] P. J. A. Cock, T. Antao, J. T. Chang, and al. Biopython: freely available
Python tools for computational molecular biology and bioinformatics.
this project, redesigning, extending and documentBioinformatics, page btp163, 2009.
ing Biomanycores. The release 1.1104 of Biomany[2] R. C. G. Holland, T. A. Down, M. Pocock, and al. BioJava: an opencores (April 15) includes applications for sequence
source framework for bioinformatics. Bioinformatics, 24(18):2096–2097,
2008.
alignment, sequence processing, and RNA folding
tools. Next releases are likely to include tools for [3] J. E. Stajich, D. Block, K. Boulez, and al. The Bioperl toolkit: Perl modules for the life sciences. Genome Research, 12(10):1611–1618, 2002.
proteomics, phylogenetics and genome-wide association studies.
[4] J.-S. VarrВґe, B. Schmidt, S. Janot, and M. Giraud. Advances in GeBiomanycores design makes it easy to alter existing pipelines and scripts, in order to use a parallel application instead of the standard one. In
some cases, this leads to large improvements. For
example, the complexity of the brute-force Position
nomic Sequence Analysis and Pattern Discovery, chapter Manycore highperformance computing in bioinformatics. World Scientific, 2011.
Poster 15
GemSIM – Generic, Error-Model based SIMulator of next-generation sequencing
Kerensa McElroya*, Fabio Lucianib, Torsten Thomasa
a
Centre for Marine Bio-Innovation and School of Biotechnology and Biomolecular
Sciences, UNSW, Sydney, NSW Australia, 2052.
b
Inflammatory Diseases Research Unit, School of Medical Sciences, UNSW, Sydney,
NSW Australia 2052.
Email: kerensa@unsw.edu.au
Project page: http://sourceforge.net/projects/gemsim/
Download: http://sourceforge.net/projects/gemsim/files/GemSIM_V1/
License: GNU General Public License version 3
Summary: Next-generation sequencing (NGS) has unprecedented potential for
assessing genetic diversity, however extracting true variants from errors is challenging
due to high NGS error rates, multiple sequencing platforms with varied error profiles,
and an every increasing variety of downstream analysis choices. While simulation can
facilitate analysis, existing simulators are limited by simplistic error-models,
unrealistic quality score information, or restricted platform applicability. GemSIM, or
General Error-Model based SIMulator, is a next-generation sequencing simulator
capable of generating single or paired-end reads for any sequencing technology
compatible with the generic formats SAM and FASTQ (including Illumina and Roche454). By creating and using empirically derived error models and quality score
distributions, GemSIM realistically emulates individual sequencing runs and/or
technologies. GemSIM draws reads from either a single genome or a haplotype set,
facilitating simulation of either individual or population level sequencing projects.
Here, we demonstrate GemSIM’s value for next-generation sequencing projects, by
simulating reads from a set of known, related bacterial haplotypes and optimising a
parameter for the popular SNP-calling program VarScan. Reads simulated using error
models derived from Illumina paired-end reads required different SNP calling
parameters to those simulated using Roche-454 derived models, demonstrating the
need for simulation when designing and analysing NGS projects.
References
Koboldt et al. (2009). “VarScan: variant-detection in massively-parallel sequencing of
individual and pooled samples.” Bioinformatics 25(17):2283-2285.
Poster 16
!"#$%&'()*'+),-*%&'()./"'0,.$%#")1&.&'2.%3*4&#,)&')4-")#5.$+)
!"#$%&'()*++%,'(-./-0/1((2%'$3(4*,563'"(-./1(7"#8(9%:5;(-./1(<"++"%:(=6**,5&(
-./1(=*>:?%,%&%?%,(=&","@%;%,A(-B/(
-./ C%D+5(E5,*:"#;(FA'G1(H%I&%$%:(!5;5%&#$(J%:63;1(J%:I&"'D5(JHBB(0KL1(
MN(
-B/ J*D,"O%,A(L5#$,*+*D?(=*+3A"*,;1(H%,D%+*&51(P,'"%(
-0/ Q&5;5,A",D(%3A$*&R(!"#$%&'()*++%,'1($*++%,'S5%D+5D5,*:"#;G#*:(
(
Q&*T5#A(;"A5(";(#3&&5,A+?R($AA6;RUU>>>G5%D+5#*D,"O%,AG#*:U($*>5@5&(A$";(;"A5(";(
,*A(63I+"#%++?(%##5;;"I+5(-+*D",;(%&5(&5V3"&5'/(W(A$5(A%+8(";(%I*3A(A$5(:5#$%,"#;(
*X($*>(>5(I3"+A("A1(,*A(A$5(%#A3%+(6&*'3#AG(
L$5&5(";(,*(#*'5(A*(%##5;;G(
PA(";(,*A(*65,Y;*3&#5("A;5+X1(I3A(3;5;(*65,Y;*3&#5(#*:6*,5,A;(A*(6*>5&("AG(
(
(
C%D+5(E5,*:"#;(FA'G(%,'(J*D,"O%,A(L5#$,*+*D?(=*+3A"*,;(T*",A+?('5@5+*65'(%(
6&**XY*XY#*,#56A(;?;A5:(X*&(A$5(Q";A*"%(K++"%,#5(>$5&5I?(85?(*65,Y;*3&#5(
I"*",X*&:%A"#;(;5&@"#5;(>5&5(A*(I5($*;A5'(",(%(;5#3&5(;$%&5'(5,@"&*,:5,A(*,(A$5(
63I+"#(#+*3'(X*&(6&"@%A5(#*::5&#"%+(3;5G(L$5(;?;A5:('5;"D,5'(I?(A$5(A>*(
#*:6%,"5;(I&",D;(A*D5A$5&(%(,3:I5&(*X(*65,Y(%,'(#+*;5'Y;*3&#5(",X&%;A&3#A3&5(
#*:6*,5,A;(A$%A(5,%I+5(A$5(*65,Y;*3&#5(A**+;(A*(I5(;5#3&5'(6&*65&+?(%,'($*;A5'(
",(%(:3+A"YA5,%,#?(5,@"&*,:5,A(A$%A(%3A*Y;#%+5;(%##*&'",D(A*('5:%,'G((
(
N5?(";;35;(X%#5'('3&",D('5@5+*6:5,A(",#+3'5'(8556",D(:3+A"6+5(;*3(*X(
#*::5&#"%+('%A%(;56%&%A5'(",(%(:3+A"YA5,%,#?(5,@"&*,:5,A1(%,'(5,;3&",D(A$%A(
A$5(*65,Y;*3&#5(A**+;($*;A5'(-",#+3'",D(C,;5:I+/(>5&5(;5#3&5(5,*3D$(A*(
6&5@5,A($%#85&;(D%",",D(%##5;;(A*(A$5(;?;A5:G(
(
L$";(A%+8(>"++(",#+3'5(%(I&"5X(#*,A5ZA3%+(*3A+",5(*X(A$5(;?;A5:(%&#$"A5#A3&5(%,'(
$*>("A(I+5,';(#+*;5'Y(%,'(*65,Y;*3&#5(#*:6*,5,A;(>"A$(X5%A3&5;(X&*:(K:%O*,(
CJB(A*(I3"+'(%,(%##56A%I+?(&*I3;A(;*+3A"*,G()*>5@5&(A$5(:%",(X*#3;(>"++(I5(%(
'";#3;;"*,(*X(A$5(#$%++5,D5;(>5(X%#5'(",(%AA5:6A",D(A*(:%85(*65,Y;*3&#5(
I"*",X*&:%A"#;(;*XA>%&5(;5#3&5(5,*3D$(A*(;3&@"@5(65,5A&%A"*,(A5;A",D(I?(%(A5%:(
*X(6&*X5;;"*,%+(5A$"#%+($%#85&;1(",#+3'",D(;*:5($",A;(%,'(A"6;(*,(;5#3&",D(*A$5&(
6&*T5#A;G(<$"+;A(A$5(;?;A5:(>5(I3"+A(";(,*A(*65,1(A$5(+5;;*,;(>5(+5%&,A(%+*,D(A$5(
>%?(:*;A('5X","A5+?(%&5(%,'(%++(*3&(;5#3&"A?(X",'",D;($%@5(%+&5%'?(I55,(&56*&A5'(
I%#8(A*(A$5(C,;5:I+(6&*T5#A(X*&(A$5"&(#*,;"'5&%A"*,G(
((
Title:
!"#$%$&'%()*+,$%$+-%%)./.')%+/0+/+1$23'4$+5+,-//1
Authors: Chunlei Wu and Andrew I. Su (presenting author underlined)
Email: cwu@iscb.org asu@scripps.edu
Affiliations: Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins DR, San Diego, CA 92121
project web site: http://mygene.info
Source code: https://bitbucket.org/newgene/genedoc/src
Open Source License being used: Apache License
Considered for a talk, a poster, or both: both
!"#$%&&'()%*(+,-$%."$%$)+//+,$%,0$)+,1",(",*$2%3$4+.$#(+'+5(-*-$*+$&."-",*$*6"(.$'%.5"$0%*%$-"*-$*+$*6"$
-)(",*(4()$)+//7,(*38$9'/+-*$%''$+4$*6"-"$-(*"-$%''+2$7-".-$*+$:7".3$%,0$."*.("1"$0%*%$4+.$*6"(.$4%1+.(*"$
5","-8$;6"-"$-"%.)6$(,*".4%)"-$7-7%''3$7*('(<"$%$0"0()%*"0$5","$%,,+*%*(+,$0%*%#%-"$*+$*.%,-'%*"$7-".$
:7".("-$(,*+$*6"$%&&.+&.(%*"$(0",*(4(".-$%,0$*+$+#*%(,$-&")(4()$5","$%,,+*%*(+,-8$="**(,5$7&$%$0%*%#%-"$
-".1".$%,0$>""&(,5$5","$%,,+*%*(+,$0%*%$7&0%*"0$%."$*(/"?)+,-7/(,5$%,0$)7/#".-+/"$*%->-@$
"-&")(%''3$4+.$-/%''".$0"1"'+&".$5.+7&-8$9,0$/+."$)+-*'3@$"1".3$'+)%'$5","$%,,+*%*(+,$0%*%#%-"$.":7(."-$
*6"$07&'()%*(+,$+4$"--",*(%''3$*6"-"$-%/"$/%(,*",%,)"$*%->-8$$$
A+''+2(,5$%$5.+2(,5$*.",0$*+$&.+1(0"$#%-()$47,)*(+,%'(*3$1(%$2"#$-".1()"-@$B35","8(,4+$
C6**&DEE/35","8(,4+F$2%-$)."%*"0$*+$&.+1(0"$GH","$9,,+*%*(+,$%-$%$=".1()"I$CH9%%=F8$J"1"'+&".-$)%,$
7*('(<"$B35","8(,4+$4+.$#%-()$-"%.)6$%,0$%,,+*%*(+,$."*.("1%'$7-(,5$%$-(/&'"$2"#$-".1()"$(,*".4%)"8$
J"1"'+&".-$)%,$*6",$4+)7-$+,$&."-",*(,5$*6"(.$+2,$,+1"'$0%*%$(,-*"%0$+4$07&'()%*(,5$)+//+,$
47,)*(+,%'(*38$B35","8(,4+$&%.*()7'%.'3$"/&6%-(<"-$-(/&'()(*3$%,0$&".4+./%,)"8$
;2+$-(/&'"$KL=;$CK"&."-",*%*(+,%'$=*%*"$;.%,-4".F$2"#$-".1()"-$%."$&.+1(0"0$4.+/$B35","8(,4+D$+,"$4+.$
5","$:7".("-$%,0$*6"$+*6".$4+.$5","$%,,+*%*(+,$."*.("1%'8$M+*6$."*7.,$N=OP$CN%1%=).(&*$O#Q")*$P+*%*(+,F$
4+./%**"0$0%*%@$/%>(,5$*6"/$"%-3$*+$7-"$(,$2"#$%&&'()%*(+,-8$;6"$5","$:7".3$-".1()"$%''+2-$:7".3(,5$#3$
/+."$*6%,$RS$)+//+,'3$7-"0$(0",*(4(".-$+.$#3$5",+/"$(,*".1%'-8$$;6"$0%*%#%-"$)7..",*'3$)+,*%(,-$TUSS>$
5","-$4.+/$"(56*$)+//+,$-&")("-8$;6"$5","$%,,+*%*(+,$-".1()"$)%,$%))"--$%,3$+4$RS$*3&"-$+4$
%,,+*%*(+,-$#3$"(*6".$L,*."<$+.$L,-"/#'$5","$(0-$C(,)'70(,5$."*(."0$5","$(0-F8$H","$%,,+*%*(+,$0%*%$%."$
."57'%.'3$7&0%*"0$+,)"$&".$/+,*68$
B35","8(,4+$(-$#7('*$+,$V+7)6JM@$%$0+)7/",*?#%-"0$0%*%#%-"8$W,'(>"$/+."$)+//+,'3$7-"0$."'%*(+,%'$
0%*%#%-"$-3-*"/-$C"858@$O.%)'"@$B3=XYF@$0%*%$%."$-*+."0$%-$G>"3?0+)7/",*I$&%(.-8$;6"$G0+)7/",*I$(-$%$
N=OP?4+./%**"0$5","$%,,+*%*(+,$+#Q")*@$26('"$*6"$G>"3I$(-$%$5","$ZJ$CL,*."<$+.$L,-"/#'F8$;6"$
6(".%.)6()%'$-*.7)*7."$+4$5","$%,,+*%*(+,$0%*%$)%,$#"$."&."-",*"0$,%*7.%''3$(,$*6(-$>"3?0+)7/",*$/+0"'8$
W-(,5$&."?(,0"["0$G1("2-I@$V+7)6JM$)+,-(-*",*'3$&".4+./-$\?R$*(/"-$#"**".$*6%,$%$."'%*(+,%'$-*.7)*7."$
(,$O.%)'"$(,$+7.$*"-*-8$;6"$-(/&'"$+#Q")*$-*.7)*7."$(,$V+7)6JM$%'-+$5."%*'3$-(/&'(4("0$#+*6$0%*%$'+%0(,5$
%,0$0%*%$:7".("-8$
B35","8(,4+$(-$6+-*"0$(,$*6"$)'+70$#3$9/%<+,$!"#$=".1()"-8$!"$6%1"$7-"0$B35","8(,4+$%-$*6"$
7,0".'3(,5$5","$%,,+*%*(+,$0%*%#%-"$4+.$+7.$&+&7'%.$M(+H]=$5","$&+.*%'$C6**&DEE#(+5&-8+.5F$4+.$%'/+-*$
%$3"%.8$!"$%'-+$/%>"$+7.$9/%<+,$(,-*%,)"$(/%5"$%1%('%#'"$4+.$*6+-"$26+$2%,*$*+$.7,$*6"(.$0"0()%*"0$
6+-*8$
B35","8(,4+$0+"-$,+*$."&'%)"$"[(-*(,5$-".1()"-$&.+1(0"0$#3$'%.5"$5","$%,,+*%*(+,$&.+1(0".-8$Z,-*"%0@$
*6"-"$2"#$-".1()"-$%."$+&*(/(<"0$4+.$&+2".(,5$5","?)",*.()$2"#$."-+7.)"-8$!"$#"'("1"$*6%*$*6"-"$
-".1()"-$2(''$'+2".$*6"$#%.$4+.$#(+'+5(-*-$*+$&7#'(-6$*6"(.$0%*%$%,0$(/&.+1"$*6"$"44()(",)3$+4$>,+2'"05"$
"[)6%,5"8$
Poster 17
!"#$%&'(#)(*$+,&#-.*&/#$01.2&3$""451$/6#7(89:".&:(#(*3#0796(1/&1#7-$6(*;&#*&6<.&1"#$%&
3#0&6<.&;.*#7(1/&1#77$*(64&9*%&:.4#*%=
!"#$%&#%'#"$(!)&*+'$,-.-(/'*(0""%12-(0)&3(41&+*&#5-(067&(/'8&)'2-(96$:36(0';&<2-(=&8#(
>'6732(&#3(!&)6#(?@(A67$"#,@
,
B@4)&'C(D6#%6)(E#$%'%:%6-(FGHI(963';&7(46#%6)(=)'J6-(K";<J'776-(9=(2HLMH(NOP-(24?Q(R&77'#CS")3-(06#$"#(
T&#6-(R&77'#CS")3-(N!-(5Q&)J&)3(O;1""7("S(U:V7';(Q6&7%1(0'"'#S")*&%';$(4")6-(WMM(Q:#%'#C%"#(PJ6#:6(
0"$%"#-(9P(H2,,M
.U)6$6#%'#C(&:%1")X(&CV'"%6;YC*&'7@;"*-(((+)"Z6;%(86V$'%6X(1%%+X[[888@;7":3V'"7'#:\@")C[(
47":3 (0'"T'#:\( '$ ( &# ( "+6#]$":);6 ( J')%:&7 ( *&;1'#6 ( ^D9_ ( '*&C6 ( %1&% ( 6#&V76$ ( $;'6#%'$%$ ( %" ( `:';<7a(
+)"J'$'"# ( "#]36*&#3 ( '#S)&$%):;%:)6$ ( S") ( 1'C1]+6)S")*&#;6 ( V'"'#S")*&%';$ ( ;"*+:%'#C ( :$'#C ( ;7":3(
+7&%S")*$@ ( N+"# ( 36+7"a*6#% ( :$6)$ ( 1&J6 ( '#$%&#% ( &;;6$$ ( %" ( *")6 ( %1&# ( ,HH( V'"'#S")*&%';$ ( +&;<&C6$-(
'#;7:3'#C(%16(V7&$%&77(&#3(V7&$%bA40E(&++7';&%'"#$-(%16(O%&36#-(?90cOO-(1**6)-(&#3(+1a7'+(;"776;%'"#$(
"S ( $"S%8&)6- ( &#3 ( *&#a ( $%&#3]&7"#6 ( &++7';&%'"#$ ( S") ( %&$<$ ( $:;1 ( &$ ( $6`:6#;6 ( &7'C#*6#%- ( ;7:$%6)'#C- (
&$$6*V7a-(3'$+7&a-(63'%'#C-(&#3(+1a7"C6#a-(&$(8677(&$(8")<'#C(8'%1(#6\%(C6#6)&%'"#($6`:6#;'#C(3&%&@ ( E#(
&33'%'"#(86(1&J6('#$%&7763(3&%&(&#&7a$'$(&#3(V'"'#S")*&%';(;"36(7'V)&)'6$(S)"*(%16(Ua%1"#-(U6)7-(K-(K:Va (
+)"C)&**'#C(7&#C:&C6$-(%16(>)66Ad($6)J6)(S")(&77"8'#C()6*"%6(&;;6$$(%"(%16(47":3(0'"T'#:\(36$<%"+-(
&#3 ( 86 ( &)6 ( &7$" ( +7&##'#C ( %" ( '#;7:36 ( %16 ( E#J6$%'C&%'"#[O%:3a[P$$&a ( ^EOP- ( 1%%+X[['$&]%""7$@")C[_ ( 3&%&(
*&#&C6*6#% ( $"S%8&)6 ( $:'%6@ (( 47":3 (0'"T'#:\( ):#$ ( "# ( %16 ( WI]V'% ( J6)$'"# ( "S ( NV:#%: ( T'#:\- ( &#3 ( %16(
V'#&)'6$ ( S") ( %16 ( V'"'#S")*&%';$ ( +&;<&C6$ ( &)6 ( S)"* ( A?04 (0'"T'#:\( W@H ( )6+"$'%")a(
^1%%+X[[#6V;@#6);@&;@:<[%""7$[0'"T'#:\[_@(N$6)$(;&#($%&)%('#$%&#;6$("S(47":3(0'"T'#:\("#(%16(P*&e"#(?42(
;7":3 (J'&(%16(;"#$"76(^1%%+X[[&8$@&*&e"#@;"*[;"#$"76_-(&#3(&;;6$$(%16(%""7$(%1)":C1(&()6*"%6(36$<%"+(
;"##6;%'"# ( S)"* ( %16') ( 7";&7 ( ;"*+:%6)@ ( (R6 ( 1&J6 ( 3";:*6#%63 ( %16 ( $%6+$ ( S") ( %1'$ ( '# ( 36%&'7(
^1%%+X[[%'#a:)7@;"*[;7":3]0'"T'#:\]%:%")'&7[_-('#(&33'%'"#(%"(%16(+)";63:)6(S")(%)&#$S6))'#C(&#3($%")'#C(3&%&(
"#(%16(;7":3-(%)":V76$1""%'#C(S")(;"**"#(+)"V76*$(&#3($:CC6$%'"#$(1"8(%"(C6%(%16(V6$%(J&7:6(S")(*"#6a(
816#()6#%'#C(;"*+:%&%'"#&7(;&+&;'%a(S)"*(;7":3(+)"J'36)$@
P%(%16(;")6("S(47":3(0'"T'#:\('$(&#("+6#]$":);6-(&:%"*&%63($a$%6*(;"#S'C:)&%'"#(%""7(^&J&'7&V76(S)"*(
1%%+$X[[C'%1:V@;"*[;1&+*&#V[;7":3V'"7'#:\-(9E/(7';6#;6_@(/1'$(%""7($'*+7'S'6$(%16(+)";6$$("S(V:'73'#C(
;:$%"*'e63(V'"'#S")*&%';(D9('*&C6$("#(%16(;7":3-($1&)'#C(%16('*&C6$(&*"#C()6$6&);16)$-(&#3(36+7"a'#C(
%16* ( "# ( 3'SS6)6#% ( ;7":3 ( ;"*+:%'#C ( +7&%S")*$@ ( c:) ( '*+76*6#%&%'"# ( '$ ( V&$63 ( "# ( %16 ( Ua%1"# ( >&V)';(
^1%%+X[[S&VS'76@")C[_ ( $"S%8&)6 ( *&#&C6*6#% ( S)&*68")< ( &#3 ( NV:#%: ( P3J&#;63 ( U&;<&C'#C ( /""7 ( ^PU/-(
1%%+X[[888@:V:#%:@;"*_-(&#3('$(;"*+"$63("S(&(*&'#(3)'J6)($;)'+%(&#3(&($6%("S(+7&'#]%6\%(;"#S'C:)&%'"#(
S'76$(%16(:$6)$(63'%('#(")36)(%"($+6;'Sa(%16(V'"'#S")*&%';$($"S%8&)6(%1&%(8'77(V6('#;7:363('#(&(V:'73("S(47":3 (
0'"T'#:\@(/16(3)'J6)($;)'+%()6&3$(%16(;"#S'C:)&%'"#(S'76$-(&#3('#$%&77$(%16($+6;'S'63($"S%8&)6(S)"*(%16(PU/]
V&$63(A?K4 (0'"T'#:\( W@H($"S%8&)6()6+"$'%")a-(81'76(&33'%'"#&7(PU/]V&$63(V'"'#S")*&%';(")($;'6#%'S';(
$"S%8&)6()6+"$'%")'6$($:;1(&$(%16("#6(&J&'7&V76(Va(%16(=6V'&#]963(;"**:#'%a(^9"776)(6%(&7@(2H,H_(;&#(V6(
&3363 ( '# ( %16 ( ;"#S'C:)&%'"# ( S'76$ ( ^& ( 36%&'763 ( )6+"$'%")'6$ ( 7'$% ( '$ ( &J&'7&V76 ( &% ( %16 ( NV:#%:( O;'6#;6 ( +&C6 (
1%%+X[[%'#a:)7@;"*[$;'6#;6]:V:#%:_@(N$6)$(;&#(;)6&%6(%16')("8#(;:$%"*'e63(J6)$'"#("S(%16(47":3(0'"T'#:\(
D9(Va($'*+7a(63'%'#C(%16(S'76$(%"(*'\(&#3(*&%;1($"S%8&)6(S)"*(%16(3'SS6)6#%()6+"$'%")'6$@(
47":3 ( 0'"T'#:\ ( '$ ( & ( ;"**:#'%a ( 6SS")%- ( $+"#$")63 ( Va ( %16 ( B@ ( 4)&'C ( D6#%6) ( E#$%'%:%6(
^1%%+X[[888@Z;J'@")C[;*$[)6$6&);1[+)"Z6;%$[Z;J'];7":3]0'"T'#:\["J6)J'68[_ ( &#3 ( A?K4 ( ?#J')"#*6#%&7(
0'"'#S")*&%';$(46#%)6(^1%%+X[[#6V;@#6);@&;@:<[_-('#(&33'%'"#(%"(&#(63:;&%'"#&7(C)&#%(%1)":C1(P*&e"#(R6V(
O6)J';6$@(R6('#J'%6("+6#(+&)%';'+&%'"#(J'&(%16(;7":3V'"7'#:\@")C(86V$'%6-(81'76(":)(;"**:#'%a(6SS")%('$(
;"")3'#&%63(%1)":C1(&(+:V7';(*&'7'#C(7'$%(&J&'7&V76(%1)":C1(%16($&*6($'%6-(&#3(36J67"+6)(*66%'#C$($:;1(&$(
%16(Q&;<&%1"#(6J6#%$(+)6;63'#C(%16(2H,H(&#3(2H,,(0cO4(*66%'#C$@(
!"#$%%&'&()&*+,)&-*./0,&12*2)3*/4(520-&06*.7&,)82/*),4,)5
!"#$%&$#'()$%*'+,$#-").'/",001.*'23)4)0,'5$33"6
!"#$%&'()*(+#
,-.,/
.(00('1)23+04(
!"#$%&$#'()$%*)#+)$',+
,+$#-").'/"+001.*)+).$',+
"2)3)0+'.$22"4*)+).$',+
!"#$%&'()%*+,'%-(.''!-//01!#"234#"3%2,5",124"/
+#6"&%(1&&%++-(.''!-//01!#"234#"3%2,5",124"/
5(0#6."4+7082)70#.09(7%7899:7(;&%%7899111:;0;)33:)#<"93);0#;0.9=);0#;0>?0?@==A?>BCA
0#:&%D3<
:)")#,"+/$%)7.8$((2)7$%)"#.8$+08",%0#8.%+47%4+0;8$.81"+<,2"1.8%&$%8$+087"/(".0;8
",8$8.0%8",8"(0+$%)"#.8%"8(0+,"+/8"#82$+=08;$%$8.0%.'8>&0.081"+<,2"1.8$+08;0(2"?0;8$.8
7"/(20@8.7+)(%.8%&$%8&$#;208%&08.0A40#708",8(+"=+$/87$22.81)%&8%&0)+8+0203$#%8)#(4%. 8
$#;8%+?8%"8%$<08$;3$#%$=08",8$87"/(4%0+8724.%0+84.)#=8$8.7&0;420+'8>&0)+8(0+,"+/$#708
+02?8"#8%&084.0+8$B)2)%?8%"8$#$2?C08%&08("%0#%)$28($+$2202)./8)#8%&081"+<,2"1'8DEFGHH8
ID0+3)70 8 E$?0+ 8 ,"+ 8 F#%0#.)30 8 G"/(4%$%)"# 8 H@074%)"# 8 H#3)+"#/0#%J 8 $B.%+$7%. 8 %&08
.7&0;420+8724.%0+87$22.8B?8&$#;2)#=87"//$#;8.4B/)..)"#K8($+$2202)./80@%+$7%)"#88$#;8
;$%$8/$#$=0/0#%'8L81"+<,2"1872)0#%8"+7&0.%+$%0.8%&08DEFGHH8.0+3)70.8%&$%80@(2")%8
%&0 8;$%$ 8($+$2202)./K8$#; 8%$<0.87$+0 8",8%&0 8;$%$ 8+"4%)#= 8B0%100#8 %$.<.'8>&4.8%&08
1"+<,2"18%$.<.80@074%)"#8%$<0.8$;3$#%$=08",8%&08($+$2202)./8$3$)2$B208"#8%&08724.%0+8
1)%&8/)#)/4/84.0+8)#%0+30#%)"#'
M$)#%$)#)#=8$8724.%0+8$+7&)%07%4+08).80@(0#.)308$#;8)%.8(+"70..)#=8("10+8).8&$+;8
%"8.7$208"30+8%)/0'8G2"4;87"/(4%)#=8(+"(".08%"83)+%4$2)C08$87"/(4%0+8$+7&)%07%4+08
$#; 8 ;0(2"? 8 )% 8 "# 8 $3$)2$B20 8 (&?.)7$2 8 7"/(4%)#= 8 +0."4+70.' 8 >&0+0,"+0 8 %&0 8 (&?.)7$28
$+7&)%07%4+08).8.&$+0;8$#;8%&083)+%4$28(+"70..)#=8("10+87$#8B08.7$20;8%"8/00%8%&08
4.0+8;0/$#;' 85:FNHH8I5#8O0/$#;8:)")#,"+/$%)7.8F#%0#.)308N"+<,2"18H@074%)"#8
H#3)+"#/0#%J 8 ). 8 $ 8 B)")#,"+/$%)7. 8 )#%0#.)30 8 1"+<,2"1 8 0@074%)"# 8 0#3)+"#/0#%8
(+07"#,)=4+0;8"#8$82)#4@83)+%4$28724.%0+K8%&$%87$#8B08;0(2"?0;80)%&0+8"#8$8(+)3$%0872"4; 8
"+8$8(4B2)7872"4;8.0+3)7082)<08L/$C"#8HGP'8>&083)+%4$28724.%0+8$+7&)%07%4+08).8.7$20;8
%"8/00%8%&081"+<,2"18+0A4)+0/0#%K8$#;8$8/$.%0+8#";08).8+4##)#=8%&08DEFGHH8/);;206
1$+0'8H$7&8#";08",8%&08724.%0+8).8+4##)#=8$8B)")#,"+/$%)7.8.(07),)78E)#4@8;).%+)B4%)"#8
%"8(+"3);08$770..8%"8$81);08+$#=08",8B)")#,"+/$%)7.8$((2)7$%)"#.'8>&083)+%4$28724.%0+8
&$.8B00#8%0.%0;8"#8$8(+)3$%0872"4;84.)#= 85(0#Q0B42$8$#;8RSMK8,"22"1)#=8.%0(8).8
L/$C"# 8 HGP 8 )#%0=+$%)"#' 8L22 8 .%0(.K 8 .%$+%)#= 8 ,+"/ 8 %&0 8 724.%0+ 8 7"#,)=4+$%)"# 8 %" 8 %&08
1"+<,2"18;0.)=#8$#;80@074%)"#8$+08(0+,"+/0;8%&+"4=&8$810B8B+"1.0+'
>&08"(0#8."4+7085:FNHH8B)")#,"+/$%)7.872"4;8.0+3)70 8&$.8B00# 8;0.)=#0;8%"8
$22"18=+"4(.81)%&82"18F>8.4(("+%8"+8(""+87"/(4%)#=8)#,+$.%+47%4+08%"8$#$2?C08%&0)+8
"1#8;$%$'8F%8$2."8&02(.8$%8,$7)#=8%&08)#7+0$.)#=8;0/$#;8,"+8B)")#,"+/$%)7.8)#%0#.)308
%+0$%/0#%.K8)#8$87"#%0@%8",82$+=08;)..0/)#$%)"#8",8.0A40#7)#=8%07&#"2"=)0.84.$=0.'
!"#$%&"'()*%+,-.*/($01+"(234%*(5"*14"(!"#3"*6"(7%8%(1*()4%-1*9:(;+13<
&!"#$%&'(&)*+,%%,"&&-./&.&!$""0&12""#3$%4.&$%5&6#3,780&9$":#%;4
-(&<#%2=2">2"&+,3?"282%;#@2&+$%A2"&+2%72".&BC+.&+8$?2D&9#DD.&C+.&
="#$%,AE23$#D(F%A(25F
/(&)%7$"#,&G%;7#7F72&H,"&+$%A2"&I2;2$"A8.&6,",%7,.&)C&1JK&-<L.&+$%$5$
4(&<#H2&62A8%,D,>#2;.&G%A(.&+$"D;=$5.&+M.&BNM
O2=;#72P&877?PQQ;2RO$"2(;,F"A2H,">2(%27
A,52P&877?PQQ;,F"A2H,">2(%27Q?",S2A7;Q;2RO$"2Q52@2D,?&
D#A2%;2P&KT<@4
682&AF""2%7&,?2"$7#%>&A,;7;&,H&%2U7V>2%&;2RF2%A2";&2%$=D2&",F7#%2&>2%2"$7#,%&,H&782&
$3,F%7&,H&5$7$&%22525&7,&"2;2RF2%A2&$&8F3$%&>2%,32(&9,O2@2".&782&#%H,"3$7#A;&
#%H"$;7"FA7F"2&%22525&7,&2HH#A#2%7D0&?",A2;;&78#;&5$7$&#%7,&F;2HFD&#%H,"3$7#,%&#;&;7#DD&
2@,D@#%>.&$%5&78#;&8$;&=2A,32&$&3$S,"&=$""#2"&7,&$5,?7#,%&,H&>2%,32&;2RF2%A#%>&#%&782&
"2;2$"A8&$%5&AD#%#A$D&A,33F%#7#2;(&M%&#52$D&W?F;8=F77,%X&;,DF7#,%&O,FD5&2D#3#%$72&=,78&
782&=F"52%&,H&3$#%7$#%#%>&$&A,3?F72"&#%;7$DD$7#,%&$%5&,H&"27$#%#%>&$&72$3&,H&
#%H,"3$7#A#$%;(&&)F"&,?2%&;,F"A2&N2RY$"2&?",S2A7&#;&,%2&;FA8&;,DF7#,%&$%5&#;&52;#>%25&
,%&782&?"#%A#?D2;&,H&AD,F5&A,3?F7#%>.&5$7$&;2AF"#70.&$F7,3$725&$%$D0;#;.&$%5&5$7$=$;#%>&
,H&"2;FD7;(&&I2D0#%>&,%&A,332"A#$D&AD,F5&A,3?F7#%>&;2"@#A2;&"23,@2;&782&A,3?F72"&
8$"5O$"2&$%5&;0;723&$53#%#;7"$7#,%&=F"52%;.&,?2%&;,F"A2&;,H7O$"2&$DD,O;&H,"&
7"$%;?$"2%A0&$%5&D2@2"$>2;&782&D$">2;7&52@2D,?32%7&A,33F%#70.&5$7$&;2AF"#70&?",@#52;&
782&GI!&$%5&9GTMM&A,3?D#$%A2&"2;2$"A82";&$%5&AD#%#A#$%;&"2RF#"2.&$%5&$&HFDD0&
$F7,3$725&?#?2D#%2&>#@2;&H$;7.&A,%;#;72%7&$%5&$F5#7$=D2&"2;FD7;(&ZF"782"3,"2.&782&
;A$D$=D2&@$"#$%7&5$7$=$;2&;F??,"7;&#%72"$A7#@2&RF2"0#%>&,H&782&"2;FD7;.&$>>"2>$7#,%&,H&
#%H,"3$7#,%&$A",;;&;F=S2A7;.&$%5&5#@2";2&5,O%;7"2$3&$%$D0;2;(&&92"2&O2&52;A"#=2&782&
?,"7&,H&782&N2RY$"2&?",S2A7&7,&782&M3$[,%&\D$;7#A&+,3?F72&+D,F5&]\+/^(&Z",3&782&
F;2"*;&?2";?2A7#@2.&#7&?",@#52;&$&?F;8=F77,%&O,":HD,O&H,"&8F3$%&>2%,32&,"&2U,32&
;2RF2%A#%>.&AF""2%7D0&,?7#3#[25&H,"&782&N)<#'&?D$7H,"3P&;7$"7#%>&O#78&;2%5#%>&$%&
2%A"0?725&8$"5&5"#@2&,H&"$O&5$7$&7,&\+/.&#7&?2"H,"3;&$D#>%32%7.&"2V$D#>%32%7.&@$"#$%7&
5272A7#,%.&5$7$=$;#%>.&$%5&@$"#$%7&$%%,7$7#,%(&Z",3&782&52@2D,?2"*;&?2";?2A7#@2&
N2RY$"2&?",@#52;&=,78&$&H"$32O,":&H,"&;2RF2%A2&$%$D0;#;&3,5FD2&$%5&O,":HD,O&
A"2$7#,%&$D,%>&O#78&782&7,,D;&%22525&7,&52?D,0⁡7,3&,"&,HHV782V;82DH&O,":HD,O;&,%&
\+/(&&)%2&,H&782&=#>>2;7&;FAA2;;2;&,H&78#;&?",S2A7&#;&78$7&N2RY$"2&#;&$=;7"$A725&H",3&
782&$A7F$D&2U2AF7#,%&2%@#",%32%7&S,=;&"F%&#%.&32$%#%>&782&?,"7&7,&\+/&"2RF#"25&%,&
3,5#H#A$7#,%;&7,&782&2U#;7#%>&O,":HD,O;&$%5&3,5FD2;&$7&782&A,"2&,H&782&?#?2D#%2(&Y2&
?"2;2%7&$%&,@2"@#2O&,H&782&;0;723&52;#>%.&#7*;&?,"7&7,&\+/.&$%&$%$D0;#;&,H&782&A,;7;&,H&
?",A2;;#%>&>2%,32;&#%&78#;&O$0.&$%5&527$#D;&,%&782&?",A2;;#%>&O,":HD,O&AF""2%7D0&
$@$#D$=D2&H,"&8F3$%&>2%,32&$%$D0;#;(&&M7&782&7#32&78#;&$=;7"$A7&O$;&?"2?$"25.&O2&8$@2&
F;25&N2RY$"2&#%&782&\+/&2%@#",%32%7&7,&?",A2;;&_&O8,D2&8F3$%&>2%,32;&#%ADF5#%>&$&
4-`U&A,@2"$>2&9FI2H&5$7$;27.&782&<FA#2"&>2%,32.&$%5&,782"&#%72"2;7#%>&8#>8V?",H#D2&
>2%,32&?",S2A7;(&&B;2";&$"2&H"22&7,&A"2$72&782#"&,O%&#%;7$%A2&,H&782&,?2%&;,F"A2&
;0;723&,%&\+/&F;#%>&782&?",S2A7;&?F=D#A&M1G&#3$>2;.&,"&F7#D#[2&$&HFDDV;2"@#A2.&
A,332"A#$DD0V;F??,"725&#%;7$%A2(
Sequencescape - a cloud enabled Laboratory Information Management Systems (LIMS)
for second and third generation sequencing
Authors: Andrew Page, Beth Jones, Maxime Bourget, Sean Dunn, Matthew Denner, Kate
Taylor, Lars Jorgensen
Contact: lj3@sanger.ac.uk
Software: http://github.com/sanger
License: GPL
LIMS have generally been a closed area for many of the large genomic research centers.
We wish to change this. By releasing our software as open source there is now tried ,
tested and open alternative to the commercial software solutions that are currently on the
market. This will reduce the workload for labs starting to use 2nd (Solexa, Solid, 454) and
3rd generation sequencing (PacBio).
LIMS have historically been seen as very institute specific. To tackle this we have built our
system on a framework that will be applicable to other users. The system can be deployed
in the cloud and is highly extensible. Sequencescape has been tested in a large number of
labs and research projects inside our institute and handles a very large number samples.
The key problem when designing this system was to come up with a good data model to
achieve this. On one hand it needs to be very generic to support many different lab
workflows. On the other, it needs to be explicit to support performance and ease of
development. Sequencescape is focused around two keys concepts: Assets (plates and
tubes) and Requests (work orders). These concepts are central to the system. It enables
us support of the wide-ranging requirements that we have encountered using the system
internally.
Key Features included in Sequencescape are:
* Work order tracking
* Sample and project management
* Capacity management for pipelines
* Accounting
* Accessioning for samples and studies at the EBI ENA/EGA/ArrayExpress
* Dynamically defined workflows for labs with support for custom processes
* Freezer tracking for tubes and plates
* API support for 3rd party applications
* Data warehousing
Current installation supports over a million samples and 1.3 million tubes and plates and is
used in an organisation of 900 people.
The talk will describe both the application of the software inside Sanger and the approach
we are using to develop it. There will also be a live demo of the software and a
demonstration of how to quickly construct a new lab workflow.
Sequencescape is part of the informatics ecosystem that exists at Sanger. Many of the
other components are openly available from the Sanger website (http://www.sanger.ac.uk/
resources/software/).
Title: Enabling NGS Analysis with(out) the Infrastructure
Authors: Enis Afgan1, Dannon Baker1, Nate Coraor3, Anton Nekrutenko3, James Taylor1
Author affiliations:
1
Department of Biology and Department of Mathematics & Computer Science, Emory University {E.A.
email: eafgan@emory.edu}
2
http://galaxyproject.org
3
Huck Institutes of the Life Sciences and Department of Biochemistry and Molecular Biology, The
Pennsylvania State University
Project website: http://usegalaxy.org/cloud
Project source code: http://bitbucket.org/galaxy/cloudman
Open Source License used: Academic Free License
Abstract:
Running tools and performing analyses to transform sequence data into biologically meaningful
information requires sophisticated computational infrastructure and support. The size of the required
computational infrastructure is outpacing what individual researchers, many labs, and even universities
are able to support. In addition, the setup and maintenance associated with a computational infrastructure
presents significant problems for individual investigators and small labs that may not have the necessary
informatics support. Fortunately, cloud computing provides unique capabilities for transparent scaling and
sharing of computational infrastructures. Built on the Galaxy CloudMan platform, we have enabled the
entire Galaxy application - completely configured with a range of tools and reference genomes - to
transparently utilize AWS cloud resources. The presented solution delivers a fully functional
infrastructure capable of performing complex genomic analyses in a matter of minutes.
This talk presents key new features of Galaxy CloudMan that focus around extension, transparency, and
automation. Namely, we have automated the process of deploying CloudMan on a cloud infrastructure
with the accompanying data, tools, and applications, making it completely transparent, reproducible, and
accessible. Any individual instance of CloudMan is now self-contained, meaning that it does not
require an external broker or service to operate. Moreover, this enables each instance of CloudMan to be
customized by deploying new or alternative tools, configurations, and data, thus supporting the widely
varied needs of individual investigators and labs. CloudMan now supports setup of different cluster
modes, allowing one to utilize all of the CloudMan’s infrastructure management features (e.g., cluster
setup, NFS setup, data persistence, (automatically) adding/removing instances, sharing) but without
setting up Galaxy. Coupled with the CloudBioLinux AMI that CloudMan builds upon, this feature allows
any of the tools in NERC BioLinux to be run on a cluster managed by CloudMan without any additional
setup. Additionally, any tool or application that can utilize a general purpose cluster can be installed on
the deployed cluster while allowing CloudMan to manage the infrastructure. CloudMan now supports
sharing of cloud cluster instances. This functionality allows an analysis to remain in the cloud (i.e., no
need to download results and make available elsewhere) while minimizing the expense incurred by
resources that need to be provided by the analysis owner. In addition to enabling publishing and sharing
of data analyses, this feature allows sharing of customized instances of CloudMan where tools and/or data
have been modified. This functionality minimizes repeat effort and offers tool developers a platform for
easily distributing their tools while minimizing any otherwise required setup (for both developers and
users). Lastly, continuing the automation effort, CloudMan now supports the notion of infrastructure
autoscaling. This feature allows a user to specify bounds for the size of their cluster while letting
CloudMan automatically adjust the current number of the compute resources to match the current system
load, thus taking maximum advantage of the elastic infrastructure underlying the computation. This
feature supports the set-it-and-forget-it paradigm of providing a compute infrastructure for users without
requiring them to manage it. This talk will highlight each of these major advancements in CloudMan and
showcase their impact on user experience when using Galaxy and cloud computing resources.
Poster 18
Hadoop-BAM: A Library for Genomic Data Processing
Matti Niemenmaa, AndrВґe Schumacher,
Keijo Heljanko
Aalto University School of Science
Firstname.Lastname@tkk.fi
Aleksi Kallio, Petri KlemelВЁa,
Taavi Hupponen, Eija Korpelainen
CSC–IT Center for Science
Firstname.Lastname@csc.fi
Next generation sequencing (NGS) technologies have redefined the requirements for data processing in
bioinformatics. To cope with the massive influx of data, cloud computing technologies have been proposed
and evaluated. The initial experiences have been positive, as the independent nature of deep sequencing
reads allow them to be effectively processed in the loosely coupled cloud computing framework. We
see that the next crucial step is to develop generic libraries that facilitate the creation of mature cloud
computing applications for NGS.
We have developed Hadoop-BAM, a library for the manipulation of BAM (Binary Alignment/Map)
using the Hadoop MapReduce framework. Our library builds on top of the Picard SAM JDK. The
library was released under the permissive MIT open source license and can be found on Sourceforge
(http://sourceforge.net/projects/hadoop-bam/).
We demonstrate the usability of the library by building a preprocessing stage for BAM file visualization, which was integrated into the Chipster Genome Browser. Chipster is a versatile open source
platform that provides tools for data analysis and visualization. This preprocessing tool uses the “Google
Earth” style MIP mapping technique to condense BAM files into summary files. Using the multilevel
summaries the genome browser can quickly navigate between different zoom levels.
The use of Hadoop-BAM yielded in a significant decrease in running time of the preprocessing stage
for 50 GB data sets and also made the genome browser significantly faster, usable, and scalable. The
computing cluster used for our evaluation consists of 112 AMD Opteron 2.6 GHz compute nodes, each
equipped with 12 cores and 32-64 GB memory, resulting in a total size of 1344 cores. The nodes are
interconnected via an Infiniband and 1 GBit Ethernet infrastructure. The nodes have a joint total local
disk capacity of approx. 30 TB, in addition to 40 TB of work space on a central network file server.
Speedup of sorting times versus #worker nodes
16
Speedup of summarizing times versus #worker nodes
16
Ideal
Input file import
Sorting
Output file export
Total elapsed
14
12
10
12
10
8
8
6
6
4
4
2
2
0
1
2
4
8
Workers
Ideal
Input file import
Summarizing
Output file export
Total elapsed
14
15
0
1
2
4
8
Workers
15
Figure 1: Speedup for increasing number of compute nodes
Figure 1 shows that in our experiments the speedup for sorting BAM files is approximately linear,
while the speedup for the computation of summary statistics is marginally worse. Preprocessing such
data without the distributed Hadoop backend would have been practically impossible on a modern
workstation alone. When considering Figure 1, it is interesting to note that as the number of worker
nodes increases, it becomes clear that one of the main bottlenecks of the system is in fact the import
and export of data to and from Hadoop. In the future, one may want to let large datasets reside inside
the cloud in order to avoid this overhead.
We have also started to integrate Hadoop-BAM with the Pig query language. Preliminary results
show that in some cases it is possible to achieve performance that matches custom Java Hadoop code by
using the higher level Pig language.
Poster 19
6$',IRU*02'%ULQJLQJ0RGHO2UJDQLVP'DWDRQWRWKH
6HPDQWLF:HE
%HQ9DQGHUYDON(/XNH0F&DUWK\0DUN':LONLQVRQ
-DPHV+RJJ5HVHDUFK&HQWUH+HDUW/XQJ,QVWLWXWH8QLYHUVLW\RI%ULWLVK&ROXPELD
HPDLOEHQYYDON#JPDLOFRP
6$',:HEVLWHKWWSVDGLIUDPHZRUNRUJ
6+$5(:HEVLWHKWWSELRUGIQHWFDUGLR6+$5(TXHU\
&KHFNRXW3HUO&RGHVYQFRKWWSVDGLJRRJOHFRGHFRPVYQWUXQN3HUOVDGLJPRG
6RIWZDUH/LFHQVH*3/+RZHYHU-DYDFRGHIRU6$',LVUHOHDVHGXQGHUWKH1HZ%6'/LFHQVH
,Q WKLV WDON ZH ZLOO SUHVHQW D QHZ H[WHQVLRQ IRU *02' *HQHULF 0RGHO 2UJDQLVP 'DWDEDVH
FDOOHG 6$', IRU *02' ZKLFK DOORZV PRGHO RUJDQLVP VLWHV WR SXEOLVK WKHLU GDWD LQ 5') ZLWK
PLQLPDO HIIRUW 7KH 5') DGDSWRU OD\HU IRU WKH GDWDEDVH LV LPSOHPHQWHG XVLQJ ZHE VHUYLFHV
FRQIRUPLQJ WR WKH 6$', 6HPDQWLF $XWRPDWHG 'LVFRYHU\ DQG ,QWHJUDWLRQ VWDQGDUG ZKLFK
FRQVLVWV RI D VHW RI EHVW SUDFWLFHV IRU LPSOHPHQWLQJ LQWHURSHUDEOH VHUYLFHV $PRQJ RWKHU
DGYDQWDJHV DGKHULQJ WR 6$', HQVXUHV WKDW VHUYLFHV FDQ EH GLVFRYHUHG LQ D SUHGLFWDEOH
DQG VHPDQWLFDOO\HQULFKHG ZD\ DSSOLFDEOH VHUYLFHV IRU LQKDQG GDWD FDQ EH DXWRPDWLFDOO\
GHWHFWHG DQG LQYRNHG PXOWLSOH LQSXWV WR D VHUYLFH FDQ EH EDWFKHG LQWR D VLQJOH +773
UHTXHVWWRPLQLPL]HQHWZRUNWUDIILFDQGWKHEHKDYLRXURIDV\QFKURQRXVVHUYLFHVLVVWDQGDUG
DQGFRQIRUPDQWZLWK+7733URWRFROKHDGHUVSHFLILFDWLRQV
7KHPDLQYDOXHRI6$',IRU*02'LVWKDWLWIDFLOLWDWHVDXWRPDWHGLQWHJUDWLRQRIELRLQIRUPDWLFV
GDWDDQGVRIWZDUHDFURVVPXOWLSOHVLWHV,QWKLVSUHVHQWDWLRQZHZLOOGHPRQVWUDWHKRZWKH6$',
IRU *02' VHUYLFHV FDQ EH XVHG WR DVVHPEOH GDWD DFURVV 02'V DQG RWKHU ELRLQIRUPDWLFV
UHVRXUFHV ZLWKRXW WKH QHHG WR ZULWH FXVWRP 3HUO VFULSWV :H ZLOO WKHQ SURFHHG WR D PRUH
VRSKLVWLFDWHGGHPRQVWUDWLRQZKHUHZHZLOOVKRZKRZDVRIWZDUHEDVHGDQDO\VLVRIGLVWULEXWHG
GDWD IURP GLVSDUDWH *02' VLWHV HJ UHWULHYH DQG DOLJQ KRPRORJXHV RI JHQH ; LQ UDW DQG
PRXVH FDQ EH GHVFULEHG DQG H[HFXWHG DV D 63$54/ TXHU\ XVLQJ RXU 6+$5( 6HPDQWLF
+HDOWKDQG5HVHDUFK(QYLURQPHQWTXHU\HQJLQH
5HIHUHQFHV
0DUN':LONLQVRQ/XNH0F&DUWK\%HQMDPLQ9DQGHUYDON'DYLG:LWKHUV(GZDUG.DZDVDQG
6RURXVK6DPDGLDQВі6$',6+$5(DQGWKHLQVLOLFRVFLHQWLILFPHWKRGВґ%0&%LRLQIRUPDWLFV
YRO6XSSO
Poster 20
!"#$%&'(')*"+#,*'+'-./0$%.-'1,'2./*'34+5'13,'6*$15131.5'
!"#$%&!'#($%)*+,-,./&0($%&+&1#((#$2./&!"3$4"&56,%/&7$8#)&1#"9,4.&$%)&:$4'(,&;'<(,&
!=9''(&'>&:'2?3",4&!=#,%=,/&@%#8,4.#"-&'>&A$%=9,.",4/&@B&
C.'#($%)*4,-,./&$($%D4D6#((#$2./&."3$4"D'6,%/&)$8#)D6#"9,4./&=$4'(,D$DE'<(,FG2$%=9,.",4D$=D3H&
'
7/.8*"3',13*9!!
!.#/"*'".6*9!
:1"*5,*9!
!
"##$%&&'''(#)*+,-)(.,/(01&'
"##$2%&&/3#"04(5.6&67/,38&2509:;! "##$%&&#)*+,-)(/../:+5.8+(5.6&'
<=>!?+22+,!<+-+,):!@04:35!?35+-2+!A?<@?B!;(C'
I$8,4%$! 32! )! 253+-#3935! '.,19:.'! 6)-)/+6+-#! 272#+6! #")#! ")2! /)3-+8! $.$0:),3#7! )6.-/2#! 253+-#32#2! 3-! 43.3-9.,6)#352D!
5"+632#,7D!)2#,.-.67!)-8!.#"+,!8.6)3-2(!E3-5+!3#2!3-5+$#3.-!3-!;FFGD!H)*+,-)I2!'.,19:.'!:)-/0)/+!")2!+*.:*+8!9,.6!!=3>(!
'"35"! ')2! 8+23/-+8! 9.,! #"+! J4,,J(3'! '.,19:.'! +-/3-+(! E509:! 8+93-+2! )! 83,+5#+8! )575:35! /,)$"! .9! #"+! )$"$& >('6& 4+#'++-!
2+,*35+2!'"35"!,+5+3*+!)-8!$,.805+!8)#)!)#!?'4".(!E509:!'.,19:.'2!),+!2#.,+8!3-!)!:3/"#'+3/"#!JK?!9.,6)#(&
"K>('6&)8)$#+8!#"+2+!5.-5+$#2!9.,!#"+!H)*+,-)!;!'.,19:.'!+-/3-+D!3-5:083-/!-+'!5)$)43:3#3+2!205"!)2!+L#+-234:+!+L+50#3.-!
5.-#,.:(!H"+!#;9:.'!JK?!9.,6)#!')2!5,+)#+8!)2!)!2+,3):32)#3.-!.9!3-#+,-):!M)*)!4+)-2D!'"35"!0-9.,#0-)#+:7!6)1+2!3#!8399350:#!
#.!5.-206+!.,!$,.805+!47!-.-NH)*+,-)!5:3+-#2!5.6$),+8!#.!E509:(!H"+,+!")*+!):2.!4++-!/,.'3-/!8+6)-82!9.,!40-8:3-/!)!
'.,19:.'!'3#"!,+:)#+8!,+2.0,5+2!205"!)2!+L)6$:+!8)#)!)-8!2+6)-#35!)--.#)#3.-2(!O88,+223-/!#"+2+!3220+2!50:63-)#+8!3-!
9.,63-/!)!-+'!'.,19:.'!:)-/0)/+!!=3>(K/&'"35"!32!$,+2+-#+8!"+,+(&
P7! )8)$#3-/! (#%H,)& )$"$& #+5"-.:./7! )-8! $,+2+,*)#3.-! 6+#".8.:./3+2! 9.,! ,+2+),5"! .4Q+5#2D! E509:;! 32! -.#! .-:7! )! $:)#9.,6N
3-8+$+-8+-#!'.,19:.'!:)-/0)/+!#")#!5)-!4+!3-2$+5#+8D!6.8393+8D!5,+)#+8!)-8!+L+50#+8!47!#"3,8N$),#7!#..:2!)-8!272#+62R!3#!
32!+L#+-234:+!#.!)::.'!#"+!5)$#0,+!.9!'.,19:.'!+L+50#3.-!3-$0#2!)-8!.0#$0#2D!,+9+,+-5+!8)#)!2+#2D!$,.*+-)-5+D!)--.#)#3.-2D!
8.506+-#)#3.-D!$04:35)#3.-2D!):#+,-)#3*+!9.,6)#2!.,!,+$,+2+-#)#3.-2D!)-8!+*+-!+64+88+8!2+,*35+!36$:+6+-#)#3.-2(!&
E509:;!5.6+2!'3#"!)!M)*)!O@S!3-8+$+-8+-#!.9!H)*+,-)!#")#!5)-!4+!02+8!9.,!$,./,)66)#35!)55+22!#.!,+)8!)-8!',3#+!E509:;&
6'4H>('6& <3%)(,.D! O! '.,19:.'! 40-8:+! 32! )! 2#,05#0,+8! TS@N93:+! 4)2+8! .-! 0)'<,& @:JD! ,L3<& 5:J! )-8! #"+! 5?,%& 7'=32,%"&
J'42$"! AUVWBD! '3#"! #"+! '.,19:.'! 8+93-3#3.-2! 3-5:08+8! )2! JK?! E5"+6)N5.-9.,6)-#! 8.506+-#2! '"35"! ),+! ):2.! *):38!
XVW&JK?(! H"+! 5.-2#,)3-#2! 9,.6! #"+! 25"+6)! )::.'! 5:3+-#2! #.! ,+)8! )-8! ',3#+! E509:;! '.,19:.'! 8+93-3#3.-2! )2! ,+/0:),!
2#,05#0,+8! JK?(! X35"+,! XVWN+-)4:+8! 5:3+-#2! 6)7! :3-1! '.,19:.'! 8+93-3#3.-2! '3#"! +L#+,-):! ,+2.0,5+2! )-8! )883#3.-):!
)--.#)#3.-2!2#.,+8!3-!2+$),)#+!XVW!93:+2!3-!#"+!40-8:+D!$+,")$2!023-/!*.5)40:),3+2!205"!)2!V04:3-!Y.,+(&
H"+! '.,19:.'! 2#,05#0,+! 32! 8+93-+8! 023-/! )-! UZ?! .-#.:./7D! )-8! )--.#)#+8! '3#"! >XS2! 2.! #")#! #"3,8! $),#3+2! 5)-! 9.,6!
2+6)-#35!2#)#+6+-#2!)4.0#!)-7!5.6$.-+-#!.9!)-7!E509:;!'.,19:.'D!9.,!3-2#)-5+!#.!2)7!#")#!)!$),#350:),!2+,*35+!$,.805+2!
.0#$0#2! .9! )! 5+,#)3-! #7$+D! .,! #")#! )! /3*+-! 8)#)! :3-1! ')2! )88+8! 47! )! 8399+,+-#! ,+2+),5"+,! #.! #"+! 403:8+,! .9! #"+! ,+2#! .9! #"+!
'.,19:.'(! O-! 0-[3$$+8! '.,19:.'! 40-8:+! 5)-! 4+! 6)8+! $04:35! 023-/! )-7! 2#)-8),8! '+4! 2+,*+,! )-8! 4+5.6+! $),#! .9! #"+!
?3-1+8!U$+-!V)#)!5:.08(&
E+6)-#35!)--.#)#3.-2!)-8!)!6)-39+2#!9.,!#"+!40-8:+!8+5:),+!#"+!$0,$.2+!.9D!)-8!:3-12!4+#'++-!#"+!8399+,+-#!5.6$.-+-#2!
9.,63-/! )! '.,19:.'(! H"32! )::.'2! #"3,8! $),#3+2! #.! +L#,)5#! )-8! )$$+-8! )--.#)#3.-2! )4.0#! 8)#)! )-8! 2+,*35+2! 02+8! 47! #"+!
'.,19:.'(! H"+! +*.:0#3.-! .9! #"+! '.,19:.'! 8+93-3#3.-! 3#2+:9! 5)-! ):2.! 4+! 3-5:08+8! 3-! #"+2+! )--.#)#3.-2D! '3#"! ,+9+,+-5+2! #.!
$,+*3.02!*+,23.-2!)-8!)0#".,2(&
S-!,+5+-#!7+),2D!H)*+,-)!")2!4+5.6+!)-!3-#+,+2#3-/!#),/+#!9.,!,+2+),5"+,2!3-*+2#3/)#3-/!$04:32"+8!'.,19:.'2D!205"!)2!#".2+!
9.0-8!.-!67\L$+,36+-#(!\L$:.,3-/!#"+2+!'.,19:.'!8+93-3#3.-2!023-/!#"+!E509:;!O@S!5)-!,+*+):!5.6$)#343:3#3+2!)-8!36$:353#!
8)#)! #7$+2! .9! '+4! 2+,*35+2D! $,.*383-/! -+'! )--.#)#3.-2! 9.,! 2+,*35+! ,+/32#,3+2! :31+! P3.Y)#):./0+(! E3#+2! )-8! #..:2! 5)-! )88!
)883#3.-):! 8+25,3$#3.-2! .9! #"+! '.,19:.'! )-8! #"+! +L$+,36+-#! 83,+5#:7! #.! #"+! E509:;! '.,19:.'! 40-8:+! '3#".0#! )99+5#3-/!
+L+50#3.-!4+")*3.0,!.,!,+]03,3-/!)-7!8++$+,!0-8+,2#)-83-/!.9!#"+!9.,6)#(!H"+7!6)7!9.,!3-2#)-5+!)0#.6)#35)::7!3-2+,#!:3-12!
#.!#"+!.,3/3-):!2.0,5+!)-8!)0#".,!3-!)!8.'-:.)8+8!40-8:+D!'"35"!5)-!4+!$+,232#+8!+*+-!39!#"+!'.,19:.'!32!+*.:*+8!90,#"+,!
)-8!832#,340#+8!+:2+'"+,+(&
>23-/!#"+2+!5)$)43:3#3+2!#.!#"+3,!90::!+L#+-#D!)!'.,19:.'!40-8:+!32!)!,+2+),5"!.4Q+5#D!3-5:083-/!+*+,7#"3-/!,+]03,+8!9.,!)!90::!
,+N+-)5#6+-#!.9!#"+!*3,#0):!+L$+,36+-#!A$.2234:7!)!90::!*3,#0):!6)5"3-+!3-!U^W!9.,6)#BD!90::!$,.*+-)-5+!)-8!8)#)!2+#2!.9!)!
$04:32"+8!'.,19:.'!+L+50#3.-D!)-8!2+6)-#35!8+25,3$#3.-2!.9!'")#!),+!#"+!$0,$.2+2!)-8!.,3/3-2!.9!#"+!$,.805+8!*):0+2(!P7!
):2.! 3-5:083-/! )! @VW! ,+$,+2+-#)#3.-! .9! #"+! $)$+,! ),323-/! 9,.6! #"+! +L$+,36+-#D! E509:;! 5)$#0,+2! )! 90::7! ,+$+)#)4:+! )-8!
,+$,.80534:+!+N253+-5+!$04:35)#3.-(!
O2! )! /+-+,):! '.,19:.'! :)-/0)/+D! 2509:! )-8! #;9:.'! ")*+! 4++-! #"+! #),/+#! .9! #,)-2:)#3.-2! )-8! /+-+,)#3.-! 9.,! +L)6$:+! 47!
P3.K.47_2!E+)")'1D!)-8!+L+50#+8!.-!):#+,-)#3*+!+-/3-+2D!:31+!KUH\>X(!P7!8+93-3-/!#"+!'.,19:.'!:)-/0)/+!3-8+$+-8+-#:7!
9,.6!#"+!+L+50#3.-!+-/3-+D!E509:;!+-5.0,)/+2!205"!+L#+-23.-2!)-8!'38+,!02+(!H"32!$,+2+-#)#3.-!8+6.-2#,)#+2!".'!E509:;!
5)-!4+!02+8!9,.6!X047!)-8!Y:.Q0,+!$,./,)62!#.!/+-+,)#+!)!E509:;!'.,19:.'!#")#!5)-!4+!+L+50#+8!47!H)*+,-)(!
J3%),)&<-M&NL!+:&NLO;PKQKRSOTU&N34'?,$%&:'22#..#'%V.&W"9&J1L&JLW*X:I*KPPW*Q&KWPTYKU&&JLW*X:I*KPPW*Q&KWPTRW!
Poster 21
OntoCAT — an integrated programming toolkit for common ontology
application tasks
Tomasz Adamusiakв€—1 , Natalja Kurbatova1 , Morris A. Swertz1,2 , and Helen Parkinson1
1 European
Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, United Kingdom
Coordination Center, Department of Genetics, University Medical Center Groningen and Groningen
Bioinformatics Center, University of Groningen, P.O. Box 30001, 9700 RB, Groningen, The Netherlands
2 Genomics
Availability
Website: www.ontocat.org
Source: www.ontocat.org/svn
License: LGPLv.3
1
Introduction
Ontologies are essential to data integration, query expansion,
and modelling biological knowledge in life sciences. Two major public ontology repositories provide programmatic access: the EBI Ontology Lookup Service (OLS) [1] and the
NCBO BioPortal [5]. Many users also develop local ontologies, so it is important to integrate queries to local files.
However, it is relatively difficult to connect to each of them,
in particular because these resources are still evolving or require considerable experience with ontologies themselves.
Therefore, we developed OntoCAT, a software toolkit
that provides high level abstraction for interacting with ontology resources including local files in standard OWL and
OBO formats (via OWL API [2]), and public ontology repositories: EBI OLS and NCBO BioPortal. The requirements
for these were based on our own use cases of Experimental
Factor Ontology (EFO) development, ArrayExpress and
MOLGENIS data annotation and analysis, and on user feedback. Since its inception in 2010 only the Java package has
seen 22 releases. Most recent progress includes the implementation of reasoning for querying of relations other than
subsumption (e.g. partonomy). This is enabled for local ontologies via HermiT reasoner [4], which supports knowledge
bases expressed in SROIQ(D) – the description logic underpinning OWL2 (see also www.ontocat.org/wiki/Reasoning)
and OLS, which provides a dedicated web service.
2
3
Applications
OntoCAT is being used by the ontocat Bioconductor/R
package [3] and the concept recognition tool Zooma
(zooma.sf.net).
Acknowledgements
This work was supported by the European Community’s
Seventh Framework Programmes GEN2PHEN [grant number 200754], SLING [grant number 226073], and SYBARIS
[grant number 242220]; The European Molecular Biology
Laboratory; the Netherlands Organisation for Scientific Research [NWO/Rubicon grant number 825.09.008]; and the
Netherlands Bioinformatics Centre [BioAssist/Biobanking
platform and BioRange grant SP1.2.3].
Bibliography
Implementation
The library is implemented in Java6 and is available under
the permissive LGPLv3 license. OntoCAT can also be used
via other interfaces including a web-based ontology database
and browser, scriptable REST service, and Google App application.
OntoCAT was designed to support simple use cases in
an easy to implement way, while still enabling the implementation of advanced algorithms. Many of such common tasks are demonstrated in code examples available at
www.ontocat.org. A complete list of available ontology,
term, and hierarchy methods named in a self-describing
manner includes: getOntologies(), getOntology(), searchAll(), searchOntology(), getTerm(), getAllTerms(), getAnnotations(), getSynonyms(), getDefinitions(), getRootTerms(), getTermPath(), getChildren(), getParents(), getAllChildren(), getAllParents(), getRelations().
OntoCAT follows the convention over configuration
design approach, i.e., requiring minimal configuration where
possible. FileOntologyService, OlsOntologyService, and BioportalOntologyService are the core objects for working
в€— To
with: OWL and OBO ontologies, EBI OLS and NCBO
BioPortal respectively. Because each ontology service implements the same OntologyService interface, these core services can then be combined or extended to provide additional behaviour by adding a wrapper (decorator), e.g.:
combination of multiple ontology resources into one service (CompositeServiceDecorator ), limiting and ranking of
search results (SortedSubsetDecorator ), translating one ontology namespace to another (TranslatedOntologyService),
Ehcache-based enterprise-grade caching (CachedServiceDecorator ), or enabling reasoner support (ReasonedFileOntologyService).
The current repertoire of supported ontology resources
could easily be extended for other resources such as DAML,
ProtВґ
egВґ
e-OWL API, ONKI API, or OntoSelect. Such services
would only need to implement the OntologyService interface
to immediately become aligned with pre-existing resources
and allow for their seamless interchangeability.
[1] Ct, R. G., Jones, P., Martens, L., Apweiler, R., and Hermjakob, H. (2008). The Ontology Lookup Service: more data
and better tools for controlled vocabulary queries. Nucleic
Acids Res, 36(Web Server issue), W372–W376.
[2] Horridge, M. and Bechhofer, S. (2009). The OWL API: A
Java API for Working with OWL 2 Ontologies. In OWLED
2009, 6th OWL Experienced and Directions Workshop,
Chantilly, Virginia.
[3] Kurbatova, N., Adamusiak, T., Kurnosov, P., Swertz, M. A.,
and Kapushesky, M. (in press). ontoCAT: an R package for
ontology traversal and search. Bioinformatics.
[4] Motik, B., Shearer, R., and Horrocks, I. (2009). Hypertableau Reasoning for Description Logics. Journal of Artificial Intelligence Research, 36, 165–228.
[5] Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M.,
Griffith, N., Jonquet, C., Rubin, D. L., Storey, M.-A., Chute,
C. G., and Musen, M. A. (2009). BioPortal: ontologies and
integrated data resources at the click of a mouse. Nucleic
Acids Res, 37(Web Server issue), W170–W173.
whom correspondence should be addressed: tomasz@ebi.ac.uk
Poster 22
!"#$%&'(")*'$&)$+$),%-./'"01"23$."'%&)'34"$2'.4%2$&5'67'1%89%5"'#,$-)'
$&.32,83$6&.
:3"77"&'(;--"2<=>='?$@'A6634B='CD632'C2$&.E='F%D6'G2%##"&4;73<='H-%&'I$--$%@.J='H-"0'(".3$%.4+$-$K='C"3"2'L$8"M='N-$+$"2'
:%--6,O='P%26.-%+'NQ'F%-84"&96R='L$84%2)'F6--%&)S='I$--$%@':166&"2S='T%@".'C2683"2<U='?462.3"&'H-3"46-V>='W4%2-".'
C-"..X>='H%26&'(Q'Y896>='H&)2"%.'?$--">'
Y&$+"2.$3X'67'Z[#"89='!"1%23@"&3'67'!"2@%36-65X='L%3V"#,25"2'H--""'<MU='BEKER'Z[#"89='\"2@%&X]' B^_LW'_&+$26&@"&3%-'A$6$&762@%3$8.'W"&32"='
W"&32"'762'_86-65X'%&)'FX)26-65X='(%8-"%&'A-)5='A"&.6&'Z%&"='W26`@%2.4'\$7762)='I%--$&5762)'Na<U'RAA='YG]' EI%5"&$&5"&'Y&$+"2.$3X='Z%#62%362X'
67'^"@%36-65X='!26"+"&)%%-.".3""5'<='MOUR'CA'I%5"&$&5"&='?4"'^"34"2-%&).]' J:8466-'67'W6@1,3"2':8$"&8"='Y&$+"2.$3X'67'(%&84".3"2='N0762)'L6%)Q'
(%&84".3"2'(<E'SCZ='YG]'KA$63"8='?Y'!2".)"&='\"2@%&X]'K_,261"%&'A$6$&762@%3$8.'b&.3$3,3"='I"--86@"'?2,.3'\"&6@"'W%@1,.='F$&036&='W%@#2$)5"='
WA<U'<:!='YG]'M:X@#$6."='b^LbHcb2$.%'d'W%@1,.')"'A"%,-$",Q'EKUJB'L"&&".='e2%&8"]' O!%23@6,34'W6--"5"='C.X846-65X'%&)'A2%$&':8$"&8".'
!"1%23@"&3='F$&@%&'A60'MBUO='F%&6+"2='^F'UEOKK='Y:H]' R_%5-"'\"&6@$8.='A%#2%4%@'L"."%284'W%@1,.='W%@#2$)5"'WABB'EH?='YG]'' SW6--"5"'67'
Z$7"':8$"&8".='Y&$+"2.$3X'67'!,&)""='!,&)""=':863-%&)Q'!!<'K_FQ'YG]' '>!"#$%&':68$"3X
<
C26D"83'YLZ*'4331
' '*cc
' '`$9$
' 'Q'')"#$%&'Q''625'c''!"#$%&(")'
Z$8"&."*'f%2$6,.='%--'!e:\'86@1-$%&3'g4331*cc```Q)"#$%&Q625c.68$%-h86&32%83i5,$)"-$&".j
k!"#$%&l'$.'34"'&%@"'762'%')$.32$#,3$6&'67'2"%)$-X',.%#-"'.673`%2"'762'34"'Z$&,0'62'e2""A:!'9"2&"-.Q'b3'$.'
@%$&3%$&")'#X'%'`62-)d`$)"'&"3`629'67'+6-,&3""2.'34%3'762@.'34"'k!"#$%&':68$"3XlQ'N1"&'36'%--'$&)$+$),%-.'
$&3"2".3")'$&'86&32$#,3$&5'36'%'.3"%)$-X'$@126+$&5'86@1,3"'"&+$26&@"&3='$3'4%.'76,&)'@%&X',."2.'$&8-,)$&5'
346."'$&+6-+")'$&'.8$"&3$7$8'86@1,3$&5Q'?4$.'12"."&3%3$6&'5$+".'%&',1)%3"'6&'34"'-%3".3'#$6$&762@%3$8.'
1%89%5".'34%3'4%+"'#""&'%))")'36'!"#$%&'%&)'6&56$&5')"+"-61@"&3.'36'76.3"2'86--%#62%3$6&Q
?4"'@62"'`"'9&6`'%#6,3'8"--,-%2'#$6-65X'%&)'34"'#"33"2'6,2'86@@,&$3X'#"86@".'$&'%#.32%83$&5'%&)'762d
@%--X'.362$&5'6,2'$&.$543.'%&)'7$&)$&5.='34"'@62"'.673`%2"'$.'82"%3")'36'%$)'2"."%284'%&)',&)"2.3%&)$&5Q'
?4"'!"#$%&'(")'86@@,&$3X'`629.'36'#2$&5'346."'366-.'726@'2"."%284"2./'`"#.$3".'36'`$)".12"%)'%11-$8%d
3$6&'#X'2"."%284"2.'2,&&$&5'!"#$%&'$&')2Xd'%&)'`"3d-%#.Q'I"'%-.6'4"-1',."2.'7$&)'34"'366-.'34"X'&"")='"Q5Q'
#X'16$&3$&5'36'366-.'.,$3%#-"'%.'86@1%&$6&.'36'"+"2X'126+$)")'.673`%2"'1%89%5"'%&)'#X'$&8-,)$&5')68,d
@"&3%3$6&'$&'%'.3%&)%2)'-68%3$6&Q'?4"'!"#$%&'(")'3%.9'1%5".'%&)'@%$-$&5'126+$)"'$&762@%3$6&'6&'34"'12"d
7"22")'26#,.3'366-.'%&)'%'762,@'762')$.8,..$&5'.1"8$7$8'126#-"@.Q''(%9$&5'34"'366-.'%&)'$&762@%3$6&')$2"83-X'
%+%$-%#-"'36'.3,)"&3.'67'%--'%5".'%&)'#%89526,&).'61"&.'34"')662'762'3%-"&3")'$&)$+$),%-.'36')"+"-61'
#$6$&762@%3$8.'.9$--.Q
AN:W'BU<U='$3.'12"d@""3$&5.'%&)'%'!"#$%&d625%&$.")'@""3$&5'6&'#$6$&762@%3$8.'4%+"'126+$)")'$@1"3,.'
762'6,2'"77623.Q'N,2'&"`-Xd76,&)'$&),.32$%-'1%23&"2='_%5-"'\"&6@$8.='%8%)"@$8'526,1.'-$9"'^_LW'`$34'
^_AW'A$6dZ$&,0='.X.3"@'%)@$&$.32%362.'%&)'2"."%284"2.'%-$9"'%--'86&32$#,3"'36'34"'.%@"':,#+"2.$6&'%&)'
\$3'.6,28"'32"".'36'86d@%$&3%$&'34"$2'."-"83")'1%89%5".Q'Z%.3'X"%2='`"'1%89%5")'_&."@#-='%))$3$6&%-'5"d
&6@"'%.."@#-"2.'-$9"'($2%='@%&X'366-.'762'"+6-,3$6&%2X'."m,"&8"'%&%-X.$.'%&)'&"03d5"&"2%3$6&'."m,"&d
8$&5'8"&32$&5'6&'n$$@"='%&)'@62"Q'(%&X'-$#2%2$".'4%+"'#""&'%))")'36'34"'%284$+"'%.'2,&d'%&)c62'#,$-)d
3$@"')"1"&)"&8$".Q'_0$.3$&5'1%89%5".'`"2"',1)%3")'62'$@126+")'36'%))2"..'6,3.3%&)$&5'$..,".='.,84'%.'
#,$-)'7%$-,2".'6&'@62"'"063$8'1-%3762@.='86@1%3$#$-$3$".'`$34'8,22"&3'86@1$-"2.=')68,@"&3%3$6&='32%&.-%3$6&.Q'
I4"2"'%112612$%3"='7,&83$6&%-'1%384".'`"2"'."&3'36'34"',1.32"%@')"+"-61"2.Q
(6.3'!"#$%&'(")'1%89%5".'762'86@1,3%3$6&%-'#$6-65X'#"86@"')$2"83-X'$&.3%--%#-"'3426,54'34"'2"5,-%2'
."2+"2.'67'!"#$%&Q'(%&X'634"2.='46`"+"2='`"2"'12"1%2")'36'.%3$.7X'%&'$@@")$%3"'-68%-'&"")'%&)'4%+"'&63'
#""&'7$&%-$.")'762',1-6%)'36'34"'!"#$%&')$.32$#,3$6&Q'b&3"2".3$&5-X='34"'#,$-)'$&.32,83$6&.'67'@%&X'.,84'
1%89%5".'`"2"'&6&"34"-"..'76,&)'36'#"'67'.326&5'$&3"2".3'36'34"'86@@,&$3XQ'?4$.'$.'".1"8$%--X'32,"'`4"&'
34"'.673`%2"'$.'&6'-6&5"2'@%$&3%$&")'%&)'2"m,$2".'1%384$&5'762'86@1%3$#$-$3X'`$34'@6)"2&'86@1$-"2.='%&)'$3'
5$+".'%))$3$6&%-'86&7$)"&8"'36'4%+"'.6@"6&"'36'"084%&5"'86&8"2&.'%#6,3'34"'"77"83'67'$&.3%--$&5'
%-3"2&%3$+"'-$#2%2$".'62'%'.1"")d,1'#X'613$@$V$&5'86@1$-"2'.`$384".'762'%'1%23$8,-%2'1-%3762@Q
C,#-$8-X'.4%2$&5'#,$-)'$&.32,83$6&.'`$34'!"#$%&'(")'$.'%-.6'16..$#-"'762'.673`%2"'34%3'$.'&63'2")$.32$#,3")'
`$34'!"#$%&='"Q5Q'#"8%,."'67'%'2".32$83$+"'-$8"&."Q'C2$@"'"0%@1-".'%2"'.""&'$&'.32,83,2%-'#$6-65X'`$34'f(!'
62'L6."33%Q'?46."'366-.'%2"'16`"27,-='#,3'4%+"'%&'"&62@6,.'.6,28"'32""'`4$84'126+$)".'@%&X'611623,&$d
3$".'36'@%9"'"2262.'),2$&5'34"$2'$&.3%--%3$6&Q'H88"..'36'34"'$@@")$%3"-X'"0"8,3%#-"'#,$-)'$&.32,83$6&.'%3'
!"#$%&'(")='34"2"762"='.%+".'3$@"'g2"5%2)-"..'67'6&"o.'"01"2$"&8"j'd'@,84'@62"'3$@"'34%&'$3'`6,-)'86.3'
36'."&)'$&'%&'$@126+"@"&3Q'_+"2X#6)X'$.'$&+$3")'36'86&32$#,3"'36'6&"/.'%#$-$3X'%&)'$&3"2".3Q
Poster 23
The BALL project: The Biochemical Algorithms Library
(BALL) for Rapid Application Development in Structural
Bioinformatics and its graphical user interface BALLView
A. Hildebrandt3,4, A .K. Dehof1, D. Stöckel1, S.Nickels1, S.Müller1, M. Schumann2,
H.P. Lenhof1, O.Kohlbacher2
1
Center for Bioinformatics, Saarland University, 2Center for Bioinformatics,
University of Tübingen, 3Johannes-Gutenberg-Universität Mainz,
4
ahildebr@uni-mainz.de
Project website: www.ball-project.org
License: LGPL (BALL), GPL (BALLView)
Abstract: Developing programs for structural bioinformatics is a difficult and often tedious
task. Even if the algorithms have been carefully designed, the programmer has to solve a
variety of complex and recurring problems not fundamentally related to the algorithm at hand,
but necessary for real-world applications. With the Biochemical Algorithms Library (BALL),
we present a versatile C++ class library for structural bioinformatics that is supplemented
with a Python interface for scripting functionality and a number of applications like the
molecular modeler BALLView.
In recent years, BALL has seen a significant increase in functionality and substantial usability
improvements. It has been ported to further operating systems; indeed, it currently supports
all major brands. Moreover, BALL has evolved from a commercial product into a free-ofcharge, open source software licensed under the Lesser GNU Public License (LGPL).
Recently, binary packages for BALL have been accepted for inclusion into the Debian
distribution, enabling simple installation on Debian platforms and all of its derivatives.
The current version (1.4.0 at the time of writing) contains more than 730 classes and more
than 700,000 lines of code. The provided functionality covers a large number of common file
formats in structural bioinformatics (pdb, mol2, hin to name a few), an extensive set of data
structures and algorithms targeting molecular modeling and computational biology as well as
several force field implementations.
The graphical user interface BALLView provides access to a large part of the functionality of
the underlying library BALL and its VIEW В± component which focuses at molecular
visualization. Recently, the RTFact library was integrated into BALLView, allowing for real
time ray tracing within the modeller. In addition, high-quality stereoscopic visualization is
provided as well. BALLView is often used for teaching purposes, as the integrated python
interpreter allows direct access to the presented data.
BALL and BALLView are currently developed at the Center for Bioinformatics of Saarland
University, the Center for Bioinformatics of the University of TГјbingen, and the JohannesGutenberg University Mainz.
References:
A. Hildebrandt, A.K. Dehof, A. Rurainski, A. Bertsch, M. Schumann, N.C. Toussaint, A. Moll,
D. Stockel, S. Nickels, S.C. Mueller, H.P. Lenhof, O. Kohlbacher: "BALL - Biochemical
Algorithms Library 1.3", 2010, BMC Bioinformatics, 11:531
A. Moll, A. Hildebrandt, H.P. Lenhof, and O. Kohlbacher: Ві%$//9LHZ $ WRRO IRU UHVHDUFK
DQGHGXFDWLRQLQPROHFXODUPRGHOLQJВі, 2006, Bioinformatics, 22(3):365-366
Poster 24
Biopython Project Update
Peter Cock∗, Brad Chapman†, et al.
Bioinformatics Open Source Conference (BOSC) 2011, Vienna, Austria
Website: http://biopython.org
Repository: https://github.com/biopython/biopython
License: Biopython License Agreement (MIT style, see http://www.biopython.org/DIST/LICENSE)
In this talk we present the current status of the Biopython project, a long running distributed collaboration
producing a freely available Python library for biological computation (Cock et al., 2009). Biopython is
supported by the Open Bioinformatics Foundation (OBF).
Since BOSC 2010, we have made three releases. Touching on key functionality, Biopython 1.55 (August
2010) made our command line tool wrappers directly executable, Biopython 1.56 (November 2010) added
a UniProt XML parser, and Biopython 1.57 (April 2011) added an SQLite based indexer for sequence flat
files which allows the use of indexes too big to hold in memory. All releases have seen more unit tests, more
documentation, and more new contributors.
In summer 2010 we had one Google Summer of Code (GSoC) project student, Joao Rodrigues, who worked
on protein structure (PDB) code for Biopython. Some of Joao’s work has already been included in Biopython
releases, and he and his mentor Eric Talevich (himself a GSoC 2009 student) are working to merge the rest
of this work. Three new students have been accepted to work on Biopython for GSoC 2011.
For the last six months, we have been running a BuildBot server (see http://buildbot.net/), with
the offline parts of the Biopython unit test suite scheduled every night on build slaves running on Linux,
Windows and Mac OS X, and both “C” Python and Jython (using the Java virtual machine). This has
been beneficial in catching platform specific regressions – for example, under Python 3 which we are still
working towards fully supporting. The BuildBot server is running on an OBF-maintained Amazon cloud
server, while the slaves are initially all machines maintained by individual Biopython developers.
References
Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T.,
Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational
molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163
в€— Plant
Pathology, James Hutton Institute (formerly SCRI), Invergowrie, Dundee DD2 5DA, UK – p.j.a.cock@googlemail.com
Core Facility, Harvard School of Public Health, Harvard University, Boston, MA, USA
†Bioinformatics
!"#$%&'()*'*+$"',-./
012$$'3#+45'#46'$")',-./'324&27$+89
5
:")'.4$#7+2';4&$+$8$)'<27'3#41)7'=)&)#71">':2724$2>'.(>'3#4#6#
?)7)'*)'@7)&)4$'#'8@6#$)'<27'$")',-./'@72A)1$>'#46'+$%&'1)4$7#B'&2<$*#7)'@72681$>'$")'3"#62'6#$#C#&)'
&1")9#D''3"#62'+&'#4'27E#4+&9'#E42&$+1'6#$#C#&)'&1")9#'$"#$'+4$)72@)7#$)&'*+$"'&)F)7#B'2$")7',-./'
&2<$*#7)'$22B&>'+41B86+4E',G72*&)>'H@2BB2>':7+@#B>'#46'-HIJ=D''=)1)4$'#66+$+24&'$2'3"#62'#46'$")'
,-./'@72A)1$'+41B86)'#'4#$87#B'6+F)7&+$K'9268B)'<27'3"#62>'#',-./'24'$")'1B286')<<27$>'#46'
+417)#&)6'12F)7#E)'2<'3"#62'+4':7+@#BD
"$$@LMME926D27EM
/2*4B2#6&L
3"#62L'"$$@LMM&2871)<27E)D4)$M@72A)1$&ME926M<+B)&M>'H7$+&$+1'N+1)4&)'O
:7+@#BL'"$$@LMM$7+@#BD&2871)<27E)D4)$MPQR62*4B2#6S/2*4B2#6>',(T'FDU
Poster 25
Title
Author
Affiliation
Contact
URL
Code
License
Exploring human variation data with Clojure
Brad Chapman
Harvard School of Public Health, Boston, MA
chapmanb@50mail.com
http://ourvar.com
https://github.com/chapmanb/r-var
MIT
Direct to consumer genetics companies such as 23andMe give non-specialists
access to sequence data. This is a powerful way to democratize research, and
I undertook a small project to make autoimmune disease variation associations
more accessible to the general public. The goal was to provide a system on
top of Ensembl, SNPedia and PubMed that prioritizes interesting variants and
allows users to share their experiences in the context of these variations.
The system is implemented on Google App Engine using Clojure; Clojure is a
functional programming language with a Lisp syntax built on top of the Java
Virtual Machine.
In the process of implementing the prototype, I discovered several characteristics
of the Clojure language that influenced my work in Python and other languages,
and wanted to share these with the open bioinformatics community:
• Easy incorporation of external libraries; this makes re-use simple and encourages community development of small packages that can be linked
together.
• Emphasis on functional code free of side effects, This eases the transition
to parallel programming using multiple cores and larger frameworks such
as Hadoop map/reduce.
• The code structure encourages a separation of assignment and action. This
change in philosophy enables production of more configurable and easy to
understand code.
• An emphasis on domain specific languages facilitates development of higher
level APIs and abstraction, improving code re-use.
By sharing, I hope to encourage you to explore Clojure and incorporate some
of its better features into your language of choice.
1
EMBOSS: New developments and extended data access
Peter Rice (pmr@ebi.ac.uk), Alan Bleasby, Jon Ison, Mahmut Uludag
European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, United Kingdom.
The European Molecular Biology Open Software Suite (EMBOSS) is a mature
package of software tools developed for the molecular biology community. It
includes a comprehensive set of applications for molecular sequence analysis
and other tasks and integrates popular third-party software packages
under a consistent interface. EMBOSS includes extensive C programming
libraries and is a platform to develop and release software in the true open
source spirit.
A major new stable version is released each year and the current source code
tree can be downloaded via CVS. All code is open source and licensed for use
by everyone under the GNU Software licenses (GPL with LGPL library code).
There have been many tens of thousands of downloads including site-wide
installations all over the world since the project inception. EMBOSS is used
extensively in production environments reflecting its mature status and has been
incorporated into many web-based, standalone graphical and workflow interfaces
including Galaxy, wEMBOSS, EMBOSS Explorer, JEMBOSS, SoapLab, SRS,
Taverna and several commercial workflow packages.
EMBOSS 6.4 will be released on 15th July 2011 (we always release on 15th
July). We have made a major effort to add new features and applications for this
release. New features include:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Use of ontologies - fully integrated new EDAM ontology for data types and methods in
EMBOSS, and for metadata annotation of public data resources
Standard definitions for 1000+ data resources for all users
Definition of "servers" as common access to multiple data resources
Simple alias names for command-line access to data resources
Catalogue of public data resources with web APIs
Integration of NCBI taxonomy with taxon annotation of data resources and entries
Retrieval of multiple datatypes - sequence, feature, ontology term, taxonomy, data
resource description, sequence assembly … and plain text from any other source
SOAP, REST, DAS, BioMart, Ensembl protocol support
New query language combining multiple queries for any data resource
Full support for integration in Galaxy
Improved adherence to data format standards, e.g. GFF3
Applications to index, retrieve, utilize and analyze new data types
Books published by Cambridge University Press
E-Learning courses reusing the book text, supplemented by new material.
Project home page: http://emboss.open-bio.org/
Release download site: ftp://emboss.open-bio.org/pub/EMBOSS/
Anonymous CVS server: http://www.open-bio.org/wiki/SourceCode
Poster 26
G-language Project: the last 10 years and beyond
Kazuharu Arakawa1 (gaou@sfc.keio.ac.jp)
1 Institute for Advanced Biosciences, Keio University, Fujisawa, 252-8520, Japan
URL(project): http://www.g-language.org/
URL(code): http://sourceforge.jp/projects/glang/releases/
License: GNU General Public License v.2
Started in the year 2001, the G-language Project has been developing a series of open-source
software tools for bioinformatic researches of genomes, especially focusing on those of bacteria.
Following are the main software projects:
•
•
•
•
•
Genome Analysis Environment (http://www.g-language.org/) provides Perl libraries and UNIX
shell interface for basic I/O of biological data, as well as more than 100 analysis tools.
Genome Projector (http://www.g-language.org/GenomeProjector/) provides online zoomable
browser of bacterial genomes using Google Maps API.
Pathway Projector (http://www.g-language.org/PathwayProjector/) provides online zoomable
browser of biochemical pathways using KEGG map and Google Maps API.
G-language Bookmarklet (http://www.g-language.org/wiki/bookmarklet) is a bookmarklet that
enables quick navigation to online bioinformatics resources including database searches and the
use of web services.
Keio Bioinformatics Web Service (http://www.g-language.org/kbws/) provides a collection of
about 50 web service tools as EMBOSS commands.
We would like to give an overview of our achievements over the last 10 years, and discuss future
directions of the project.
References:
1. Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M, "G-language Genome
Analysis Environment: a workbench for nucleotide sequence data mining", Bioinformatics,
2003, 19(2):305-306.
2. Arakawa K, Tomita M, "G-language System as a platform for large-scale analysis of highthroughput omics data", Journal of Pesticide Science, 2006, 31(3):282-288.
3. Arakawa K, Suzuki H, Tomita M, "Computational Genome Analysis Using The G-language
System", Genes, Genomes and Genomics, 2008, 2(1): 1-13.
4. Arakawa K, Kido N, Oshita K, Tomita M, "G-language genome analysis environment with
REST and SOAP web service interfaces", Nucleic Acids Res., 2010, 38 Suppl:W700-705
!"#$%&'()$*"+)$",-)-.+)$&%/-01")."/2'"3-0$)1)+/"45%/+)$&"
!"#$%&'()*()+&'"*,-(.&/012$312"&-%4&5$6&7-8"41$%&
'"*)$1$9:&;(1(-)*,&<$%%(*:"$%1+&=%(&'"*)$1$9:&>-0+&;(4#$%4+&>?&@ABCD+&E!?&
F1"#$%G#()*()H#"*)$1$9:G*$#I&
'-"%&J)$K(*:&3(6J-L(M&,::JMNN)(1(-)*,G#"*)$1$9:G*$#N6"$&
!$O)*(&*$4(&-8-".-6.(&-:M&,::JMNN#69G*$4(J.(PG*$#&
=J(%&!$O)*(&."*(%1(M&?J-*,(&DGB&
&
'"*)$1$9:&;(1(-)*,&6(L-%&:,(&4(8(.$J#(%:&$9&-&."6)-)0&$9&6-1"*&6"$"%9$)#-:"*1&9O%*:"$%1&"%&DBBA+&
3)"::(%&"%&<Q&-%4&6O".:&$%&:,(&GRST&9)-#(3$)2G&T,(&9")1:&8()1"$%&$9&:,(&'"*)$1$9:&5"$.$L0&U$O%4-:"$%&
F'5UI&3-1&.-O%*,(4&"%&VO.0&DBWB+&-%4&"1&4(1"L%(4&:$&1()8(&:,(&%((41&$9&:,(&6"$"%9$)#-:"*"-%&60&
J)$8"4"%L&J)(X3)"::(%&6-1"*&9O%*:"$%-.":0&1O*,&-1&9".(&J-)1()1+&"#J.(#(%:-:"$%1&$9&*$##$%&-.L$)":,#1&9$)&
:,(&#-%"JO.-:"$%&$9&7R?+&;R?&-%4&J)$:("%&1(YO(%*(1+&-%4&*$%%(*:$)1&:$&*$##$%.0XO1(4&3(6&1()8"*(1G&
'5U&3-1&*$%*("8(4&$9&-1&-%&$J(%&1$O)*(&J)$K(*:&9)$#&:,(&6(L"%%"%L+&-%4&)(.(-1(4&O%4()&:,(&=!ZX
-JJ)$8(4&'!X[\&."*(%1(G&>$)2&,-1&*$%:"%O(4&1"%*(&:,(&.-O%*,&$9&8()1"$%&WGB+&3":,&:,(&(1:-6."1,#(%:&$9&-&
L)$3"%L&*$##O%":0&$9&O1()1+&1$#(&$9&3,"*,&-)(&J.-0"%L&-%&-*:"8(&)$.(&"%&4(8(.$J#(%:&:,)$OL,&:,(&
*$%:)"6O:"$%&$9&1$O)*(&*$4(+&6OL&)(J$):1&-%4&J-:*,(1G&
>$)2&$%&-&1(*$%4&8()1"$%&$9&:,(&."6)-)0&"1&$%L$"%L+&-%4&3"..&)(1O.:&"%&:,(&.-O%*,&$9&8()1"$%&DGB&"%&:,(&
1O##()&$9&DBWWG&Z%&-44":"$%&:$&(%,-%*(#(%:1&:$&:,(&J()9$)#-%*(&-%4&*-J-*":0&$9&:,(&6-1"*&9(-:O)(1&
*$%:-"%(4&"%&8()1"$%&W+&:,(&%(3&8()1"$%&3"..&J)$8"4(&-&)-%L(&$9&%(3&9(-:O)(1&-%4&4(#$&-JJ."*-:"$%1&
"%*.O4"%L&-**(11&:$&-48-%*(4&#-:,&9O%*:"$%1+&-&*$#J-)-:"8(&7R?&1(YO(%*(&-11(#6.()+&-&)-%L(&$9&
*$##-%4."%(&:$$.1&-%4&:$$.&1,$3"%L&,$3&:,(&7((J/$$#&8"1O-."]-:"$%&:(*,%$.$L0&*-%&6(&O1(4&:$&
6)$31(&-&L(%$#(&YO"*2.0&-%4&"%:O":"8(.0G&
=O)&$%L$"%L&(99$):1&:$&J)$#$:(&:,(&L)$3:,&$9&-%&$J(%&1$O)*(&*$##O%":0&-)$O%4&'5U&"%*.O4(&:,(&
(1:-6."1,#(%:&$9&*$##O%":0&9$)O#1&-:&,::JMNN#69G*$4(J.(PG*$#&-%4&:,(&*)(-:"$%&$9&-&T(*,%"*-.&
?48"1$)0&5$-)4&*$%1"1:"%L&$9&-*-4(#"*&-%4&*$##()*"-.&L)$OJ1&O1"%L&'5U&"%&:,(")&3$)2G&'$1:&
"#J$):-%:.0+&3(&J.-%&:$&:)-%19()&$9&$3%()1,"J&$9&:,(&J)$K(*:&9)$#&'"*)$1$9:&:$&:,(&=O:()<O)8(&
U$O%4-:"$%&F,::JMNN333G$O:()*O)8(G$)LNI+&#-2"%L&'"*)$1$9:&$%(&*$%:)"6O:$)&-#$%L&#-%0&:$&-&*$##$%&
$J(%&1$O)*(&J)$K(*:G&
T,"1&J)(1(%:-:"$%&3"..&)(J$):&$%&:,(&L)$3:,&$9&:,(&'"*)$1$9:&5"$.$L0&U$O%4-:"$%&*$##O%":0+&:,(&
*,-..(%L(1&$9&)O%%"%L&-%&$J(%&1$O)*(&J)$K(*:&"%&-&*$##()*"-.&(%8")$%#(%:+&-%4&J.-%1&9$)&:,(&%(P:&J,-1(&
$9&4(8(.$J#(%:G&
&
&
!" #$$%" &'#" ($)" *)+,*)$-+..'/0" 0+/$1+" 2//$#2#'$/." '/" 3+/+)'-" 4+2#5)+" 4$)12#"
63447"
!
"#$#%!&'!()**+,-)-%!!
.-/,#%*!0*-)%#%1!#%!2#30314!5678-*9:/,!
;)#*+)#/,!.#*:/,*)!<-=3)-93)4!3>!9,*!.-?!@0-%/A!(3/#*94!
($*B-%%:9)-::*!CDE!FGHFI!!
&J*=#%1*%E!6*)B-%4!
K#$#%'9:L9J*=#%1*%'B$1'+*
!
M8<!>3)!6;;!9330A#9N!,99$NOO1-0-?4'9J*=#%1*%'B$1'+*O!!7P!6;;!&330Q#9!
M8<!>3)!:3J)/*!/3+*N!,99$NOO/3BBJ%#94'1G'=?'$:J'*+JO!!
!
&,*! 6*%*)#/! ;*-9J)*! ;3)B-9! R6;;S! #:! -! 9-=7+*0#B#9*+! >0-9! >#0*! >3)B-9E!
+*:/)#=#%1!-!,#*)-)/,#/-0!1)3J$#%1!3>!1*%3B#/!>*-9J)*:!-%+!:J=7>*-9J)*:!-%%39-9*+!
>3)! -! $-)9#/J0-)! 1*%3B*E! )3J9#%*04! $)3K#+*+! =4! +#>>*)*%9! /*%9*):! -)3J%+! 9,*! 103=*'!
509,3J1,!9,#:!#:!-!T#+*04!-//*$9*+!>3)B-9E!9,*)*!-)*!:0#1,9!+#>>*)*%/*:!=*9T**%!9,*!
>#0*:! >)3B! +#>>*)*%9! /*%9*):E! T,#/,! T#00! #%! 9J)%! 0*-+! 93! #%9*))J$9#3%:! #%! 9,*!
+3T%:9)*-B! -%-04:#:'! &3+-4E! #%! 9,*! %*?9! 1*%*)-9#3%! :*UJ*%/#%1! *)-E! 9,*!
+*K*03$B*%9:! #%! +**$7:*UJ*%/#%1! 9*/,%3031#*:! 0#A*! 8V57(*UJ*%/#%1! R8V57(*US!
$)3K#+*! J:! T#9,! -! B3)*! $)*/#:*! B*-:J)*B*%9! 3>! 0*K*0:! 3>! 9,*! 9)-%:/)#$93B*! #%!
+#>>*)*%9!+*K*03$B*%9-0!:9-1*:'!@)31)-B:!:J/,!-:!)WJ-%9E!)X#>>E!YJ>>+#>>E!B&Z.!-%+!
YJ>>0#%A:! /-%! =*! J:*+! 93! *:9#B-9*! 9,*! -=J%+-%/*! 3>! -! 1#K*%! :*9! 3>! 9)-%:/)#$9:E!
+#>>*)*%9#-0!9)-%:/)#$9!*?$)*::#3%!9*:9#%1!-%+!+*7%3K3!-::*B=04!3>!9)-%:/)#$9:!>)3B!
8V57(*U! +-9-'! [>9*%E! 9,*! -=3K*! $)31)-B:! )*UJ#)*! -! 0#990*! $)#3)! A%3T0*+1*! 3>! -%!
3)1-%#:B\:!A%3T%!9)-%:/)#$9:!#%!6;;7)*0-9*+!>3)B-9'!
!
]*)*E!T*!-)*!-++)*::#%1!9T3!B-#%!#::J*:E!T,#/,!T#00!/3B*!-/)3::!9,*!#%#9#-0!
:9-1*!3>!9,*!-=3K*!94$*:!3>!8V57(*U!-%-04:#:!T3)A^!9,*4!-)*N!!
_S! 9,*! -=3K*! $)31)-B:! -)*! UJ#9*! :9)#/9! -=3J9! 9,*! :$*/#>#/! >3)B-9! 3>! 9)-%:/)#$9!
-%%39-9#3%!9,*4!-//*$9E!-%+!
GS!#>!=#3031#:9:!T3J0+!0#A*!93!*K-0J-9*!9,*!B*9,3+:!-=3K*E!T,#/,!)*UJ#)*!+#>>*)*%9!>#0*!
>3)B-9:E!9,*!/3B$-9#=#0#94!3>!>3)B-9:!#:!/)J/#-0'!!
!
&3!9-/A0*!9,*!-=3K*!$)3=0*B:!T*!#%9*1)-9*+!-!/300*/9#3%!3>!9330:!#%!3J)!6-0-?4!
:*)K#/*E! T,#/,! 1#K*:! 43J! -! =*99*)! J%+*):9-%+#%1! 3>! 9,*! -%%39-9#3%! >#0*:! =*>3)*! 43J!
$)3/**+!T#9,!8V57(*U!-%-04:#:'!&,*!9330:!#%!9,*!>#):9!:*/9#3%!K-0#+-9*!9,*!6;;!>3)B-9!
-%+!)*9J)%!-!)*$3)9!9,-9!1#K*:!+*9-#0:!3%!9,*!>#0*!/3%9*%9:'!!;3)!*?-B$0*E!9,*!>*-9J)*7
#+*%9#>#*)!B-$$#%1!#%!6;;!>#0*:!,-:!-!$39*%9#-0!)30*!#%!/3%%*/9#%1!:J=7>*-9J)*:!93!9,*!
)*0-9*+!B-#%!>*-9J)*'!&,*!-K-#0-=0*!6;;!$-):#%1!$)31)-B:!-)*!,*-K#04!+*$*%+*%9!3%!
>*-9J)*7#+*%9#>#*)! B-$$#%1! 93! /0-::#>4! +#>>*)*%9! 0*K*0:! 3>! >*-9J)*:! -%+! :J=! >*-9J)*:'!
.3)*3K*)E!93!B*-:J)*!9,*!+#>>*)*%9#-0!1*%*!*?$)*::#3%!3)!)*-+!-=J%+-%/*!>3)!-!:*9!
3>! 9)-%:/)#$9:E! =-:*+! 3%! >#0*:! >)3B! 9,*! MY(Y! 6*%3B*! Y*%9*)! :**B:! 93! =*! +#>>#/J09!
=*/-J:*! #%! 9,3:*! 6;;! >#0*:E! +#>>*)*%9! 9)-%:/)#$9! -%%39-9#3%:! >)3B! :-B*! 03/#! 3>! 9,*!
1*%3B*! -)*! %39! B*)1*+E! *K*%! 9,3J1,! 9,*4! B-4! =*03%1! 93! 9,*! :-B*! 1*%*'! [J)!
$)31)-B!/-00*+!.*)1*<3/#!#:!-=0*!93!B*)1*!:*K*)-0!9)-%:/)#$9:!>)3B!-!:#%10*!03/J:!93!
Poster 27
ADDAPTS: A Data-Driven Automated Pipeline and Tracking System
Risha Narayan1,2, Kim Rutherford1, Rishi Nag1, Krys Kelly1
1
Department of Plant Sciences, University of Cambridge,
Downing Street, Cambridge, CB2 3EA, UK
2
rvn21@cam.ac.uk
Project website & source code: http://www.sirocco-project.eu/pipeline/
Open Source License used: GPL v3
Much effort goes into handling the large amount of data produced by current sequencing
technologies. In order to manage the raw data and the data about the data (the metadata),
we saw a need for a database-backed automated processing pipeline with a web front end.
We are developing ADDAPTS, a Data-Driven Automated Pipeline and Tracking System.
The ADDAPTS relational database stores metadata about the samples and their associated
data files. It interacts with the local sequencing centre LIMS system to initiate sequencing
runs and later, to retrieve sequencing reads.
A controller process monitors the database and starts new pipeline jobs when appropriate.
The dependencies between pipeline tasks are configured in the database itself, and can be
modified without pausing the pipeline.
The pipeline uses raw sequencing output files in FASTQ format as input, the full analysis
results in alignments viewable from the GBrowse genome viewer. The pipeline carries out
the following processes: it de-multiplexes files from multiplexed sequencing runs; removes
small RNA adapters or clips sequence reads as appropriate; filters reads by size; generates
output files and statistics for each stage of the analysis; aligns the reads against an
appropriate reference genomes; converts the resulting alignment files into several formats,
such as GFF3, SAM and BAM file formats (using SAMtools); and generates BAM indexes for
GBrowse. It supports data generated using Illumina and 454 sequencing technologies, and
various sequencing applications (e.g. small RNA, RNA-seq, CHIP-seq, genomic DNA).
The tracking system provides a web front end for entering, viewing and editing metadata. It
also generates charts and statistics for each sample, provides links to the files generated
from the pipeline analysis, and generates global reports for all samples.
Poster 28
BALLView: A versatile molecular visualization and modeling tool
S. Nickels1,2, A.K. Dehof1, D. Stöckel1, S. C. Müller1,2, L. Marsalek2, I. Georgiev2,
H.P. Lenhof1, O. Kohlbacher1,3, A. Hildebrandt2,4
1
3
Project website:
Source code:
License:
2
Zentrum fГјr Bioinformatik Saar, Intel Visual Computing Institute,
4
Eberhard Karls Universität Tübingen, Johannes-Gutenberg-Universität Mainz
http://www.ballview.org
http://www.ballview.org/Downloads or http://ball-trac.bioinf.uni-sb.de/browser
BALLView is licensed under GNU Public License (GPL)
Molecular viewing and editing tools are an important part of many application scenarios and
processes in structural bioinformatics, computational chemistry, and pharmacy. Successful molecular
modeling tools provide reliable modeling functionality, an intuitive graphical user interface, state-ofthe-art graphics and extensive documentation.
Here, we present BALLView [1,2], a versatile and flexible molecular visualization and editing tool
that is based on the Biochemical Algorithms Library (BALL) [3]. BALLView supports all major
molecular file formats and is able to visualize molecular structures using a variety of different
representations such as ball-and-stick, cartoon, surface, and volumetric models. Via its GUI,
BALLView provides modeling functionality from the underlying BALL library like structure
validation processors and molecular mechanics computations using different force fields. In addition,
the full potential of the BALL library can be accessed through a Python scripting interface.
An extended version of BALLView [4] uses real-time ray tracing for displaying molecular structures.
The use of real-time ray tracing in combination with 3D stereo visualization provides a much better
structural perception and allows for fast and direct creation of publication quality images.
The current versions of BALLView (1.4.0) as well as the BALL library are available for all major
platforms including Windows, MacOS X, and Linux. As part of the Debian-Med project, BALLView
is directly available from package repositories of current Debian and Ubuntu releases. BALL and
BALLView are open source projects and are licensed under LGPL (BALLView) and GPL (BALL)
licenses.
[1] Moll, A., Hildebrandt, A., Lenhof, H.-P., and Kohlbacher, O. , Bioinformatics, 2006, 22(3), 365-366
[2] Moll, A., Hildebrandt, A., Lenhof, H.-P., Kohlbacher, O. (2005), Journal of Computer-Aided Molecular Design, 2005, 19(11), 791-800
[3] A. Hildebrandt, A.K. Dehof, A. Rurainski, A. Bertsch, M. Schumann, N.C. Toussaint, A. Moll, D. Stockel, S. Nickels, S.C. Mueller,
H.P. Lenhof, and O. Kohlbacher, BMC Bioinformatics, 2010, 11, 531
[4] L. Marsalek, I. Georgiev, A. K. Dehof, H.P. Lenhof, P. Slusallek and A. Hildebrandt , Information Visualization in Biomedical
Informatics (IVBI) London, 2010, p. 239 - 245
Poster 29
!"#$%&'(")'*"+,-."/0'
1233-%4'567"+%23'8+$4+29/':+$9';-7<-%'=2,"+%2'
'
!"#$%&'""'#()*+%,#-.%/0#112$34567+%862552$%94""207+%!"2:)#$;0#%<2$#;'=*+%
/3#"';%>2"3#--#(2*+%86'#$%8.'"#$;?@2A2)*%#$;%B#0."2%C.1"2*%
D#"#$E0EF'""'#()+%#E$2$#;'=+%:3#"';E12"3#--#(2+%).'"#$;?02A2)+%=#0."2E#EG.1"2HI(#$=32)620E#=EJ:+%%
:0#112$3+%(.2""20HI'$1EJ$'?"J212=:E;2%
%
*8=3.."%.5%B.(KJ620%8='2$=2+%L$'M20)'6A%.5%9#$=32)620+%L/%
7N$)6'6J62%5.0%<2J0.%#$;%>'.'$5.0(#6'=)+%L$'M20)'6A%.5%OP12=:+%C20(#$A%
8+$>".7?/'(")'/-7"0'366KQRRFFFE6#M20$#E.0GEJ:R'
*$@+."'.$&"0'
366KQRR=.;2EG..G"2E=.(RKR6#M20$#R).J0=2R10.F)2R6#M20$#S6#M20$#T7U2$G'$2T7U$26E)5E6#M20$#E67E#=6'M'6'2)T7U10#$=32)T7U(#'$62$
#$=2T7U2V620$#"?6.."?#=6'M'6A%
A-."%."0%C<L%O2))20%C2$20#"%WJ1"'=%O'=2$)2%XOCWOY%7E*%
%
Z#M20$#%')%#$%#KK"'=#6'.$%63#6%)JKK.06)%;2)'G$%#$;%2V2=J6'.$%.5%1'.'$5.0(#6'=)%#$;%.6320%6AK2)%.5%F.0:5".F)E%
N6% #"".F)% J)20)% 6.% =02#62% =.(K"2V% #$;% 2552=6'M2% ;#6#% K'K2"'$2)% 1A% =.(1'$'$G% 02(.62% &21% F'63% ".=#"% )20M'=2)%
F0'662$%'$%[#M#E%%
@2=2$6"A+%F2%3#M2%2V62$;2;%632%=#K#1'"'6'2)%.5%Z#M20$#%6.%=#620%5.0%632%'$M.=#6'.$%.5%#KK"'=#6'.$)%63#6%#02%$.6%
K0.M';2;% #)% &21% )20M'=2)% .0% #02% F0'662$% '$% [#M#+% 1J6% 0#6320% #02% #M#'"#1"2% #)% 2V620$#"% =.((#$;% "'$2% 6..")E%
\F'$G%6.%63')%2V62$)'.$+%)='2$6')6)%=#$%;2)'G$%F.0:5".F)%63#6%J)2%#$;%'$M.:2%M'06J#""A%#$A%5J$=6'.$#"'6A%63#6%')%
#==2))'1"2% 2'6320% ".=#""A+% 2EGE+% J)'$G% #% =.((#$;% "'$2+% .0% 02(.62"A+% 2EGE+% .$% #% G0';% $.;2% 6.% F3'=3% J)20% =#$%
#J632$6'=#62%J)'$G%G0';%)2=J0'6A%(2=3#$')()%.0%630.JG3%#$%88,%=3#$$2"E%%
Z32%2V620$#"%6.."%)20M'=2%"26)%A.J0%F.0:5".F%=#""%6..")%F0'662$%'$%#$A%"#$GJ#G2%#$;%K#))%;#6#%'$6.%#$;%.J6%.5%
63.)2%632%6..")E%Z3')%')%#$%2#)A%F#A%6.%'$="J;2%=#"")%.5%W20"+%WA63.$%.0%#$A%.6320%=.((#$;%"'$2%)=0'K6%F'63'$%
Z#M20$#E% Z32% W20"% )=0'K6% =#$% 2M2$% 12% ;A$#('=#""A% ;.F$".#;2;% 50.(% #% "'10#0A% .5% )=0'K6)E% Z32% 2V620$#"% 6.."%
)20M'=2%#").%"26)%J)20)%'$620#=6%F'63%G0#K3'=#"%6..")%#)%K#06%.5%632%F.0:5".FE%
Z.%#;;%#$%2V620$#"%6.."%#)%#%)20M'=2%F'63'$%#%Z#M20$#%F.0:5".F+%#%J)20%2'6320%=3..)2)%#%6.."%50.(%#%02G')60A%.5%
6..")%X5.0%2V#(K"2%632%6..")%#M#'"#1"2%'$%#%]21'#$%92;%'$)6#""#6'.$Y+%.0%)K2='5'2)%2VK"'='6"A%632%=.((#$;%6.%12%
2V2=J62;E% N$% 1.63% =#)2)% Z#M20$#% #J6.(#6'=#""A% =02#62)% #% )20M'=2% F'63'$% 632% F.0:5".F% =.002)K.$;'$G% 6.% #$%
'$M.=#6'.$% .5% 632% 6.."E% Z32% )20M'=2% 3#)% '$KJ6% #$;% .J6KJ6% K.06)% ;20'M2;% 50.(% 632% 6.."% ;2)=0'K6'.$% #)% F2""% #)%
K.06)%5.0%632%6.."^)%)6#$;#0;%.J6KJ6%#$;%200.0%)602#()E%Z32%K.06)%#02%J)2;%6.%K#))%;#6#%6.R50.(%#$%'$M.=#6'.$%
.5%632%6.."E%
N$% Z#M20$#+% 632% '$M.=#6'.$% (2=3#$')(% .5% #$% 2V620$#"% 6.."% ')% )2K#0#62;% 50.(% 632% 6.."^)% )K2='5'=#6'.$% #$;% 632%
J)20% =#$% =3..)2% F3202% 632% 6.."% F'""% 12% 2V2=J62;E% U.0% 2V#(K"2+% 632% 2V620$#"% =.((#$;% =#$% 12% 2V2=J62;% .$% #%
".=#"%(#=3'$2%.0%.$%#%02(.62%(#=3'$2%6.%F3'=3%Z#M20$#%=#$%.K2$%#$%))3%=.$$2=6'.$%.$%123#"5%.5%632%J)20E%Z32%
)K2='5'=#6'.$%.5%632%6.."%=#$%12%=3#$G2;%'$;2K2$;2$6"A%50.(%632%=3.'=2%.5%F3202%'6%')%2V2=J62;E%Z32%'$M.=#6'.$%
(2=3#$')(%')%;2)'G$2;%).%63#6%;2M2".K20)%=#$%#;;%'$%632'0%.F$%2V2=J6'.$%2$M'0.$(2$6)E%
Z#M20$#^)% 2V620$#"% 6..")% =#K#1'"'6'2)% 3#M2% 122$% ;2(.$)60#62;% F'63'$% 632% /$.F!@B^)% K#0#""2"% (2;'=#"% '(#G2%
#$#"A)')%#6%632%C2$2M#%L$'M20)'6A%,.)K'6#")^%'$620$#"%G0';%="J)620%X366KQRRFFFE:$.F#0=E2JRYE%%Z3202%#02%#").%
K"#$)% 6.% J6'"'_2% '6% '$% ;'G'6#"% K02)20M#6'.$+% =3#0#=620'_#6'.$% #$;% ('G0#6'.$% F.0:5".F)+% #)% K#06% .5% 632% 8B!W`%
K0.-2=6%X366KQRRFFFE)=#K2?K0.-2=6E2JRYE%
Z32% 2V620$#"% 6.."% K"JG'$% F#)% .0'G'$#""A% ;2M2".K2;% 5.0% Z#M20$#% *Ea% #)% K#06% .5% 632% /$.F!@B% K0.-2=6% 5.0%
=.(K#6'1'"'6A% F'63% 632% <.0;JC0';% X366KQRRFFFE$.0;JG0';E.0GYE% !)% .5% Z#M20$#% 7Eb+% 63')% )20M'=2% 6AK2% F'""% 12%
'$="J;2;% '$% 632% )6#$;#0;% Z#M20$#% ;')60'1J6'.$E% !5620% 632% 02"2#)2% .5% Z#M20$#% 7Eb+% .$2% 5.=J)% F'""% 12% .$% 5J06320%
'(K0.M'$G% 632% 5"2V'1'"'6A% 5.0% 02(.62% '$M.=#6'.$% (2=3#$')()% 5.0% 2V620$#"% 6..")+% 'E2E% 5.0% G0';% =.(KJ6'$G+% .$% #%
B.$;.0%="J)620%#$;%.$%!&8?6AK2%=".J;)E%
Z3')%F.0:%F#)%2$#1"2;%1A%632%`L%c63%U0#(2F.0:%W0.G0#(%'$% 632% =.$62V6% .5% 632% /$.F!@B% K0.-2=6% XN8Z% db7ce*Y+%
#$;%632%(AC0';%K"#65.0(%G0#$6%X`WRCd7c7bfR*Y%50.(%632%L/^)%`W8@BE%
Poster 30
!"#$%&'()#*+",-.('/-(0#1-&(#2('/-(.-*%,'")(1-+(1"'/('/-(.0--3(#2(
&-4%'"#,%4(3%'%+%.-(56-&7",8(
J Baran, A Cros, JM Guberman, S Haider, J Hsu, Y Liang, E Rivkin, J Wang, M Wong-Erasmus,
L Yao, J Zhang, A Kasprzyk
Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3, Canada
web: http://www.biomart.org
svn: https://code.oicr.on.ca/svn/biomart/biomart-java/branches/release-0_8-candidate_6
license: GNU Lesser General Public License v2.1
9+.'&%)'(
The semantic web holds great potential for use with biological data, facilitating the creation of
new tools that make use of and find connections between data sets in ways that are currently
not possible. However, the utility of the semantic web for biological research is currently
hindered by two major obstacles: slow querying speeds and a small set of semanticallyannotated data. To address both of these obstacles, we have added semantic web capabilities
to the latest version of the BioMart database management system, thereby taking advantage of
its query optimization features, as well as the wide variety of data available through BioMart
databases.
The latest release of BioMart allows semantic querying through the integration of two accepted
standards of the semantic web: OWL for describing ontologies and SPARQL for executing
queries. When a new mart is created, an OWL ontology is automatically generated. This can be
accessed and queried using a REST interface. Users can also create queries interactively
through the web GUI and convert that to a SPARQL query with the click of a button.
Currently, semantic web features are being incorporated into all of the members of BioMart
Central Portal, a repository incorporating a large number of biological databases, thereby
creating a huge repository of semantically-enabled biological data. The semantic web access is
still under development, with several plans for the future: BioMart will incorporate existing
biomedical ontologies, and will also support the definition of custom semantic relationships
between ontologies.
Poster 31
!"#$%&'"()*+
,-&(.&'.,/+0#1"2"3,+&(.+2#445("2&-"#(+"(+6"#,2"3(23+
!
!
"#$%&&%'($$#&)%!"%&$*&+,-.-/-!0%1&!23+45.-6-!(&&%7%*4%!"%&)%8$3+8*9-!:%;*&&!<%=#38+,-!
>?34377+!@*AA%'"+88%,-!B?83$!C%D4*8,-E-!F++!G%84%&5H!%&5!)?+!I3*"?%83&=!A*;;#&3)3+$J!
!
!
,J!K&3L+8$3)D!*M!NOM*85-!KPQ!.J!<RIIRQ!6J!S:@B'S:IB-!KPQ!9JC?+!<%83*!S+=83!R&$)3)#)+-!
R)%4DQ!HJ!>M3T+8!F)5-!KPQ!EJ!:IR-!KPQ!/!$%J$%&$*&+U=;%34JA*;!
"#$%&'(!)&*+,(&!-./!'$001.,(,&+2!)))3*,$+4-#,.53$#5!!
6-(-7$51&В¶+!'$/&2!4((8299/#18-73$#5!!
!
!
:&+&-#'4!'$001.,(,&+;!<1./,.5!-5&.',&+;!-./!%$1#.-7+!8-#(,',8-(&!,.!(4&!/&=&7$80&.(!$<!
8+7*8)3&=! $)%&5%85$! M*8! )?+! V3*$A3+&A+! 5*;%3&>! ($! &.+1#&! (4-(! +4-#&/! &?8&#,0&.(+! -#&!
#&8$#(&/! ),(4! &.$154! ,.<$#0-(,$.! ($! *&! '$08#&4&.+,*7&! -./! '-.! *&! @,.! 8#,.',87&A!
#&8#$/1'&/;!'$08-#&/!$#!,.(&5#-(&/3!B,0,7-#!(#&./+!&?,+(!,.!*$(4!(4&!#&517-($#C!-#&.-+J=JD!
-./!'$00&#',-7!+',&.'&+J=JE3!"#$7LIHUDWLRQRIVWDQGDUGVLVDSRVLWLYHVLJQRIVWDNHKROGHUV¶
&.5-5&0&.(;!*1(!4$)!01'4!/$!)&!F.$)!-*$1(!(4&+&!+(-./-#/+G!H4,'4!$.&+!-#&!0-(1#&!
-./! +(-*7&! &.$154! ($! 1+&! $#! #&'$00&./G! H4,'4! /$0-,.@+A! /$! (4&C! '$=&#G! H4,'4! ($$7+!
-./!/-(-*-+&+!,087&0&.(!)4,'4!+(-./-#/@+AG!
I,$B4-#,.5! ,+! -! '$77-*$#-(,=&! 8#$%&'(! (4-(! )$#F+! -(! (4&! 57$*-7! 7&=&7! ($! *1,7/! +(-*7&!
7,.F-5&+!*&()&&.!%$1#.-7+;!<1./&#+!@,087&0&.(,.5!51,/-.'&!($!-1(4$#+!-./!/-(-!+4-#,.5!
8$7,',&+;! #&+8&'(,=&7CA! -./! )&77J'$.+(,(1(&/! +(-./-#/,K-(,$.! &<<$#(+! ,.! (4&! *,$+',&.'&+!
/$0-,.3! L.! /$,.5! +$;! ,(! )$#F+! ($! &?8&/,(&! (4&! '$001.,'-(,$.! -./! (4&! 8#$/1'(,$.! $<! -.!
,.(&5#-(&/! +(-./-#/+J*-+&/! <#-0&)$#F! <$#! (4&! '-8(1#&! -./! +4-#,.5! $<! 4,54J(4#$15481(!
5&.$0,'+! -./! <1.'(,$.-7! 5&.$0,'! *,$+',&.'&! /-(-;! ,.! 8-#(,'17-#3! M! 8#$($(C8&! $<! (4&!
!"#$%&'"()+2&-&1#)53!,+!7,=&!-./!(4-(!,.!8-#(.&#+4,8!),(4!F&C!87-C&#+!+J=JN!,(!-,0+!($2!
x A+&)8%43T+! '$001.,(CJ/&=&7$8&/! *,$+',&.'&! +(-./-#/+! @<,51#&! *&7$)A;! '7-++,<,&/! ,.($!
(4#&&! (C8&+2! !"#$!%&'() !"*+&!","'%-! @0,.,0-7! ,.<$#0-(,$.! '4&'F7,+(+! ($! #&8$#(! $<!
(4&! +-0&! '$#&! +&(! ,.<$#0-(,$.A;! %"!,&'$.$(&/0.) 0!%&10/%-! @+1'4! -+! '$.(#$77&/!
=$'-*17-#,&+!-./!$.($7$5,&+!($!/&+'#,*&!(4&!,.<$#0-(,$.A;!-./!"2/30'(")1$!,0%-!@($!
'$001.,'-(&!(4&!,.<$#0-(,$.AO!
x 43&W! ($! 8$7,',&+;! $(4&#! 8$#(-7+! +J=H-E-X! $8&.! -''&++! #&+$1#'&+! +J=JP! -./! 7,+(+! $<! ($$7+! -./!
/-(-*-+&+!,087&0&.(,.5!(4&!+(-./-#/+!+J=JQO!
x 5+L+4*7!-./!0-,.(-,.!-!+&(!$<!'#,(&#,-!<$#!-++&++,.5!(4&!R1-7,(C!-./!<$#0-7!#,5$#!$<!(4&!
+(-./-#/+;!*1(!-7+$!(4&!,.(&#$8&#-*,7,(C!-./!#&7-(,$.+!-0$.5!(4&0O!
x M*$)+8! ,.(&#$8&#-*,7,(C;! -//#&++,.5! $=&#7-8+! -./! /187,'-(,$.! $<! &<<$#(+! (4-(! 4-08&#!
(4&,#!),/&#!18(-F&!-./!,.(&#<&#&!),(4!(4&!'#&-(,$.!$<!+(-./-#/+J'$087,-.(!+C+(&0+3!
!
78! S,&7/! T3;! B-.+$.&! BM;! +)! %4;! "A3+&A+;! DUUQO! 98! 86 )'$ Ві&'(5 'DWD 6WDQGDUGV 3ODQ 9HUVLRQ!
ВґO! :8! V3! I-#.&+! +)! %4;! S%)! @+L! 08#=! 03$A*L-! DUUQO! ;8! IV6! :&+&-#'4! W$(&+2!
)))3*,$0&/'&.(#-73'$09*0'#&+.$(&+9+&#,&+9/-(-+4-#,.5O!<8!VLIIL2!)))30,**,3$#5O!=8!I,$"$#(-72!
*,$8$#(-73*,$$.($7$5C3$#5O! >8! XIX! S$1./#C2! )))3$*$<$1./#C3$#5O! ?8! W-(1#&! "#&'&/,.5+2!
8#&'&/,.5+3.-(1#&3'$0O!@8!W&1#$+',&.'&!L.<$#0-(,$.!S#-0&)$#F2!)))3.&1,.<$3$#5+
Poster 32
!"#$%&!"#$%"&'(")*$%+,"-./"0,12*"02.23-./+,#2*1"4,#,"
ƒ–ïæ!ƒŽƒæ"#$%#!&'(!&)*+,-./(("#!012+3!ƒ”–ƒæ‡�‹«‹ó–·4#!56-27+/86,!9(3*:6,+;#!<-=2*!>?8@,-"#A#!B3*!56-27+23*!9-C*, D#!
B/*!E7/*F#!G-27+/@@,-!H383:I24#! 3*1!E*J,!B/*377,* "#$!
%=3+)7KI3(37LM::7K)2MK*/!
"5/=8)+3+2/*3(!92/(/JC!N*2+#!N*2!5/=8)+2*J#!9,-J,*#!O/-P3CK!$Q,83-+=,*+!/@!E*@/-=3+2:7#!N*2.,-72+C!/@!9,-J,*#!O/-P3CK!45,*+,-!@/-!
92/(/J2:3(! R,S),*:,! <*3(C727#! >,:6*2:3(! N*2.,-72+C! /@! Q,*=3-I#! G/*J,*7! TC*JMC#! Q,*=3-IK! ;E*7+2+)+! 1,!92/(/J2,!,+!562=2,!1,7!
&-/+U2*,7#! 5OHR#! N*2.,-72+U! TC/*! "#! V-3*:,K! AE*7+2+)+,! @/-! 92/2*@/-=3+2:7#! 92,(,@,(1! N*2.,-72+C#! W,-=3*CK! DE*7+2+)+,! @/-! 53*:,-!
H,7,3-:6#!X7(/!N*2.,-72+C!Y/782+3(#!O/-P3CK!F0)-/8,3*!92/2*@/-=3+2:7!E*7+2+)+,#!0Z9T#!Y2*[+/*#!NGK!
!"#$%&'#()*
$##5!6602.7148./96:2.&);<=8=0%#,=8714"
<.32(3M(,!)*1,-!+6,!>/%,#2?%">.++.31":@<A;"B8C!(2:,*:,!P2+6!3112+2/*3((C!3((/P,1!2*:()72/*#!,[+,*72/*7!3*1!-,7+-2:+2/*7!2*!—•‡”ǯ•!
\ZT!*3=,783:,K!5/*+-2M)+2/*7!+/!*,P!:3*/*2:3(!.,-72/*7#!2*!+6,!!"#$%&'#()!\ZT!*3=,783:,#!3-,!P,(:/=,!)*1,-!7)8,-.272/*!/@!+6,!
92/\RQ!:/*7/-+2)=!]2*!/-1,-!+/!I,,8!92/\RQ!3!:/==/*#!:3*/*2:3(!13+3!=/1,(^K!
<!:/==/*!\ZT!R:6,=3!]\RQ^#!1,@2*2*J!3!:3*/*2:3(!\ZT!
@/-=3+#! 27! 2=8/-+3*+! @/-! 7=//+6! :/=83+2M2(2+C! /@!
6,+,-/J,*,/)7! +//(7! 3*1! 13+3! -,7/)-:,7#! 3*1! 2*!
83-+2:)(3-! @/-! :/==)*2:3+2/*! P2+6! _,M! 7,-.2:,7K! >6,!
(3:I! /@! 3! :/==/*! \RQ`M37,1! @/-=3+! @/-! +6,! M372:!
M2/2*@/-=3+2:7! +C8,7! /@! 13+3! 637! =/+2.3+,1! +6,!
1,.,(/8=,*+!/@!92/\RQ!a"bK!
92/\RQ! 27! 3*! \ZT! R:6,=3! 1,@2*2*J! @/-=3+7! /@!+6,!=32*!
M2/2*@/-=3+2:7! +C8,7! /@! 13+3! +63+! 3-,! */+! =/1,((,1! MC!
3*C! 78,:23(27,1! 7+3*13-1! \ZT! R:6,=37! 7):6! 37! @/-!
,[3=8(,! R9ZT#! &Q9ZT#! Z<W0`ZT#! ZEV#! W5QZT#! /-!
86C(/\ZT! a$`FbK! 92/\RQ! +6)7! @/:)7,7! /*! M2/=/(,:)(3-!
7,S),*:,7#! 3(2J*=,*+7#! 3*1! 3**/+3+2/*! MC! 3*C! I2*1! /@!
@,3+)-,7! /-! 8-/8,-+2,7K! >6,7,! =32*! +C8,7! 3-,!
3::/=83*2,1! MC! 1,@2*2+2/*7! /@! 13+3`-,7/)-:,! 3*1!
/*+/(/JC! -,@,-,*:,! @/-=3+7#! 8-/.,*3*:,! =,+313+3#!
7:/-,7#! 3*1!/+6,-K!92/\RQ!637!M,,*!://-12*3+,1!P2+6!+6,!
0)-/8,3*! 0Z9H<50! 8-/c,:+! @/:)7,1! /*! 8-3:+2:3(!
2*+,-/8,-3M2(2+C!3=/*J!M2/2*@/-=3+2:7!+//(7! adbK!
>6,! 32=! /@! 92/\RQ! 27! +/! M,:/=,! )7,1! 37! 3! :3*/*2:3(#!
Dz•–ƒ†ƒ”†dz 13+3! @/-=3+! @/-! 7,S),*:,! 13+3! 3*1! J,*,-2:!
@,3+)-,! 3**/+3+2/*7K! E+! 1/,7! */+! =,3*! +63+! 92/\RQ!
76/)(1!M,!e+6,!/*(C!@/-=3+e#!M)+!3*!,[:63*J,!@/-=3+!+63+!
:3*! M,! :/==/*! +/! 7,.,-3(! +//(7! ]37! /*,! /@! =)(+28(,!
@/-=3+7! +6,! +//(7! 3-,! 7)88/-+2*J^K! >//(7! :3*! 8-/1):,!
3*1! :/*7)=,! 92/\RQ! 12-,:+(C#! /-! 92/\RQ!:3*!M,!)7,1!37!
3*! 2*+,-=,123+,! :3*/*2:3(! @/-=3+!-2:6!,*/)J6!+/!,*3M(,!
:/*.,-72/*7! 3=/*J! 12.,-7,! @/-=3+7K! 92/\RQ! +C8,7! :3*!
M,!12-,:+(C!2*:()1,1!2*+/!/+6,-!\ZT!R:6,=37#!/-!+6,C!:3*!
M,! @)-+6,-! ,[+,*1,1! /-! -,7+-2:+,1! 2*! 3! 72=2(3-! P3C! +/!
/Mc,:+`/-2,*+,1! 8-/J-3==2*J! :(377,7K! >6,! \ZT! R:6,=3!
:3*!7,-.,!37!3!78,:2@2:3+2/*!@/-!J,*,-3+2*J!=/-,!,@@2:2,*+!
M2*3-C! -,8-,7,*+3+2/*7! 7):6! 37! 0\E! afbK! 92/\RQ! @/-=3+7!
!
63.,! 1,+32(,1! 7+-):+)-,! 3*1! 3-,! -2:6!,*/)J6!+/!7)88/-+!
.3-C2*J!-,S)2-,=,*+7!@/-!=3:62*,`)*1,-7+3*13M(,!13+3!
3*1! =,+313+3! -,8-,7,*+3+2/*#! M)+! 3+! +6,! 73=,! +2=,!
+-C2*J! */+! +/! M,! +//! :/=8(2:3+,1K! R,=3*+2:7! /@! +6,!
7C*+3:+2:! 92/\RQ! +C8,7! 27! 1,@2*,1! .23! R<_RQT!
3**/+3+2/*!P2+6!:/*:,8+7!@-/=!+6,!0Q<Z!/*+/(/JC!a"gbK!
>6,!62J6(2J6+7!/@!+6,!92/\RQ!@/-=3+!2+7,(@!3-,h!7+-):+)-,1!
=,+313+3! /@! 72=8(,! 7,S),*:,! -,:/-17! ]37! /88/7,1! +/!
V<R><! &*+,"-*7^#! 8-/.,*3*:,! =,+313+3! 2*! 3((! +C8,7! /@!
8-/:,77,1! 13+3!3*1!-,@,-,*:,7#!7+-):+)-,1!-,@,-,*:,7!+/!
13+3! -,7/)-:,7! 3*1! /*+/(/JC! :/*:,8+7! 2*:()12*J! 3!
.*/-"-)! /@! +6,! -,(3+2/*#! :/=8(,[! -,(3+2/*7! M,+P,,*!
7,S),*:,!@,3+)-,7#!@,3+)-,!=/1,(!13+3#!=)(+28(,!:/=8(,[!
7:/-,7! P2+6! =,3*2*J7#! @/-=3(27,1! 3**/+3+2/*! /@! -,(3+,1!
8/72+2/*7! /)+721,! /@! +6,! 3**/+3+,1! 7,S),*:,#! 3*1!=/-,K!
>6,! *,P! 92/\RQ! .,-72/*! "K"! 637! /8+2=27,1! +6,! 7C*+3[!
@/-! @,3+)-,! 3**/+3+2/*#! 7:/-,7#! 7,=3*+2:! 3*1! 13+3!
-,@,-,*:,7K! E+! 3((/P7! 3**/+3+2/*! /@! 1,*7,! 7,S),*:,!
@,3+)-,7! 388(2:3M(,! +/! P6/(,`J,*/=,! 3**/+3+2/*7!
,[:63*J,1! @/-! ,[3=8(,! 2*! +6,! 7+3*13-127,1! M2*3-C! 0\E!
@/-=3+K! >6,! @2-7+! M,+3! .,-72/*! /@! 92/\RQ! "K"! 637! M,,*!
-,(,37,1! 2*! Z3C! $g""K! 0[+,*72.,! 7)88/-+! @/-! P6/(,`
J,*/=,! 3(2J*=,*+7#! 2*12.21)3(! J,*/=2:7#! 3*1! 7,S),*:,!
8-/@2(,7!P2((! M,!311,1!2*!+6,!*,[+!M,+3!.,-72/*7K!
E*! @)+)-,#! 92/\RQ! =)7+! M,! =32*+32*,1! 3*1! -,J)(3-(C!
-,@2*,1!2*!/-1,-!+/!@2+!+6,!12.,-7,!3*1!:63*J2*J!*,,17!/@!
+6,! M2/2*@/-=3+2:7! :/==)*2+CK! E*./(.,=,*+! /@! +6,!
:/==)*2+C! 27! ,77,*+23(! M/+6! @/-! +6,! )8+3I,! 3*1! +6,!
@)-+6,-! 1,.,(/8=,*+! P62:6!=)7+!M,!://-12*3+,1!MC!+6,!
,=,-J2*J! 92/\RQ! :/*7/-+2)=K! >/! ,*3M(,! (3-J,-`7:3(,!
31/8+2/*#!7)88/-+2.,!8-/J-3==3+2:!3*1!2*+,-3:+2.,!+//(7!
63.,! +/! M,! 1,.,(/8,1#! 2*:()12*J! @/-=3+! :/*.,-+,-7! 3*1!
2*+,J-3+2/*! P2+6! +6,! Xi9iV! 92/%! @-3=,P/-I7K
a"b!ВѓВЋВѓГ¦ЗЎЗ¤!*01/,K!]$g"g^!92/\RQh!+6,!:/==/*!13+3`,[:63*J,!@/-=3+!@/-!,.,-C13C!M2/2*@/-=3+2:7! P,M!7,-.2:,7K!2"#"-+#(./0"3%#! DE#!2A;g`2A;DK!
a$b! Y):I3#! ZK! *01 /,K! ]$gg4^! >6,! 7C7+,=7! M2/(/JC! =3-I)8! (3*J)3J,! ]R9ZT^h! 3! =,12)=! @/-! -,8-,7,*+3+2/*! 3*1! ,[:63*J,!/@!M2/:6,=2:3(!*,+P/-I!=/1,(7K! 2"#"-+#(./0"3%#!=F#!A$;`
A4"K!
a4b!_,7+M-//I#BK! *01/,'1]$ggA^! &Q9ZTh!+6,!-,8-,7,*+3+2/*! /@!3-:62.3(!=3:-/=/(,:)(3-!7+-):+)-,!13+3!2*! \ZTK!2"#"-+#(./0"3%#!D=#!fddИ‚ff$K!
a;b!R8,((=3*#&K>K!*01/,'1]$gg$^!Q,72J*!3*1! 2=8(,=,*+3+2/*!/@!=2:-/3--3C! J,*,!,[8-,772/*! =3-I)8!(3*J)3J,! ]Z<W0 `ZT^K!4*-#.*12"#,K#!B#!-,7,3-:6gg;DK"`gg;DKfK!
aAb! Y,-=c3I/M#YK! *01 /,'1 ]ʹͲͲͶȌ Š‡ ǯ•‘Ž‡…—Žƒ”–‡”ƒ…–‹‘ˆ‘”ƒ–Ȅ3!:/==)*2+C!7+3*13-1!@/-!+6,!-,8-,7,*+3+2/*!/@!8-/+,2*!2*+,-3:+2/*!13+3K! 5/0'2"#0*36-#,'#!DD#!"FF`
"d4K!
aDb! G/++=3**#HK! *01 /,'1 ]$ggd^! <!7+3*13-1!ZEWRjZEZR!:/=8(23*+!\ZT!7:6,=3h!+/P3-1!+6,!1,.,(/8=,*+!/@!+6,! W,*/=2:!5/*+,[+)3(!Q3+3!Z3-I)8!T3*J)3J,!]W5QZT^K!789:;#!=D#!
""A`"$"K!
aFb!Y3*#ZKkK!3*1!l=37,I#5KZK!]$ggf^!86C(/\ZTh!\ZT!@/-!,./()+2/*3-C!M2/(/JC! 3*1! :/=83-3+2.,!J,*/=2:7K! 28:12"#"-+#(./0"3%#! =C#!4ADK!!
adb!&,++2@,-#!RK!*01/,K!]$g"g^!>6,!0Z9H<50!_,M!7,-.2:,!:/((,:+2/*K!5<3,*"31=3"&%1>*%'#!BG#!_Dd4И‚_DddK!
afb!0@@2:2,*+!\ZT!E*+,-:63*J,!]0\E^!V/-=3+!"KgK!6++8hjjPPPKP4K/-Jj>Hj,[2j!!
a"gb!6++8hjj,13=/*+/(/JCK7/)-:,@/-J,K*,+! !
Poster 33
!"#$%&'(')*)+)&,-./01-")2-.3'%$1')0(4',5-1')5#,)1"#$%)1#6.$40(&)
:;<
78)9#,', ;)=8)=1/>(/',,
:;?;<
:
:
?
;)@8)A'0B'(C4'0(', ;)98)D,#('(E',& ;)F8)=.'1/4 ;)+8)D"#CCGH,-(%C4I44',
:)
!""#$%$&$'("')"*+(+,$-"./$0+1$'2'345"#+/67,1+(,"')"8+0$-62"*+(+,$-&9"8'2+-:267"6(0";2$($-62"<=6716-'2'345"
"""">((&?7:-@"8+0$-62"A($%+7&$,49">((&?7:-@9"B:&,7$6"
C""#+/67,1+(,"')"#6,6?6&+"6(0">()'716,$'("D4&,+1&5">(&,$,:,+"')";'1/:,+7"D-$+(-+5""
""""A($%+7&$,4"')">((&?7:-@9">((&?7:-@9"B:&,7$6""
"
"
"
"
"
"""""""""<)1#(4,0E$4'%)'J$-""K)
"
;'1/:,+7" &-$+(-+" /264&" 6(" $1/'7,6(," 7'2+" $(" ,'064E&" 3+(+,$-" 7+&+67-=F" G+H," 3+(+76,$'("
&+I:+(-$(3" ,+-=('2'3$+&" /7'0:-+" 6(" +('71':&" 61':(," ')" 06,69" ,=+7+?4" /:&=$(3" 3+(+,$-"
26?'76,'7$+&",'",=+"2$1$,&"')"06,6"&,'763+"6(0"-'1/:,6,$'(62"/'J+7F"G+J"$()'716,$-&"6//7'6-=+&"
67+" ,=+7+)'7+" (++0+0" ,'" +2$1$(6,+" ,=+&+" &='7,-'1$(3&9" /7'%$0+" /'&&$?$2$,$+&" ,'" 6//24" -:77+(,"
623'7$,=1&"$(",=+"67+6"')"K$'$()'716,$-&"6(0"$1/7'%+",=+$7":&6?$2$,4F""
L+" /7+&+(," !"#$%&'(';" 6" &4&,+1" ,'" &$1/2$)4" ,=+" 6--+&&" ,'" -'1/:,6,$'(62" 7+&':7-+&" 6(0"
6&&'-$6,+0"-'1/:,6,$'(62"1'0+2&"')"-2:&,+7"67-=$,+-,:7+&"M/:?2$-"6(0"/7$%6,+"-2':0&NF";2':03+(+"
6&&$&,&"+(0":&+7&"$("+H+-:,$(3"6(0"1'($,'7$(3"0$%+7&+"623'7$,=1&"%$6"6"04(61$-6224"376/=$-62"J+?"
$(,+7)6-+"6(0"/7'%$0+&"6("O8P"$(,+7)6-+",'"600" ):,:7+"0+%+2'/1+(,&"'7"6(4"@$(0"')"/7'3761&F"
L+" 0+1'(&,76,+" '(" +H$&,$(3" 623'7$,=1&" M+F3F" ;2':0K:7&," Q!RN" ='J" ,=+" $(,+376,$'(" -6(" ?+" 0'(+"
J$,=" 2$,,2+" +))'7,9" ,=+7+?4" 16@$(3" ;2':03+(+" +&/+-$6224" :&+):2" )'7" ,=+" +%62:6,$'(" ')" 0$))+7+(,"
6//7'6-=+&9" )'7" ,=+" 7+/7'0:-,$'(" ')" /7+%$':&" 7+&:2,&" 6(0" )'7" 6" &$1/2$)$+0" :&63+" ')" -:77+(,"
623'7$,=1&F"K6&+0"'("'/+("&':7-+")761+J'7@&"2$@+"B/6-=+"S60''/"QCR"6(0"B/6-=+"L=$77"QTR9"J+"
$1/2+1+(,+0"6"/7',',4/+",'"%62$06,+"':7"6//7'6-="M&++"U$3:7+"!NV">("6")$7&,"&,+/9",=+":&+7"$&"6?2+"
,'"&+,":/"6"-2:&,+7"67-=$,+-,:7+"%$6"6("
$(,:$,$%+" J+?" $(,+7)6-+F" B" ):224"
'/+76?2+" 6(0" -:&,'1$W+0" -2:&,+7" $&"
,=+("/7'%$0+09"$(-2:0$(3"622"(+-+&&674"
:&+7" 06,6" 6(0" &+7%$-+&" M2$@+" B/6-=+"
S60''/N"'("$,F">("6"&:?&+I:+(,"&,+/"6"
376/=$-62" $(,+7)6-+" M-622+0" !"#$%&'(
)#*+,-.',( /0%,12#',3( !)/!" $&"
6:,'16,$-6224" 26:(-=+0" '(" ,=+"
16&,+7" ('0+" ')" ,=+" -2:&,+7F" .8>" -6("
?+" &++(" 6&" 6(" 6?&,76-,$'(" ')" ,=+"
:(0+724$(3" &4&,+1" 67-=$,+-,:7+" )7'1"
,=+" +(0" :&+79" 6&" $," 2$+&" '(" ,'/" ')" ,=+"
$(,+376,+0" /7'3761&" 6(0" 622'J&" ,=+"
:&+7" ,'" -'11:($-6,+" 6(0" $(,+76-,"
Figure 1 - Cloudgene Prototype in Action
J$,=",=+"-2:&,+7"6&"J+22"6&",'"7+-+$%+"
)++0?6-@" ')" -:77+(,24" +H+-:,+0" J'7@)2'J&F" ;2':03+(+" -6(" ?+" '/+76,+0" %$6" -'()$3:76,$'(" )$2+&"
J$,="-2+67"0+)$(+0"$(/:,"6(0"':,/:,"%67$6?2+&",'"1$($1$W+"-'()$3:76,$'("$&&:+&")'7"+(0":&+7&F"X'"
&:1" :/9" ;2':03+(+" $&" 6" )7++" &'),J67+" &'2:,$'(" ,=6," 622'J&" ,=+" :&+7" ,'" #'',$$( &021#$%1.'%.1,9"
,4,'.%,( 0,5( 61( ,4&$%&07( #"761&%89$" $(" 6" :&+7" )7$+(024" J649" 1,*16-.',( 1,$."%$" 6(0" 960&%61( #0-(
*&*,"&0,"86/Y+0:-+"Z'?&F"
"
<7'Z+-,"&$,+V"/44.LMM1"#$%&'('8$0EN8-18-4)
D':7-+"-'0+V"/44.LMM1"#$%&'('8$0EN8-18-4M%#O("#-%C8/46")
P$-+(&+V"*GA"*<P"%T"
"
"
P:Q"8F";F"D-=6,WF";2':0K:7&,V"=$3=24"&+(&$,$%+"7+60"16//$(3"J$,="86/Y+0:-+F"K$'$()'716,$-&9"C[V!T\T]!T\^9"_:("C``^F""
P?Q"B/6-=+"S60''/F"=,,/Vaa=60''/F6/6-=+F'73a
PRQ"B/6-=+"L=$77F"=,,/Vaa$(-:?6,'7F6/6-=+F'73aJ=$77a"
Poster 34
The RapidMiner Plugin for Taverna: bringing Data Mining Tools to
Bioinformatics Workflows
Simon Jupp1, James Eales1, Simon Fischer2, Sebastian Land2,
Rishi Ramgolam1, Alan Williams1 and Robert Stevens1
{simon.jupp, james.eales, rishi.ramgolam, alan.r.williams, robert.stevens}@manchester.ac.uk,
{fischer, land}@rapid-i.com
1School
of Computer Science, University of Manchester, Oxford Road. Manchester M13 9PL, UK
GmbH, Stockumer Str. 475, 44227 Dortmund, Germany
Project Site: http://www.e-lico.eu/
Source code: http://taverna.googlecode.com/svn/unsorted/taverna-elico/
Licence: GNU Lesser General Public License (LGPL) 2.1
2Rapid-I
Knowledge discovery through pattern finding in data is central to modern molecular biology, which now has thousands
of databases and similar numbers of tools for processing those data. Any data analysis in molecular biology involves
gathering and processing data from many sources, even before the analysis for the central biological question takes place.
Taverna (http://www.taverna.org.uk) is a workflow workbench that allows bioinformaticians to create data pipelines
involving distributed Web services and other forms of tool; these workflows gather and manage data in order to perform
analyses that ask biological questions.
RapidMiner (RM) is an open source, cross-platform application, released under the AGPLv3, that brings a large suite of
data processing, visualization and data mining tools to bear upon tables of data, such as those that can be gathered by
Taverna. A typical task for RM is to apply a series of operators to a table of gene products, their functions and locations
to perform simple correlations of location and function. More sophisticated tasks will involve training classifiers over a
set of features, selecting the best features and then applying the classifier to test data. The RapidAnalytics enterprise
server from Rapid-I (http://rapid-i.com/) provides a platform for interacting with these data mining operators from RM
via the RapidAnalytics execution service.
Through the RM plugin for Taverna we have combined the ability to gather and process data from many molecular
biological sources with RM’s data mining capabilities to provide a powerful tool for scientific analysis.
The RapidAnalytics execution service is a single polymorphic WSDL service. It takes a reference to the input file
locations for the operator as input, along with a set of parameters including the name of the operator to execute. The
polymorphic functionality, however, can make it difficult to work with in an environment like Taverna. For this reason
the RM-specific plugin was developed. Using the plugin, the available operators within RM are exposed within Taverna.
In addition we provide dialog-based interactions for setting input file locations and operator invocation parameters.
RapidAnalytics requires the data being processed to sit within the RapidAnalytics repository. This reduces the need to
pass large amounts of data between services and improves the execution time on file transfer overheads associated with
running distributed workflows. So, before data can be analyzed in Taverna, these data must be first uploaded onto the
RapidAnalytics server. When any RM operator in Taverna is used, the user is given a configuration dialog. From this
dialog a user can access the RapidAnalytics server in order to browse or upload new data to the repository.
The first release of the plugin is currently limited to a subset of RM operators. Simple operators that work on input files
and generate a set of output files are easily handled. However, RM also contains some specialized operators, that we call
dominating operators that control the execution of one or more sub data mining processes. This control currently
requires some logic that cannot be expressed in Taverna via web services alone. These tools are important in data
mining, so work is currently underway to extend the RM plugin.
The RM plugin already makes available a large number of data processing, visualization and data mining tools for
bioinformatics analyses implemented as workflows within Taverna. In order to illustrate the benefits the RM plugin
brings, we have developed several workflows to demonstrate its functionality in some typical bioinformatics tasks. These
workflows are available for download from myExperiment in http://www.myexperiment.org/groups/402.html
This work is funded by the EU/FP7/ICT-2007.4.4 e-LICO project.
Poster 35
!"#$%"&'$()* +(&* ,(%-".(#* /0%* 1"0"2/0%34$".#5* * 6'7$"87(* ,(9'(2.(*
:7";23(2$*
*
!"#"$%&'%($)*+,-.%/01"*%2'%!$)3#"$%0-4%5")66$"7%/'%20$#)-%
%
8)99":"%)6%;,6"%<3,"-3"*.%=-,>"$*,#7%)6%?@-4"".%?@-4"".%=A.%??B%CDE%
%
<34"75%F'>'#$)*+,-G4@-4""'03'@H%
=%0>(.$5%+##FIJJKKK'3)1FL,)'4@-4""'03'@HJM0L0K*%
,0'%.(*?0)(5%+##FIJJKKK'3)1FL,)'4@-4""'03'@HJM0L0K*J0$3+,>"JM0L0NF$)M"3#'O,F%
@".(2.(5%PF03+"%Q%%
%
R"S4%9,H"%#+"%0L*#$03#%#)%L"%3)-*,4"$"4%6)$%#+"%#09H%0-4%#+"%F)*#"$%*"**,)-*'%
%
R"L*"$>,3"*%#"3+-)9):7%+0*%L"3)1"%0%F)F@90$%1"#+)4%#)%10H"%L,),-6)$10#,3*%
#"3+-,T@"*%K,4"97%0>0,90L9"%,-%0%F$):$0110L9"%)$%*3$,F#0L9"%6)$1'%%D)K">"$.%
"U"3@#,)-%9,1,#*%)6%F@L9,3%K"L%*"$>,3"*%0$"%)6#"-%#))%$"*#$,3#,>"%0-4%#+"%40#0%F$,>037%)6%
#+",$%@*"$*%30--)#%L"%:@0$0-#""4'%V-%044,#,)-.%#+"%033"**,L,9,#7%)6%K"L%*"$>,3"*%0*%
K"99%0*%#+"%6$"T@"-37%)6%#+",$%,-#"$603"%3+0-:"*%,*%)@#*,4"%)6%#+"%@*"$*S%3)-#$)9'%
W,-0997.%10-7%K"L%*"$>,3"*%F$)>,4"%>"$7%9,1,#"4%033"**%#)%#+"%3)110-4N9,-"%
F0$01"#"$*%)6%#+"%#))9*%#+"7%K$0F'%
D"$"%K"%F$"*"-#%#+"%/P>0%2,),-6)$10#,3*%P-097*"*%R"L%<"$>,3"*%X/P2PR<Y%
*7*#"1.%0%3)99"3#,)-%)6%<ZP!%K"L%*"$>,3"*%K+,3+%)>"$3)1"*%#+"*"%F$)L9"1*'%
/P2PR<%9"#*%0%-)$109%@*"$%4"F9)7.%)F"$0#"%0-4%3)-6,:@$"%K"L*"$>,3"*%)-%#+",$%)K-%
+0$4K0$"'%(+"%/P2PR<%,-*#0990#,)-%F$)3"4@$"%,*%*,1F9"%0-4%033"**,L9"%6)$%0-%
0>"$0:"%@*"$%K,#+)@#%F$):$011,-:%"UF"$,"-3"'%/P2PR<%30-%$@-%)-%0%*,-:9"%
3)1F@#"$.%)$%0%39@*#"$%10-0:"4%L7%Z$039"%5$,4%[-:,-".%;)04%<+0$,-:%W03,9,#7%0-4%
)#+"$%*7*#"1*%>,0%?\]PP'%8)-6,:@$0#,)-%6,9"*%*F"3,67%+)K%M)L*%*+)@94%L"%$)@#"4%#)%
#+"%9)309%103+,-"%)$%39@*#"$%4"F"-4,-:%)-%9)04%)$%*,O"%)6%M)L'%%%
/P2PR<%3@$$"-#97%F$)>,4"*%F$):$0110#,3%033"**%#)%#+"%3)11)-%1@9#,F9"%
*"T@"-3"%09,:-1"-#%1"#+)4*%!$)L3)-*.%(N3)66"".%]@*39".%]066#.%0-4%89@*#09R%0-4%
30-%L"%033"**"4%6$)1%#+"%F)F@90$%/09>,"K%1@9#,F9"%09,:-1"-#%0-097*,*%K)$HL"-3+%)$%
#+"%3)110-4N9,-"%39,"-#%#+0#%,*%4,*#$,L@#"4%K,#+%/P2PR<'%P9#+)@:+%3@$$"-#97%6)3@*,-:%
)-%1@9#,F9"%09,:-1"-#.%#+"%/P2PR<%6$01"K)$H%,*%69"U,L9"%0-4%30-%L"%"0*,97%"U#"-4"4%
K,#+%)#+"$%K"L%*"$>,3"*'%R"%F90-%#)%044%4,*)$4"$.%3)-*"$>0#,)-%0-4%*"3)-40$7%
*#$@3#@$"%F$"4,3#,)-%K"L%*"$>,3"*%#)%/P2PR<%*))-'%
Poster 36
!"#$%&'(')"'%*+"',%-$.+'/$)0+1%$2'/%$'30)4+'*$%.+,,3"4'
)"#')")56,3,
&!"#$%&'(%)*+,,"%&&-.&/#"0$,&12&34)56*%
7(892&":&3*",;*(%;(,&$%<&=>9)*9*"%&$%<&?(%9()&:")&3*",;*(%;(,.
@$)"6*%,+$&A%,9*9>9(9.&'B6,"CB5(%&D.&="C>0.&'><<*%5(.&EF(<(%
7
0$#"5%GH$)(9$2")5
#998IJJFFF2(%<)"C2%(9
K",9&":&9#(&;"<(&*,&>%<()&9#(&LM;6$>,(&3E7&6*;(%,(2&E*%56(&:*6(,&$)(&NOP2
Q#*6(&R>*6<*%5&S*)9>$6MQ")0&3$,(.&$&<$9$R$,(&":&,8$9*"M9(08")$6&5(%(&(T8)(,,*"%&8$99()%,&
<>)*%5&!"#$%$&'()&<(C(6"80(%9.&F(&:">%<&9#$9&9#(&;>))(%9&9""6&;#$*%&:")&0*;)",;"8G&F$,&
*%,>::*;*(%92&?"00();*$6&,":9F$)(&;">6<&%"9&R(&0"<*:*(<&:")&">)&8>)8",(,&,*%;(&F(&6$;+(<&9#(&
,">);(&;"<(2&A0$5(!.&9#(&<(:$;9"M,9$%<$)<&*0$5(&,":9F$)(.&F$,&%"9&<(,*5%(<&:")&0"<()%&
%((<,2&Q(&:">%<&9#$9&F(&F()(&%"9&9#(&"%6G&"%(,&)>%%*%5&*%9"&6*0*9$9*"%,&$%<&<(;*<(<&9"&<"&
,"0(9#*%5&$R">9&*92
/"&,"6C(&9#(,(&8)"R6(0,&F(&#$C(&<(C(6"8(<&$&%(F&86$9:")0&M&U%<)"C&M&9#$9&;"C(),&9#(&(%9*)(&
0*;)",;"8G&F")+:6"F2&A9&*,&$&86>5M*%&:)$0(F")+&;"%,*,9*%5&":&-VW&WWW&6*%(,&":&!$C$.&)>%%*%5&
"%&$66&"8()$9*%5&,G,9(0,2&/#(&:"66"F*%5&$)(&U%<)"CX,&:($9>)(,
•
A9&#$,&$<C$%;(<&;"%9)"6&":&9#(&0*;)",;"8(&#$)<F$)(&YC*$&K*;)"M0$%$5()Z
•
A9&;$%&C*(F.&$%%"9$9(.&8)";(,,&$%<&$%$6G[(&)(;")<*%5,2&=(F&$886*;$9*"%,&$)(&
*%9)"<>;(<&F*9#&9#(&5)$8#*;$6&8)"5)$00*%5&6$%5>$5(&Y:6"F,Z.&*%9()8)(9(<&!$C$&")&%(F&
86>5*%,&YF)*99(%&*%&!$C$Z2&U%<)"C&;$%&$6,"&R(&>,(<&$,&$&6*R)$)G.&0$+*%5&#($CG&<$9$&
$%$6G,*,&,*086(2
•
/#()(&$)(&$9&9#(&0"0(%9&$R">9&-LW&*0$5(&8)";(,,*%5&:*69(),&$%<&\W&86>5*%,
•
]&%(F&,;#(0$&:")&<$9$&,9")$5(&F#*;#&#$%<6(,&$)R*9)$)G&0(9$<$9$.&V7&)(;")<*%5,.&
0*T(<&)(,"6>9*"%,&$%<&;"08)(,,*"%&YR"9#&6",,G&$%<&6",,6(,,Z2
•
P$)5(&<$9$,(9,&$)(&#$%<6(<&5)$;(:>66G2&K")("C().&9#(&,8((<&":&0",9&"8()$9*"%,&*,&
>%$::(;9(<&RG&9#(&,*[(&":&<$9$,(9,2&U%<)"C&,>88")9,&0",9&:*6(&:")0$9,&YC*$&3*"M
:")0$9,Z
8-$'/$)0+1%$2'3,'%*+"')"#'/$++'%/'.9)$4+:'
;<'3,'934956'/5+=3>5+')"#'.)"'>+')#%*<+#'<%'
)"6'"++#,'3"'.-$$+"<'$+,+)$.9:'?,')"'
)**53.)<3%"@'1+'9)&+'-,+#'%-$'/$)0+1%$2'
<%'A-)"<3/6')'5)$4+'"-0>+$'%/'4+"+,B'
+=*$+,,3%"'5+&+5,:'
Poster 37
fastapl – a utility for processing fasta format data
Paul Horton
AIST, Computational Biology Research Center
Tokyo, Japan
horton-p@aist.go.jp
Introduction
Processing of sequence and annotation data in multifasta format is a common task in bioinformatics. Existing
resources to facilitate this task either provide a predefined set of configurable scripts to cover many common
cases (e.g. EMBOSS tools) or provide libraries (e.g. BioPerl) which can be called from user programs.
fastapl, (FASTA Perl Loop, pronounced like ”fasta apple”) is a new and complementary approach,
motivated by the observation that the tasks users require are often simple but ad hoc tasks. Some of which
would be almost trivial to do with standard linux tools such as grep, sed, wc, etc. – except for the fact that
these tools are line based, rather than fasta record based. fastapl provides functionality analogous to perl
used with its -n and related switches, but looping over fasta records instead of lines.
Examples
fastapl is best described by listing examples:
Reformat sequence lines to have max line length 100.
% fastapl -p -l 100
Truncate sequences to maximum sequence length of 39.
% fastapl -p -e ’$seq = substr( $seq, 0, 39 )’
Reverse complement DNA sequences.
% fastapl -p -e ’$seq = reverse $seq; $seq =
tr/acgtACGT/tgcaTGCA/’
Print records of sequences not starting with methionine.
% fastapl -g -e ’$seq !~ /^M/’
Randomly shuffle sequences.
% fastapl -M ’List::Util qw(shuffle)’ -p -e ’$seq = join( "", shuffle(@seq) )’
Sort records by id.
% fastapl --sort -e ’$id1 cmp $id2’
Code Generation
An unusual and useful feature of fastapl is that it can generate a standalone program in standard perl for
any command it can handle (-n option). This is very helpful when debugging user scripts. Moreover, it
eases the transition from quick and dirty one-liners to more robust programs – which often happens when
we find ourselves repeating tasks we initially thought would be only need to be done once or twice.
Availability and Future Directions
fastapl may be used by anyone under the GNU general public license (GPLv3). More examples, documentation, and the source code are available at http://seq.cbrc.jp/fastapl. We are currently developing
fastqpl, which is closely analogous to fastapl, but for use with fastq format files. In addition, we plan to
package this software as a debian package.
Poster 38
)/$0($Q2SHQ6RXUFH$JHQW%DVHG0RGHOOLQJ3ODWIRUP
IRU+LJK3HUIRUPDQFH&RPSXWLQJRI%LRORJLFDO6\VWHPV
0HVXGH%LFDN6LPRQ&RDNOH\0LNH+ROFRPEH'DZQ)LHOG
(PDLOPELFDN#FHKDFXN
1(5&&HQWUHIRU(FRORJ\DQG+\GURORJ\0DFOHDQ%XLOGLQJ%HQVRQ/DQH:DOOLQJIRUG8.2;%%
'HSDUWPHQWRI&RPSXWHU6FLHQFH5HJHQW&RXUW3RUWREHOOR6WUHHW8QLYHUVLW\RI6KHIILHOG6KHIILHOG6'3
)/$0(LVGLVWULEXWHGZLWKD*3/Y/LFHQVHYLDKWWSVRXUFHIRUJHQHWSURMHFWVIODPHDEPILOHVSURMHFW
7KH LGHRORJLHV VXUURXQGLQJ &HOOXODU $XWRPDWD JDYH ELUWK WR WKH FRQFHSW RI DJHQWEDVHG
PRGHOOLQJ $%0 ,QWURGXFHG E\ 5H\QROGV LQ $%0 UHFHQWO\ EHFDPH WKH GULYLQJ IRUFH LQ
YDULRXVUHVHDUFKDUHDVHVSHFLDOO\DIWHUWKHDGYHQWRISRZHUIXOSDUDOOHOFRPSXWHUV5DWKHUWKDQ
ORRNLQJDWV\VWHPVDVDZKROH$%0HQFRXUDJHVERWWRPXSDSSURDFKHVDOORZLQJIRFXVRQWKH
HPHUJHQFH RI FRPSOH[LW\ DV D FRQVHTXHQFH RI LQWHUDFWLRQV EHWZHHQ LQGLYLGXDO DJHQWV 7KH
SULQFLSDOFRPSRQHQWVRIDV\VWHPDUHUHSUHVHQWHGDVDXWRQRPRXVDJHQWVZKLFKDUHDVVLJQHG
FKDUDFWHULVWLFSURSHUWLHVDQGEHKDYLRXUDOUXOHVZLWKLQDQHQYLURQPHQW7KH\HQFDSVXODWHDVWDWH
WKDW FKDQJHV EDVHG RQ WKH H[FKDQJH RI LQIRUPDWLRQ GXULQJ ORFDO LQWHUDFWLRQV 7KXV RYHU WLPH
WKHV\VWHPHYROYHVIURPPLFUROHYHOWRPDFUROHYHODQGFRPSOH[EHKDYLRXULVREVHUYHGDWWKH
SRSXODWLRQOHYHO6\VWHPVEDVHGRQGLIIHUHQWLDOHTXDWLRQVWHQGWRDSSUR[LPDWHDFWXDOEHKDYLRXU
ZKHUHDV DJHQWEDVHG V\VWHPV FDSWXUH GLIIHUHQFHV EHWZHHQ LQGLYLGXDOV VXSSRUWLQJ IOH[LELOLW\
DQGYDULDELOLW\
)/$0(GHYHORSHGDWWKH8QLYHUVLW\RI6KHIILHOGLVDQRSHQVRXUFH$%0IUDPHZRUNWKDWDOORZV
PRGHOOHUV IURP DOO GLVFLSOLQHV WR HDVLO\ ZULWH DJHQWEDVHG PRGHOV %DVHG RQ ;0/ DQG & LW
VXSSRUWV YDULRXV OHYHOV RI FRPSOH[LW\ IURP PRGHOOLQJ PROHFXOHV WR FRPSOHWH FRPPXQLWLHV E\
RQO\ YDU\LQJ DJHQW GHILQLWLRQV DQG IXQFWLRQV ;PDFKLQHV DUH XVHG DV WKH DJHQW DUFKLWHFWXUH
ZKLFK SURYLGH DJHQWV ZLWK PHPRU\ VWDWHV DQG WUDQVLWLRQ IXQFWLRQV $JHQWV FRPPXQLFDWH YLD
D 0HVVDJH 3DVVLQJ ,QWHUIDFH 03, KDQGOHG E\ DQ LQWHOOLJHQW PHVVDJH ERDUG ZKLFK DOORZV
ILOWHULQJRIPHVVDJHVWKHUHE\LPSURYLQJSHUIRUPDQFH)/$0(XVHVDGLVWULEXWHGPRGHO6LQJOH
3URJUDP0XOWLSOH'DWD630'DQGKDQGOHVGHDGORFNVWKURXJKV\QFKURQLVDWLRQSRLQWV
0RGHOOHUVDUHKLQGHUHGE\FRPSOH[LWLHVRISRUWLQJPRGHOVRQSDUDOOHOSODWIRUPVYHUVXVWKHWLPH
WDNHQWRUXQODUJHVLPXODWLRQVRQDVLQJOHPDFKLQH)/$0(VWDQGVRXWIURPRWKHUVDVLWDOORZV
FRPSOH[PRGHOVWREHUXQLQSDUDOOHORYHU+LJK3HUIRUPDQFH&RPSXWLQJ+3&JULGVYLD+3&[
OLEUDULHV E\ DXWRPDWLFDOO\ JHQHUDWLQJ WKUHDGHG FRGH $V D UHVXOW VLPXODWLRQV ZLWK DV PDQ\ DV
DJHQWV ZHUH VXFFHVVIXOO\ SHUIRUPHG ZLWKLQ PLQXWHV HQKDQFLQJ UHVHDUFK LQ WHUPV
RI WLPH DQG FRPSOH[LW\ RI PRGHOV 7KLV ZRUN ZDV GRQH LQ FROODERUDWLRQ ZLWK WKH 6FLHQFH DQG
7HFKQRORJ\)DFLOLWLHV&RXQFLODWWKH5XWKHUIRUG$SSOHWRQ/DEVDVSDUWRIWKH(85$&(SURMHFW
)/$0(KDVEHHQXVHGLQPRGHOOLQJYDULRXVELRORJLFDOV\VWHPVUDQJLQJIURPIRUDJLQJVWUDWHJLHV
RILQGLYLGXDODQWVLQODUJHFRORQLHVEHKDYLRXURI(FROLEDFWHULDLQGHR[\JHQDWHGHQYLURQPHQWV
LQWHUFHOOXODU ERQGLQJ PLJUDWLRQ SUROLIHUDWLRQ LQ XURWKHOLDO WLVVXHV WR WKH HIIHFW RI JHQRPH VL]H
RQSKDJHDQGEDFWHULDOLQWHUDFWLRQV)XQGHGE\WKH(365&)/$0(DLPVWRSURYLGHLPSURYHG
+3&SHUIRUPDQFHE\DOORZLQJG\QDPLFGLVWULEXWLRQRIDJHQWVGXULQJSDUDOOHOVLPXODWLRQV
Poster 39
GeneNomenclatureUtils: Tools for annotating genes and comparing gene lists with
community resources.
*
Mike D.R. Croning and Seth G.N. Grant
*†*
Genes to Cognition Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA,
†UK and Division of Clinical Neuroscience, Royal Infirmary of Edinburgh, 51 Little France Crescent,
Old Dalkeith Road, Edinburgh, EH16 4SA.
Email: mdr@sanger.ac.uk
Project URL: www.genes2cognition.org/software/GeneNomenclatureUtils
Code URL: https://github.com/mdrc/GeneNomenclatureUtils
License: Artistic License 2.0, http://www.opensource.org/licenses/artistic-license-2.0.php
Verifying, annotating, storing, and comparing lists of genes is now an essential task in modern
experimental and computational biology. These lists can be derived from so-called �omics’
technologies that allow one to investigate gene (transcriptomics) or protein expression (proteomics)
across the genome under defined conditions such as drug treatment, developmental stage, or clinical
status. Similarly lists of candidate disease genes are being generated from the huge quantities of
sequence data produced by geneticists and clinicians employing new-sequencing technologies to
identify sequence variation in large cohorts of human patient samples.
Comparing lists of genes produced by different labs from these disparate experimental sources
remains an arduous, but obligatory task for the bioinformatician. This is because gene and protein
identifiers are not stable, as they are continually revised and withdrawn over time by the gene
nomenclature committees (such as HGNC) and database identifiers provided by other community
genomic resources change similarly. This �identifier creep’ is confounded by the fact that gene list
comparisons often have to be made across species and model organism.
Here we provide an extensive suite of 40+ command-line driven utilities that are designed to simply
this process. We provide an automated means to fetch the key nomenclature and other genomic,
bibliographic and disease annotation resources from: HGNC, MGI, NCBI Entrez, OMIM, PubMed and
UniProt, and store them on local disk. Required MEDLINE records are cached in a local MySQL
database. This ensures a high-availability of all the resources, and allows a consistent set of
nomenclature and annotation to be used across the lifetime of an ongoing experiment or analysis.
Scripts are provided to verify the gene symbols, and species-specific database identifiers to which
they are associated, and to project across genome to orthologous genes and identifiers. Once a list is
verified it can be quickly integrated into the list comparison engine, which can store and compare any
number of such gene lists in a particular identifier-space. This makes it straightforward to manage
hundreds of lists, compare them, and subsequently recheck their nomenclature. Other tools allow
look-up of human disease association in OMIM, what genes have reported mouse knockout models,
and the frequency of occurrence of protein domains, as assessed from a list of genes.
The component Perl scripts can quite simply be chained together to create repeatable workflows, and
some care has been taken to use a consistent (and minimal) set of well-documented command line
parameters for this purpose. The tools perform input/output using tab-delimited text files, a widelyaccepted format supported across the community. The package incorporates a number of accessory
scripts to manipulate and check tab-delimited files, and to interconvert with proprietary spreadsheet
formats, often used in experimental research environments.
GeneNomenclatureUtils is free software, collaboratively-developed, easy to install, and should prove
useful to both bioinformaticians and experimental investigators working with lists of genes.
Poster 40
GenomeTools – a versatile and efficient
bioinformatics toolkit
Gordon Gremme, Sascha Steinbiss, and Stefan Kurtz
Center for Bioinformatics, University of Hamburg, BundesstraГџe 43, 20146 Hamburg, Germany
Contact E-mail: steinbiss@zbh.uni-hamburg.de
URL of the project web site: http://genometools.org
Git source repository: git://genometools.org/genometools.git
License: ISC (BSD-like)
While most bioinformatics software is written to be efficient in terms of space and time, other aspects of the
software, like extensibility and portability, are mostly neglected. Often, applications are developed to serve only
one small task, raising the need for a myriad of �glue’ scripts to integrate single tools into a specific work flow.
Comprehensive toolkits for building bioinformatics applications exist (e.g. Bioperl, Biojava, . . . ) but they are often
tied to a specific language, limiting their reusability in other contexts.
To address these problems we have developed the GenomeTools, a free software toolkit for tasks relevant when
working with large genomes. The GenomeTools provide an extensive software library for storage, indexing, and
processing of both genomic sequences and annotations using an object-oriented interface. Written in C, the library
is accessible via bindings to a variety of script-programming languages.
Based on the library, the GenomeTools contain a collection of advanced bioinformatics tools for sequence handling
and analysis, transposon prediction, and annotation visualization, some of which have already been published
separately [1–6]. The tools show how the GenomeTools library can be used to write well-maintainable, clean
software without compromising efficiency.
References
[1] S. GrВЁaf, F.G.G. Nielsen, S. Kurtz, M.A. Huynen, E. Birney, H. Stunnenberg, and P. Flicek. Optimized design
and assessment of whole genome tiling arrays. Bioinformatics, 23 ISMB/ECCB 2007:i195–i204, 2007.
[2] D. Ellinghaus, S. Kurtz, and U. Willhoeft. LTRharvest, an efficient and flexible software for de novo detection
of LTR retrotransposons. BMC Bioinformatics, 9:18, 2008.
[3] S. Kurtz, A. Narechania, J.C. Stein, and D. Ware. A new method to compute K-mer frequencies and its
application to annotate large repetitive plant genomes. BMC Genomics, 9:517, 2008.
[4] S. Steinbiss, G. Gremme, C. SchВЁarfer, M. Mader, and S. Kurtz. AnnotationSketch: a genome annotation
drawing library. Bioinformatics, 25(4):533–534, 2009.
[5] S. Steinbiss, U. Willhoeft, G. Gremme, and S. Kurtz. Fine-grained annotation and classification of de novo
predicted LTR retrotransposons. Nucleic Acids Res, 37(21):7002–7013, 2009.
[6] D. J. Schmitz-HВЁubsch and S. Kurtz. MetaGenomeThreader: A software tool for predicting genes in DNAsequences of metagenome projects. In Streit, W. and Daniel, R., editor, Metagenomics. Methods and Protocols,
Methods in Molecular Biology. Springer, Berlin, 2010.
Poster 41
!"#$%&'"$()(*"(+,$"(-./%0$(12#2(32%$4./5$(2"6(
7/$%8('"#$%920$
*:;<+=->(
!"#$%&'#()&*(+,"-(.',/"01213*(4'#1",'(5&6'#7*(+/01'#(8'00*(9"0$17(
87#601#7*(:1;"(<%#"*(=1>?'0/(9216?*(='/";(@6ABC#*(D&,1"(9&,,1E'#*(F73(
:1>;,"2(GHIJ
*??!@!*;!+A> K#1E"0316%(7L(8'2M01/$"*(L?NOP
(
(Q
( (>'2R'>
(
(R((&;(
B=+CDE;(
:=@>(
(SSS(R((1#6"021#"(R((70$(
-+:=ED(
E+1D>
(?66B(TUU
( (SSS
(
(R((1#6"021#"(R((70$(U((M07S3"0(*(
(?66B(TUU
( (SSS
(
(R((1#6"021#"(R((70$(U((S1;1(U((9VW8?">;7&6(
@!EDA-D>
<FH<
?:A1!AF>
X",,>72"(Y0&36('#/(6?"(WI)UW)F=IR
I#6"0:1#"(Z(+#([B"#(97&0>"(4'6'ZX'0"?7&3"('#/(\&"0%(1#6"0L'>"
I#6"0:1#"(GSSSR1#6"021#"R70$J(13('(B7S"0L&,(7B"#(37&0>"(3%36"2(L70(M&1,/1#$(]&"0%Z7B61Z
213"/(/'6'(S'0"?7&3"3R(I6(3&BB7063(/'6'(1#6"$0'617#(L072(36'#/'0/(M17,7$1>',(L702'63(
'#/(2';"3(16("'3%(67('//(%7&0(7S#(/'6'R(+(37B?1361>'6"/(S"M('BB,1>'617#(B07E1/"3(L,"-1Z
M,"(]&"0%('>>"33(L70('#%(/'6'(27/",R(4'6'(>'#(M"(B07$0'22'61>('>>"33"/(M%(M&1,/1#$(
'#/("-">&61#$(]&"01"3(E1'(S"MZ3"0E1>"(+HI(S?73"(>7/"(>'#("E"#(M"($"#"0'6"/(S16?1#(
6?"(S"M('BB,1>'617#(1#(E'017&3(B07$0'221#$(,'#$&'$"3R(
Poster 42
!"#$%&'()*(+,"-.%/%0-.(
!1(21(3,04/&,5(31(61(+,0.7%-5(81(91(:1(9",/%-5(;1(!1(2",/0-(
<0##&=&(0>(?%>&(@4%&-4&.5(A-%$&,.%/B(0>(8C-D&&5(AE1(40-/"4/*(F1G,04/&,HDC-D&&1"41CI(
(
!"#$%&"'(7//G*JJ'''1F"#$%&'10,=(K.0C,4&5(D04CL&-/"/%0-("-D(,&#"/&D(.0>/'",&(#%-I.M((
()*+,"'(7//G*JJ'''1F"#$%&'10,=J.0C,4&J.0C,4&17/L#(
-%,".$"'/!"#$%&'(%.(%LG#&L&-/&D(%-(!"$"("-D(%.(L"D&("$"%#"N#&(C-D&,(/7&(01-/231(
(
+7&('%D&#B(C.&D(LC#/%G#&(.&OC&-4&("#%=-L&-/($%.C"#%P"/%0-5("--0/"/%0-("-D(&D%/%-=(/00#(Q(!"#$%&'5(
'%##(.00-(&-/&,("(/,"-.%/%0-(G&,%0D1(+7&(4C,,&-/(!"#$%&'()(40D&N".&5('7%47(.CGG0,/.("(F"$"("GG#&/(
"-D(./"-DR"#0-&(D&.I/0G("GG#%4"/%0-5('".(D&$&#0G&D(%-()SST1(@%-4&(/7&-5("(=,&"/(-CLN&,(0>(
>&"/C,&.("-D(%LG,0$&L&-/.(7"$&(N&&-(L"D&(/0(%/(K.&&(
7//G*JJ'''1F"#$%&'10,=J,&#&".&U%./0,B17/L#(>0,(D&/"%#.M5("-D(/7%.(D&$&#0GL&-/("4/%$%/B(7".(
C-D0CN/&D#B(40-/,%NC/&D(/0(%/.(40-/%-C&D(.C44&..1(U0'&$&,5('7%#./(!"#$%&'()(L&&/.(/7&(-&&D.(0>(
L"-B(N%0#0=%./.("-D(N%0%->0,L"/%4%"-.5(%/.(C-D&,#B%-=(.0>/'",&(",47%/&4/C,&(7".(N&40L&(
%-4,&".%-=#B(40LG#&V1(+0("##&$%"/&(/7&.&(%..C&.5('&('%##(.00-(N&=%-("(L"F0,(,&$%.%0-(0>(/7&(4C,,&-/(
",47%/&4/C,&(/0(4,&"/&(!"#$%&'($&,.%0-(WX('7%47('%##(>"4%#%/"/&(/7&(%-/,0DC4/%0-(0>(-&'(>&"/C,&.(".(
G#C=%-.(D&$&#0G&D(NB("('%D&,(D&$&#0GL&-/(40LLC-%/B1(
(
!"#$%&'(6&,.%0-()1Y(%.(&VG&4/&D(/0(N&(/7&(G&-C#/%L"/&(L"F0,(,&#&".&(N".&D(0-(/7&(6&,.%0-()(
40D&N".&1(Z/('%##(%-40,G0,"/&("(-CLN&,(0>(-&'(>&"/C,&.5(%-4#CD%-=(%LG,0$&D(.CGG0,/(>0,(
$%.C"#%P%-=("-D("-"#BP%-=([\:(.&OC&-4&("#%=-L&-/.5(/7"-I.(/0(40-/,%NC/%0-.(>,0L(0C,()S]S(
;00=#&(@CLL&,(0>(<0D&(./CD&-/5(?"C,&-(?C%(K7//G*JJF"#$%&'R,-"1N#0=.G0/140LM1(?%-I&D("#%=-L&-/(
"-D(./,C4/C,&($%.C"#%P"/%0-.(7"$&(N&&-(%LG,0$&D5("##0'%-=("(.&/(0>(40LG#&V&.(0,(LC#/%RD0L"%-(
G,0/&%-(./,C4/C,&.(/0(N&(40#0C,&D(C.%-=("#%=-L&-/.("..04%"/&D('%/7(%-D%$%DC"#(D0L"%-.(0,(
G,0/&%-.('%/7%-(/7&(.4&-&1(!"#$%&'?%/&5(/7&("GG#&/($&,.%0-(0>(!"#$%&'5(7".("#.0(N&&-(&V/&-D&D('%/7(
"#%=-L&-/("--0/"/%0-($%.C"#%P"/%0-(4"G"N%#%/%&.(/7"/('&,&(G,&$%0C.#B(0-#B("$"%#"N#&(%-(/7&(!"#$%&'(
8&.I/0G1(Z/.(GCN#%4(:3Z(7".("#.0(N&&-(&V/&-D&D5("-D("(G,0/0/BG&(F"$".4,%G/(#%N,",B(D&$&#0G&D(/0(
>"4%#%/"/&(/%=7/(%-/&=,"/%0-5(%-4#CD%-=(/7&(&V47"-=&(0>(L0C.&(0$&,.5(.&#&4/%0-("-D($%.C"#%P"/%0-(
"//,%NC/&.(N&/'&&-(/7&("GG#&/("-D(0/7&,('&N(N".&D($%.C"#%P"/%0-(40LG0-&-/.(0-(/7&(G"=&1(
(
:(-CLN&,(0>(!"#$%&'(40LLC-%/B(&$&-/.(7"$&("#.0(/"I&-(G#"4&5(G,0$%D%-=(C.('%/7($"#C"N#&(
>&&DN"4I(,&=",D%-=(!"#$%&'^.(C.&>C#-&..("-D(,&#&$"-4&(/0(N%0#0=%./.1(_$&,(.&$&-(!"#$%&'(/C/0,%"#.(
7"$&(7"GG&-&D(0$&,(/7&(#"./(])(L0-/7.5(%-4#CD%-=(/'0(,&.%D&-/%"#(40C,.&.5(7&#D("/(/7&(`92?R`2Z(
%-(<"LN,%D=&5(AE("-D(`92?(U&%D&#N&,=(%-(;&,L"-B1(+7&.&(40C,.&.(40$&,("##(".G&4/.(0>(!"#$%&'^.(
C.&5(%-4#CD%-=(/7&(%-./"##"/%0-(0>(!:2:a@(.&,$&,.(K+,0.7%-(&/("#15()S]]("-D(
7//G*JJ''140LGN%01DC-D&&1"41CIJF"N"'.M5('7%47(&-"N#&.(!"#$%&'(C.&,.(/0(G&,>0,L(
40LGC/"/%0-"##B(%-/&-.%$&("-"#B.%.(/".I.(0-(/7&%,(#04"#(L"47%-&(0,(4#C./&,1(a&(7"$&("#.0(
&VG&,%L&-/&D('%/7(-&'(.0>/'",&(D&G#0BL&-/("GG,0"47&.5(%-4#CD%-=(/7&(G,0$%.%0-(0>(0>>R#%-&(
L%,,0,.(0>(/7&(!"#$%&'("-D(!:2:a@('&N(.%/&.5(/0("##0'(C.&,.(%-(,&L0/&(#04"/%0-.('%/7(.#0'(0,(
C-./"N#&(%-/&,-&/(40--&4/%0-.(/0("44&..("##(!"#$%&'('&N(,&.0C,4&.(#04"##B1((
(
!"#$%&'(C."=&(./"/%./%4.(.C==&./(/7&(D&.I/0G("GG#%4"/%0-(%.(-0'(C.&D(0$&,(TSSS(/%L&.("(L0-/7(
'0,#D'%D&1(Z/(%.(/7&,&>0,&(4,%/%4"#(/7"/(/7&($&,.%0-()(.&,%&.(0>(!"#$%&'(40-/%-C&.(/0(N&(L"%-/"%-&D(
"-D(%LG,0$&D('7%#./(/7&($&,.%0-(W(40D&N".&(%.(D&$&#0G&D1(@0L&(0>(/7%.(L"%-/&-"-4&(&>>0,/(40L&.(
>,0L(/7&(!"#$%&'(C.&,(40LLC-%/B(Q('7%47(%.(D%$&,.&5("-D(%-4#CD&.("(-CLN&,(0>(&VG&,/(.0>/'",&(
D&$&#0G&,.('70(40-/,%NC/&($"#C"N#&(G"/47&.(/0(%/.(40D&N".&1(U0'&$&,5(/7&.&(40-/,%NC/%0-.(",&(
7"LG&,&D(NB(!"#$%&'^.(4C,,&-/(.0C,4&(D%./,%NC/%0-(L0D&#5('7%47(%-$0#$&.(GCN#%.7%-=(.C44&..%$&(
,&#&".&.(".("(.0C,4&(",47%$&1(+0(#0'&,(/7&(N",,%&,(/0(>C/C,&(40-/,%NC/%0-5('&(7"$&(-0'(L"D&(
!"#$%&'^.($&,.%0-(40-/,0#(,&G0.%/0,B(GCN#%4#B("44&..%N#&1(2B(0G&-%-=(/7&(G,0F&4/(,&G0.%/0,B("-D(
%-/&=,"/%-=(%/('%/7(!"#$%&'^.(&V%./%-=(NC=(/,"4I&,5('&(70G&(/0(>0./&,("-(%-4,&".&(%-(40-/,%NC/%0-(/0(
!"#$%&'(>,0L(/7&(0G&-(.0C,4&(N%0%->0,L"/%4.(.0>/'",&(D&$&#0GL&-/(40LLC-%/B1(
(
[&>&,&-4&.(
+,0.7%-5(365(3,04/&,5(!25("-D(2",/0-5(;(K)S]]M1(b!:2:a@*9@:(8%./,%NC/&D(a&N(@&,$%4&.(>0,(
2%0%->0,L"/%4.*(9C#/%G#&(@&OC&-4&(:#%=-L&-/c1(!"#$%#&'$()*+%,#)-#.)")-/"0123)4*5#
Poster 43
jsD AS and Dasty3, enabling D AS protein visualisation
Leyla Garcia1*, Bernat Gel2,3, Rafael C. Jimenez1, Jose M. Villaveces1, Gustavo A. Salazar4,1, Nicola Mulder4, Maria Martin1, Alexander Garcia5 and Henning Hermjakob1
1
European Bioinformatics Institute, Hinxton, UK.
Software Department, UPC-BarcelonaTech, Barcelona, Spain.
3
Hereditary Cancer Program, Institute for Predictive and Personalised Medicine of Cancer, Badalona, Spain.
4
Computational Biology Group, Department of Clinical Laboratory Sciences, University of Cape Town, South Africa.
5
University of Arkansas, Biomedical Informatics, Medical Center, Arkansas, USA
*
ljgarcia@ebi.ac.uk
2
jsD AS: JavaScript client library for the Distributed Annotation System; it is an open source tool freely available under the terms of the GNU Lesser
General Public License. Project Web site and source code available at http://code.google.com/p/jsdas/
Dasty3: Dasty3 is an extensible Web-based framework that supports protein visualization. Dasty3 is an open source tool freely available under the terms
of the GNU General Public License. Official Web site at http://www.ebi.ac.uk/dasty/. Project Web site and source code available at
http://code.google.com/p/dasty/
A BST R A C T
The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic or protein
sequences (http://www.biodas.org). Its client-server architecture involves servers that manage the data distribution, clients
that handle the data manipulation and visualisation, and a registry that provides a repository for registration and discovery of
DAS services. A DAS server can comprise more than one data source that actually provides the information on sequences,
i.e. reference data sources, or annotations, i.e. annotation data sources. The current specification can be found at the official
Wiki page (http://www.biodas.org/wiki/DAS1.6!"
Although DAS clients can be developed from scratch, it is also possible to use a client library. A DAS client library
encapsulates the communication with the server as well as the data parsing. Currently there are client libraries in PERL В±
Bio::DAS::Lite, JavaScript В±jsDAS, and Java В±Dasobert and JDas.
#$%&'( )$( *( +*,*'-.)/0( %&'( -1)230( 1)4.*.5( 06*0( 7*3*82$( *11( 062( *$/2-0$( 9:( %&'( ;*0*( ;)$-9,2.5( *3;( .20.)2,*1<( )0( )$( :=115(
-97/1)*30(>)06(062(1*02$0($/2-):)-*0)93?(%&'(@"A"((B6)$(1)4.*.5(C=2.)2$(062(%&'($9=.-2$<(93-2(062(;*0*()$(.20.)2,2;D()0(/*.$2$(
062(EFG(;9-=7230(*3;(8232.*02$(+*,*'-.)/0(94#2-0$(.2/.2$230)38()0"(#$%&'()$(-97/120215(*$53-6.939=$<()0(.21)2$(93(-*114*-H(
:=3-0)93$(09(/.9,);2(:22;4*-H(93($=--2$$(9.(:*)1=.2"(I0()$(J.9$$(K.)8)3(L2$9=.-2('6*.)38(MJKL'!(-97/*0)412D(>6)-6(72*3$()0(
-*3(*--2$$(06).;(/*.05($2.,2.$(;).2-015(:.97(062(4.9>$2.(>)069=0(*(/.9N5( В±*1069=86()0()3-1=;2$(*(:*114*-H(:9.($2.,2.$(390(520(
JKL'O23*412;"( #$%&'( 79;=12$( *.2( ;21),2.2;( *$( *3( &PI( /.9,);)38( 062( -9.2( :=3-0)93*1)05?( QLG( /*.$)38D( EFG( /*.$)38D(
-9332-0)93(7*3*8)38D(20-.
Dasty3 is a Web-based framework based on its predecessor В±Dasty2, and developed upon jsDAS. It allows visualising and
manipulating proteins from DAS sources as well as from third party providers. Dasty3 relies on a modular architecture that
makes it easier to integrate new plug-ins; it also offers a public API that comprises methods for integrating, visualising, and
manipulating information from different data sources.
Users can perform searches based on a protein accession or identifier. Dasty3 uses PICR (http://www.ebi.ac.uk/Tools/picr)
to match the query to a UniProt entry; the protein sequence is retrieved from the UniProt DAS reference server
(http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot). Protein information from all other data sources configured in Dasty3 is
then retrieved; data sources can be easily added or removed. The set of predefined plug-ins provides the user with a unified,
organised, and interactive view: (i) ontology filter plug-in to navigate and filter the ontology terms used in DAS В±BioSapiens,
Sequence, Protein Modification, and Evidence Code ontologies, (ii) 3D view of the protein structures from JMOL applet, (iii)
positional features plug-in to display annotations related to particular positions of amino acids in the protein sequence,
including information about the type and the method used to generate the annotation, labels, the graphical representation of
the annotation, the data source providing it, and the category В±inferred from manual or electronic means, (iv) writeback plugin to allow users to create and modify existing annotations, (v) sequence plug-in to display the sequence and highlights the
amino acids from the annotations selected in the positional features plug-in, (vi) non-positional features plug-in to display
annotation related to the whole protein such as publications, and (vii) interactions plug-in to show information about the
molecule interaction of the protein.
Dasty3 is highly modular and extensible; new components can be easily added to the framework. The interoperability
delivered by the API facilitates the flow of data across plug-ins. The architecture also facilitates the organisation of the frontend by means of templates; easing in this way the definition of the layout and improving the user experience.
Poster 44
!"#$%&'())*+,-./Нґ/,0-.123-,40/45/-6./7489+-.2/3,:.:/:2+1/:.;,10/
;+,-./<'())*+,-.=/,0-4/-6./!40;-30>/#054283-,40/$,0.2/<!"#$%=/
$327/?@--,1ABCB/$327.D/*76+8300A/30:/ED,F.2/!46DG376.2A/
!
"##$%&'()%*%+,*-./0%123(4&+0&-(,*-()%*%+,*-./0%123(5&#/-0.&+0(*,(4*.#60&-(71%&+1&3((
8+%9&-2%0:(*,(;<=%+>&+3(7/+'(!?3(@AB@C(;<=%+>&+3(D&-./+:(
E4*--&2#*+'%+>(/60F*-G(-*&00%>H%+,*-./0%IJ6+%K06&=%+>&+J'&(
H24I.7-&J?KL/6--9LMM3G,N,05N+0,&-+.G,01.0N:.M*45-O32.MP0,8.04:.;M73::;+,-./
K,7.0;.L/KQHK/
R6./S,476.8,73D/(D142,-68;/K,G232T/<S(KK=/,;/30/399D,73-,40/5238.O42P/,0/'UU/-63-/63;/G..0/
;9.7,5,73DDT/ :.;,10.:/ -4/ 3DD4O/ 239,:/ ;45-O32./ 924-4-T9,01/ 30:/ -4/ ;,10,5,730-DT/ 2.:+7./
:.F.D498.0-/-,8.;/,0/-6./5,.D:/45/7489+-3-,403D/84D.7+D32/G,4D41T/30:/84D.7+D32/84:.D,01N/
#-/ 924F,:.;/ 30/ .V-.0;,F./ ;.-/ 45/ :3-3/ ;-2+7-+2.;/ 3;/ O.DD/ 3;/ 7D3;;.;/ 542/ 84D.7+D32/ 8.7630,7;B/
3:F307.:/ ;4DF3-,40/ 8.-64:;B/ 748932,;40/ 30:/ 303DT;,;/ 45/ 924-.,0/ ;-2+7-+2.;B/ 5,D./
,8942-M.V942-B/30:/F,;+3D,>3-,40/WAXN/E0/-49/45/S(KK/O./:.F.D49.:/-6./7489+-.2/3,:.:/:2+1/
:.;,10/ ;+,-./ <'())*+,-.=B/ 3/ 74892.6.0;,F./ 74DD.7-,40/ 45/ -44D;/ O6,76/ 3,8;/ -4/ .3;./ -6./
399D,73-,40/ 45/ 748840/ 7489+-.2&3,:.:/ :2+1/ :.;,10/ -3;P;/ GT/ 924F,:,01/ 3DD/ 0.7.;;32T/ -44D;/
30:/3D142,-68;B/O6,76/32./+;.3GD./,0/3/;,89D./30:/740;,;-.0-/8300.2/WYXN/Z.2.B/O./92.;.0-/3/
04F.D/,0-.123-,40/8.7630,;8/45/'())*+,-./,0-4/-6./!"#$%/O42P5D4O/;T;-.8/-4/5+2-6.2/.3;./
-6./399D,73-,40/45/-64;./-44D;/O,-6,0/D321./O42P5D4O;B/3DD4O,01/-6./+;.2/-4/72.3-./30:/D3+076/
O42P5D4O;/:,2.7-DT/5248/3/2,76/7D,.0-/+;.2/,0-.2537.N/R6./!40;-30>/#054283-,40/$,0.2/<!"#$%=/
,;/3/84:+D32/:3-3/.V9D423-,40/9D3-5428/-63-/3DD4O;/84:.D,01/45/O42P5D4O;/542/:3-3/303DT;,;/
W[XN/R6./G3;,7/+0,-/45/.V.7+-,40/O,-6,0/!"#$%/,;/3/04:.B/-6+;/3::,-,403D/5+07-,403D,-T/730/G./
3::.:/-4/!"#$%/GT/72.3-,01/0.O/!"#$%/04:.;N/R6./1.0.23-,40/45/!"#$%K'())*+,-./04:.;/
,;/7322,.:/4+-/,0/3/1.0.2,7/53;6,40/GT/G+,D:,01/!"#$%/04:.;/3+-483-,73DDT/5248/30/\$K&G3;.:/
04:./;9.7,5,73-,40/<:.;72,G,01/2.]+,2.:/,09+-/5,D.;B/Ж‰Д‚ЖЊД‚ЕµДћЖљДћЖЊЖђН•Н™НїЖљЕљД‚Жљ.376/'())*+,-./-44D/
;+99D,.;N/ Q.0.23DDTB/ 30T/ 74DD.7-,40/ 45/ -44D;/ 3:6.2,01/ -4/ -6,;/ 04:./ ;9.7,5,73-,40/ 8.7630,;8/
730/04O/G./,0-.123-.:/,0-4/!"#$%/30:/O./3D2.3:T/63F./,0-.123-.:/40./45/4+2/4-6.2/D,G232,.;B/
038.DT/-6./E9.0$*/D,G232T/542/83;;/;9.7-248.-2T/303DT;,;/W^XN/#0/-6./5+-+2.B/4-6.2/-44D/;.-;/
O,DD/G./,0-.123-.:/+;,01/-6./92494;.:/1.0.2,7/!"#$%/04:./1.0.23-,40/8.7630,;8N/
/
AN/
YN/
[N/
^N/
Z,D:.G230:-//&0(/$JB/)"LL(K(=%*1F&.%1/$(/$>*-%0F.2($%=-/-:(!JMJ/S$'/G,4,054283-,7;B/Y_A_N/!!L/
9N/`[AN/
*76+8300/ &0( /$J/ '"5576%0&G( "( ,$&N%=$&( /+'( *#&+( ,-/.&O*-I( /+'( O*-I,$*O( 2:20&.( ,*-(
1*.#60&-K/%'&'('-6>('&2%>+J(P./+621-%#0(%+(#-&#/-/0%*+Q(
OOONP0,8.N421/
*-+28/ &0( /$J/ R#&+S7( K( /+( *#&+K2*6-1&( 2*,0O/-&( ,-/.&O*-I( ,*-( ./22( 2#&10-*.&0-:J/ S$'/
G,4,054283-,7;B/Y__aN/"L/9N/Ab[N/
Poster 45
!"#$%$&'()*+',-+.$/-&+$0+12$3(&'+0$&+$"#$)$45+&-6-(&#,+
!"#$%&'()*+,-+."/"0*)*%."1"2)$+."/!"3-45),*%."6"7*895)."/"7+-.":";8*%&."("
<8=>8%."/"#*%&."0"#?8@@A.";":*$."/"B?*%&."1"C*+D)EA>"
F%@*)8$"G%+@8@-@5"H$)"2*%I5)"<5+5*)I?."J$)$%@$."F%@*)8$"!K3"L1M."2*%*9*"
(,*8NO",*)85PQ$%&'5)*+,-+R$8I)P$%PI*"!
"#$%!&''(%))"""*$+,-./'*,/0)!
123%!&''(1%))4,5#*,+4/*,3*4.)123)$+,-./')$+,-./'67.2.)$/.34&#1)/#8#.1#69:;64.35+5.'#:<!
8+4#31#%!=>?!@#11#/!=#3#/.8!AB$8+4!@+4#31#!2C*D!
786'&(#'+
OncoPortal is the collaborative vision of both academic and industry partners to build
a data integration and management system tailored to the oncology arena. To
achieve this goal, OncoPortal uses the BioMart framework that is known for its ability
to seamlessly integrate disparate data sources and allow for cross data-mining. The
idea is to provide researchers with a pre-configured BioMart that has been
customized to manage oncology data in a distributed environment that is typical of
collaborative projects. The advantage of using BioMart as a starting point is that it is
free, open-source and most importantly, it enables the researcher to capitalize on the
rich capabilities of the system such as installing a local version of the portal,
controlling the web interface look and feel as well as how the data is linked, queried
and protected if necessary. In addition, the wealth of available resources provided by
the BioMart architecture could be drawn upon to offer annotations that would be
useful in complementing the oncology data.
One of the first incarnations of this tool is the International Cancer Genome
Consortium (ICGC) portal which can be found at http://dcc.icgc.org/ and has been
available to the wider research community for some time. This unified point of access
enables cross-comparisons of high-complexity data from 24 cancer studies from 12
countries consisting of 3,478 genomes broken down into 13 cancer types and
subtypes. Currently, 7 genomic data types are cohesively and transparently handled.
These data types include (i) simple somatic mutations, (ii) copy number alterations,
(iii) structural rearrangements, (iv) gene expression, (v) miRNA, (vi) DNA
methylation, and (vii) exon junctions. At present, a researcher is able to mine each
data type across one or more datasets as well as run simple analyses based on a
selection of genes, mutations and pathways. Besides ICGC data, there is also
federated access to public external annotation resources such as the Kyoto
Encyclopedia of Genes and Genomes (KEGG), Reactome and the Catalogue of
Somatic Mutations in Cancer (COSMIC) just to name a few.
The future of OncoPortal will not only be to provide data federation and mining but
also leverage the breadth of BioMart’s functionality such as its new plug-in feature
that aims to give researchers a way to enable more complex data analysis across
one or more of the datasets. With these capabilities in place, this portal intends to
bring data rapidly to the forefront of oncology research and serve as a model for
oncology data, mining and analysis to cancer researchers worldwide.
Poster 46
SEQCRAWLER – A CLOUD READY INDEXING PLATFORM
Biological data indexing and browsing platform
Authors: Olivier Sallou
Affiliation: GenOuest BioInformatics Platform,IRISA, Rennes, France, Olivier.sallou@irisa.fr
URL: http://seqcrawler.sourceforge.net
Code: http://sourceforge.net/projects/seqcrawler/
License: CeCill v2 (GNU like) (http://www.cecill.info/licences/Licence_CeCILL_V2-en.html)
I.
INTRODUCTION
Seqcrawler takes its roots in software like SRS or
Lucegene. Its goal is to provide an indexing platform to ease
the search of data in biological banks. Besides metadata
information search, it can store raw data such as original
sequence and transcripts. At least it provides a sequence
browser to visualize results (genes on chromosomes etc…).
The software integrates different technologies and software
and ties them together in a coherent and scalable platform,
ready to run on a single computer or on a cloud, in a fully
scalable architecture.
II. ARCHITECTURE
The software runs on Linux systems. It is composed of 3
components:
• The indexer/search interface
• The genome browser (Gbrowse2), linked to the
index via a specific DBI interface
• The raw data storage backend (Riak or MongoDB), a
NOSQL database.
The software is coded in Java and Perl scripts (based on
BioPerl). It runs on Linux systems and can be either
installed manually or installed with a Live DVD or a virtual
machine containing all required software and components.
The platform ties those 3 components to share and extract
data. However, each component can be queried or activated
independently on a system. Each component is scalable and
can be extended to reach expected dimensions. We could
have for example 2 index shards on 2 servers and 1 browser
collocated with 1 storage system.
A. Indexer/search platform
The component provides a program to index data from
different formats (Genbank, Embl, GFF…). The source data
is cut in key/value pairs that can be used as query
parameters. Specific fields can be defined with a special
format in the index configuration (dates, spatial…). This
allows a complex query on fields with AND/OR conditions.
Additionally, full text search can be done. The index engine
is Solr based. Readseq is used to analyze various formats, for
non-dedicated implementations.
The index engine supports index sharding (possibility to
cut a large index in smaller parts) and transparently query all
the shards over the network. Sharding adds the possibility to
extend/specialize the indexes over several servers. Any
server can be queried, all shards will be triggered and results
merged on queried server. It also provides a REST interface
to trigger it and render results (cf. APIs).
Indexed fields can be indexed and/or stored for later
retrieval. GFF documents are fully indexed/stored; some
other formats index most of the fields but store only minimal
information. A link to the original document allows
extraction of the source (part of the document is extracted
via an extraction script). Index can be built/updated offline
(for initial creation or large uploads) or online, and support
replication.
B. Genome browser
The browse of the genome is done using the GMOD
GBrowse2 Perl software. A DBI interface has been created
to link GBrowse2 to the index. All GFF indexed files
(GBrowse required format) can be visualized, independently
of the index location and splitting. GBrowse query an index
master (possibly with load sharing), and the index master
sends the merged results to the DBI interface. The interface
is in charge of remote querying the index and looping over
all the results that should be displayed (meaning that further
“pages” of the index may be required).
To get “document” details (a gene for example), the
index is queried again to get full available information and
the NOSQL storage is also queried to get raw data
corresponding to the element.
Any server can be queried separately from the index
servers. From a general perspective, any kind of data
browser could be set here, only a driver to the index is
required (via simple HTTP interface).
C. NOSQL storage
The raw data is stored in a NOSQL backend. Software
supports Riak and MongoDB, but can be extended to other
systems via an interface. NOSQL storage eases the
horizontal extension of the data among many servers. Data
can be replicated for security and easily scaled with data
sharding.
With the NOSQL database, the data is stored as a
document with a unique id and its metadata, accessible from
any node of the system. In Seqcrawler, data may be cut in
smaller chunks to tie to the database requirements. Then, the
program takes in charge the collection of all the chunks to
recreate the original data help with stored metadata.
III.
APIS
The index component can be queried via HTTP REST to
get index page results in JSON or XML formats. Page size
and page number can be specified in query. The NOSQL
storage can be queried either via HTTP GET (Riak) or
specific drivers (Perl, Java etc… for MongoDB) using the id
of the document.
Poster 47
SeqGI: Sequence Read Enrichment at Genomic Intervals
InГЄs de Santiago, Tom Carroll, Ana Pombo
MRC Clinical Sciences Centre, Imperial College School of Medicine, London, UK.
desantiago.ines07@csc.mrc.ac.uk
http://seqgi.sourceforge.net/
GNU Lesser General Public License
The visualisation and statistical evaluation of read profiles over genomic features are core
components in the interpretation of high-throughput sequencing data. These processes have
largely remained disparate and so have led to the use of multiple softwares requiring interconversion between differing file formats. Furthermore, the increasing use of multiple
biological samples in ChIP-Seq studies demand for statistical and computational methods
suitable for the assessment of biological variation. SeqGI is an open source software that
provides a GUI framework for the simultaneous visualisation and testing of sequence read
distributions both between and within classes of user defined genomic features. The software
is written in Python and R, and runs on all standard operating systems.
SeqGI can be used to intersect BED, WIG or output files from standard aligners with a set of
dictated genomic features in order to calculate and illustrate the read density at these genomic
intervals or at some distance from known regions/features. Profile plots and heatmaps are
used to visualise single or multiple read distributions across features whereas scatter and box
plots allow for the identification of differential read densities both between individual or
classes of genomic features as well as between conditions. Alongside normalisation,
transformation and classical parametric and non-parametric tests, SeqGI’s statistical
framework allows for the analysis of differential read densities as count data using
methodology implemented in the DESeq Bioconductor package. SeqGI provides users with
an intuitive graphical interface, combining both visualisation and statistical tools and so
assists in the rapid interpretation of sequencing data.
Poster 48
!"#$#%&"'$()("*)+"$,-%."/0-1&$,0-$'%2%$(-03"&&)+4$
56*"$7**-0228$9)+4$:;<8$=%>)'$?%<&&*"-$
@+)>"-&)26$0,$A%*),0-+)%8$B%+2%$A-<C$
1"**-022D&0"E<3&3E"'<$
;22(FGG4"+0."H3%+3"-E<3&3E"'<$;22(&FGG4)2;<#E30.G1"**-022GI".<&$
J(%3;"$K)3"+&"8$L"-&)0+$MEN$
O+"$0,$2;"$30..0+$2%&1&$2;%2$#)0)+,0-.%2)3&$-"&"%-3;$*%#&$,%3"$)&$2;"$+""'$20$("-,0-.$%$&"2$0,$&2%+'%-'$
3%*3<*%2)0+&$0+$3;%+4)+4$'%2%$%2$-"4<*%-$)+2"->%*&E$P,$3%-"$)&$+02$2%1"+8$2;"$2%&1$3%+$#"30."$%$30+>0*<2"'$/"#$0,$
&3-)(2&$*)>)+4$)+$%+$%--%6$0,$-%+'0.$')-"320-)"&$2;%2$0>"-/-)2"$2;"$'%2%$(-0'<3"'$#6$(-">)0<&$-<+&E$Q;"-"$"R)&2$
,-%."/0-1&$%+'$."2;0'0*04)"&$2;%2$3%+$;"*($&2%>"$0,,$2;"&"$#%'$-"&"%-3;$;%#)2&E$S-0.$-0**$60<-$0/+$&3-)(2&$
2;%2$<&"$/"#$#%&"'$T0BUK$'%2%#%&"&$*)1"$A0<3;=V$20$.0-"$)+2"4-%2"'$&0*<2)0+&$*)1"$W%*%R6$0-$S)-";0&"$2;"-"$
)&$%$4-0/)+4$3<*2<-"$0,$#)0)+,0-.%2)3&$/"#$#%&"'$()("*)+"$"+4)+"&E
O+"$(-%32)3%*$"R%.(*"$0,$2;"$+""'$,0-$%$/"#$#%&"'$()("*)+"$"+4)+"$)&$2;"$(-"(-03"&&)+4$2;%2$)&$'0+"$,0-$2;"$
@ABA$A%+3"-$W"+0."$#-0/&"-$X$;22(&FGG4"+0."H3%+3"-E<3&3E"'<$Y8$/;)3;$(-0>)'"&$%$>)&<%*)C%2)0+$(0-2%*$0,$
2;"$(<#*)3$%33"&&)#*"$'%2%$,-0.$Q;"$A%+3"-$W"+0."$J2*%&$XQAWJY$(-0Z"32E$Q;)&$'%2%$3<--"+2*6$30>"-&$0>"-$[N$
'),,"-"+2$"R("-)."+2%*$(*%2,0-.&$%+'$0>"-$\NNN$&%.(*"&$,-0.$[\$'),,"-"+2$26("&$0,$3%+3"-&E$Q;"$&3%*"$0,$2;"$
'%2%$)&$"R("32"'$20$4-0/$'-%.%2)3%**6$,<-2;"-$)+$2;"$+"R2$,"/$6"%-&$/;)*"$QAWJ$-%.()+4$<($'%2%$(-0'<32)0+E$
Q;)&$30+&2%+2*6$4-0/)+4$&"2$0,$"R("-)."+2&$-"]<)-"&$-"4<*%-$<('%2"$20$3%23;$+"/*6$%''"'$'%2%8$P+$%'')2)0+8$
#"3%<&"$2;"$30.()*"'$'%2%$#"30."&$2;"$#%&)&$,0-$02;"-$%+%*6&)&8$)2$)&$).(0-2%+2$20$.%)+2%)+$%-3;)>"&$0,$-"&<*2&$
&0$2;%2$(-">)0<&$-<+&$3%+$#"$-""R%.)+"'$),$+""'"'E
I".<&$X;22(&FGG4)2;<#E30.G1"**-022GI".<&Y$)&$%$&6&2".$'">"*0("'$%2$@ABA$20$;"*($,0-.%*)C"$QAWJ$'%2%$
)+(<2E$P2$/%&$#<)*2$20$"+&<-"$2;%2$&<-"$2;%2$%**$Z0#$.%+%4"."+28$'%2%$%33"&&$%+'$4"+"-%2"'$-"(0-2&$%-"$%33"&&)#*"$
>)%$/"#$#%&"'$)+2"-,%3"E$P2$%**0/&$2;"$<&"-&$20$'",)+"$()("*)+"&$30.(-)&"'$0,$&"2&$0,$30++"32"'$1"6G>%*<"$'%2%$
&2%31&$,)**"'$/)2;$9BOT$'%2%E$Q;"&"$'%2%$&2%31&$%-"$2)"'$204"2;"-$#6$.%(G-"'<3"$&26*"$0("-%2)0+&$/;)3;$0("-%2"$
0,$2;"$9BOT$,0-.%22"'$'%2%$&20-"'$)+$2;"$&2%31&$0-$%22%3;"'$#)+%-6$,)*"&E
I".<&$".(;%&)C"&$2;"$).(0-2%+3"$0,$)+&2%+3"'$'%2%8$&0$2;%2$"%3;$+"/$)+&2%+3"$0,$%$()("*)+"$-<+$"R)&2&$)+$)2^&$
0/+$&(%3"E$P+&2%+3"$3%+$#"$'<.("'$20$%-3;)>"$%+'$&20-"'$0+$')&1$%+'$-".0>"'$,-0.$2;"$%32)>"$'%2%#%&"E$
B;0<*'$0*'$-"&<*2&$+""'$20$#"$-""R%.)+"'8$2;"6$3%+$#"$-"*0%'"'$)+20$2;"$%32)>"$'%2%#%&"E$Q;"-"$)&$%*&0$%+$
%22".(2$20$&20-"$30'"$%&&03)%2"'$/)2;$%$()("*)+"$/;"+$)&$/%&$,)-&2$)+&2%+3"'8$&0$2;%2$),$()("*)+"$30'"$;%&$
3;%+4"'$0>"-2)."$%+'$30'"$%&&03)%2"'$/)2;$0*'"-$-"&<*2&$3%+$#"$-""R%.)+"'E
J**$'%2%$%&&03)%2"'$/)2;$2;"$()("*)+"8$%+'$%**$0,$)2^&$)+&2%+3"&8$)&$%33"&&)#*"$>)%$%$I7BQ,<*$)+2"-,%3"8$/)2;$%**$
'%2%$(-"&"+2"'$)+$9BOT$,0-.%2E$Q;"&"$."%+&$2;%2$%+6$()"3"$0,$'%2%$&20-"'$)+$2;"$'%2%$&2%31&$)&$%33"&&)#*"$#6$
J9J_$]<"-6E$B0$2;"$'",)+)2)0+$0,$2;"$()("*)+"$&2-<32<-"$#"30."&$2;"$'",)+)2)0+$0,$2;"$/"#$J`P$<&"'$20$%33"&&$
)2E$Q;<&8$2;"$'">"*0(."+2$0,$2;"$()("*)+"$*"%'&$20$2;"$&).<*2%+"0<&$'">"*0(."+2$0,$2;"$/"#%(($<&"'$20$%33"&&$
'%2%E$V)+%-6$,)*"&$3%+$#"$%22%3;"'$20$>%*<"&$)+$2;"$'%2%$&2%318$&0$2;"$9BOT$'%2%$&2%31&$3%+$#"$<&"'$20$'"&3-)#"$
2;"$."2%H)+,0-.%2)0+$%#0<2$%$()"3"$0,$#)+%-6$'%2%E$Q;)&$/%6$2;"$()("*)+"$3%+$4<)'"$/;)3;$(-04-%.&$%-"$-<+$0+$
&"2&$0,$,)*"&8$/)2;0<2$;%>)+4$20$30+>"-2$;)4;$'"+&)26$'%2%$20$9BOTE$
P+$2;"$/0-*'$0,$#)0)+,0-.%2)3&$-"&"%-3;$2;"-"$)&$%$4-0/)+4$+<.#"-$0,$'">"*0()+4$()("*)+"$"+4)+"&E$I".<&^&$
&2-)32$%';"-"+3"$20$%$I7BQ,<*$9BOT$.0'"*$."%+&$)2$)&$&).(*"$20$#-)'4"$20$02;"-$&0,2/%-"$&0*<2)0+&$%+'$(-">"+2$
(-0'<32$*031$)+E$!;%2">"-$&0*<2)0+$%$*%#$'0"&$3;00&"$20$2%31*"$2;"$+""'$,0-$/0-1,*0/$0-4%+)C%2)0+$2;"-"$%-"$
3"-2%)+$30+3"(2&$2;%2$.<&2$#"$(-)0-)2)C"'E$!"#$%33"&&)#)*)268$'%2%$%-3;)>%#)*)268$%+'$200*$)+2"-,%3"$
&2%+'%-')C%2)0+$%-"$2;"$30-+"-&20+"&$0,$I".<&E
Poster 49
!"#$%&'Нґ'("'#")*+,()#-*'*"-#,."/*")'0.,'1#.2.+#3(2'"*)4.,56!
,D
3D
CD
"#$%&'(!)&%'(*+,-!.'#!/0#12&%3-!4&1&%!56&%7'##,-!
86*+'&9!/':;7'##,-!<96=&%!/>+9?'*+&%,-!@'#(A4&1&%!B&#+>;C!
!
E#6=&%(61F!>;!G0?6#H&#-!I'*:91F!>;!J*6&#*&-!K&L1M!;>%!N>7L:1&%!J*6&#*&-!)&%7'#F!
O>*+&!K6'H#>(16*(!)7?@-!4+'%7'!O&(&'%*+!P!Q'%9F!K&=&9>L7&#1!R#;>%7'16*(-!4&#2?&%H-!)&%7'#F!
J''%9'#$!E#6=&%(61F-!N&#1&%!;>%!S6>6#;>%7'16*(-!)&%7'#F!
Q7'69T!H&%'(*+U6#;>%7'16VM:#6A1:&?6#H&#M$&! EOBT!+11LTWW:#6L'XM(;M#&1W!
B6*&#(&T!)5E!B&((&%!)&#&%'9!4:?96*!B6*&#(&!=C!
!
!
R#!1+&!9'(1!1&#!F&'%(!$6;;&%&#1!$'1'!;>%7'1(!+'=&!?&&#!$&=&9>L&$!;>%!%&L%&(&#16#H!?6>9>H6*'9!#&1Y>%V(M! GY>!>;!
1+&7! &=>9=&$! 1>! ?&! Z:'(6! (1'#$'%$(T! S6>4"[! \,]! '(! &X*+'#H&! ;>%7'1! 6#! 1+&! L%>1&>76*(-! '#$! JS8B! \3]! ;>%!
7&1'?>96*! 7>$&96#H! 6#! (F(1&7(! ?6>9>HFM! S>1+! ;>%7'1(! '%&! [8B! ?'(&$! '#$! $&(*%6?&! ?6>9>H6*'9! %&9'16>#(+6L(!
?&1Y&&#!&#1616&(M!G>$'F-!7>(1!1>>9(!6#!1+&(&!%&(&'%*+!'%&'(!L%>=6$&!67L>%1!'#$!&XL>%1!;:#*16>#'961F!;>%!'1!9&'(1!
>#&!>;! 1+&(&!;>%7'1(M! J6#*&! S6>4"[! '#$! JS8B! >=&%9'L! L'%19F-! 61! 6(! L>((6?9&! '#$!%&'(>#'?9&! 1>! *>7?6#&!?>1+!
;>%7'1(!'#$!1>!6#1&H%'1&!1+&7!Y61+>:1!9>((!>;!6#;>%7'16>#M!!
G+&!H>'9!>;!E#64"[!6(!1>!L%>=6$&!'!$'1'!Y'%&+>:(&!;>%!?6>9>H6*'9!#&1Y>%V(-!Y+&%&!$6;;&%&#1!?6>9>H6*'9!$'1'?'(&(!
'%&!;:99F!(&7'#16*'99F!6#1&H%'1&$-!:(6#H!'!;:(6>#!>;!1+&!&((&#16'9(!>;!?>1+!$'1'!;>%7'1(!S6>4"[!'#$!JS8BM! ^&!
*>7L9&1&9F! %&A$&(6H#&$! >:%! &X6(16#H! $'1'! Y'%&+>:(&! S5__! \C]-! Y+6*+! Y'(! 6#1%>$:*&$! '! $&*'$&! 'H>M! ^&!
%&L9'*&$!1+&!>9$!$'1'!7>$&9!S6>N>%&!?F!'!#&Y!>#&-!*>7?6#6#H!?>1+!;>%7'1(M! I>%!1+6(-!Y&!'#'9F2&$!1+&!7>$&9!
7'LL6#H! >;! (>7&! 1>>9(-! Y+6*+! *'#! *>#=&%1! >#&! ;>%7'1! 6#1>! 1+&! >1+&%! >#&M! G+&! S6>4"[! *>77:#61F! 6(! '9(>!
Y>%V6#H! >#! 6#*>%L>%'16#H! L'%1(!>;! JS8B! 6#1>! 1+&!#&X1! S6>4"[! 9&=&9! `-!Y+6*+! 6(! (1699! 6#! &'%9F! $&;6#616>#! L+'(&M!
^61+!1+6(!#&Y!6#1&%#'9!$'1'!7>$&9!Y&!;:99F!(:LL>%1!1+&!*:%%&#1!(1'#$'%$62'16>#!&;;>%1(!6#!(F(1&7(!?6>9>HFM!
^&!+'=&!67L9&7&#1&$!1+&!$'1'!7>$&9!>;!E#64"[!6#!N__!'#$!.'='-!'#$!*%&'1&$!'#!>?a&*1A%&9'16>#'9!7'LL6#H!;>%!
JbB!$'1'?'(&(M!^&!'$'L1&$!>:%!&X6(16#H!S5__!$'1'!Y'%&+>:(&!1>!1+&!#&Y!$'1'!7>$&9!'#$!%&Y%>1&!1+&!7>(1!
67L>%1'#1!67L>%1&%(M!J6#*&-!>:%!#&Y!$'1'!7>$&9!6(!?'(&$!>#!S6>4"[!'#$!JS8B!7>(1!$'1'?'(&(!*'#!?&!$6%&*19F!
67L>%1&$M!^&!'9(>!6#1%>$:*&$!'!7&*+'#6(7!;>%!(1>%6#H!$6;;&%&#1!=&%(6>#(!>;!(>:%*&!$'1'?'(&(!:(6#H!'!H%'L+!
?'(&$!=&%(6>#6#H! *>#*&L1M! G+6(! &#'?9&(!:(! 1>! 9>'$!>#9F! (:?(&1(!>;! 1+&!*>7L9&1&! #&1Y>%V! ?F!$&;6#6#H! &XL96*61!
=&%(6>#(!>;!(>:%*&!$'1'?'(&(M!
"!#&Y!$'1'!7>$&9!%&Z:6%&(!'!9>1!>;!'$'L16>#(!1>!1+&!(:%%>:#$6#H!*>$&-!Y+6*+!Y&!'%&!*:%%&#19F!Y>%V6#H!>#M! R#!
;:1:%&-!1+&!(&%=&%!Y699!L%>=6$&!$6;;&%&#1!Y&?!(&%=6*&(!;>%!(&'%*+6#H-!'#'9F26#H-!'#$!=6(:'9626#H!1+&!#&1Y>%VM!<#!
1+&!*96&#1!(6$&-!Y&!'%&!67L9&7&#16#H!'!#&Y!=&%(6>#!>;!>:%!*>7L%&+&#(6=&!=6(:'962'16>#!'#$!'#'9F(6(!1>>9!S65"!
\C]M!S&*':(&!>;!+'=6#H!$6;;&%&#1!=&%(6>#(!>;!1+&!(>:%*&!$'1'?'(&(!6#1&H%'1&$-!Y&!&XL&*1!'!7'((6=&!&#9'%H&7&#1!
>;!1+&!(1>%&$!$'1'M!G+&%&;>%&-!Y&!L9'#!1>!(Y61*+!1>!'!OKI!$'1'?'(&-!Y+6*+!6(!7>%&!>L16762&$!;>%!H%'L+A?'(&$!
$'1'M!8>(1!(:61'?9&!;>%!>:%!L:%L>(&(!6(!OKIAC[!\`]-!Y+6*+!>:1L&%;>%7(!L>L:9'%!%&9'16>#'9!$'1'?'(&(!='(19FM!
!
,M!
3M!
CM!
`M!
K&76%!Q-!N'%F!84-!4'9&F!J-!I:V:$'!/-!B&7&%!N-!c'(1%6V!R-!^:!)-!KdQ:(1'*+6>!4-!J*+'&;&%!N-!B:*6'#>!.!"#!
$%T!78*'9#.$%&'3.//:"#);'6)("<(,<'0.,'=()84(;'<()('68(,#"+M!&$#!'()#"*+,)%!3e,e-!>?fgDTgChAg`3M!
@:*V'!8-!I6##&F!"-!J':%>!@8-!S>9>:%6!@-!K>F9&!.N-!/61'#>!@-!"%V6#!"4-!S>%#(1&6#!S.-!S%'F!K-!N>%#6(+A
S>Y$&#! "! "#! $%T! 78*' 6;6)*/6' 1#.2.+;' /(,5:='2("+:(+*'@A9BCDE' ('/*<#:/'0.,' ,*=,*6*")()#."' ("<'
*F38("+*'.0'1#.38*/#3(2'"*)4.,5'/.<*26M!'()(,-)./$#(*0!3eeC-!GHf`DTh3`AhC,M!
/0#12&%! .-! S'*V&(! N-! S9:7! G-! )&%'(*+! "-! />+9?'*+&%! <-! /':;7'##! 8-! B&#+>;! @A4T! 9IJ9' K' 78*'
9#.38*/#3(2'I*)4.,5'J()(1(6*L!'12!'()(,-)./$#(*0!3eei-!?f,DTCjiM!
5&:7'##!G-!^&6V:7!)T!MJNKO&E'('MPAQK6);2*'*"+#"*'0.,'MJNM!3.)*!456'!7,8)9!3eek-!Gf,DTj`iAjhgM!
!
G+&!L%>a&*1!6(!(:LL>%1&$!?F!1+&!ДћЖµЖљЖђДђЕљДћ&ЕЅЖЊЖђДђЕљЖµЕ¶ЕђЖђЕђДћЕµДћЕќЕ¶ЖђДђЕљД‚ДЁЖљНѕ^WWП­ПЇПЇП±Нљ^ДђД‚ЕЇД‚ДЏЕЇДћsЕќЖђЖµД‚ЕЇЕ¶Д‚ЕЇЗ‡ЖљЕќДђЖђНљНїН�!
Poster 50
([SDQGLQJ%LR/LQX[WRFUHDWHDSODWIRUPIRUDQDO\VLVRIFRPPXQLW\GLYHUVLW\DQG
IXQFWLRQ
7LP%RRWK0HVXGH%LFDN'DQ3DVV3HWH.LOOH'DZQ)LHOG
1(5&&HQWUHIRU(FRORJ\DQG+\GURORJ\0DFOHDQ%XLOGLQJ%HQVRQ/DQH:DOOLQJIRUG8.2;%%
&DUGLII6FKRRORI%LRVFLHQFHV&DUGLII8QLYHUVLW\0DLQEXLOGLQJ3DUN3ODFH&DUGLII&)$7
6HHWKH%LR/LQX[KRPHSDJHIRUDOOOLQNVDQGPRUHLQIRKWWSQHEFQHUFDFXNWRROVELROLQX[
(QYLURQPHQWDO PLFURELDO GLYHUVLW\ UHYHDOHG E\ PDVV VHTXHQFLQJ KDV EHHQ D KRW WRSLF VLQFH
WKH JURXQGEUHDNLQJ 6DUJDVVR 6HD VWXG\ QHDUO\ D GHFDGH DJR 9HQWHU HW DO 6FLHQFH 1RZ ZLWK KLJK WKURXJKSXW VHTXHQFLQJ +76 HYHQ D PRGHVW ODE FDQ SURGXFH WKDW YROXPH
RI VHTXHQFH DQG IURP SROOXWHG PLQH ZDWHU WR WKH KXPDQ JXW WR WKH ZDWHUV RI WKH DQWDUFWLF
UHVHDUFKHUV DUH UHDG\ WR WXUQ WKLV SRZHUIXO VSRWOLJKW RQ QHZ HQYLURQPHQWV DQG GLVFRYHU WKH
GLYHUVLW\DQGIXQFWLRQRIRUJDQLVPVWKDWOLYHWKHUH
$ YDULHW\ RI VSHFLDOLVW RSHQ VRXUFH WRROV 4,,0( $PSOLFRQ1RLVH 0RWKXU M0278WD[RQHUDWRU
WKH9(*$15SDFNDJHHWFDUHUDSLGO\EHLQJGHYHORSHGWRPDNHVHQVHRIWKHGDWDDQGSHUIRUP
WKHPDQ\VWHSVQHFHVVDU\WRJRIURPUDZVHTXHQFHRIDPSOLFRQVLH6ULERVRPDOEDUFRGHV
HWF WR PHDQLQJIXO DQDO\VHV OLNH GLYHUVLW\ LQGLFHV UDUHIDFWLRQ FXUYHV UDQN DEXQGDQFH $V ZHOO
DV DQDO\VLV XVHUV PXVW 4& PDQLSXODWH VWRUH DQG VXEPLW WKHLU GDWD WR SXEOLF UHSRVLWRULHV
UHTXLULQJ NQRZOHGJH RI \HW PRUH VRIWZDUH .HHSLQJ SDFH ZLWK DOO WKH WRROV QHHGHG FDQ EH D
ERWWOHQHFNWRUHVHDUFKHUVHYHQDVWKHDFWXDOVHTXHQFLQJEHFRPHVHDVLHU
7KH %LR/LQX[ SODWIRUP EDVHG RQ 8EXQWX KDV EHHQ GHYHORSHG IRU RYHU HLJKW \HDUV DV D IUHH
WXUQNH\DQDO\VLVVROXWLRQIRUELRLQIRUPDWLFV7RPHHWWKHQHHGVRIHQYLURQPHQWDOUHVHDUFKHUVWR
FRSHZLWKGDWDIURP+76VWXGLHVZHQRZLQFOXGHDEXQGOHRINH\WRROVDQGDQDO\VLVSLSHOLQHV
LQ %LR/LQX[ WR VXSSRUW DPSOLFRQ VHTXHQFH DQDO\VLV :H FXUDWH GRFXPHQWDWLRQ DQG PDLQWDLQ
DWDEOHRIFDSDELOLWLHVIRUHDFKWRROWRDOORZXVHUVWRVHOHFWWKHDSSURSULDWHRQHVDQGILQGZKLFK
FDQ ZRUN WRJHWKHU :H ZRUN ZLWK 'HELDQ0HG 0|OOHU HW DO %0& %LR WR HQVXUH RXU
SDFNDJLQJ ZRUN LV FRQWULEXWHG WR FRUH 'HELDQ DQG 8EXQWX UHSRVLWRULHV DQG RXU SDFNDJHV DUH
DOVRDGGHGWR&ORXG%LR/LQX[KWWSFORXGELROLQX[RUJ
:H KDYH DOVR EHHQ SDUW RI WKH GHYHORSPHQW RI D QHZ PLQLPXP LQIRUPDWLRQ VWDQGDUG IRU
GHVFULELQJJHQHPDUNHUVHTXHQFHGDWD0,0$5.6<LOPD]HWDO1DWXUH%LRWHFKQRORJ\
,Q SDUWLFXODU %LR/LQX[ QRZ LQFOXGHV VRIWZDUH IURP WKH ,6$ SURMHFW KWWSLVDWRROVRUJ 7KLV
VXSSRUWV ERWK WKH FDSWXUH RI 0,0$5.6 FRPSOLDQW DQQRWDWLRQV DQG DOVR WKH FUHDWLRQ RI RQOLQH
FDWDORJXHVRIGDWDVHWVDQGDVVLVWHGVXEPLVVLRQWRWKHSXEOLFUHSRVLWRULHVKHUH(0%/65$
&RQWLQXHGXSGDWLQJRIWKH%LR/LQX[DPSOLFRQDQDO\VLVH[SHUWEXQGOHZLOOEHQHILWWKHH[SDQGLQJ
FRPPXQLW\ RI UHVHDUFKHUV ZRUNLQJ LQ WKLV GRPDLQ :H PDLQWDLQ WKH VRIWZDUH DV D FRPPXQLW\
SURMHFWDQGLQYLWHSDUWLFLSDWLRQIURPLQWHUHVWHGSDUWLHV
Poster 51
Study capturing: from research question to sample annotation
Kees van Bochove1,2, Jeroen Wesbeek8, Tjeerd Abma3, Jahn-Takeshi Saito4, Robert
Horlings1, Siemen Sikkema7, Chris Evelo4, Ben van Ommen8 and Jildau Bouwman8
The Hyve, Utrecht 2NBIC BioAssist Engineering Team, 3UMCU Metabolomics Centre, UMC Utrecht, 4BiGCaT Department
of Bioinformatics, Maastricht University, 5Leiden/Amsterdam Centre for Drug Research, 6Plant Research International,
Wageningen University, 7Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of
Amsterdam, 8TNO Quality of Life, Zeist.
Correspondence to Kees van Bochove - kees@thehyve.nl.
1
!"#$%&'$(
!" #$%&'(#" )*+,,$-.$" '/" 0&'&-/'%1+2&)#" &#" 2*$" $3)*+-.$" '/" 0&'&-/'%1+2&)#" 4+2+" +-4" 2*$"
+))'15+-6&-."#)&$-2&/&)")',,+0'%+2&'-7"8*$"$32$-2"'/"4+2+"$3)*+-.$"4&//$%#"/'%"9+%&'(#"4'1+&-#:"
/'%" $3+15,$:" /'%" 2%+-#)%&52'1&)#" 4+2+:" 2*$%$" +%$" ;$,,<$#2+0,&#*$4" %$5'#&2'%&$#:" ;*$%$+#" /'%" $7.7"
1$2+0','1&)#" 2*$#$" +%$" *+%4" 2'" /&-47" =';$9$%:" 2*$%$" &#" '-$" )'11'-+,&26" 0$2;$$-" +,,"
0&'&-/'%1+2&)#"&-9$#2&.+2&'-#"&-9',9&-."0&','.6:"+-4"2*+2"&#"2*$"5%$#$-)$"'/"+"#2(46"4$#&.-:";*&)*"
&#"*+%4"2'")+52(%$"&-"+"4+2+0+#$7">#5$)&+,,6";&2*")'15,$3"#2(4&$#:"#()*"+#")$%2+&-"-(2%&.$-'1&)#"
&-2$%9$-2&'-"#2(4&$#:";&2*"4&//$%$-2"2&1$5'&-2#:"'%.+-#:"#+15,$#:"+##+6#"+-4"'2*$%"#2(46"/+)2'%#:"
&2"0$)'1$#"*+%4"2'"?$$5"2%+)?"'/";*+2"+"4+2+5'&-2"'%"#$2"'/"1$+#(%$1$-2#"1$+-#"&-"2*$")'-2$32"
'/"2*$"#2(46"4$#&.-7"8*$%$/'%$:";$"4$9$,'5$4"+-"'5$-"#'(%)$";$0"+55,&)+2&'-"@ABCDE"2*+2"+,,';#"
0&','.&#2#"2'"$-2$%"'%"(5,'+4"2*$&%"#2(46"4$#&.-:"+,'-.";&2*"'2*$%"#2(46"1$2+4+2+"#()*"+#"#(0F$)2"
&-/'%1+2&'-:" $9$-2#" '%" #+15,$" )*+%+)2$%&#2&)#7" G$" +,#'" 4$9$,'5$4" +" #&15,$" H>B8" &-2$%/+)$" /'%"
*''?&-." (5" +)2(+," I'1&)#J" 4+2+" 2'" 2*&#" #2(46" 4$#&.-:" #'" 2*+2" )%'##<'1&)#" K($%&$#" )+-" 0$"
$#2+0,&#*$4:" +-4" 0(&,2" 2$)*-','.6" 1'4(,$#" &15,$1$-2&-." 2*&#" &-2$%/+)$" /'%" 2%+-#)%&52'1&)#:"
1$2+0','1&)#:"-$32".$-$%+2&'-"#$K($-)&-."+-4"),&-&)+,")*$1&#2%67"8*&#"+55,&)+2&'-"'%&.&-+2$4"&-"
2*$"L(AM"40LN"+-4"LOC"PBN"5%'F$)2#:"+-4"#$%9$#"+#"+"5%$)(%#'%"+-4"&-/%+#2%()2(%$"-'4$"/'%"2*$"
#$1+-2&)";$0"2$)*-','.&$#"2*+2"+%$"4$9$,'5$4";&2*&-"2*$"M5$-N=!C8B"5%'F$)27"
)'%**+#,-$#(
"
a. Study overview of an existing study | b. Subject information inclusion | c. An Excel importer for large datasets"
"
.*"#/$*Q"*225QRR40-57'%."<")-0%'*('-1*Q"*225QRR2%+)7-0&)7-,R.#)/"@!5+)*$"S7T",&)$-#$E"
"
Poster 52