12th Annual Bioinformatics Open Source Conference BOSC 2011 Vienna, Austria, July 15-16, 2011 http://www.open-bio.org/wiki/BOSC_2011 Welcome to BOSC 2011! The Bioinformatics Open Source Conference, established in 2000, is held every year as a Special Interest Group (SIG) meeting in conjunction with the Intelligent Systems for Molecular Biology (ISMB) Conference. BOSC is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. We have an exciting lineup of topics and speakers, including keynote speakers Matt Wood (the Technology Evangelist for Amazon Web Services) and Lawrence Hunter (director of the Computational Bioscience Program at the University of Colorado, and one of the founders of ISMB). A major theme of BOSC this year is cloud-based approaches to improving software and data accessibility. Other sessions focus on approaches to organizing and analyzing highthroughput 'omics and next-generation sequencing data; projects that involve the semantic web; and data visualization tools. The second day features a panel discussion about the challenges of inter-institutional collaborations, which are very relevant to many of us who work on Open Source projects. This year, BOSC includes posters as well as talks. There are three scheduled poster sessions. We have space for several last-minute posters in addition to those listed in the program. Also new is the chance to vote for your favorite BOSC talk. The talk with the most votes will be announced at the Awards session that closes the meeting. Please fill out your ballot and return it to us before the panel discussion (4:30 on the second day). Thanks to generous support from Eagle Genomics and another sponsor, we were able to offer Student Travel Awards to the authors of the three best student abstracts. Congratulations to the student winners: Florian P. Breitwieser, Kerensa McElroy, and Konstantin Okonechnikov. BOSC is a community effort. We thank the organizing committee, the program committee, the session chairs, and the ISMB SIG chair for their help. If you are interested in participating in the organization of BOSC 2012 (which will take place in July 2012 in Long Beach, California) please email bosc@open-bio.org. 2011 Organizing Committee: Nomi Harris (Co-Chair), Peter Rice (Co-Chair), Brad Chapman, Peter Cock, Kam Dahlquist, Erwin Frise, Darin London, Ron Taylor 2011 Program Committee: Jan Aerts, Enis Afgan, Tiago Antao, Kazuharu Arakawa, Brad Chapman, Peter Cock, Kam Dahlquist, Heiko Dietze, Thomas Down, Erwin Frise, Cyrus Harmon, Nomi Harris, Michael Heuer, Richard Holland, Alex Lancaster, Hilmar Lapp, Heikki Lehvaslaiho, Darin London, Scott Markel, Hervé Ménager, Dave Messina, Jim Procter, Peter Rice, Olivier Sallou, Martin Senger, William Spooner, Ronald Taylor, Mark Wilkinson, Christian Zmasek BOSC 2011 Schedule Day 1 (Friday, July 15, 2011) Time Title 9:00-9:15 Introduction Speaker or Session Chair Nomi Harris (CoChair, BOSC 2011) 10:15-10:45 Keynote: The role of openness in knowledge-based systems for biomedicine Coffee Break 10:45-12:30 Session: Genome Content Management Chair: Peter Rice Konstantin Okonechnikov Thomas Down 12:25-12:32 12:30-2:00 Unipro UGENE: an open source toolkit for complex genome analysis Exploring the genome with Dalliance InterMine - Using RESTful Webservices for Interoperability easyDAS: Automatic creation of DAS servers Enacting Taverna Workflows through Galaxy Mobyle 1.0: new features, new types of services BioMart 0.8 offers new tools, more interfaces, and increased flexibility through plug-ins Running Workflows Through Taverna Server Lunch 1:30-2:30 Poster Session 2:30-3:30 Session: Visualization Chair: Jan Aerts 2:30-2:50 Michael Smoot 3:30-4:00 Cytoscape 3.0: Architecture for Extension Applying Visual Analytics to Extend the Genome Browser from Visualization Tool to Analysis Tool WebApollo: A web-based sequence annotation editor for community annotation The isobar R package: Analysis of quantitative proteomics data Coffee Break 4:00-5:30 Session: Next-Generation Sequencing 9:15-10:15 10:45-11:05 11:05-11:25 11:25-11:45 11:45-11:55 11:55-12:05 12:05-12:15 12:15-12:25 2:50-3:10 3:10-3:20 3:20-3:30 4:00-4:20 4:20-4:40 4:40-4:50 4:50-5:00 Stacks: building and genotyping loci de novo from short-read sequences Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands Bio-NGS: BioRuby plugin to conduct programmable workflows for Next Generation Sequencing data Goby framework: native support in GSNAP, BWA and IGV 2.0 2 Larry Hunter Alex Kalderimis Bernat Gel Kostas Karasavvas Hervé Ménager Junjun Zhang Donal Fellows Jeremy Goecks Nomi Harris Florian P. Breitwieser Chair: Thomas Down Julian Catchen Morris Swertz Raoul Bonnal Kevin C. Dorff Time Speaker or Session Chair Title 5:30-6:30 A Scalable Multicore Implementation of the TEIRESIAS Algorithm Biomanycores, open-source parallel code for manycore bioinformatics GemSIM: General, Error-Model Based Simulator of next-generation sequencing Poster Session and BOFs 7:00 Optional dinner for BOSC attendees 5:00-5:10 5:10-5:20 5:20-5:30 Frank Drews Jean-Frédéric Berthelot Kerensa McElroy Location TBA Day 2 (Saturday, July 16, 2011) Speaker or Session Chair Nomi Harris and Peter Rice Time Title 8:45-8:50 Announcements 8:50-9:50 Keynote: Into the Wonderful Matt Wood 9:50-10:15 Securing and sharing bioinformatics in the cloud Richard Holland 10:15-10:45 Coffee Break 10:45-12:30 Session: Cloud Computing Chair: Brad Chapman 10:45-11:05 Mygene.info: Gene Annotation as a Service - GAaaS Chunlei Wu 11:05-11:25 Cloud BioLinux: open source, fully-customizable bioinformatics computing on the cloud for the genomics community and beyond Konstantinos Krampis 11:25-11:35 OBIWEE : an open source bioinformatics cloud environment Olivier Sallou 11:35-11:45 SeqWare: Analyzing Whole Human Genome Sequence Data on Amazon's Cloud Brian O'Connor 11:45-12:05 Sequencescape - a cloud enabled Laboratory Information Management Systems (LIMS) for second and third generation sequencing Lars Jorgensen 12:05-12:15 Enabling NGS Analysis with(out) the Infrastructure Enis Afgan 12:15-12:25 12:30-2:00 Hadoop-BAM: A Library for Genomic Data Processing Lunch Aleksi Kallio 1:00-2:00 Poster Session 2:00-3:30 2:00-2:10 2:10-2:20 2:20-2:30 Session: Semantic Web and Misc. Open Source Projects SADI for GMOD: Bringing Model Organism Data onto the Semantic Web Scufl2: Because a workflow is more than its definition OntoCAT - an integrated programming toolkit for common ontology application tasks 3 Chair: Peter Cock Ben Vandervalk Stian Soiland-Reyes Tomasz Adamusiak Time Speaker or Session Chair Title 2:50-3:10 3:10-3:20 3:20-3:27 3:27-3:34 3:30-4:00 Debian Med: individuals' expertise and their sharing of package build instructions The BALL project: The Biochemical Algorithms Library (BALL) for Rapid Application Development in Structural Bioinformatics and its graphical user interface BALLView Biopython Project Update What's new with GMOD Exploring human variation data with Clojure Coffee Break 4:00-4:30 Session: Misc. Open Source Projects 2:30-2:50 4:00-4:10 4:10-4:20 4:20-4:30 EMBOSS: New developments and extended data access G-language Project: the last 10 years and beyond A Framework for Bioinformatics on the Microsoft Platform 4:30-5:20 Panel: Multi-Institution Collaboration 5:20-5:30 Presentation of awards 5:30-6:30 BOFs Steffen Möller Andreas Hildebrandt Peter Cock Scott Cain Brad Chapman Chair: Jim Procter Peter Rice Kazuharu Arakawa Simon Mercer Moderator: Brad Chapman Panelists: Richard Holland, Hilmar Lapp, Jean Peccoud, Peter Rice Nomi Harris * Any last-minute schedule updates will be posted at http://www.open-bio.org/wiki/BOSC_2011_Schedule 4 Keynote Speakers Lawrence Hunter Lawrence Hunter is Professor of Pharmacology and Computer Science at the University of Colorado and director of the Computational Bioscience Program at the School of Medicine. He is one of the founders of ISMB, a fellow of the ISCB, and is well known for contributions in a broad range of problems in computational biology. Dr. Hunter will be giving a talk entitled The role of openness in knowledge-based systems for biomedicine. Knowledge-based approaches to the analysis to genome-scale data require the extraction, sharing and use of very large amounts of knowledge about biomedicine. Developments such as the open source software movement, the Open Biomedical Ontologies, Semantic Web standards such as OWL and SPARQL, and the spread of open access publishing are creating the potential for powerful knowledge-based computer systems that may play an important role in the future of biomedical research. Yet several critical challenges remain before this vision can be realized. Dr. Hunter will discuss some relevant recent resources developed in my lab, some of the socio-political barriers that remain, and what you can do to overcome them. Matt Wood As the Technology Evangelist for Amazon Web Services, Matt discusses the technical and organisational aspects of cloud computing across the world. With a background in the life sciences, Matt is interested in helping teams of all sizes bring their ideas to life through technology. Before joining Amazon he built web-scale search engines at Cornell University, sequenced DNA in Hinxton and developed scientific software in Cambridge. He is a frequent speaker at international conferences, a blogger, published author and an advocate of research productivity. Matt's talk, Into the Wonderful, will feature a discussion of the constraints of working with the size, scope and complexity of modern research data, and how cloud computing can help accelerate academic research. We'll take a look at the current state of the art, the role cloud computing plays in increasing the impact of open source tools, the use of public hosted data in the cloud and how academic cloud platforms can help promote collaboration, reproducibility and reuse across disciplines. Audience Favorite Talk This year, BOSC attendees can vote for their favorite talk. Ballots (included in the printed program) must be turned in before the panel session on the second day at 4:30pm. The talk that receives the most votes will be announced in the closing session. O|B|F Membership Professionals, scientists, students, and others active in the Open Source Software arena in the life sciences are invited to join the Open Bioinformatics Foundation (the O|B|F). The membership body was formally established at the 2005 Board of Directors meeting. As laid out in the bylaws, officers in the Board of Directors are elected by the membership among nominees, and candidates for future Directors will be nominated from the membership when seats are added or a term expires. The eligibility criteria are met by anyone who is "interested in the objectives of the OBF", and there are no dues at present. You can join the O|B|F at BOSC by filling out the application form included in this program, signing it, and giving it to a Board member. You may also e-mail the 5 scanned form to the current Parliamentarian, Hilmar Lapp, at hlapp@drycafe.net. (The O|B|F is legally required to have signatures on record for all members.) If you are interested in meeting and talking to some of the O|B|F Directors and members, please join us at the no-host dinner (location TBA) the evening of the first day of the conference. Talk and Poster Abstracts Talk abstracts are included in this program in the order in which they will be presented at the conference. Some, but not all, of the talks will also be presented as posters. Abstracts for posters that are not talks appear after the talk abstracts, ordered by poster number. All posters should be put up before the first poster session (2:00 on the first day). After that time, unused poster slots will be made available for last-minute posters. Please be sure to use the poster slot that is assigned to you. 6 O|B|F - Open Bioinformatics Foundation Membership Application I wish to apply for membership in the Open Bioinformatics Foundation (O|B|F). First and Last Name: ________________________________________________________ Street Address: _____________________________________________________________ City, State, Zip Code: ________________________________________________________ Country of Residence: _______________________________________________________ Email Address: ______________________________________________________________ All fields are mandatory. The O|B|F will treat all personal information as strictly confidential and will not share personal information with anyone except members of the O|B|F Board of Directors, or entities or persons appointed by the Board to administer membership communication. This may be subject to change; please see below. I am an attendee of BOSC 201___: ! Yes ! No If you answered No, please state why you meet the membership eligibility requirement of being interested in the objectives of the O|B|F: (Use back of page if you need more space) I understand that membership rights and duties are laid down in the O|B|F Bylaws which may be downloaded from the O|B|F homepage at http://www.open-bio.org/. I understand that if the O|B|F’s privacy statement changes I will be notified at my email address (as known to O|B|F), and if I do not express disagreement with the proposed change(s) by terminating my membership within 10 days of receipt of the notification, I consent to the change(s). Signature Bioinformatics Open Source Conference 2011 7 BOSC 2011 Talk and Poster Abstracts 8 !"#$%&'!()*)+',"'&$-"'.&/%0-'1&&23#1'4&%'0&5$2-6'7-"&5-',",28.#.' !"#$%&#%'#()*"#+,-#'*"./0()12&(3"1"$".&40(51+6+7(8&91&:".40(;'*-&'1(<=9$".4 /( >"."$'?'9$*(@%&%+(A#'.+9$'%70(B=$$'& *C"*"#+,-#'*".D2:&'1C,": 4( E+#%+9("F(G#F"9:&%'"#(H+,-#"1"2'+$(IA#'J9"I0(>"."$'?'9$*0(B=$$'& J9"K+,%(L+?M$'%+N(-%%ONPP=2+#+C=#'O9"C9=( @"=9,+(,"Q+N(-%%O$NPP=2+#+C=#'O9"C9=P$.#P=2+#+P%9=#*( R',+#$+N(3JR(.4 A#'O9" (A3S>S('$(&#("O+#M$"=9,+($"F%L&9+(O&,*&2+(F"9(:"1+,=1&9(?'"1"2'$%$C(H-+(:&'#(2"&1("F( %-+(O9"K+,%('$(%"('#%+29&%+(O"O=1&9(?'"'#F"9:&%',$(%""1$(&#Q(&12"9'%-:$(O9".'Q'#2(?"%-(29&O-',&1( =$+9(&#Q(,"::&#Q(1'#+('#%+9F&,+$C(A3S>S(&11"L$(,"#$%9=,%'#2(9+=$&?1+(L"9*F1"L$(&#Q(9=##'#2( %-+:("#(&(1",&1(:&,-'#+("9('#(TJE(+#.'9"#:+#%C(H-+($"F%L&9+(O&,*&2+('$(L9'%%+#('#(EUU(=$'#2( %-+(V%(F9&:+L"9*(&#Q('#,"9O"9&%+$(&(?='1%M'#(O1=2'#($7$%+:0(L-',-(&11"L$(+6%+#Q'#2(%-+(O9"K+,%( L'%-(#+L(F=#,%'"#&1'%7C(A3S>S('$(F9++17(&.&'1&?1+(F"9(;@(W'#Q"L$0(R'#=6(&#Q(;&,()@(XC( H-+('#%+29&%+Q(%""1$(&#Q(&12"9'%-:$($"1.+(&(.&9'+%7("F(?'"'#F"9:&%',$(%&$*$(%-&%('#,1=Q+(&(O&%%+9#( $+&9,-0(1",&1($+Y=+#,+( &1'2#:+#%(Z@:'%-MW&%+9:&#[0 ( :=1%'O1+($+Y=+#,+(&1'2#:+#%(Z;A@ERS0( !&1'2#[0(T;;SB0(9+$%9',%'"#($'%+$(&#&17$'$0($-"9%(9+&Q(&1'2#:+#%(Z\"L%'+[(&#Q(:&#7("%-+9$C(5( *+7(&Q.&#%&2+("F(A3S>S('$(%-&%(%-+(:"$%("F(%-+(&12"9'%-:$(&9+('#%+29&%+Q('#%"(%-+($"=9,+(O&,*&2+( &#Q ( :"Q'F'+Q ( %" ( =$+ ( '#%+9#&1 ( A3S>S ( Q&%& ( :"Q+1C ( H-'$ ( &11"L$ ( "#+ ( %" ( &."'Q ( :&#=&1 ( Q&%&( ,"#.+9$'"# ( ?+%L++# ( %-+ ( %""1$] ( '#O=% ( &#Q ( "=%O=%C ( @":+ ( "F ( %-+ ( &12"9'%-:$ ( &9+ ( "O%':'^+Q ( F"9( :=1%',"9+(+#.'9"#:+#%(&#Q(-&.+(3JA(':O1+:+#%&%'"#$C(A3S>S($=OO"9%$(9+&Q'#2(&#Q(L9'%'#2(F"9( :"9+(%-&#(4_(?'"1"2',&1(Q&%&(F"9:&%$(&#Q(&11"L$(.'$=&1'^'#2($=,-(?'"1"2',&1("?K+,%$(&$(&##"%&%+Q( `>5PB>5(&#Q(O9"%+'#($+Y=+#,+0(:=1%'O1+($+Y=+#,+(&1'2#:+#%0(:&,9":"1+,=1&9(a`($%9=,%=9+(&#Q( `>5(&$$+:?17C(A3S>S('$(&1$"(,&O&?1+(%"(9+Y=+$%(*+7(?'"1"2',&1("#1'#+(Q&%&?&$+$($=,-(&$(>E\G( 3+#?&#*0(J`\(&#Q("%-+9$C )#+("F(A3S>S(:&'#(,":O"#+#%$('$(%-+(W"9*F1"L(`+$'2#+90(&(.'$=&1(%""1(F"9(?='1Q'#2(,":O1+6( &#&17$'$(O'O+1'#+$C(@'#,+(A3S>S ('$(&($%&#QM&1"#+(&OO1',&%'"#("#+(Q"+$#]%(#++Q(%"('#$%&11(&#7( &QQ'%'"#&1(,":O"#+#%$("9(=O1"&Q(&#Q(Q"L#1"&Q(&#7(Q&%&('#("9Q+9(%"(=$+(%-+(W"9*F1"L(`+$'2#+9C( H-'$(F&,%"9(&1"#2(L'%-('#%='%'.+(&#Q(=$+9MF9'+#Q17('#%+9F&,+(:&*+$(%-+(+#%97(%-9+$-"1Q(F"9(#+L( =$+9$ ( 1"L+9 ( '# ( ,":O&9'$"# ( %" ( $':'1&9 ( O9"K+,%$ ( Z$=,- ( &$ ( H&.+9#& ( "9 ( 3&1&67[C ( T"L+.+90 ( F"9( &Q.&#,+Q(=$+9$(%-+(W"9*F1"L(`+$'2#+9(O9".'Q+$(,&O&?'1'%'+$(%"(,9+&%+(,=$%":(L"9*F1"L(+1+:+#%$( +'%-+9('#(EUU("9('#(V%@,9'O%(O9"29&::'#2(1&#2=&2+(&#Q(,=$%":'^+(.&9'"=$(&$O+,%$("F($,-+:&( +6+,=%'"#C(S.+97(L"9*F1"L($,-+:&(&11"L$($+%%'#2($O+,'&1(&1'&$+$(F"9('%$(O&9&:+%+9$b(%-+$+(&1'&$+$( ,&#(?+(=$+Q(%"(O9".'Q+(,=$%":(&92=:+#%$(L-+#(9=##'#2(%-+($,-+:&(&$(&(,"::&#Q(1'#+(%""1C(@=,-( F+&%=9+(:&*+$('%(+&$7(%"(9=#(L"9*F1"L$('#(TJE(+#.'9"#:+#%("9('#,1=Q+(%-+:('#%"(=$+9$]($,9'O%$C( H-+(W"9*F1"L(`+$'2#+9(-&$(&#(+6O+9':+#%&1($=OO"9%(F"9(1&=#,-'#2(,":O=%&%'"#&1(L"9*F1"L$("#( 5:&^"#(SE4($+9.+9$C A3S>S('$(&(O&9%("F("FF','&1(A?=#%=(&#Q(<+Q"9&(R'#=6(Q'$%9'?=%'"#$(&#Q(-&$(&(29"L'#2(,"::=#'%7( "F(=$+9$(&9"=#Q(%-+(L"91QC(H-+(O9"K+,%('$(,"#$%&#%17(+."1.'#2C()#+("F(%-+(:&K"9(O9'"9'%'+$(,=99+#%17( '$(%-+(&?'1'%7(%"(L"9*(L'%-(-=2+(Q&%&$+%$C(H-'$('#,1=Q+$('#%+29&%'"#("F(#+6%M2+#+9&%'"#($+Y=+#,'#2( &#&17$'$ ( &#Q ( .'$=&1'^&%'"# ( :+%-"Q$0 ( %-=$ ( "#+ ( "F ( %-+ ( :&'# ( F=9%-+9 ( Q+.+1"O:+#% ( Q'9+,%'"#$ ( '$( A3S>S(5$$+:?17(\9"L$+9C Poster 1 !"#$%&'()*+,-*)-(%.-*/'+,*01$$'1(2'!"#$%&'()'*#+,''' '%,.'!/$'0)'1)'2344%5.6 '7899:#$8'!53&;<=>?@'A35.#,'B,&;/;3;8C'?,/D85&/;E'#F '=%$45/.G8C'?@ 6 '7899:#$8'!53&;'H%,G85'B,&;/;3;8C'=%$45/.G8C'?@ 15#I8:;'+84&/;8J'";;KJ<<+++)4/#.%99/%,:8)#5G< H#35:8'.#+,9#%.J'";;KJ<<G/;"34):#$<.%&$#;"<.%99/%,:8 L/:8,&8J'MH* A8,#$8 ' 45#+&85& ' %58 ' % ' D/;%9 ' K%5; ' #F ' ;"8 ' G8,#$/:& ' +#5NF9#+) ' ' OE84%99/,G ' .%;% ' "%& ' K5#D8.' /,./&K8,&%498'F#5'&K#;;/,G'3,8PK8:;8.':#5589%;/#,&C'F#5$39%;/,G',8+'"EK#;"8&8&C'#5'&/$K9E'&%,/;EQ :"8:N/,G',8+9E':#998:;8.'.%;%)''!"8'&8R38,:/,G'58D#93;/#,'"%&'$%.8';"/&'8D8,'$#58'/$K#5;%,;)' !#.%EC ' 8D8, ' &$%99 ' 9%4& ' %58' 5#3;/,89E' %KK9E/,G ' ;8:",/R38& ' 9/N8 ' ="B1Q&8RC ' >S(Q&8RC ' #5' 8P#$8' 58&8R38,:/,GC'#F;8,'+/;"'9/$/;8.'4/#/,F#5$%;/:'&3KK#5;)''!#'N88K'K%:8'+/;"';"8&8';58,.&C'+8',88.' K#+85F39 ' D/&3%9/T%;/#, ' %,. ' .%;% ' /,;8G5%;/#, ' ;##9& ' +"/:" ' $%N8 ' 9#%./,G ' 9%5G8 ' .%;%&8;& ' U ' 8/;"85 ' ./58:;9E'#5'%&'%,'%3;#$%;/:'F/,%9'&8;'/,'%,%9E&/&'+#5NF9#+&'U'%&'&/$K98'%&'K#&&/498) *%99/%,:8'V-W'/&'%',8+'G8,#$8'45#+&85'+"/:"'$%N8&'%GG58&&/D8'3&8'#F '2!XLY';8:",#9#G/8&';# ' #FF85'%'"/G"'98D89'#F '/,;85%:;/D/;E'%,.'K#+85F39',%D/G%;/#,'%,.'8PK9#5%;/#,';##9&'+"/98'53,,/,G' +/;"/, ' % ',#5$%9 '+84 ' 45#+&85) ' '!"8' ./&K9%E' :%,' 48 'F5889E '&:5#998. '%,. 'T##$8. '+/;" '$#3&8' G8&;358&'%,.'N8E4#%5.':#,;5#9&)'H"#5;:3;&'F#5',%D/G%;/#,'48;+88,'F8%;358&'%99#+&'&K%5&8'.%;%&8;&' ;#'48'5%K/.9E'8PK9#58.)'78'"%D8'%.#K;8.'%'F399E'./&;5/43;8.'%KK5#%:"C'+/;"',#'$%&;85'&85D85' "#&;/,G';"8'K5/$%5E'.%;%&8;)''*%;%':%,'48'/,;8G5%;8.'3&/,G'8/;"85';"8'&;%,.%5.'*(H'K5#;#:#9'V6WC ' #5'F5#$'.%;%'K%:N%G8.'/,'%'&;%,.%5.'/,.8P8.'4/,%5E'F#5$%;'Z:3558,;9E'M/G7/GC'M/GM8.'V[WC'%,.' M(X'V\W'%58'&3KK#5;8.]'#,'%'&;%,.%5.'+84'&85D85)' '!"8'9%;;85'#K;/#,'$%N8&'/,;8G5%;/,G',8+' G8,#$/:'.%;%&8;&C'K%5;/:39%59E';"8'58&39;&'#F '"/G"Q;"5#3G"K3;'&8R38,:/,G'8PK85/$8,;&C'D85E'R3/:NC' %,.'%::8&&/498';#'^#::%&/#,%9'4/#/,F#5$%;/:/%,&_'+"#'#F;8,'"%D8'9/$/;8.'&E&%.$/,'8PK85/8,:8'U' %,. ' 9/$/;8. ' 8,;"3&/%&$ ' F#5 ' /,&;%99/,G ' 8P;5% ' &85D85 ' &#F;+%58` ' ' B; ' %9&# ' %99#+& ' /,&;%,; ' %::8&& ' ;# ' .%;%&8;& ' /, ' ;"/& ' F#5$%; ' F5#$ ' 58$#;8 ' +84 ' &85D85& ' +/;"#3; ' ;/$8Q:#,&3$/,G ' .#+,9#%./,G ' Z8)G)' OS=a*O'.%;%&8;&'F5#$'?=H=])'?,3&3%99E'F#5'%'+84Q4%&8.'%KK9/:%;/#,C'*%99/%,:8'%9&#'%99#+&' D/8+/,G'#F '.%;%'./58:;9E'F5#$'9#:%9'./&N'#,'E#35'#+,'$%:"/,8) V-W'*#+,'!(C'1//K%5/'XC'2344%5.'!0)'!"##$"%&'()$%*'+"&*$,')-'%./'),$'0$%-).%)*1')0'2'M/#/,F#5$%;/:&' Z6b--]'OK34'%"8%.'#F 'K5/,; V6W ' 08,N/,&#, ' ( ' 8; ' %93' 4%*'-+"*$%- ) 2$.#.-$&"# ) 5"*" ) 6 ) *1' ) !$7*+$28*'5 ) 9%%.*"*$.% ) :;7*'/) ' MX=' M/#/,F#5$%;/:&'Z6bbc]'dZ&3KK9c]JH[ V[W'@8,;'70'8;'%9) '<$-=$-)"%5)<$-<'5()'%"2#$%-)2+.07$%-).> )#"+-')5$7*+$28*'5)5"*"7'*73 )M/#/,F#5$%;/:&' Z6b-b]'6eJ66b\Q66bf V\W'L/'2'8;'%9) '?1'):'@8'%&')9#$-%/'%*AB"C)>.+/"*)"%5):9B*..#73' M/#/,F#5$%;/:&'Z6bbd]'6YJ6bfcQ 6bfd Poster 2 !"#$%&'"$()(*+'",(-./0123(4$5+$%6'7$+(18%(!"#$%89$%:5'3'#; <*0=>-/?( !"#$%&'"(#)*+*,-%.'/*#"'%012'/3-%!()*'/%4'))-%5#)6*3%43/2)*/3-%71%8#/691#/-%:*;#% <9/#-%=*>?')(%5+*2?-%='(#;%@2ABC/-%D1"*#%51""*E'/-%F3,%:*>;"#+%GHIJ <@@!A!<0!>B? K/*E#),*29%3L%4'+M)*(6#-%'"#$ % %N % %L"9+*/# % %O%%3)6% C->D.E0(*-A?( %PPP%O%%*/2#)+*/#%O%%3)6% />*-E.(E>F.? %?22B%QRR % %PPP % %O%%*/2#)+*/#%O%%3)6%R%%M)3P,#) % %-%?22B % %QRR % %PPP % %O%%*/2#)+*/#%O%%3)6%R%%P*;* % %R%%5ST4?#>;312 % % A!E.B/.? <FH< @*BF!BG? U#"">3+#%V)1,2%'/(%2?#%TI7RT7F=IO I/2#):*/#%W%!/%XB#/%531)>#%.'2'WU')#?31,#%'/(%Y1#)9%*/2#)L'># %I/2#):*/#%%%*,%'/%'>2*E#"9%(#E#"3B#(%3B#/W,31)>#%('2'WP')#?31,*/6%,9,2#+-%P?*>?%'""3P,%931%23Q% ! I/2#6)'2#%('2'%L)3+%(*E#),#%,31)>#,O ! =1/% >3+B"*>'2#(-% >1,23+% Z1#)*#, %3E#)% +1"2*B"# % ('2',#2, %Z1*>;"9 % M9%2';*/6% '(E'/2'6# % 3L% Z1#)9%3B2*+*,'2*3/O ! U)*2#%B)#W(#L*/#(%Z1#)9%2#+B"'2#,%*/%'%6)'B?*>'"%*/2#)L'>#O ! KB"3'(%'/(%'/'"9,#%"*,2,%3L%('2'O ! [$B"3)#%'/(%E*,1'"*,#%('2'%2?)316?%'%P#M%*/2#)L'>#O ! !>>#,,%('2'%B)36)'++'2*>'""9%2?)316?%B1M"*,?#(%!HI,O U?*"# % 6#/#)*> % */ % *2, % 1/(#)"9*/6 % (#,*6/- % I/2#):*/# % >3+#, % P*2? % ' % P*(# % )'/6# % 3L % B3P#)L1"% M*3*/L3)+'2*>,W,B#>*L*>%233",%23%?'/("#%2',;,%*/>"1(*/6%('2'%"3'(*/6%GL)3+%,2'/(')(%L3)+'2,%,1>?%',% F88\-%8!5V!-%>?'(3-%',%P#""%',%K/*B)32%'/(%[/,#+M"%('2'J-%'/(%('2'%'/'"9,*,%G*/>"1(*/6%,2'2*,2*>'" % #/)*>?+#/2%'/'"9,*,%'/(%*/2#)'>2*3/%E*,1'"*,'2*3/JO 41))#/2%I+B"#+#/2'2*3/,%]%I/2#):3( X)*6*/'""9%(#E#"3B#(%L3)%8"9:*/#-%2?#)#%')#%/3P%+'/9%I/2#):*/#%*+B"#+#/2'2*3/,%*/%3B#)'2*3/O%I/% B')2*>1"')- % I/2#):*/# % *, % 1,#( % M9 % ,#E#)'" % :3(#" % X)6'/*,+ % .'2'M',#, % G:X.,J- % */>"1(*/6 %5F.% G^#',2:*/#J-%=F.%G='2:*/#J-%'/(%_8IT%G_8IT+*/#J-%2?#%L31/(*/6%+#+M#),%3L%2?#%I/2#):X.%B)3`#>2% G:X.%+*/#,%L3)%P3)+-%+31,#%'/(%5O%B3+M#%')#%>1))#/2"9%*/%(#E#"3B+#/2JO% I/2#):*/# % *, % '",3 % 2?# % #/6*/# % M#?*/( % ,#E#)'" % (#(*>'2#( % ('2'W+*/*/6 % B)3`#>2,- % */>"1(*/6 % +#2'M3"*>:*/#%'/(%2?#%+3([/>3(# % %%%%B)3`#>2%O%!%/1+M#)%3L%B)*E'2#%'/(%*/(#B#/(#/2%+*/#,%'",3%#$*,2O I/2#)3B#)'M*"*29 V?# % */>)#',*/6 % /1+M#) % 3L % I/2#):*/# % *+B"#+#/2'2*3/, % ?', % '""3P#( % L3) % 6)#'2#) % */2#)3B#)'2*3/-% P?*>?%*,%+#(*'2#(%2?)316?%P#M,#)E*>#,%3E#)%'%B1M"*,?#(%=[5VL1" % %%% %!HI%O%V?*,%'""3P,%,*2#,%23%Z1#)9%'% +*/#a,%('2'-%'/(%L3)%'/9%P#M,*2#%23%#+M#(%2'M"#,%3L%('2'%/'2*E#"9%*/%2?#*)%B'6#O%V?#%L31/('2*3/'"% 2#>?/3"36*#, % M#?*/( % 2?*, % ')# % P#M,#)E*>#, % 2?'2 % )#21)/ % D5XT- % D5XTH % L3) % >)3,,W(3+'*/% >3++1/*>'2*3/-%'/(%'%D'E'5>)*B2%"*M)')9%L3)%P#M%>"*#/2%'>>#,,O U#M,#)E*>#%*/2#)3B#)'M*"*29%#/'M"#,%+*/#,%23%(*,B"'9%?3+3"369%('2'%'/(%B)3E*(#%"*/;,%23%('2'%*/% (*LL#)#/2%,B#>*#,O%:X.,%>'/%'",3%)#B"'>#%2?#*)%>1,23+%('2'M',#%Z1#)*#,%P*2?%+3)#%B#)L3)+'/2 % I/2#):*/#%3/#,O%[/(%1,#),%>'/%'123+'2#%P3);L"3P,%1,*/6%>"*#/2%"*M)')*#,O%:*/#,%>'/%*/2#6)'2#%P*2?% P3);WL"3P%'/(%('2'W'/'"9,*,%B)3`#>2,-%,1>?%',%F'"'$9O Poster 3 easyDAS: Automatic creation of DAS servers Bernat Gel1,2∗ , Andrew M Jenkinson3 , Rafael C Jimenez3 , Xavier Messeguer Peypoch1 and Henning Hermjakob3 1 Software Department, UPC-BarcelonaTech, Barcelona, Spain. 2 Hereditary Cancer Program, Institute for Personalised and Predictive Medicine of Cancer, Badalona, Spain. 3 European Bioinformatics Institute, Hinxton, Cambridge, UK. ∗ E-mail: bgel@lsi.upc.edu Project URL: http://www.ebi.ac.uk/panda-srv/easydas/ Code URL: http://code.google.com/p/easydas/ License: GNU Lesser General Public License (LGPL) Abstract Background: The Distributed Annotation System (DAS) has proven to be a successful way to publish and share biological data. Although there are more than 1000 registered servers, setting up a DAS server involves a fair amount of work and requires some specific skills such as programming and server managing and a reliable infrastructure supporting it. There are many research groups who will not have easy access to people proficient enough in programming to implement the required data access layer or able to set up and manage an internet accessible machine to host the server. Those difficulties can represent too big an overhead for many data generators when publishing their data, particularly for those with small data sets. Given the clear advantage that the generalized sharing of relevant biological data is for the research community it would be desirable to convert all those data generators into data providers, increasing the amount and variety of the biological data available to the scientific community and contributing to the collective annotation of biological sequences. Results: easyDAS is a web-based and ready-to-use system for biological data sharing using DAS. The user only needs to upload a text file (GFF or CSV) with the data into easyDAS and set a few configuration options with a wizard-like interface, mainly stating what the data represents, and the system will automatically create a new DAS source serving that data. Although the DAS source will be automatically created and managed by easyDAS, the user will retain full control over the data and will be able to modify or delete it at any point using the same web interface. Sources created with easyDAS are fully compliant with the latest specification of the standard, DAS 1.6, and can be integrated on any DAS client. easyDAS encourages exhaustive meta-annotation of the data source offering both a list of available coordinate systems and an ontology browser interface based on the Ontology Lookup Service to specify an ontology term for each feature type. easyDAS has been written in perl and javascript. Data is stored in a MySQL database and uses ProServer -the perl DAS server- and its hydra functionality to serve DAS data. An instance of easyDAS is running at http://www.ebi.ac.uk/panda-srv/easydas/ and is freely available to any researcher wanting to create a new DAS source. That instance is running at the EBI and takes advantage of its high storage capacity and connectivity. The code is available at http://code.google.com/p/easydas/ and can be easily installed at any other institution. Conclusions: easyDAS is an automated DAS source creation system which can help many researchers in sharing their biological data, potentially increasing the amount of relevant biological data available to the scientific community. Poster 4 !"#$%&"'()#*+,"#(-.,/01.23(%4,.5'4(6#1#78 !"#$%&#%'#"$(!&)&$&**&$+,-,(./)'$%'#0(./'1/0$%0)+,(2"#(.)3'14$/&#45,(6"70)%(8&'#0$9,( 2"#&:(;0::"<$9(&#=(>&)1"(6""$+,? + @0%/0):&#=$(A'"'#B")C&%'1$(.0#%)0,(D00)%(D)""%0E:0'#(?F,(GH?H(DI(@'JC0K0#,(L/0(@0%/0):&#=$ A'"M0C&#%'1$(K)"3E,(83C&#(D0#0%'1$(20E&)%C0#%,(N0'=0#(O#'*0)$'%P(>0='1&:(.0#%)0,(I:7'#3$=)00B(?,(?555( QI(N0'=0#,(L/0(@0%/0):&#=$ 5 M1/"":("B(R:01%)"#'1$(&#=(."CE3%0)(M1'0#10,(O#'*0)$'%P("B(M"3%/&CE%"#,(8'K/B'0:=(.&CE3$,(O#'*0)$'%P(6"&=,( M"3%/&CE%"#(MS+T(+AU,(O! 9 M1/"":("B(."CE3%0)(M1'0#10,(O#'*0)$'%P("B(>&#1/0$%0),(SVB")=(6"&=,(>&#1/0$%0),(>+5(WXN,(O! X)0$0#%'#K(&3%/")Y(4"$%&$Z4&)&$&**&$[#7'1Z#: !"#$%&'()*'%+(!""#$%&&"'()*+,-)*+.&/.(,0()"1'2&3-4-&/5(.(62 ,#-%(./01(2*&%34%5+((!""#$%&&"'()*+,-)*+.&$7+&/.(,0()"1'2 ? 674'"8&' L/0(='*0)$'%P(&#=(&*&':&7':'%P("B(7'"'#B")C&%'1$(%"":$(/&*0('#1)0&$0=('#()010#%(P0&)$Z(M"C0( %"":$(=0&:(<'%/(%/0($&C0(E)"7:0C(3$'#K(&(='BB0)0#%(&EE)"&1/,("%/0)$(E)"*'=0(='BB0)0#%(&110$$( C01/&#'$C$(%"(%/0($&C0()0$"3)10$Z(I#"%/0)(1:&$$("B(%"":$(E)"*'=0(&KK)0K&%'"#(C01/&#'$C$(%"( C&40(3$0("B(&(#3C70)("B(%"":$('#(&(3#'B")C(<&PZ(;")(%/0(:&%%0),($"B%<&)0(E'E0:'#'#K($P$%0C$( &)0(701"C'#K(%/0(#")C('#(7'"'#B")C&%'1$Z(\'=0:P(3$0=(0V&CE:0$("B($31/($P$%0C$(&)0(D&:&VP,( &(<07]E")%&:(&#=(B)&C0<")4,(&#=(L&*0)#&,(&(<")4B:"<(C&#&K0C0#%($P$%0CZ(L/0(K"&:("B(7"%/( "B(%/0$0($P$%0C$('$(%"(E)"*'=0(&(E:&%B")C(%/&%(7'"'#B")C&%'1'&#$(&#=(7'":"K'$%$(1&#(3$0(%"( =0$1)'70(%/0')(0VE0)'C0#%$('#(&(#3C70)("B(<0::]=0$1)'70=(E)"10$$'#K($%0E$,('Z0Z(<")4B:"<$Z A"%/("B(%/0(&B")0C0#%'"#0=($P$%0C$(E)"*'=0(&(<'=0()&#K0("B(B3#1%'"#&:'%P("3%("B(%/0(7"VZ( L/0)0 ( '$ ( $"C0 ( "*0):&E ( 73% ( %/0 ( )0&: ( &==0= ( *&:30 ( <"3:= ( 70 ( 'B ( <0 ( 1"3:= ( 1"C7'#0 ( %/&%( B3#1%'"#&:'%P(&#=(%)P(%"(K0%(%/0(70$%("B(7"%/($P$%0C$Z(L/'$(<")4(&'C$(%"(7)'=K0(%/0(%<"(7P( &::"<'#K(%/0(C")0(0VE)0$$'*0(73%(1"CE:0V(L&*0)#&(<")4B:"<$(^0ZKZ($3EE")%('%0)&%'"#$(&#=( 1"#='%'"#&:$_(%"(0V013%0(*'&(%/0($'CE:0(&#=('#%3'%'*0('#%0)B&10(%/&%(D&:&VP("BB0)$Z(L"(%/'$(0#=( <0(73':%(&(D&:&VP(%"":(K0#0)&%")(%/&%(K'*0#(&(L&*0)#&(<")4B:"<(=0$1)'E%'"#(&#=(&(L&*0)#&( M0)*0)(1"#$%)31%$(&(D&:&VP(%"":(%"(0#&7:0(%/0(0#&1%C0#%("B(%/&%(<")4B:"<(*'&(D&:&VPZ(230(%"( 13))0#%(:'C'%&%'"#$("B(D&:&VP,(%/'$(#0<(%"":(#00=$(%"(70('#$%&::0=('#%"(D&:&VP(C&#3&::PZ L/0(K0#0)&%")('$(&*&':&7:0(B")(="<#:"&=,(73%('$(&:$"(E&)%("B(CPRVE0)'C0#%Z")K,(&(<")4B:"<( )0E"$'%")P(&#=(1"CC3#'%P(<07($'%0(B")(1"CE3%&%'"#&:($1'0#%'$%$Z(80)0,(%/0(7'"'#B")C&%'1'&#( <'::(J3$%(7)"<$0(%/0()0E"$'%")P(%"('=0#%'BP(L&*0)#&(<")4B:"<$("B('#%0)0$%(&#=(<'::(%/0#(70(&7:0( %"(="<#:"&=(&(<")4B:"<(&$(&(D&:&VP(%"":(&#=('#$%&::(%/0(%"":('#%"(&(D&:&VP($0)*0)('#(%/0(3$3&:( <&PZ( L/0(K0#0)&%")('$('CE:0C0#%0=('#(637P(&#=(='$$0C'#&%0=(&$(&(K0CZ(`%(1)0&%0$(%<"(B':0$Y(+_(&#( a>N(="13C0#%(%/&%('$(3$0=(7P(D&:&VP(%"('=0#%'BP(%/0(0V013%&7:0,('%$('#E3%$(&#=("3%E3%$,(&#=( %/0(:&P"3%(B")(%/0(D&:&VP(3$0)('#%0)B&10,(&#=(?_(&()37P($1)'E%(%/&%(1&#(&110$$(&(L&*0)#&($0)*0)( &#=(0#&1%(%/0(<")4B:"<Z(A01&3$0(%/0(6RML('#%0)B&10("B(CPRVE0)'C0#%('$(3$0=,('%(<"3:=(70( )0:&%'*0:P(0&$P(%"(0V%0#=(%/0(K0#0)&%")(%"(1)0&%0($1)'E%$(%/&%(&110$$("%/0)(<")4B:"<(0#K'#0$Z L/0)0(&)0(E:&#$(%"(B3)%/0)(&3%"C&%0(%/'$(E)"10=3)0(7P(&110$$'#K(%/0(CPRVE0)'C0#%($'%0(&$(&( D&:&VP (0V%0)#&: ('#%0)B&10, (</'1/ (<':: (=P#&C'1&::P ( &==( %/0( <")4B:"<$ ( &$ ( #0< (%"":$ (B)"C( <'%/'#(D&:&VPZ( ;")(%/'$(%"(<")4($0&C:0$$:P,(D&:&VP(#00=$(%/0(B"::"<'#K(#0<(B3#1%'"#&:'%PY( =P#&C'1&::P(&=='#K(#0<(%"":$(E)"K)&CC&%'1&::P,(&#=(&$$"1'&%'"#("B(3$0)()":0$(%"($E01'B'1( %"":$(B")($013)'%PZ(I::('$$30$(C0#%'"#0=(/&*0(700#(1"CC3#'1&%0=(%"(%/0(D&:&VP(=0*0:"EC0#%( %0&CZ Poster 5 Mobyle 1.0: new features, new types of services Herv´e M´enager1 * , Bertrand N´eron1 , Vivek Gopalan4 , Sandrine Larroud´e1 , Julien Maupetit2,3 , Adrien Saladin2 , Pierre Tuff´ery2,3 , Yentram Huyen4 , Bernard Caudron1 1 2 Groupe Projets et D´eveloppements en Bioinformatique, Institut Pasteur, {hmenager,bneron}@pasteur.fr; * : presenting author MTi, INSERM UMR-S 973, Universit´e Paris Diderot (Paris 7), Paris, France, 3 4 RPBS, Universit´e Paris Diderot (Paris 7), Paris, France, {julien.maupetit,adrien.saladin,pierre.tuffery}@univ-paris-diderot.fr Bioinformatics and Computational Biosciences Branch, OCICB, NIAID, NIH, Bethesda, MD 20892, USA {gopalanv,huyeny}@niaid.nih.gov website: https://projets.pasteur.fr/wiki/mobyle, downloads: ftp://ftp.pasteur.fr/pub/gensoft/projects/mobyle/ open source license being used: GNU GPLv2 Performing bioinformatics analyses requires the selection and combination of tools and data to answer a given scientific question. Many bioinformatics applications are command-line only and researchers are often hesitant to use them based on installation issues and complex command requirements. Mobyle is a framework and web portal specifically aimed at the integration of bioinformatics software and databanks. It allows to run bioanalyses through a web interface without installing anything locally. In addition to a web interface to command-line tools, the latest release of Mobyle, version 1.0, offers the possibility to execute predefined workflows, and enhances visualisation possibilities with browser-embedded client components, the viewers. We focus here on these major improvements. Chaining automation with Workflows Mobyle uses an XML-based service description system, where parameters and user data include a description of the nature and format of the information they convey, allowing to determine the compatibility between them. In the interface, this allows to (1) suggest the relevant options to interactively chain successive programs using an intelligent piping suggestion system, and (2) facilitate the reuse of data over successive analyses by storing data bookmarks that can be directly loaded into a form. To enable the automation of these chainings, the data model has been extended to incorporate Workflows, which define a dataflow-based coordination of programs that run successive and/or parallel tasks to perform an analysis. Similarly to programs, workflows are viewed as services, sharing most of their description with programs, with the exception of the execution, which consists of a coordination of subtasks rather than the generation and execution of a command line. Data visualisation with Viewers When running an analysis in Mobyle, job result files can be directly pre-visualized in the portal. However, the understandability of the result is still often hindered by the necessity to browse potentially large and complex text-based files. To overcome this limitation, we created a specific type of service, Viewers. Viewers are a way to embed type-dependant visualization components for the data displayed in the Mobyle Portal. As opposed to programs and workflows, viewers are not executed on the server side, but rather rely entirely on browser-embedded code. The XML description files provide a way to incorporate custom interface code that will display data of a given type in the browser, incorporating HTML-embeddable components such as Java or Flash applets, Javascript code, etc. The new version of Mobyle, v1.0, extends the spectrum of services available to include workflows and viewers. Current and future works include (1) the development of an interface that allows the “de novo“ creation of workflows directly by users, and the automation of interactive chainings into workflows, and (2) the extension of the integration capabilities for client-side components beyond simple visualisation, to the edition of user data. 1 Poster 6 !"#$%&'( )*+( #,,-&.( /-0( '##1.2( 3#&-( "/'-&,%4-.2( %/5( "/4&-%.-5( ,1-6"7"1"'8('9&#:;9(<1:;="/.( !"#$%&'("!")%*%&("+",*-.("!/"0123*4%&("5"6%783*("!"6.1("9":7%&'(";"<7=>7&("!"?%&'("/"?-&'@ ;*%.41.(":"9%-("+"A%.B*CD>" E&F%*7-"G&.F7F1F3"H-*",%&I3*"<3.3%*I$("J-*-&F-("E&F%*7-"/K0"L+M(",%&%8%" ;4%7NO"!1&P1&QC$%&'R-7I*Q-&QI%"" !"#$%&''($))!!!*#+,-./'*,/0)% 123$%&''(1$))4,5"*,+4/*,3*4.)123)#+,-./')#+,-./'67.2.)#/.34&"1)/"8".1"69:;64.35+5.'":<% 8+4"31"$%=>?%@"11"/%="3"/.8%AB#8+4%@+4"31"%2C*D% >7.'&%4'( BioMart is an open-source data management and federation system used by dozens of biological databases, many of which are available through BioMart Central Portal. For the latest release of BioMart, version 0.8, the software has been completely rewritten, incorporating many new features and optimizations. BioMart 0.8 is an integrated Java application, making it possible to build a data source, configure querying and presentation interfaces, and deploy a BioMart server from a single tool. On the database side, BioMart now supports more relational database systems, adding SQL Server and DB2 as well as continuing to support MySQL, PostgreSQL, and Oracle. In addition to querying “mart” databases, BioMart can now query directly against any normalized database. New optimizations have been added to the querying, including parallelization of query execution and the ability to create indices for links between datasets, leading to faster-than-ever data retrieval. The BioMart server now includes built-in support for the HTTPS protocol, as well as OpenID and OAuth-based user authentication, so BioMart can now be easily configured to handle complex access control for sensitive data. The BioMart user experience has been improved by the addition of several new graphical user interfaces (GUIs). There are four database search GUIs, each tailored for a different degree of complexity and flexibility in querying. There is also a new way of presenting data, the MartReport GUI, where information about one single data entity (usually a biological entity such as a gene) can be collated from several different resources. In addition to the built-in GUIs, BioMart 0.8 adds a robust plug-in framework to support the development of novel data analysis and visualization tools. Several such plug-ins have been developed and used in BioMart Central Portal and ICGC Data Portal, including tools for visualizing the pathways most frequently affected by somatic mutations; an ID converstion tool; and a gene sequence retrieval module. The latest version of BioMart continues to offer developers programmatic access through several application programming interfaces (APIs). REST- and SOAP-based querying continues to be offered, as well as a new JAVA API and a SPARQL interface for semantic web querying. All of these access methods are integrated with the interactive querying, so that after a query has been constructed interactively, the same query can be presented in API format at the push of a button. With these new features and extensibility BioMart remains a top choice for managing dataintensive collaborative projects. Poster 7 !"##$#%&'()*+,(-.&/0)("%0&1234)#2&54)34)& !"#$%&'(%%")*+&,"-(./&0$1#(*+&2%$#&31%%1$4*+&2%(5*$#6.$&7(#$618+&9$."%(&:"-%(& ;8<""%&"=&9"4>?/(.&;81(#8(+&@#1A(.*1/B&"=&C$#8<(*/(.+&@D& E6"#$%F5F=(%%")*+&."-(./F<$1#(*+&$%$#F.F)1%%1$4*+&& $F#(#$618+&8$."%(F$FG"-%(HI4$#8<(*/(.F$8F?5& 6)(748/9.&'4:&.$/4;&<//>JKK)))F/$A(.#$F".GF?5K& 5(")84&8(<4;&<//>JKK.?-BG(4*F".GKG(4*K/LM*(.A(.N& <//>JKK8"6(FG""G%(F8"4K>K/$A(.#$K*"?.8(K-.")*(K/$A(.#$K>."6?8/*K?5F".GF/$A(.#$F*(.A(.& =$84#84;&O:PO&LFQ&R*(.A(.S+&T;!&R8%1(#/S& & U$A(.#$&1*&$#&">(#&*"?.8(&$#6&6"4$1#M1#6(>(#6(#/&*?1/(&"=&/""%*&?*(6&/"&6(*1G#&$#6&(V(8?/(&*81(#/1=18&)".5=%")*F&U$A(.#$& 3".5-(#8<&1*&$&6(*5/">&8%1(#/&$>>%18$/1"#&).1//(#&1#&W$A$&/<$/&8"4-1#(*&$&G.$><18$%&1#/(.=$8(&/"&8.($/(&/<(&)".5=%")*&)1/<&$& )".5=%")&(V(8?/1"#&(#G1#(F&U$A(.#$&1*&$%*"&$A$1%$-%(&$*&$&8"44$#6&%1#(&/""%&=".&(V(8?/1#G&)".5=%")*&=."4&$&/(.41#$%F& 2*& "=& A(.*1"#& LFL+& U$A(.#$& 1*& $%*"& $A$1%$-%(& $*& $& *(.A(.& /"& $%%")& .(4"/(& (V(8?/1"#& "=& )".5=%")*F& U$A(.#$& L& ;(.A(.& (#$-%(*& ($*1(.&1#/(G.$/1"#&"=&U$A(.#$X*&)".5=%")&(V(8?/1"#&(#G1#(&)1/<&3(-&>"./$%*&$#6&"/<(.&$>>%18$/1"#*+&)<18<&8$#&8"44?#18$/(& )1/<&/<(&;(.A(.&A1$&*/$#6$.6&1#/(.=$8(*F&& U$A(.#$&L&;(.A(.&(V>"*(*&,Y;U&$#6&;Z2P&2P[*N&(1/<(.&8$#&-(&?*(6&/"&$88(**&/<(&=?#8/1"#$%1/B&"=&/<(&;(.A(.F&[#/(.#$%%B+&/<(& 8?..(#/& A(.*1"#& "=& /<(& ;(.A(.& 1*& -$*(6& "#& W2\M,;& $#6& W2\M3;& R=".& ,Y;U& $#6& ;Z2P& *?>>"./+& .(*>(8/1A(%BS& )1/<& /<(& -$*18& )".5=%")&(V(8?/1"#&=?#8/1"#$%1/B&<$#6%(6&-B&6(%(G$/1"#&/"&/<(&U$A(.#$&8"44$#6&%1#(&/""%F&U"&/<(&-$*18&(V(8?/1"#&4"6(%&"=& /<(& 8"44$#6& %1#(& /""%+& /<(& ;(.A(.& $66*& =1%(& $#6& 61.(8/".B& 4$#$G(4(#/+& $*& )(%%& $*& %""51#G& $=/(.& 6(%(/1#G& $%%& .(*"?.8(*& $**"81$/(6&)1/<&/<(&*(.A(.&"#8(&6"#(F&U<(&;(.A(.&1*&14>%(4(#/(6&?*1#G&2>$8<(&9\'&1#*16(&$&;>.1#G&-($#&8"#/$1#(.+&6(*1G#(6& /"&-(&6(>%"B$-%(&1#&$#B&W$A$&*(.A%(/&8"#/$1#(.&R(FGF+&U"48$/+&W(//B+&:%$**=1*<SF&& U<(&8%1(#/&*/$./*&/<(&1#/(.$8/1"#&)1/<&/<(&;(.A(.&-B&*?-41//1#G&/<(&6(=1#1/1"#&=1%(&"=&/<(&)".5=%")&/"&-(&(V(8?/(6F&U<1*&8.($/(*& $& ])".5=%")& .?#^& "#& /<(& ;(.A(.& $#6& .(/?.#*& /<(& .?#& 16(#/1=1(.& /"& /<(& 8%1(#/F& 7(V/+& /<(& 8%1(#/& *(/*& ?>& /<(& 1#>?/& 6$/$& =".& /<(& )".5=%")X*&1#>?/&>"./*+&1=&$#B+&-B&(1/<(.&>."A161#G&A$%?(*&61.(8/%B&".&?>%"$61#G&=1%(*&)<(.(&1#>?/*&$.(&/"&-(&.($6&=."4F&U<(& )".5=%")&1*&#")&.($6B&=".&(V(8?/1"#&_&*(//1#G&/<(&)".5=%")X*&*/$/?*&/"&]Z>(.$/1#G^&"#&/<(&;(.A(.&1#1/1$/(*&/<(&)".5=%")&.?#F& 7(V/+&/<(&8%1(#/&>"%%*&/<(&.?#X*&*/$/?*&"#&/<(&;(.A(.+&)$1/1#G&=".&1/&/"&=1#1*<F&2%/(.#$/1A(%B+&/<(B&8$#&)$/8<&/<(&2/"4&=((6&=".& #"/1=18$/1"#& "=& /(.41#$/1"#+& ".& .(G1*/(.& $& .(8(1A(.& =".& "#(& "=& /<(& "/<(.& *?>>"./(6& >."/"8"%*F& '1#$%%B+& /<(& 8%1(#/& 8"%%(8/*& /<(& .(*?%/&6$/$&(1/<(.&1#61A16?$%%B&=".&($8<&"?/>?/&>"./&".&8"%%(8/1A(%B&$*&"#(&\CO&".&`1>&=1%(F& U"& 8"4>%(4(#/& /<(& U$A(.#$& L& ;(.A(.+& )(& <$A(& $%*"& 6(A(%">(6& 8%1(#/& %1-.$.1(*& 1#& ,?-B& $#6& W$A$& /<$/& (V>"*(& /<(& ;(.A(.X*& 8$>$-1%1/1(*&$#6&(#$-%(&>(">%(&/"&a?185%B&1#/(G.$/(&U$A(.#$&)1/<1#&/<(1.&$>>%18$/1"#*F&T"/<&8%1(#/&%1-.$.1(*&1#/(.#$%%B&$88(**& /<(&;(.A(.X*&,Y;U&1#/(.=$8(F&U<(&,?-B&8%1(#/&1*&$A$1%$-%(&$*&$&G(4&$#6&<$*&-((#&?*(6&-B&/<(&U$A(.#$M:$%$VB&1#/(G.$/1"#&/"& (#$-%(&).$>>1#G&U$A(.#$&)".5=%")*&$*&:$%$VB&/""%*&$#6&8$%%1#G&/<(4&=."4&/<(&:$%$VB&)".5=%")*F&[#/(.#$%%B+&/<(&:$%$VB&/""%& *(#6*&/<(&).$>>(6&U$A(.#$&)".5=%")&=".&(V(8?/1"#&"#&$&U$A(.#$&;(.A(.&$#6&=(/8<(*&.(*?%/*&=."4&1/F&& b$.1"?*&"/<(.&>."c(8/*&$.(&?/1%1*1#G&U$A(.#$&L&;(.A(.&1#&/<(1.&1#=.$*/.?8/?.(*J&& • • • • ;<$.(6& :(#"418*& R<//>*JKK)))F#1-<1F".GF?5K*<$.(6G(#"418*K6(=$?%/F$*>VS_& ?*1#G& U$A(.#$& L& ;(.A(.& /"& .?#& G(#(/18&6$/$&)".5=%")*& 0YO[Z&R<//>JKK)))F<(%1"MA"F(?S&_&?*1#G&U$A(.#$&L&;(.A(.&/"&.?#&<(%1"><B*18*&)".5=%")*& 8$:.16&R<//>JKK)))F8$G.16F".GS&_&(V(8?/1#G&$#$%B/18&$#6&6$/$&*(.A18(*&1#&8$#8(.M>.(618/1#G&)".5=%")*&"#&U$A(.#$& L&;(.A(.&.?##1#G&"#&8$T[:X*&8$:.16&>%$/=".4& 7([;;&R)))F#(1**F".GF?5S&_&.?##1#G&>">?%$/1"#&$#6&/.$==18&*14?%$/1"#&)".5=%")*&=."4&$&>"./$%& U<(&#(V/&.(%($*(&"=&/<(&U$A(.#$&L&;(.A(.&R*8<(6?%(6&=".&W?#(&LdQQS&)1%%&<$A(&14>."A(6&<$#6%1#G&"=&%$.G(&1#>?/&$#6&"?/>?/& 6$/$+&=?%%BM1#/(G.$/(6&*(8?.1/B&*?>>"./&R)1/<&8%1(#/&$?/<(#/18$/1"#&/"&/<(&;(.A(.&4$#$G(6&-B&;>.1#G&;(8?.1/B+&$?/<(#/18$/(6& *(.A18(& 1#A"8$/1"#*& =."4& 1#*16(& (V(8?/1#G& )".5=%")*+& $#6& )".5=%")& .?#& $88(**& 8"#/."%& *"& /<$/& ?*(.*& 8$#& >(.41/& /<(1.& 8"%%$-".$/".*&/"&*((&/<(1.&.(*?%/*&61.(8/%BS+&14>."A(6&c"-&4$#$G(4(#/&$#6&$88"?#/1#G+&*(4$#/18&.(*?%/&6$/$&6(*8.1>/1"#&$#6& A$.1"?*&#"/1=18$/1"#&4(8<$#1*4*&=".&%(//1#G&?*(.*&5#")&)<(#&/<(&)".5=%")&.?#*&<$*&8"4>%(/(6&R1#8%?61#G&$#&2/"4&=((6&$#6& *?>>"./&=".&U)1//(.+&(M4$1%&$#6&W$--(.&4(**$G1#GSF& !"#$%$&'(&)*%&+,-.&!/0&12+345&6789%:);&<7=#)&#8>&?@ABCBD&=#$&)*%&,E-.&FFGHI&1G(.J5&6789%:);&<7=#)&#8>&FFK4LLMC@0KND>& Poster 8 !"#$%&'()*+,-.*/0&12#)&#30)*4$0*56#)7%2$7* * 82&1')9*:;$$#*<*=72>)0%2#"*$4*!'924$072'*:'7*?2)@$*A;%;$$#B3&%C,)C3D* E0$F)&#*G)H%2#).*1##(.II&"#$%&'(),$0@* E0$F)&#*%$30&)*&$C).*1##(.II&12'7#2,3&%C,)C3I%>7I&"#$%&'()** J()7*:$30&)*K2&)7%).*KLEK* * !"#$%&'()*2%*'*($(39'0M*$()7*%$30&)*C)%N#$(*'((92&'#2$7*4$0*>2%3'92O27@*'7C*'7'9"O27@* H2$9$@2&'9*7)#G$0N%,**!"#$%&'()*P,Q*&$7%2%#%*$4*'*&$0)*'((92&'#2$7*#1'#*(0$>2C)%*'* >2%3'92O'#2$7*'7C*'7'9"%2%*&'('H292#2)%*'9$7@*G2#1*'7*/ER*4$0*)6#)7C27@*!"#$%&'()S%* 437&#2$7'92#"*#10$3@1*T(93@27%,U*:&2)7#2%#%*'7C*$#1)0*!"#$%&'()*3%)0%*H)7)42#*40$;*#1)* '7'9"#2&'9*C)(#1*(0$>2C)C*H"*#1)*(93@27%M*G129)*(93@27*'3#1$0%*H)7)42#*40$;*#1)*&$0)* !"#$%&'()*437&#2$7'92#"*'7C*#1)*40';)G$0N*4$0*C2%#02H3#27@*'7C*'C>)0#2%27@*(93@27%,*V12%* ;3#3'99"*H)7)42&2'9*0)9'#2$7%12(*1'%*0)%39#)C*27*$>)0*W--*(93@27%* A1##(.II&"#$%&'(),$0@I(93@27%,1#;9D*'9$7@*G2#1*C$O)7%*$4*(3H92&'#2$7%*'H$3#*#1)*(93@27%* #1);%)9>)%,*** * !"#$%&'()*+,-*0)(0)%)7#%*'7*'##);(#*#$*0)4'&#$0*!"#$%&'()*#$*;'N)*(93@27*G02#27@*%2;(9)0* G129)*'#*#1)*%';)*#2;)*(0$>2C27@*;$0)*%#'H292#"M*($G)0M*'7C*49)62H292#"*#$*#1)*%"%#);*'%*'* G1$9),**X20%#*'7C*4$0);$%#M*!"#$%&'()*+,-*1'%*H))7*;$C39'02O)C*G2#1*#1)*/ER*&9)'79"* %)('0'#)C*40$;*#1)*2;(9);)7#'#2$7,**V12%*;$C39'02#"*2%*H)27@*4'&292#'#)C*'7C*)74$0&)C*G2#1* J:L2*A1##(.II$%@2,$0@DM*'*($(39'0*;$C39'02O'#2$7*40';)G$0N,**J:L2S%*;2&0$*%)0>2&)* '0&12#)&#30)*'99$G%*(02>'#)*2;(9);)7#'#2$7*&$C)*#$*0);'27*(02>'#)*H"*0)@2%#)027@*;2&0$* %)0>2&)%M*G12&1*0)9"*$79"*$7*#1)*(3H92&*/ER,*V12%*;)'7%*#1'#*'7"*(93@27*$79"*1'%*#1)* $(($0#372#"*#$*C)()7C*$7*#1)*(3H92&*/ERM*G12&1*G299*1$()4399"*&9'024"*'7C*%2;(924"*G1'#*2%* 7))C)C*#$*G02#)*'*(93@27,*Y)*1'>)*'9%$*H)@37*3%27@*#1)*:);'7#2&*Z)0%2$727@*%#'7C'0C* A1##(.II%);>)0,$0@D*4$0*!"#$%&'()*&$C)*#$*;'N)*&9)'0*1$G*'7C*G1)7*'*(3H92&*/ER*;'"* &1'7@),*V12%*G299*'99*@$*#$G'0C%*1)9(27@*#1)*!"#$%&'()*&$0)*;'27#'27*H'&NG'0C%* &$;('#2H292#"M*G12&1*G299*@0)'#9"*27&0)'%)*(93@27*%#'H292#",**/99*T(93@27%U*27*+,-*&'7*H)* G02##)7*'%*J:L2*H37C9)%M*F3%#*92N)*#1)*!"#$%&'()*&$0)*;$C39)%,**V12%*;)'7%*#1'#*(93@27%*G299* 7$G*1'>)*#1)*$(($0#372#"*#$*0)@2%#)0*#1)20*$G7*(3H92&*/ERM*)92;27'#27@*#1)*C2%#27&#2$7* H)#G))7*&$0)*'7C*(93@27*'7C*0)%39#27@*27*'*;3&1*;$0)*($G)0439*'7C*49)62H9)*%"%#);,**Y129)* #1)*'0&12#)&#30)*$4*!"#$%&'()*+,-*0)92)%*$7*J:L2M*>)0"*92##9)*&$C)*C$)%,**Y)*1'>)*9)>)0'@)C* #1)*:(027@*?"7';2&*8$C39)%*%"%#);*#$*)92;27'#)*7)'09"*'99*0)92'7&)*$7*#1)*J:L2*/ER*27*#1)* &$0)*'((92&'#2$7,**/#*#1)*&$%#*$4*Q8K*&$742@30'#2$7*429)%M*'99*&$C)*)7C%*3(*H)27@*T(9'27*$9C* ['>'*$HF)&#%U*G12&1*%1$39C*1$()4399"*;'N)*#1)*&$0)*&$C)*;3&1*)'%2)0*#$*37C)0%#'7C,* * Y129)*7$#*G2#1$3#*02%NM*G)*H)92)>)*#1'#*!"#$%&'()*+,-*G299*)7'H9)*'*7)G*@)7)0'#2$7*$4* !"#$%&'()*(93@27%*'%*G)99*'%*;3&1*@0)'#)0*$(($0#372#"*4$0*&$99'H$0'#2$7*G2#1*C244)0)7#* %"%#);%,**V12%*#'9N*G299*)9'H$0'#)*$7*#1)*7)G*!"#$%&'()*'0&12#)&#30)*27&93C27@*2#%*H)7)42#%M* &1'99)7@)%M*'7C*02%N%,* Poster 9 Applying Visual Analytics to Extend the Genome Browser from Visualization Tool to Analysis Tool Jeremy Goecks (jeremy.goecks@emory.edu)1, Kanwei Li1, The Galaxy Team2, and James Taylor1 1Departments of Biology and Math & Computer Sciences, Emory University 2http://galaxyproject.org Website: http://galaxyproject.org, Code: http://bitbucket.org/galaxy/galaxy-central/ License: Academic Free License Genome browsers play a central role in genomic research by enabling scientists to visualize large textual and numerical datasets in a biologically meaningful way, making it possible to observe patterns both within and across datasets. We have applied the principles of visual analytics to develop the Galaxy Track Browser (GTB), a Web-based genome browser integrated into the Galaxy platform (see Figure). GTB utilizes a client-server model to scale efficiently and support the large amounts of data produced by experiments using next-generation sequencing technologies. This model ensures that GTB users can completely customize the display of each data track. GTB leverages the Galaxy platform to combine data visualization and data analysis; GTB users can run tools to produce and visualize new data and dynamically filter data. Using GTB, a user can specify parameters and run a tool on the subset of data that is visible; GTB renders the tool’s output when it has completed. Running a tool and viewing its output can be done interactively (quickly) because a tool runs only on the subset of data visible to the user. GTB also supports real-time data filtering. Users can dynamically filter their data by using sliders to specify ranges for attribute values; data with values outside an attribute range are hidden as a user makes changes. GTB is available on every Galaxy server. Users can create visualizations for both standard and custom genome builds and can also share visualizations with colleagues or publish them on the Web. While GTB leverages many of Galaxy’s features, it is modular and can be configured to work outside of Galaxy and with different data providers. GTB is available in the stable Galaxy distribution. 1 2 3 4 5 6 7 Figure. Using the Galaxy Track Browser for analysis of ENCODE RNA-seq data. From top to bottom: (1) partial view of mapped RNA-seq reads from ENCODE cell line h1-hESC; (2) a form for running Cufflinks, a tool for assembling mapped reads into transcripts; (3) first attempt at transcript assembly; (4-6) improving the assembly using different parameter values for Cufflinks; (7) filtering assembled transcripts from the GM12878 cell line using transcript attributes. Poster 10 WebApollo: A Web-Based Sequence Annotation Editor for Community Annotation Ed Lee1, Gregg Helt1, Nomi Harris1, Mitch Skinner2, Christopher Childers3, Justin Reese3, Jay Sandaram3, Christine Elsik3, Ian Holmes2, Suzanna Lewis1 1 Berkeley Bioinformatics Open-source Projects, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA 2 Department of Bioengineering, University of California at Berkeley, Berkeley, CA, 94720, USA 3 Georgetown University, Washington DC, 20057, USA Contact: Ed Lee (elee@berkeleybop.org) As technical advances make sequencing faster and cheaper there are more and more community annotation efforts, augmenting the traditional centralized model where curators for a given genome project are all at the same physical location. This trend is particularly strong for smaller genome projects that largely rely on contributions from geographically dispersed community experts. WebApollo was designed to provide an easy to use web-based environment that allows multiple distributed users to edit and share sequence annotations. WebApollo is comprised of three components: a web-based client, a server-side annotation editing engine, and a server-side service for providing the client with data from different sources, including databases at the University of California at Santa Cruz, Ensembl, and Chado. The web-based client is designed as an extension to JBrowse, a Javascript-based genome browser that provides a fast, highly interactive interface for visualizing genomic data. This JBrowse extension provides the gestures needed for editing annotations, such as dragging and dropping features to create new annotations, dragging to change boundaries of existing annotations, and using context-specific menus for modifying features. The extension offers access to the annotation-editing service and the data-providing services as well. The server-side annotation-editing engine is written in Java. It handles all the logic for editing and deals with the complexities of modifications in a biological context, where a single change (for example, splitting or merging transcripts) can have multiple cascading effects. Edits are stored persistently in the server, allowing users to quickly recover their data, should either their browser or the server crash. The server makes use of the Comet model to provide synchronized updating over multiple browser instances, so that if one user edits an annotation, anyone who is viewing that annotation sees the changes instantly in their browser window. The server-side service for providing data to the client is built on top of Trellis, a DAS server framework. It uses format injection to provide JBrowse-supported JSON data structures, rather than the more verbose DAS XML. We also developed a Trellis plugin to access data from the UCSC MySQL genome database, which provides quick access to that popular data source. All three components are open source and provided under the BSD License. Source code and demo: https://github.com/berkeleybop/jbrowse (client side code) http://code.google.com/p/apollo-web (annotation editing engine server code) http://code.google.com/p/gbol (data model and I/O layer code used by edit engine) http://code.google.com/p/genomancer (Trellis server code) http://icebox.lbl.gov:8080/ApolloWebDemo (demo) h?2 BbQ#` _ T+F;2, MHvbBb Q7 [mMiBiiBp2 T`Qi2QKB+b /i 6HQ`BM SX "`2BirB2b2` M/ C+[m2b *QHBM;2 mi?Q`b {HBiBQM S`2b2MiBM; mi?Q` 1KBH S`QD2+i q2# aBi2 S`QD2+i *Q/2 PT2M aQm`+2 GB+2Mb2 *2JJ @ _2b2`+? *2Mi2` 7Q` JQH2+mH` J2/B+BM2 Q7 i?2 mbi`BM +/2Kv Q7 a+B2M+2b- oB2MM- mbi`B 7#`2BirB2b2`!+2KKXQ2rX+Xi ?iiT,ff#BQBM7Q`KiB+bX+2KKXQ2rX+XifBbQ#` ?iiT,ff#BQBM7Q`KiB+bX+2KKXQ2rX+XifBbQ#`fBbQ#`Xi`X;x G:SGpkXR "+F;`QmM/ Jbb aT2+i`QK2i2`b UJaV HHQr i?2 B/2MiB}+iBQM Q7 i?QmbM/b Q7 T`Qi2BMb BM #BQHQ;B+H bKTH2b pB Kbb@pb@ BMi2MbBiv bT2+i` Q7 Bib T2TiB/2bX ZmMiBiiBp2 /Bz2`2M+2b BM T`Qi2BM #mM/M+2 +M #2 K2bbm`2/ #v H#2HHBM; T2TiB/2b rBi? BbQ#`B+ bi#H2 BbQiQT2b UBh_Z hJhV- r?B+? T`Q/m+2 bB;Mim`2 BQMb r?Qb2 BMi2MbBiB2b `2 +QKT`2/X PM i?2 #BQBM7Q`KiB+b bB/2- [mMiBiBp2 BM7Q`KiBQM ?b iQ #2 2ti`+i2/ M/ biiBbiB+H K2i?Q/b mb2/ iQ KT `iBQb K2bm`2/ i i?2 BM/BpB/mH bT2+i`mK H2p2H iQ i?2 T`Qi2BM H2p2H M/- }MHHv- +QKT`2 #BQHQ;B+H bKTH2bX .2b+`BTiBQM h?2 BbQ#` T+F;2 BKTH2K2Mib 7mM/K2MiH T`Qi2QKB+b /i `2T`2b2MiiBQM BM a9 +Hbb2b #b2/ QM "BQ@ +QM/m+iQ`X Ai T`b2b T`Qi2BM B/2MiB}+iBQM `2bmHib UJb+Qi M/ S?2Mvtc JxA/2MiJG BM /2p2HQTK2MiV M/ /2i2`KBM2b T`Qi2BM ;`QmTb M/ T2TiB/2b b?`2/ #v KmHiBTH2 T`Qi2BMbX LQp2H biiBbiB+H KQ/2Hb i?2M +Tim`2 i2+?MB+H b r2HH b #BQHQ;B+H bQm`+2b Q7 /i p`B#BHBivX S`Qi2BM `iBQb `2 `2TQ`i2/ rBi? S@pHm2b M/ +QM@ }/2M+2 BMi2`pHbX *QKTmiiBQMb `2 T2`7Q`K2/ BM i?2 _ 2MpB`QMK2Mi r?B+? 7+BHBii2b /i 2tTHQ`iBQM M/ pBbmHBxiBQMX lb2`@Q`B2Mi2/ Gh1s M/ 1t+2H [mHBiv@+QMi`QH M/ MHvbBb `2TQ`ib +M 2bBHv #2 ;2M2`i2/ pB b+`BTibX +?R +?k R *QMi`QH h`2iK2Mi k *QMi`QH j XX X T`Qi2BM ;`QmT T2TiB/2b bT2+i` a2`TBMR2, Zyy3N3 RfR d R h`2iK2Mi ++, Z8aqlN1,2 kfk 8 9 *QMi`QH h`2iK2Mi XX X iT8D, SNd98y RfR XX X 9 XX X RN RjR *QMi`QH h`2iK2Mi SQbiM, ZekyyN1−5 8f8 R j Rjk *QMi`QH h`2iK2Mi Jv?d, ZNRw3j RfR Rk3 ek `iBQ @8 R X yXkk I yX9y I X = X = X = jXy8 I jXee I X = X = yX9N I XX X *QM+HmbBQMb q2 /2pBb2/ biiBbiB+H K2i?Q/b M/ BKTH2K2Mi2/ i?2 T+F;2 BbQ#` 7Q` _ HHQrBM; ++m`i2 MHvbBb M/ pBbmHBxiBQM Q7 Bh_Z M/ hJh /iX AbQ#` +QMiBMb 7mM/K2MiH +Hbb2b M/ K2i?Q/b 7Q` Ja /i `2T`2b2MiiBQM i?i +M #2 2tTM/2/ iQ T`QpB/2 ;2M2`B+ Ja 7`K2rQ`F BM _- BM/2T2M/2Mi 7`QK [mMiBiiBp2 T`Qi2QKB+b bT2+B}+ M22/bX Poster 11 8 Title: Stacks: building and genotyping loci de novo from short-read sequences Authors: Julian M. Catchen*, Angel Amores§, Paul Hohenlohe*, William Cresko*, and John H. Postlethwait§ * University of Oregon, Center for Ecology and Evolutionary Biology, Eugene OR 97403 USA § University of Oregon, Institute of Neuroscience, Eugene OR 97403 USA Email: jcatchen@uoregon.edu Project: http://creskolab.uoregon.edu/stacks/ License: GNU GPL Abstract Advances in sequencing technology provide special opportunities for genotyping individuals rapidly and cheaply, but the lack of software for the automated calling of tens of thousands of genotypes over hundreds of individuals has hindered progress. Stacks is a software system that identifies and genotypes loci in a set of individuals from short-read sequence data, either de novo or by comparison to a reference genome. Using Stacks to analyze reduced representation Illumina sequence data, such as RAD-tags, can recover thousands of single nucleotide polymorphism (SNP) markers that can be used for the genetic analysis of crosses or populations. Stacks can generate markers for ultra-dense genetic maps, can facilitate the examination of population phylogeography, and can help in assembly of reference genomes. We report here the algorithms implemented in Stacks and demonstrate its efficacy by constructing loci from simulated RAD-tags taken from the stickleback reference genome and by recapitulating and improving a genetic map of the zebrafish, Danio rerio. Further, we demonstrate the application of Stacks in non-model organisms to develop a genetic map and assemble an emerging reference genome in two teleost fishes. Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands Heorhiy V Byelas1*, Danny Arends2*, Freerk van Dijk1,4, K Joeri van der Velde2, Laurent Francioli1,4, Martijn Dijkstra1,4, Alexandros Kanterakis1, Ishtiaq Ahmad3,5, David van Enckvoort5, Leon Mei5, Peter Horvatovich3,5, many other members of BBMRI-NL4, NBIC5 and Target6, Morris A. Swertz1,2,4-6 1 Genomics Coordination Center, University Medical Center Groningen 2Groningen Bioinformatics Center, University of Groningen, 3Pharmacy Department, University of Groningen, 4BBMRI-NL biobank consortium, 5 Netherlands Bioinformatics Center, NBIC, 6Groningen Centre for IT & Target/LifeLines infrastructure project *equal contribution. Contact: Morris Swertz (m.a.swertz@rug.nl) Project website: http://www.molgenis.org Code: http://www.molgenis.org/svn/molgenis_apps/ License: LGPLv3 Abstract Many have embarked on large-scale next generation sequencing and GWAS imputation studies. However, running all the necessary analysis scripts on large, parallel, compute clusters and keeping track of what protocols were used to produce a particular sequence or quality control has become a huge challenge. Last year we presented the open source MOLGENIS database toolkit to address data management challenges around creation, management and reporting of genomic data1-4. Here, we present a new flexible tool �MOLGENIS/compute’ that extends MOLGENIS to define, run and manage large analysis pipelines for next generation sequencing, imputation and other bioinformatics analyses on large scale ICT infrastructures5-7. Processing 750 Dutch genomes An example of a large-scale analysis challenge is the Genome of the Netherlands project5,6, a Dutch National initiative of five universities to sequence 750 Dutch individuals in 250 parent/child trios to establish a �hapmap’ of the Dutch population. Already the first phase of this project has been a major data management and compute challenge with 10.000s of compute jobs for alignment and SNP calling of 2250 lanes/750 samples which each requires >10 analysis steps (bwa, realignment, quality recalibration, etc) and tracking all biomaterials involved (sequence lanes, samples, cohorts, individuals, etc.). This hapmap will subsequently be used to impute all existing Dutch cohorts up to 100.000 individuals, which will be similarly challenging in managing GWAS quality control and imputation (impute2 and beagle imputation, etc). And also proteomics mass spectrometry analyses are targeted by this project. Define compute protocols and pipelines To address these challenges, bioinformaticians can use MOLGENIS/compute web user interface to specify all compute protocols needed inside the database, i.e., executable scripts written in shell or R, their input and output parameters and data sets needed, e.g. �bwa alignment’. Each script can use REST and R programming interfaces to upload (references to) result data and link it to their analysis targets, e.g. individuals, cohorts, flowcells, lanes, samples, etc. A template mechanism allows for flexible scripting necessary for parallelization. Complete analysis pipelines can be made by chaining compute protocols into simple workflows, e.g. �alignment and QC’, which can then be used by biologist just as individual compute protocols. Poster 12 Run and monitor analyses at a large scale Researchers can select, run and monitor computational protocols. When a compute protocol is selected (e.g. bwa-align-pe), the user gets an auto-generated dialog box to fill in all analysis parameters (e.g. bwa-reference-genome) and analysis targets (e.g. illumina lane) based on a bioinformatics definition as described above. Then the user clicks �start’ and all the necessary compute jobs will be generated as actual scripts that will be first stored in the database as analysis logbook and automatically sent to a compute cluster or cloud for execution, currently: local PCs, LANs, PBS clusters or Amazon EC2(alpha) and MOTEUR and Galaxy exports are under development. During analysis, the user can monitor all scripts submitted. Integrated analysis and data management Finally, the user can view all results integrated with genomic data management such as quality reports and SNP calls for all 750 Genomes of the Netherland. MOLGENIS/compute is available free for all to use and co-develop as open source. References 1. 2. 3. 4. 5. 6. 7. http://www.molgenis.org Swertz MA, Jansen RC (2007) Beyond standardization: dynamic software infrastructures for systems genetics. Nature Reviews Genetics. 8(3). Swertz et al (2010) XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments. Genome Biology. 9;11(3):R2 Swertz et al (2010) The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button. BMC Bioinformatics 11(Suppl 12):S12 BBMRI-NL Genome of NL. http://tinyurl.com/bbmri-gonl BBMRI-NL bioinformatics team. http://www.bbmriwiki.nl Target infrastructure. http://www.rug.nl/target/infrastructuur/computing Bio-NGS: BioRuby plugin to conduct programmable workflows for Next Generation Sequencing data 1 2 3 2 Bonnal R. , Strozzi F. , Katayama T. , Stella A. , Pagani M. 1 1 Istituto Nazionale Genetica Molecolare (INGM), Via F. Sforza 28, Milan 20122, Italy, bonnal@ingm.org 2 Parco Tecnologico Padano, Via Einstein Loc. Cascina Codazza 26900 Lodi Italy 3 Laboratory of Genome Database, Human Genome Center, Institute of Medical Science, University of Tokyo, Japan Project Home: http://bioruby.open-bio.org/wiki/Next_Generation_Sequencing Code base: http://github.com/helios/bioruby-ngs License: The Ruby License BioRuby is a well-established bioinformatics library for the Ruby 1.9 programming language. Here we present a new package, Bio-NGS, for BioRuby to perform Next Generation Sequencing (NGS) analyses based on a recently introduced plugin system (Bio-Gem), which allows users to extend the core library for adding new functionalities. Tools and libraries have been written in other languages using different approaches, but Ruby will facilitate the development of a light, flexible and customizable solution to face the new challenges of NGS data analysis. This NGS plugin can handle standard software like BWA, Bowtie, TopHat, Cufflinks, SAMtools and many others, in a common way. Third-party tools and applications can be integrated with Bio-NGS in different manners, by wrapping command line applications or by binding low-level libraries/functionalities. Wrapping is the easiest way to support and maintain third-party software while binding offers the benefit to use some internal functionality which is not exposed to the end user. The applications for which we provide a wrapper will be included in the package when possible, as Linux and OSX binaries usually pre-compiled at 64 bit. This approach gives flexibility to choose the best solution and provide a ready to run NGS-analysis tool. Binding for BWA and SAMtools respectively bio-bwa and bio-samtools, are available as separated plugins as well. The plugin provides a task management framework and a set of predefined tasks to use third-party applications and trigger specific procedures, including pre- and postprocessing operations, often required to filter the data by quality control on the input or to visualize the outputs of the analyses. The Ruby programming language is perfectly suited to define pipelines and procedures given its flexible and clean syntax. Users of Bio-NGS will be able to add new tasks and develop their custom workflows with popular Ruby DSLs like Rake and Thor. Every operation submitted is recorded using a built-in history manager that will store all the settings and parameters used for a given procedure. Tasks can also be submitted as parallel jobs on different environments like multi-core machines, computer clusters and in the near future in the cloud as well. A monitoring system which tracks the tasks is under consideration. Bio-NGS provides a command-line reusable, yet customizable, system for demanding NGS data analysis and workflow management. Bio-NGS is developed following Ruby 1.9 specifications and is working on MRI 1.9.2 and JRuby 1.6.0 . Poster 13 Goby framework: native support in GSNAP, BWA and IGV 2.0 1 Kevin C. Dorff (kcd2001@med.cornell.edu), Nyasha Chambwe 1,2* Campagne (fac2003@med.cornell.edu) 1,2 3 4 , Thomas D. Wu , Jim T. Robinson , Fabien 1 The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine; Weill Medical College of Cornell 2 University, New York, NY 10021; Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, 3 4 NY 10021; Genentech, Inc., South San Francisco, CA 94080, USA; Broad Institute of Massachusetts Institute of Technology and Harvard,!Cambridge, Massachusetts, USA. Project URL: http://goby.campagnelab.org/ ! Source Code URL: http://campagnelab.org/software/goby/download-goby/ ! Open Source License: GNU General Public License! We will present an update about the Goby framework, a set of high performance APIs and tools that support a variety of data analysis for Next-Generation Sequencing (NGS) projects. We reported last year that the framework offered file formats 80 to 90% smaller than BAM files, while retaining sufficient information to perform RNA-Seq data analysis. We now report that we have extended three widely used NGS open-source tools with native support for Goby file formats. The following describes our rational for targeting these complementary sets of tools. BWA is a fast aligner to map reads derived from genomic DNA [1]. In contrast to Bowtie, BWA can align reads that contain insertions and deletions. GSNAP is an aligner that can map reads which span exon-exon junctions as well as perform SNP-tolerant mapping [2]. The first feature is useful for RNA-Seq applications, while the second helps avoid reference bias in alignments. GSNAP also supports the alignment of reads derived from bisulfite treated DNA samples used in the RRBS or Methyl-Seq protocols. IGV is a widely used genome viewer that supports data integration across modalities [3]. GSNAP and BWA support. We have extended GSNAP and BWA to read Goby compact read files natively and to write Goby alignment files. These extensions support aligning an arbitrary slice of a very large read file and are key to parallelizing alignments efficiently (the input file is partitioned into non-overlapping slices, which are aligned in parallel, results are concatenated). Using these techniques, we have developed a parallel alignment tool that aligns about 100 million single end 100bp reads per hour on a three nodes cluster (24 threads on each node). Such a cluster could be bought for less than $24,000 in 2011. The alignment time reported includes the time to generate wiggle tracks and various statistics of alignment quality. IGV support. We have extended IGV to load and display Goby alignment files. Alignments that have been sorted can be loaded into IGV along with other data tracks. Visualization shows mapped reads individually as well as differences between the read and the reference sequence. The version of IGV that supports Goby alignments is currently available for download as an Early Access Version (http://www.broadinstitute.org/software/igv/download). Extensions to the Goby read format. We have added a number of features to the Goby formats over the last year. Read files were extended to support paired-end reads (both pairs of reads are represented in a single Goby reads file). We also developed a mechanism to store meta-information about the reads in the reads file. This can be used for instance to document the organism the reads were derived from, the type of sequencing instrument, or the date the run was performed (useful to detect batch effects). Arbitrary key-value pairs make it possible to encode userdefined meta-information about a collection of reads. Extensions to the Goby alignment format. We have extended the alignment format to store paired-end alignments, as well as alignment of reads that map across exon-exon junctions. These extensions to the file format were made with backward and forward compatibility with previous versions of the file format or version of the framework. Extensions to the Goby toolbox. We have extended the Goby toolbox with high-performance, well documented and simple to use tools for calling genotypes (DNA-Seq), estimating allele frequencies (RNA-Seq), finding variants that differ between groups of samples, and estimating methylation rates at CpG sites (RRBS or Methyl-Seq). Several of these new applications will be discussed. Citations. 1. Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009. 25(14): p. 1754-1760. 2. Wu, T.D. and S. Nacu, Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 26(7): p. 873-881. 3. Robinson, J.T., et al., Integrative genomics viewer. Nat Biotechnol. 29(1): p. 24-6. Poster 14 A Scalable Multicore Implementation of the TEIRESIAS Algorithm Lee Nau1 , Frank Drews1 , and Lonnie Welch1 1 School of Electrical Engineering and Computer Science, Ohio University, Athens, OH 47501, USA E-mail: welch@ohio.edu Project URL: http://code.google.com/p/pteir/ Code URL: http://code.google.com/p/pteir/source/checkout Open Source License: MIT License Abstract The TEIRESIAS algorithm is a combinatorial pattern (motif) discovery technique originally published by IBM Research. The lack of an available, open-source implementation of this algorithm motivated this open-source project. This algorithm is computationally expensive, and thus in order to reduce run times and achieve scalable performance, parallel processing techniques were used to make this implementation modern and efficient. This algorithm is an important one in the field of motif discovery, allowing the discovery of variablelength, and potentially long patterns. TEIRESIAS defines the notion of maximal patterns, which are as exact as possible without changing the number of occurrences. These have inherent value and might contain more useful information than a simple collection of strings. TEIRESIAS has been used in the discovery of long biological patterns, and as the basis of subsequent, modified algorithms based on this fundamental one. This fast, parallel, memory-efficient version of the algorithm has been implemented in C and uses the OpenMP parallel programming model. Experiments were conducted using up to sixteen threads on the Ohio Supercomputer Center’s Glenn cluster using System x3755 compute nodes (quad socket, quad core 2.4 GHz Opterons). Speedups achieved reached a maximum amount of approximately 10x, when scaling to sixteen cores/threads total. While not linear, this still represents a signifcant improvement in run times without much additional memory overhead. This project details the parallel processing techniques used to achieve such a performance increase and characterizes the scalability given several real biological datasets. Specifically, there are two stages in the TEIRESIAS algorithm: scanning and convolution. Both are distinct steps and have different processing techniques. The scan phase builds a database of candidate patterns from the input sequence(s). It is an embarrassingly parallelizable problem and was therefore straightforward to decompose into parallel tasks. The convolution phase is a more complex stage and parallelization strategies are not immediately obvious. This stage is recursive, and contains strict data and processing ordering dependencies. However, there exist certain categories of patterns which belong to the same equivalence class. That is, some can be segmented into discrete, equivalent groups. Due to the existence of these equivalence classes among the patterns, they may be processed in parallel due to the lack of a class’ interdependence. It was determined through statistical analysis of these classes resulting from real genome sequences that sufficient potential concurrency existed to process these sets simultaneously. Combined, these parallelization strategies resulted in improved run times and achieved a consistent speedup across organisms for the entire algorithm pipeline. For cases that tax either one stage or both, a significant improvment is seen for completion times, within a reasonable memory envelope. 1 Biomanycores, open-source parallel code for many-core bioinformatics Mathieu Giraud1 , St´ephane Janot1 , Jean-Fr´ed´eric Berthelot1 , Charles Deltel2 , Laetitia Jourdan1 , Dominique Lavenier2 , H´el`ene Touzet1 , Jean-St´ephane Varr´e1 1 2 LIFL, UMR CNRS 8022, Universit´e Lille 1 and INRIA Lille, France IRISA, UMR CNRS 6074, Universit´e Rennes 1, ENS Cachan, and INRIA Rennes, France contact@biomanycores.org URL: http://www.biomanycores.org Licenses: Various open-source licenses Graphics processing units (GPUs) enable efficient parallel processing at a very low cost, and are a first step towards the generalization of massively manycore architectures. Since CUDA (in 2007) and the standard OpenCL (2009), many GPU bioinformatics applications have been developed, from sequence alignment to proteomics or phylogenetics (review in [4]). Biomanycores is a collection of bioinformatics tools, designed to bridge the gap between researches in OpenCL/CUDA high-performance computing on GPU and other “manycore processors” and usual bioinformaticians and biologists. The main goal is to gather parallel programs and interface them with the BioJava [2], BioPerl [3] and Biopython [1] frameworks. We will also provide benchmarks to show in which cases it is worth using parallel versions of the programs. Weight Matrix (PWM) scan algorithm is in O(nm), where n is the sequence length and m the matrix size. PWM performance can be measured in millions of operations per second, one operation being the scoring of one sequence character against one matrix column. While the native Biopython search_pwm method peaks at 2 Mop/s, the Biomanycores GPU version (TFM-CUDA) peaks at more than 2 Gop/s, 1000 times faster, even once included the overhead due to the Python interpreter. In the coming months, we plan to integrate new applications into Biomanycores. Moreover, we wish to receive critical feedback from BioJava, BioPerl and Biopython communities and users in order to improve our interfaces, and discuss further integration into these frameworks. People who are developing CUDA or OpenCL bioinformatics code (available under a free license) are welcome to get involved in the project – we are willing to help them to integrate their code. Please contact us at contact@biomanycores.org. Biomanycores was presented at BOSC 2009. Since References November 2010, a developer is working full-time on [1] P. J. A. Cock, T. Antao, J. T. Chang, and al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. this project, redesigning, extending and documentBioinformatics, page btp163, 2009. ing Biomanycores. The release 1.1104 of Biomany[2] R. C. G. Holland, T. A. Down, M. Pocock, and al. BioJava: an opencores (April 15) includes applications for sequence source framework for bioinformatics. Bioinformatics, 24(18):2096–2097, 2008. alignment, sequence processing, and RNA folding tools. Next releases are likely to include tools for [3] J. E. Stajich, D. Block, K. Boulez, and al. The Bioperl toolkit: Perl modules for the life sciences. Genome Research, 12(10):1611–1618, 2002. proteomics, phylogenetics and genome-wide association studies. [4] J.-S. Varr´e, B. Schmidt, S. Janot, and M. Giraud. Advances in GeBiomanycores design makes it easy to alter existing pipelines and scripts, in order to use a parallel application instead of the standard one. In some cases, this leads to large improvements. For example, the complexity of the brute-force Position nomic Sequence Analysis and Pattern Discovery, chapter Manycore highperformance computing in bioinformatics. World Scientific, 2011. Poster 15 GemSIM – Generic, Error-Model based SIMulator of next-generation sequencing Kerensa McElroya*, Fabio Lucianib, Torsten Thomasa a Centre for Marine Bio-Innovation and School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW Australia, 2052. b Inflammatory Diseases Research Unit, School of Medical Sciences, UNSW, Sydney, NSW Australia 2052. Email: kerensa@unsw.edu.au Project page: http://sourceforge.net/projects/gemsim/ Download: http://sourceforge.net/projects/gemsim/files/GemSIM_V1/ License: GNU General Public License version 3 Summary: Next-generation sequencing (NGS) has unprecedented potential for assessing genetic diversity, however extracting true variants from errors is challenging due to high NGS error rates, multiple sequencing platforms with varied error profiles, and an every increasing variety of downstream analysis choices. While simulation can facilitate analysis, existing simulators are limited by simplistic error-models, unrealistic quality score information, or restricted platform applicability. GemSIM, or General Error-Model based SIMulator, is a next-generation sequencing simulator capable of generating single or paired-end reads for any sequencing technology compatible with the generic formats SAM and FASTQ (including Illumina and Roche454). By creating and using empirically derived error models and quality score distributions, GemSIM realistically emulates individual sequencing runs and/or technologies. GemSIM draws reads from either a single genome or a haplotype set, facilitating simulation of either individual or population level sequencing projects. Here, we demonstrate GemSIM’s value for next-generation sequencing projects, by simulating reads from a set of known, related bacterial haplotypes and optimising a parameter for the popular SNP-calling program VarScan. Reads simulated using error models derived from Illumina paired-end reads required different SNP calling parameters to those simulated using Roche-454 derived models, demonstrating the need for simulation when designing and analysing NGS projects. References Koboldt et al. (2009). “VarScan: variant-detection in massively-parallel sequencing of individual and pooled samples.” Bioinformatics 25(17):2283-2285. Poster 16 !"#$%&'()*'+),-*%&'()./"'0,.$%#")1&.&'2.%3*4&#,)&')4-")#5.$+) !"#$%&'()*++%,'(-./-0/1((2%'$3(4*,563'"(-./1(7"#8(9%:5;(-./1(<"++"%:(=6**,5&( -./1(=*>:?%,%&%?%,(=&","@%;%,A(-B/( -./ C%D+5(E5,*:"#;(FA'G1(H%I&%$%:(!5;5%&#$(J%:63;1(J%:I&"'D5(JHBB(0KL1( MN( -B/ J*D,"O%,A(L5#$,*+*D?(=*+3A"*,;1(H%,D%+*&51(P,'"%( -0/ Q&5;5,A",D(%3A$*&R(!"#$%&'()*++%,'1($*++%,'S5%D+5D5,*:"#;G#*:( ( Q&*T5#A(;"A5(";(#3&&5,A+?R($AA6;RUU>>>G5%D+5#*D,"O%,AG#*:U($*>5@5&(A$";(;"A5(";( ,*A(63I+"#%++?(%##5;;"I+5(-+*D",;(%&5(&5V3"&5'/(W(A$5(A%+8(";(%I*3A(A$5(:5#$%,"#;( *X($*>(>5(I3"+A("A1(,*A(A$5(%#A3%+(6&*'3#AG( L$5&5(";(,*(#*'5(A*(%##5;;G( PA(";(,*A(*65,Y;*3&#5("A;5+X1(I3A(3;5;(*65,Y;*3&#5(#*:6*,5,A;(A*(6*>5&("AG( ( ( C%D+5(E5,*:"#;(FA'G(%,'(J*D,"O%,A(L5#$,*+*D?(=*+3A"*,;(T*",A+?('5@5+*65'(%( 6&**XY*XY#*,#56A(;?;A5:(X*&(A$5(Q";A*"%(K++"%,#5(>$5&5I?(85?(*65,Y;*3&#5( I"*",X*&:%A"#;(;5&@"#5;(>5&5(A*(I5($*;A5'(",(%(;5#3&5(;$%&5'(5,@"&*,:5,A(*,(A$5( 63I+"#(#+*3'(X*&(6&"@%A5(#*::5&#"%+(3;5G(L$5(;?;A5:('5;"D,5'(I?(A$5(A>*( #*:6%,"5;(I&",D;(A*D5A$5&(%(,3:I5&(*X(*65,Y(%,'(#+*;5'Y;*3&#5(",X&%;A&3#A3&5( #*:6*,5,A;(A$%A(5,%I+5(A$5(*65,Y;*3&#5(A**+;(A*(I5(;5#3&5'(6&*65&+?(%,'($*;A5'( ",(%(:3+A"YA5,%,#?(5,@"&*,:5,A(A$%A(%3A*Y;#%+5;(%##*&'",D(A*('5:%,'G(( ( N5?(";;35;(X%#5'('3&",D('5@5+*6:5,A(",#+3'5'(8556",D(:3+A"6+5(;*3&#5;(*X( #*::5&#"%+('%A%(;56%&%A5'(",(%(:3+A"YA5,%,#?(5,@"&*,:5,A1(%,'(5,;3&",D(A$%A( A$5(*65,Y;*3&#5(A**+;($*;A5'(-",#+3'",D(C,;5:I+/(>5&5(;5#3&5(5,*3D$(A*( 6&5@5,A($%#85&;(D%",",D(%##5;;(A*(A$5(;?;A5:G( ( L$";(A%+8(>"++(",#+3'5(%(I&"5X(#*,A5ZA3%+(*3A+",5(*X(A$5(;?;A5:(%&#$"A5#A3&5(%,'( $*>("A(I+5,';(#+*;5'Y(%,'(*65,Y;*3&#5(#*:6*,5,A;(>"A$(X5%A3&5;(X&*:(K:%O*,( CJB(A*(I3"+'(%,(%##56A%I+?(&*I3;A(;*+3A"*,G()*>5@5&(A$5(:%",(X*#3;(>"++(I5(%( '";#3;;"*,(*X(A$5(#$%++5,D5;(>5(X%#5'(",(%AA5:6A",D(A*(:%85(*65,Y;*3&#5( I"*",X*&:%A"#;(;*XA>%&5(;5#3&5(5,*3D$(A*(;3&@"@5(65,5A&%A"*,(A5;A",D(I?(%(A5%:( *X(6&*X5;;"*,%+(5A$"#%+($%#85&;1(",#+3'",D(;*:5($",A;(%,'(A"6;(*,(;5#3&",D(*A$5&( 6&*T5#A;G(<$"+;A(A$5(;?;A5:(>5(I3"+A(";(,*A(*65,1(A$5(+5;;*,;(>5(+5%&,A(%+*,D(A$5( >%?(:*;A('5X","A5+?(%&5(%,'(%++(*3&(;5#3&"A?(X",'",D;($%@5(%+&5%'?(I55,(&56*&A5'( I%#8(A*(A$5(C,;5:I+(6&*T5#A(X*&(A$5"&(#*,;"'5&%A"*,G( (( Title: !"#$%$&'%()*+,$%$+-%%)./.')%+/0+/+1$23'4$+5+,-//1 Authors: Chunlei Wu and Andrew I. Su (presenting author underlined) Email: cwu@iscb.org asu@scripps.edu Affiliations: Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins DR, San Diego, CA 92121 project web site: http://mygene.info Source code: https://bitbucket.org/newgene/genedoc/src Open Source License being used: Apache License Considered for a talk, a poster, or both: both !"#$%&&'()%*(+,-$%."$%$)+//+,$%,0$)+,1",(",*$2%3$4+.$#(+'+5(-*-$*+$&."-",*$*6"(.$'%.5"$0%*%$-"*-$*+$*6"$ -)(",*(4()$)+//7,(*38$9'/+-*$%''$+4$*6"-"$-(*"-$%''+2$7-".-$*+$:7".3$%,0$."*.("1"$0%*%$4+.$*6"(.$4%1+.(*"$ 5","-8$;6"-"$-"%.)6$(,*".4%)"-$7-7%''3$7*('(<"$%$0"0()%*"0$5","$%,,+*%*(+,$0%*%#%-"$*+$*.%,-'%*"$7-".$ :7".("-$(,*+$*6"$%&&.+&.(%*"$(0",*(4(".-$%,0$*+$+#*%(,$-&")(4()$5","$%,,+*%*(+,-8$="**(,5$7&$%$0%*%#%-"$ -".1".$%,0$>""&(,5$5","$%,,+*%*(+,$0%*%$7&0%*"0$%."$*(/"?)+,-7/(,5$%,0$)7/#".-+/"$*%->-@$ "-&")(%''3$4+.$-/%''".$0"1"'+&".$5.+7&-8$9,0$/+."$)+-*'3@$"1".3$'+)%'$5","$%,,+*%*(+,$0%*%#%-"$.":7(."-$ *6"$07&'()%*(+,$+4$"--",*(%''3$*6"-"$-%/"$/%(,*",%,)"$*%->-8$$$ A+''+2(,5$%$5.+2(,5$*.",0$*+$&.+1(0"$#%-()$47,)*(+,%'(*3$1(%$2"#$-".1()"-@$B35","8(,4+$ C6**&DEE/35","8(,4+F$2%-$)."%*"0$*+$&.+1(0"$GH","$9,,+*%*(+,$%-$%$=".1()"I$CH9%%=F8$J"1"'+&".-$)%,$ 7*('(<"$B35","8(,4+$4+.$#%-()$-"%.)6$%,0$%,,+*%*(+,$."*.("1%'$7-(,5$%$-(/&'"$2"#$-".1()"$(,*".4%)"8$ J"1"'+&".-$)%,$*6",$4+)7-$+,$&."-",*(,5$*6"(.$+2,$,+1"'$0%*%$(,-*"%0$+4$07&'()%*(,5$)+//+,$ 47,)*(+,%'(*38$B35","8(,4+$&%.*()7'%.'3$"/&6%-(<"-$-(/&'()(*3$%,0$&".4+./%,)"8$ ;2+$-(/&'"$KL=;$CK"&."-",*%*(+,%'$=*%*"$;.%,-4".F$2"#$-".1()"-$%."$&.+1(0"0$4.+/$B35","8(,4+D$+,"$4+.$ 5","$:7".("-$%,0$*6"$+*6".$4+.$5","$%,,+*%*(+,$."*.("1%'8$M+*6$."*7.,$N=OP$CN%1%=).(&*$O#Q")*$P+*%*(+,F$ 4+./%**"0$0%*%@$/%>(,5$*6"/$"%-3$*+$7-"$(,$2"#$%&&'()%*(+,-8$;6"$5","$:7".3$-".1()"$%''+2-$:7".3(,5$#3$ /+."$*6%,$RS$)+//+,'3$7-"0$(0",*(4(".-$+.$#3$5",+/"$(,*".1%'-8$$;6"$0%*%#%-"$)7..",*'3$)+,*%(,-$TUSS>$ 5","-$4.+/$"(56*$)+//+,$-&")("-8$;6"$5","$%,,+*%*(+,$-".1()"$)%,$%))"--$%,3$+4$RS$*3&"-$+4$ %,,+*%*(+,-$#3$"(*6".$L,*."<$+.$L,-"/#'$5","$(0-$C(,)'70(,5$."*(."0$5","$(0-F8$H","$%,,+*%*(+,$0%*%$%."$ ."57'%.'3$7&0%*"0$+,)"$&".$/+,*68$ B35","8(,4+$(-$#7('*$+,$V+7)6JM@$%$0+)7/",*?#%-"0$0%*%#%-"8$W,'(>"$/+."$)+//+,'3$7-"0$."'%*(+,%'$ 0%*%#%-"$-3-*"/-$C"858@$O.%)'"@$B3=XYF@$0%*%$%."$-*+."0$%-$G>"3?0+)7/",*I$&%(.-8$;6"$G0+)7/",*I$(-$%$ N=OP?4+./%**"0$5","$%,,+*%*(+,$+#Q")*@$26('"$*6"$G>"3I$(-$%$5","$ZJ$CL,*."<$+.$L,-"/#'F8$;6"$ 6(".%.)6()%'$-*.7)*7."$+4$5","$%,,+*%*(+,$0%*%$)%,$#"$."&."-",*"0$,%*7.%''3$(,$*6(-$>"3?0+)7/",*$/+0"'8$ W-(,5$&."?(,0"["0$G1("2-I@$V+7)6JM$)+,-(-*",*'3$&".4+./-$\?R$*(/"-$#"**".$*6%,$%$."'%*(+,%'$-*.7)*7."$ (,$O.%)'"$(,$+7.$*"-*-8$;6"$-(/&'"$+#Q")*$-*.7)*7."$(,$V+7)6JM$%'-+$5."%*'3$-(/&'(4("0$#+*6$0%*%$'+%0(,5$ %,0$0%*%$:7".("-8$ B35","8(,4+$(-$6+-*"0$(,$*6"$)'+70$#3$9/%<+,$!"#$=".1()"-8$!"$6%1"$7-"0$B35","8(,4+$%-$*6"$ 7,0".'3(,5$5","$%,,+*%*(+,$0%*%#%-"$4+.$+7.$&+&7'%.$M(+H]=$5","$&+.*%'$C6**&DEE#(+5&-8+.5F$4+.$%'/+-*$ %$3"%.8$!"$%'-+$/%>"$+7.$9/%<+,$(,-*%,)"$(/%5"$%1%('%#'"$4+.$*6+-"$26+$2%,*$*+$.7,$*6"(.$0"0()%*"0$ 6+-*8$ B35","8(,4+$0+"-$,+*$."&'%)"$"[(-*(,5$-".1()"-$&.+1(0"0$#3$'%.5"$5","$%,,+*%*(+,$&.+1(0".-8$Z,-*"%0@$ *6"-"$2"#$-".1()"-$%."$+&*(/(<"0$4+.$&+2".(,5$5","?)",*.()$2"#$."-+7.)"-8$!"$#"'("1"$*6%*$*6"-"$ -".1()"-$2(''$'+2".$*6"$#%.$4+.$#(+'+5(-*-$*+$&7#'(-6$*6"(.$0%*%$%,0$(/&.+1"$*6"$"44()(",)3$+4$>,+2'"05"$ "[)6%,5"8$ Poster 17 !"#$%&'(#)(*$+,&#-.*&/#$01.2&3$""451$/6#7(89:".&:(#(*3#0796(1/&1#7-$6(*;&#*&6<.&1"#$%& 3#0&6<.&;.*#7(1/&1#77$*(64&9*%&:.4#*%= !"#$%&#%'#"$(!)&*+'$,-.-(/'*(0""%12-(0)&3(41&+*&#5-(067&(/'8&)'2-(96$:36(0';&<2-(=&8#( >'6732(&#3(!&)6#(?@(A67$"#,@ , B@4)&'C(D6#%6)(E#$%'%:%6-(FGHI(963';&7(46#%6)(=)'J6-(K";<J'776-(9=(2HLMH(NOP-(24?Q(R&77'#CS")3-(06#$"#( T&#6-(R&77'#CS")3-(N!-(5Q&)J&)3(O;1""7("S(U:V7';(Q6&7%1(0'"'#S")*&%';$(4")6-(WMM(Q:#%'#C%"#(PJ6#:6( 0"$%"#-(9P(H2,,M .U)6$6#%'#C(&:%1")X(&CV'"%6;YC*&'7@;"*-(((+)"Z6;%(86V$'%6X(1%%+X[[888@;7":3V'"7'#:\@")C[( 47":3 (0'"T'#:\( '$ ( &# ( "+6#]$":);6 ( J')%:&7 ( *&;1'#6 ( ^D9_ ( '*&C6 ( %1&% ( 6#&V76$ ( $;'6#%'$%$ ( %" ( `:';<7a( +)"J'$'"# ( "#]36*&#3 ( '#S)&$%):;%:)6$ ( S") ( 1'C1]+6)S")*&#;6 ( V'"'#S")*&%';$ ( ;"*+:%'#C ( :$'#C ( ;7":3( +7&%S")*$@ ( N+"# ( 36+7"a*6#% ( :$6)$ ( 1&J6 ( '#$%&#% ( &;;6$$ ( %" ( *")6 ( %1&# ( ,HH( V'"'#S")*&%';$ ( +&;<&C6$-( '#;7:3'#C(%16(V7&$%&77(&#3(V7&$%bA40E(&++7';&%'"#$-(%16(O%&36#-(?90cOO-(1**6)-(&#3(+1a7'+(;"776;%'"#$( "S ( $"S%8&)6- ( &#3 ( *&#a ( $%&#3]&7"#6 ( &++7';&%'"#$ ( S") ( %&$<$ ( $:;1 ( &$ ( $6`:6#;6 ( &7'C#*6#%- ( ;7:$%6)'#C- ( &$$6*V7a-(3'$+7&a-(63'%'#C-(&#3(+1a7"C6#a-(&$(8677(&$(8")<'#C(8'%1(#6\%(C6#6)&%'"#($6`:6#;'#C(3&%&@ ( E#( &33'%'"#(86(1&J6('#$%&7763(3&%&(&#&7a$'$(&#3(V'"'#S")*&%';(;"36(7'V)&)'6$(S)"*(%16(Ua%1"#-(U6)7-(K-(K:Va ( +)"C)&**'#C(7&#C:&C6$-(%16(>)66Ad($6)J6)(S")(&77"8'#C()6*"%6(&;;6$$(%"(%16(47":3(0'"T'#:\(36$<%"+-( &#3 ( 86 ( &)6 ( &7$" ( +7&##'#C ( %" ( '#;7:36 ( %16 ( E#J6$%'C&%'"#[O%:3a[P$$&a ( ^EOP- ( 1%%+X[['$&]%""7$@")C[_ ( 3&%&( *&#&C6*6#% ( $"S%8&)6 ( $:'%6@ (( 47":3 (0'"T'#:\( ):#$ ( "# ( %16 ( WI]V'% ( J6)$'"# ( "S ( NV:#%: ( T'#:\- ( &#3 ( %16( V'#&)'6$ ( S") ( %16 ( V'"'#S")*&%';$ ( +&;<&C6$ ( &)6 ( S)"* ( A?04 (0'"T'#:\( W@H ( )6+"$'%")a( ^1%%+X[[#6V;@#6);@&;@:<[%""7$[0'"T'#:\[_@(N$6)$(;&#($%&)%('#$%&#;6$("S(47":3(0'"T'#:\("#(%16(P*&e"#(?42( ;7":3 (J'&(%16(;"#$"76(^1%%+X[[&8$@&*&e"#@;"*[;"#$"76_-(&#3(&;;6$$(%16(%""7$(%1)":C1(&()6*"%6(36$<%"+( ;"##6;%'"# ( S)"* ( %16') ( 7";&7 ( ;"*+:%6)@ ( (R6 ( 1&J6 ( 3";:*6#%63 ( %16 ( $%6+$ ( S") ( %1'$ ( '# ( 36%&'7( ^1%%+X[[%'#a:)7@;"*[;7":3]0'"T'#:\]%:%")'&7[_-('#(&33'%'"#(%"(%16(+)";63:)6(S")(%)&#$S6))'#C(&#3($%")'#C(3&%&( "#(%16(;7":3-(%)":V76$1""%'#C(S")(;"**"#(+)"V76*$(&#3($:CC6$%'"#$(1"8(%"(C6%(%16(V6$%(J&7:6(S")(*"#6a( 816#()6#%'#C(;"*+:%&%'"#&7(;&+&;'%a(S)"*(;7":3(+)"J'36)$@ P%(%16(;")6("S(47":3(0'"T'#:\('$(&#("+6#]$":);6-(&:%"*&%63($a$%6*(;"#S'C:)&%'"#(%""7(^&J&'7&V76(S)"*( 1%%+$X[[C'%1:V@;"*[;1&+*&#V[;7":3V'"7'#:\-(9E/(7';6#;6_@(/1'$(%""7($'*+7'S'6$(%16(+)";6$$("S(V:'73'#C( ;:$%"*'e63(V'"'#S")*&%';(D9('*&C6$("#(%16(;7":3-($1&)'#C(%16('*&C6$(&*"#C()6$6&);16)$-(&#3(36+7"a'#C( %16* ( "# ( 3'SS6)6#% ( ;7":3 ( ;"*+:%'#C ( +7&%S")*$@ ( c:) ( '*+76*6#%&%'"# ( '$ ( V&$63 ( "# ( %16 ( Ua%1"# ( >&V)';( ^1%%+X[[S&VS'76@")C[_ ( $"S%8&)6 ( *&#&C6*6#% ( S)&*68")< ( &#3 ( NV:#%: ( P3J&#;63 ( U&;<&C'#C ( /""7 ( ^PU/-( 1%%+X[[888@:V:#%:@;"*_-(&#3('$(;"*+"$63("S(&(*&'#(3)'J6)($;)'+%(&#3(&($6%("S(+7&'#]%6\%(;"#S'C:)&%'"#( S'76$(%16(:$6)$(63'%('#(")36)(%"($+6;'Sa(%16(V'"'#S")*&%';$($"S%8&)6(%1&%(8'77(V6('#;7:363('#(&(V:'73("S(47":3 ( 0'"T'#:\@(/16(3)'J6)($;)'+%()6&3$(%16(;"#S'C:)&%'"#(S'76$-(&#3('#$%&77$(%16($+6;'S'63($"S%8&)6(S)"*(%16(PU/] V&$63(A?K4 (0'"T'#:\( W@H($"S%8&)6()6+"$'%")a-(81'76(&33'%'"#&7(PU/]V&$63(V'"'#S")*&%';(")($;'6#%'S';( $"S%8&)6()6+"$'%")'6$($:;1(&$(%16("#6(&J&'7&V76(Va(%16(=6V'&#]963(;"**:#'%a(^9"776)(6%(&7@(2H,H_(;&#(V6( &3363 ( '# ( %16 ( ;"#S'C:)&%'"# ( S'76$ ( ^& ( 36%&'763 ( )6+"$'%")'6$ ( 7'$% ( '$ ( &J&'7&V76 ( &% ( %16 ( NV:#%:( O;'6#;6 ( +&C6 ( 1%%+X[[%'#a:)7@;"*[$;'6#;6]:V:#%:_@(N$6)$(;&#(;)6&%6(%16')("8#(;:$%"*'e63(J6)$'"#("S(%16(47":3(0'"T'#:\( D9(Va($'*+7a(63'%'#C(%16(S'76$(%"(*'\(&#3(*&%;1($"S%8&)6(S)"*(%16(3'SS6)6#%()6+"$'%")'6$@( 47":3 ( 0'"T'#:\ ( '$ ( & ( ;"**:#'%a ( 6SS")%- ( $+"#$")63 ( Va ( %16 ( B@ ( 4)&'C ( D6#%6) ( E#$%'%:%6( ^1%%+X[[888@Z;J'@")C[;*$[)6$6&);1[+)"Z6;%$[Z;J'];7":3]0'"T'#:\["J6)J'68[_ ( &#3 ( A?K4 ( ?#J')"#*6#%&7( 0'"'#S")*&%';$(46#%)6(^1%%+X[[#6V;@#6);@&;@:<[_-('#(&33'%'"#(%"(&#(63:;&%'"#&7(C)&#%(%1)":C1(P*&e"#(R6V( O6)J';6$@(R6('#J'%6("+6#(+&)%';'+&%'"#(J'&(%16(;7":3V'"7'#:\@")C(86V$'%6-(81'76(":)(;"**:#'%a(6SS")%('$( ;"")3'#&%63(%1)":C1(&(+:V7';(*&'7'#C(7'$%(&J&'7&V76(%1)":C1(%16($&*6($'%6-(&#3(36J67"+6)(*66%'#C$($:;1(&$( %16(Q&;<&%1"#(6J6#%$(+)6;63'#C(%16(2H,H(&#3(2H,,(0cO4(*66%'#C$@( !"#$%%&'&()&*+,)&-*./0,&12*2)3*/4(520-&06*.7&,)82/*),4,)5 !"#$%&$#'()$%*'+,$#-").'/",001.*'23)4)0,'5$33"6 !"#$%&'()*(+# ,-.,/ .(00('1)23+04( !"#$%&$#'()$%*)#+)$',+ ,+$#-").'/"+001.*)+).$',+ "2)3)0+'.$22"4*)+).$',+ !"#$%&'()%*+,'%-(.''!-//01!#"234#"3%2,5",124"/ +#6"&%(1&&%++-(.''!-//01!#"234#"3%2,5",124"/ 5(0#6."4+7082)70#.09(7%7899:7(;&%%7899111:;0;)33:)#<"93);0#;0.9=);0#;0>?0?@==A?>BCA 0#:&%D3< :)")#,"+/$%)7.8$((2)7$%)"#.8$+08",%0#8.%+47%4+0;8$.81"+<,2"1.8%&$%8$+087"/(".0;8 ",8$8.0%8",8"(0+$%)"#.8%"8(0+,"+/8"#82$+=08;$%$8.0%.'8>&0.081"+<,2"1.8$+08;0(2"?0;8$.8 7"/(20@8.7+)(%.8%&$%8&$#;208%&08.0A40#708",8(+"=+$/87$22.81)%&8%&0)+8+0203$#%8)#(4%. 8 $#;8%+?8%"8%$<08$;3$#%$=08",8$87"/(4%0+8724.%0+84.)#=8$8.7&0;420+'8>&0)+8(0+,"+/$#708 +02?8"#8%&084.0+8$B)2)%?8%"8$#$2?C08%&08("%0#%)$28($+$2202)./8)#8%&081"+<,2"1'8DEFGHH8 ID0+3)70 8 E$?0+ 8 ,"+ 8 F#%0#.)30 8 G"/(4%$%)"# 8 H@074%)"# 8 H#3)+"#/0#%J 8 $B.%+$7%. 8 %&08 .7&0;420+8724.%0+87$22.8B?8&$#;2)#=87"//$#;8.4B/)..)"#K8($+$2202)./80@%+$7%)"#88$#;8 ;$%$8/$#$=0/0#%'8L81"+<,2"1872)0#%8"+7&0.%+$%0.8%&08DEFGHH8.0+3)70.8%&$%80@(2")%8 %&0 8;$%$ 8($+$2202)./K8$#; 8%$<0.87$+0 8",8%&0 8;$%$ 8+"4%)#= 8B0%100#8 %$.<.'8>&4.8%&08 1"+<,2"18%$.<.80@074%)"#8%$<0.8$;3$#%$=08",8%&08($+$2202)./8$3$)2$B208"#8%&08724.%0+8 1)%&8/)#)/4/84.0+8)#%0+30#%)"#' M$)#%$)#)#=8$8724.%0+8$+7&)%07%4+08).80@(0#.)308$#;8)%.8(+"70..)#=8("10+8).8&$+;8 %"8.7$208"30+8%)/0'8G2"4;87"/(4%)#=8(+"(".08%"83)+%4$2)C08$87"/(4%0+8$+7&)%07%4+08 $#; 8 ;0(2"? 8 )% 8 "# 8 $3$)2$B20 8 (&?.)7$2 8 7"/(4%)#= 8 +0."4+70.' 8 >&0+0,"+0 8 %&0 8 (&?.)7$28 $+7&)%07%4+08).8.&$+0;8$#;8%&083)+%4$28(+"70..)#=8("10+87$#8B08.7$20;8%"8/00%8%&08 4.0+8;0/$#;' 85:FNHH8I5#8O0/$#;8:)")#,"+/$%)7.8F#%0#.)308N"+<,2"18H@074%)"#8 H#3)+"#/0#%J 8 ). 8 $ 8 B)")#,"+/$%)7. 8 )#%0#.)30 8 1"+<,2"1 8 0@074%)"# 8 0#3)+"#/0#%8 (+07"#,)=4+0;8"#8$82)#4@83)+%4$28724.%0+K8%&$%87$#8B08;0(2"?0;80)%&0+8"#8$8(+)3$%0872"4; 8 "+8$8(4B2)7872"4;8.0+3)7082)<08L/$C"#8HGP'8>&083)+%4$28724.%0+8$+7&)%07%4+08).8.7$20;8 %"8/00%8%&081"+<,2"18+0A4)+0/0#%K8$#;8$8/$.%0+8#";08).8+4##)#=8%&08DEFGHH8/);;206 1$+0'8H$7&8#";08",8%&08724.%0+8).8+4##)#=8$8B)")#,"+/$%)7.8.(07),)78E)#4@8;).%+)B4%)"#8 %"8(+"3);08$770..8%"8$81);08+$#=08",8B)")#,"+/$%)7.8$((2)7$%)"#.'8>&083)+%4$28724.%0+8 &$.8B00#8%0.%0;8"#8$8(+)3$%0872"4;84.)#= 85(0#Q0B42$8$#;8RSMK8,"22"1)#=8.%0(8).8 L/$C"# 8 HGP 8 )#%0=+$%)"#' 8L22 8 .%0(.K 8 .%$+%)#= 8 ,+"/ 8 %&0 8 724.%0+ 8 7"#,)=4+$%)"# 8 %" 8 %&08 1"+<,2"18;0.)=#8$#;80@074%)"#8$+08(0+,"+/0;8%&+"4=&8$810B8B+"1.0+' >&08"(0#8."4+7085:FNHH8B)")#,"+/$%)7.872"4;8.0+3)70 8&$.8B00# 8;0.)=#0;8%"8 $22"18=+"4(.81)%&82"18F>8.4(("+%8"+8(""+87"/(4%)#=8)#,+$.%+47%4+08%"8$#$2?C08%&0)+8 "1#8;$%$'8F%8$2."8&02(.8$%8,$7)#=8%&08)#7+0$.)#=8;0/$#;8,"+8B)")#,"+/$%)7.8)#%0#.)308 %+0$%/0#%.K8)#8$87"#%0@%8",82$+=08;)..0/)#$%)"#8",8.0A40#7)#=8%07&#"2"=)0.84.$=0.' !"#$%&"'()*%+,-.*/($01+"(234%*(5"*14"(!"#3"*6"(7%8%(1*()4%-1*9:(;+13< &!"#$%&'(&)*+,%%,"&&-./&.&!$""0&12""#3$%4.&$%5&6#3,780&9$":#%;4 -(&<#%2=2">2"&+,3?"282%;#@2&+$%A2"&+2%72".&BC+.&+8$?2D&9#DD.&C+.& ="#$%,AE23$#D(F%A(25F /(&)%7$"#,&G%;7#7F72&H,"&+$%A2"&I2;2$"A8.&6,",%7,.&)C&1JK&-<L.&+$%$5$ 4(&<#H2&62A8%,D,>#2;.&G%A(.&+$"D;=$5.&+M.&BNM O2=;#72P&877?PQQ;2RO$"2(;,F"A2H,">2(%27 A,52P&877?PQQ;,F"A2H,">2(%27Q?",S2A7;Q;2RO$"2Q52@2D,?& D#A2%;2P&KT<@4 682&AF""2%7&,?2"$7#%>&A,;7;&,H&%2U7V>2%&;2RF2%A2";&2%$=D2&",F7#%2&>2%2"$7#,%&,H&782& $3,F%7&,H&5$7$&%22525&7,&"2;2RF2%A2&$&8F3$%&>2%,32(&9,O2@2".&782&#%H,"3$7#A;& #%H"$;7"FA7F"2&%22525&7,&2HH#A#2%7D0&?",A2;;&78#;&5$7$&#%7,&F;2HFD&#%H,"3$7#,%&#;&;7#DD& 2@,D@#%>.&$%5&78#;&8$;&=2A,32&$&3$S,"&=$""#2"&7,&$5,?7#,%&,H&>2%,32&;2RF2%A#%>&#%&782& "2;2$"A8&$%5&AD#%#A$D&A,33F%#7#2;(&M%&#52$D&W?F;8=F77,%X&;,DF7#,%&O,FD5&2D#3#%$72&=,78& 782&=F"52%&,H&3$#%7$#%#%>&$&A,3?F72"&#%;7$DD$7#,%&$%5&,H&"27$#%#%>&$&72$3&,H& #%H,"3$7#A#$%;(&&)F"&,?2%&;,F"A2&N2RY$"2&?",S2A7&#;&,%2&;FA8&;,DF7#,%&$%5&#;&52;#>%25& ,%&782&?"#%A#?D2;&,H&AD,F5&A,3?F7#%>.&5$7$&;2AF"#70.&$F7,3$725&$%$D0;#;.&$%5&5$7$=$;#%>& ,H&"2;FD7;(&&I2D0#%>&,%&A,332"A#$D&AD,F5&A,3?F7#%>&;2"@#A2;&"23,@2;&782&A,3?F72"& 8$"5O$"2&$%5&;0;723&$53#%#;7"$7#,%&=F"52%;.&,?2%&;,F"A2&;,H7O$"2&$DD,O;&H,"& 7"$%;?$"2%A0&$%5&D2@2"$>2;&782&D$">2;7&52@2D,?32%7&A,33F%#70.&5$7$&;2AF"#70&?",@#52;& 782&GI!&$%5&9GTMM&A,3?D#$%A2&"2;2$"A82";&$%5&AD#%#A#$%;&"2RF#"2.&$%5&$&HFDD0& $F7,3$725&?#?2D#%2&>#@2;&H$;7.&A,%;#;72%7&$%5&$F5#7$=D2&"2;FD7;(&ZF"782"3,"2.&782& ;A$D$=D2&@$"#$%7&5$7$=$;2&;F??,"7;&#%72"$A7#@2&RF2"0#%>&,H&782&"2;FD7;.&$>>"2>$7#,%&,H& #%H,"3$7#,%&$A",;;&;F=S2A7;.&$%5&5#@2";2&5,O%;7"2$3&$%$D0;2;(&&92"2&O2&52;A"#=2&782& ?,"7&,H&782&N2RY$"2&?",S2A7&7,&782&M3$[,%&\D$;7#A&+,3?F72&+D,F5&]\+/^(&Z",3&782& F;2"*;&?2";?2A7#@2.&#7&?",@#52;&$&?F;8=F77,%&O,":HD,O&H,"&8F3$%&>2%,32&,"&2U,32& ;2RF2%A#%>.&AF""2%7D0&,?7#3#[25&H,"&782&N)<#'&?D$7H,"3P&;7$"7#%>&O#78&;2%5#%>&$%& 2%A"0?725&8$"5&5"#@2&,H&"$O&5$7$&7,&\+/.&#7&?2"H,"3;&$D#>%32%7.&"2V$D#>%32%7.&@$"#$%7& 5272A7#,%.&5$7$=$;#%>.&$%5&@$"#$%7&$%%,7$7#,%(&Z",3&782&52@2D,?2"*;&?2";?2A7#@2& N2RY$"2&?",@#52;&=,78&$&H"$32O,":&H,"&;2RF2%A2&$%$D0;#;&3,5FD2&$%5&O,":HD,O& A"2$7#,%&$D,%>&O#78&782&7,,D;&%22525&7,&52?D,0&AF;7,3&,"&,HHV782V;82DH&O,":HD,O;&,%& \+/(&&)%2&,H&782&=#>>2;7&;FAA2;;2;&,H&78#;&?",S2A7&#;&78$7&N2RY$"2&#;&$=;7"$A725&H",3& 782&$A7F$D&2U2AF7#,%&2%@#",%32%7&S,=;&"F%&#%.&32$%#%>&782&?,"7&7,&\+/&"2RF#"25&%,& 3,5#H#A$7#,%;&7,&782&2U#;7#%>&O,":HD,O;&$%5&3,5FD2;&$7&782&A,"2&,H&782&?#?2D#%2(&Y2& ?"2;2%7&$%&,@2"@#2O&,H&782&;0;723&52;#>%.&#7*;&?,"7&7,&\+/.&$%&$%$D0;#;&,H&782&A,;7;&,H& ?",A2;;#%>&>2%,32;&#%&78#;&O$0.&$%5&527$#D;&,%&782&?",A2;;#%>&O,":HD,O&AF""2%7D0& $@$#D$=D2&H,"&8F3$%&>2%,32&$%$D0;#;(&&M7&782&7#32&78#;&$=;7"$A7&O$;&?"2?$"25.&O2&8$@2& F;25&N2RY$"2&#%&782&\+/&2%@#",%32%7&7,&?",A2;;&_&O8,D2&8F3$%&>2%,32;&#%ADF5#%>&$& 4-`U&A,@2"$>2&9FI2H&5$7$;27.&782&<FA#2"&>2%,32.&$%5&,782"&#%72"2;7#%>&8#>8V?",H#D2& >2%,32&?",S2A7;(&&B;2";&$"2&H"22&7,&A"2$72&782#"&,O%&#%;7$%A2&,H&782&,?2%&;,F"A2& ;0;723&,%&\+/&F;#%>&782&?",S2A7;&?F=D#A&M1G&#3$>2;.&,"&F7#D#[2&$&HFDDV;2"@#A2.& A,332"A#$DD0V;F??,"725&#%;7$%A2( Sequencescape - a cloud enabled Laboratory Information Management Systems (LIMS) for second and third generation sequencing Authors: Andrew Page, Beth Jones, Maxime Bourget, Sean Dunn, Matthew Denner, Kate Taylor, Lars Jorgensen Contact: lj3@sanger.ac.uk Software: http://github.com/sanger License: GPL LIMS have generally been a closed area for many of the large genomic research centers. We wish to change this. By releasing our software as open source there is now tried , tested and open alternative to the commercial software solutions that are currently on the market. This will reduce the workload for labs starting to use 2nd (Solexa, Solid, 454) and 3rd generation sequencing (PacBio). LIMS have historically been seen as very institute specific. To tackle this we have built our system on a framework that will be applicable to other users. The system can be deployed in the cloud and is highly extensible. Sequencescape has been tested in a large number of labs and research projects inside our institute and handles a very large number samples. The key problem when designing this system was to come up with a good data model to achieve this. On one hand it needs to be very generic to support many different lab workflows. On the other, it needs to be explicit to support performance and ease of development. Sequencescape is focused around two keys concepts: Assets (plates and tubes) and Requests (work orders). These concepts are central to the system. It enables us support of the wide-ranging requirements that we have encountered using the system internally. Key Features included in Sequencescape are: * Work order tracking * Sample and project management * Capacity management for pipelines * Accounting * Accessioning for samples and studies at the EBI ENA/EGA/ArrayExpress * Dynamically defined workflows for labs with support for custom processes * Freezer tracking for tubes and plates * API support for 3rd party applications * Data warehousing Current installation supports over a million samples and 1.3 million tubes and plates and is used in an organisation of 900 people. The talk will describe both the application of the software inside Sanger and the approach we are using to develop it. There will also be a live demo of the software and a demonstration of how to quickly construct a new lab workflow. Sequencescape is part of the informatics ecosystem that exists at Sanger. Many of the other components are openly available from the Sanger website (http://www.sanger.ac.uk/ resources/software/). Title: Enabling NGS Analysis with(out) the Infrastructure Authors: Enis Afgan1, Dannon Baker1, Nate Coraor3, Anton Nekrutenko3, James Taylor1 Author affiliations: 1 Department of Biology and Department of Mathematics & Computer Science, Emory University {E.A. email: eafgan@emory.edu} 2 http://galaxyproject.org 3 Huck Institutes of the Life Sciences and Department of Biochemistry and Molecular Biology, The Pennsylvania State University Project website: http://usegalaxy.org/cloud Project source code: http://bitbucket.org/galaxy/cloudman Open Source License used: Academic Free License Abstract: Running tools and performing analyses to transform sequence data into biologically meaningful information requires sophisticated computational infrastructure and support. The size of the required computational infrastructure is outpacing what individual researchers, many labs, and even universities are able to support. In addition, the setup and maintenance associated with a computational infrastructure presents significant problems for individual investigators and small labs that may not have the necessary informatics support. Fortunately, cloud computing provides unique capabilities for transparent scaling and sharing of computational infrastructures. Built on the Galaxy CloudMan platform, we have enabled the entire Galaxy application - completely configured with a range of tools and reference genomes - to transparently utilize AWS cloud resources. The presented solution delivers a fully functional infrastructure capable of performing complex genomic analyses in a matter of minutes. This talk presents key new features of Galaxy CloudMan that focus around extension, transparency, and automation. Namely, we have automated the process of deploying CloudMan on a cloud infrastructure with the accompanying data, tools, and applications, making it completely transparent, reproducible, and accessible. Any individual instance of CloudMan is now self-contained, meaning that it does not require an external broker or service to operate. Moreover, this enables each instance of CloudMan to be customized by deploying new or alternative tools, configurations, and data, thus supporting the widely varied needs of individual investigators and labs. CloudMan now supports setup of different cluster modes, allowing one to utilize all of the CloudMan’s infrastructure management features (e.g., cluster setup, NFS setup, data persistence, (automatically) adding/removing instances, sharing) but without setting up Galaxy. Coupled with the CloudBioLinux AMI that CloudMan builds upon, this feature allows any of the tools in NERC BioLinux to be run on a cluster managed by CloudMan without any additional setup. Additionally, any tool or application that can utilize a general purpose cluster can be installed on the deployed cluster while allowing CloudMan to manage the infrastructure. CloudMan now supports sharing of cloud cluster instances. This functionality allows an analysis to remain in the cloud (i.e., no need to download results and make available elsewhere) while minimizing the expense incurred by resources that need to be provided by the analysis owner. In addition to enabling publishing and sharing of data analyses, this feature allows sharing of customized instances of CloudMan where tools and/or data have been modified. This functionality minimizes repeat effort and offers tool developers a platform for easily distributing their tools while minimizing any otherwise required setup (for both developers and users). Lastly, continuing the automation effort, CloudMan now supports the notion of infrastructure autoscaling. This feature allows a user to specify bounds for the size of their cluster while letting CloudMan automatically adjust the current number of the compute resources to match the current system load, thus taking maximum advantage of the elastic infrastructure underlying the computation. This feature supports the set-it-and-forget-it paradigm of providing a compute infrastructure for users without requiring them to manage it. This talk will highlight each of these major advancements in CloudMan and showcase their impact on user experience when using Galaxy and cloud computing resources. Poster 18 Hadoop-BAM: A Library for Genomic Data Processing Matti Niemenmaa, Andr´e Schumacher, Keijo Heljanko Aalto University School of Science Firstname.Lastname@tkk.fi Aleksi Kallio, Petri Klemel¨a, Taavi Hupponen, Eija Korpelainen CSC–IT Center for Science Firstname.Lastname@csc.fi Next generation sequencing (NGS) technologies have redefined the requirements for data processing in bioinformatics. To cope with the massive influx of data, cloud computing technologies have been proposed and evaluated. The initial experiences have been positive, as the independent nature of deep sequencing reads allow them to be effectively processed in the loosely coupled cloud computing framework. We see that the next crucial step is to develop generic libraries that facilitate the creation of mature cloud computing applications for NGS. We have developed Hadoop-BAM, a library for the manipulation of BAM (Binary Alignment/Map) using the Hadoop MapReduce framework. Our library builds on top of the Picard SAM JDK. The library was released under the permissive MIT open source license and can be found on Sourceforge (http://sourceforge.net/projects/hadoop-bam/). We demonstrate the usability of the library by building a preprocessing stage for BAM file visualization, which was integrated into the Chipster Genome Browser. Chipster is a versatile open source platform that provides tools for data analysis and visualization. This preprocessing tool uses the “Google Earth” style MIP mapping technique to condense BAM files into summary files. Using the multilevel summaries the genome browser can quickly navigate between different zoom levels. The use of Hadoop-BAM yielded in a significant decrease in running time of the preprocessing stage for 50 GB data sets and also made the genome browser significantly faster, usable, and scalable. The computing cluster used for our evaluation consists of 112 AMD Opteron 2.6 GHz compute nodes, each equipped with 12 cores and 32-64 GB memory, resulting in a total size of 1344 cores. The nodes are interconnected via an Infiniband and 1 GBit Ethernet infrastructure. The nodes have a joint total local disk capacity of approx. 30 TB, in addition to 40 TB of work space on a central network file server. Speedup of sorting times versus #worker nodes 16 Speedup of summarizing times versus #worker nodes 16 Ideal Input file import Sorting Output file export Total elapsed 14 12 10 12 10 8 8 6 6 4 4 2 2 0 1 2 4 8 Workers Ideal Input file import Summarizing Output file export Total elapsed 14 15 0 1 2 4 8 Workers 15 Figure 1: Speedup for increasing number of compute nodes Figure 1 shows that in our experiments the speedup for sorting BAM files is approximately linear, while the speedup for the computation of summary statistics is marginally worse. Preprocessing such data without the distributed Hadoop backend would have been practically impossible on a modern workstation alone. When considering Figure 1, it is interesting to note that as the number of worker nodes increases, it becomes clear that one of the main bottlenecks of the system is in fact the import and export of data to and from Hadoop. In the future, one may want to let large datasets reside inside the cloud in order to avoid this overhead. We have also started to integrate Hadoop-BAM with the Pig query language. Preliminary results show that in some cases it is possible to achieve performance that matches custom Java Hadoop code by using the higher level Pig language. Poster 19 6$',IRU*02'%ULQJLQJ0RGHO2UJDQLVP'DWDRQWRWKH 6HPDQWLF:HE %HQ9DQGHUYDON(/XNH0F&DUWK\0DUN':LONLQVRQ -DPHV+RJJ5HVHDUFK&HQWUH+HDUW/XQJ,QVWLWXWH8QLYHUVLW\RI%ULWLVK&ROXPELD HPDLOEHQYYDON#JPDLOFRP 6$',:HEVLWHKWWSVDGLIUDPHZRUNRUJ 6+$5(:HEVLWHKWWSELRUGIQHWFDUGLR6+$5(TXHU\ &KHFNRXW3HUO&RGHVYQFRKWWSVDGLJRRJOHFRGHFRPVYQWUXQN3HUOVDGLJPRG 6RIWZDUH/LFHQVH*3/+RZHYHU-DYDFRGHIRU6$',LVUHOHDVHGXQGHUWKH1HZ%6'/LFHQVH ,Q WKLV WDON ZH ZLOO SUHVHQW D QHZ H[WHQVLRQ IRU *02' *HQHULF 0RGHO 2UJDQLVP 'DWDEDVH FDOOHG 6$', IRU *02' ZKLFK DOORZV PRGHO RUJDQLVP VLWHV WR SXEOLVK WKHLU GDWD LQ 5') ZLWK PLQLPDO HIIRUW 7KH 5') DGDSWRU OD\HU IRU WKH GDWDEDVH LV LPSOHPHQWHG XVLQJ ZHE VHUYLFHV FRQIRUPLQJ WR WKH 6$', 6HPDQWLF $XWRPDWHG 'LVFRYHU\ DQG ,QWHJUDWLRQ VWDQGDUG ZKLFK FRQVLVWV RI D VHW RI EHVW SUDFWLFHV IRU LPSOHPHQWLQJ LQWHURSHUDEOH VHUYLFHV $PRQJ RWKHU DGYDQWDJHV DGKHULQJ WR 6$', HQVXUHV WKDW VHUYLFHV FDQ EH GLVFRYHUHG LQ D SUHGLFWDEOH DQG VHPDQWLFDOO\HQULFKHG ZD\ DSSOLFDEOH VHUYLFHV IRU LQKDQG GDWD FDQ EH DXWRPDWLFDOO\ GHWHFWHG DQG LQYRNHG PXOWLSOH LQSXWV WR D VHUYLFH FDQ EH EDWFKHG LQWR D VLQJOH +773 UHTXHVWWRPLQLPL]HQHWZRUNWUDIILFDQGWKHEHKDYLRXURIDV\QFKURQRXVVHUYLFHVLVVWDQGDUG DQGFRQIRUPDQWZLWK+7733URWRFROKHDGHUVSHFLILFDWLRQV 7KHPDLQYDOXHRI6$',IRU*02'LVWKDWLWIDFLOLWDWHVDXWRPDWHGLQWHJUDWLRQRIELRLQIRUPDWLFV GDWDDQGVRIWZDUHDFURVVPXOWLSOHVLWHV,QWKLVSUHVHQWDWLRQZHZLOOGHPRQVWUDWHKRZWKH6$', IRU *02' VHUYLFHV FDQ EH XVHG WR DVVHPEOH GDWD DFURVV 02'V DQG RWKHU ELRLQIRUPDWLFV UHVRXUFHV ZLWKRXW WKH QHHG WR ZULWH FXVWRP 3HUO VFULSWV :H ZLOO WKHQ SURFHHG WR D PRUH VRSKLVWLFDWHGGHPRQVWUDWLRQZKHUHZHZLOOVKRZKRZDVRIWZDUHEDVHGDQDO\VLVRIGLVWULEXWHG GDWD IURP GLVSDUDWH *02' VLWHV HJ UHWULHYH DQG DOLJQ KRPRORJXHV RI JHQH ; LQ UDW DQG PRXVH FDQ EH GHVFULEHG DQG H[HFXWHG DV D 63$54/ TXHU\ XVLQJ RXU 6+$5( 6HPDQWLF +HDOWKDQG5HVHDUFK(QYLURQPHQWTXHU\HQJLQH 5HIHUHQFHV 0DUN':LONLQVRQ/XNH0F&DUWK\%HQMDPLQ9DQGHUYDON'DYLG:LWKHUV(GZDUG.DZDVDQG 6RURXVK6DPDGLDQ³6$',6+$5(DQGWKHLQVLOLFRVFLHQWLILFPHWKRG´%0&%LRLQIRUPDWLFV YRO6XSSO Poster 20 !"#$%&'(')*"+#,*'+'-./0$%.-'1,'2./*'34+5'13,'6*$15131.5' !"#$%&!'#($%)*+,-,./&0($%&+&1#((#$2./&!"3$4"&56,%/&7$8#)&1#"9,4.&$%)&:$4'(,&;'<(,& !=9''(&'>&:'2?3",4&!=#,%=,/&@%#8,4.#"-&'>&A$%=9,.",4/&@B& C.'#($%)*4,-,./&$($%D4D6#((#$2./&."3$4"D'6,%/&)$8#)D6#"9,4./&=$4'(,D$DE'<(,FG2$%=9,.",4D$=D3H& ' 7/.8*"3',13*9!! !.#/"*'".6*9! :1"*5,*9! ! "##$%&&'''(#)*+,-)(.,/(01&' "##$2%&&/3#"04(5.6&67/,38&2509:;! "##$%&&#)*+,-)(/../:+5.8+(5.6&' <=>!?+22+,!<+-+,):!@04:35!?35+-2+!A?<@?B!;(C' I$8,4%$! 32! )! 253+-#3935! '.,19:.'! 6)-)/+6+-#! 272#+6! #")#! ")2! /)3-+8! $.$0:),3#7! )6.-/2#! 253+-#32#2! 3-! 43.3-9.,6)#352D! 5"+632#,7D!)2#,.-.67!)-8!.#"+,!8.6)3-2(!E3-5+!3#2!3-5+$#3.-!3-!;FFGD!H)*+,-)I2!'.,19:.'!:)-/0)/+!")2!+*.:*+8!9,.6!!=3>(! '"35"! ')2! 8+23/-+8! 9.,! #"+! J4,,J(3'! '.,19:.'! +-/3-+(! E509:! 8+93-+2! )! 83,+5#+8! )575:35! /,)$"! .9! #"+! )$"$& >('6& 4+#'++-! 2+,*35+2!'"35"!,+5+3*+!)-8!$,.805+!8)#)!)#!?'4".(!E509:!'.,19:.'2!),+!2#.,+8!3-!)!:3/"#'+3/"#!JK?!9.,6)#(& "K>('6&)8)$#+8!#"+2+!5.-5+$#2!9.,!#"+!H)*+,-)!;!'.,19:.'!+-/3-+D!3-5:083-/!-+'!5)$)43:3#3+2!205"!)2!+L#+-234:+!+L+50#3.-! 5.-#,.:(!H"+!#;9:.'!JK?!9.,6)#!')2!5,+)#+8!)2!)!2+,3):32)#3.-!.9!3-#+,-):!M)*)!4+)-2D!'"35"!0-9.,#0-)#+:7!6)1+2!3#!8399350:#! #.!5.-206+!.,!$,.805+!47!-.-NH)*+,-)!5:3+-#2!5.6$),+8!#.!E509:(!H"+,+!")*+!):2.!4++-!/,.'3-/!8+6)-82!9.,!40-8:3-/!)! '.,19:.'!'3#"!,+:)#+8!,+2.0,5+2!205"!)2!+L)6$:+!8)#)!)-8!2+6)-#35!)--.#)#3.-2(!O88,+223-/!#"+2+!3220+2!50:63-)#+8!3-! 9.,63-/!)!-+'!'.,19:.'!:)-/0)/+!!=3>(K/&'"35"!32!$,+2+-#+8!"+,+(& P7! )8)$#3-/! (#%H,)& )$"$& #+5"-.:./7! )-8! $,+2+,*)#3.-! 6+#".8.:./3+2! 9.,! ,+2+),5"! .4Q+5#2D! E509:;! 32! -.#! .-:7! )! $:)#9.,6N 3-8+$+-8+-#!'.,19:.'!:)-/0)/+!#")#!5)-!4+!3-2$+5#+8D!6.8393+8D!5,+)#+8!)-8!+L+50#+8!47!#"3,8N$),#7!#..:2!)-8!272#+62R!3#! 32!+L#+-234:+!#.!)::.'!#"+!5)$#0,+!.9!'.,19:.'!+L+50#3.-!3-$0#2!)-8!.0#$0#2D!,+9+,+-5+!8)#)!2+#2D!$,.*+-)-5+D!)--.#)#3.-2D! 8.506+-#)#3.-D!$04:35)#3.-2D!):#+,-)#3*+!9.,6)#2!.,!,+$,+2+-#)#3.-2D!)-8!+*+-!+64+88+8!2+,*35+!36$:+6+-#)#3.-2(!& E509:;!5.6+2!'3#"!)!M)*)!O@S!3-8+$+-8+-#!.9!H)*+,-)!#")#!5)-!4+!02+8!9.,!$,./,)66)#35!)55+22!#.!,+)8!)-8!',3#+!E509:;& 6'4H>('6& <3%)(,.D! O! '.,19:.'! 40-8:+! 32! )! 2#,05#0,+8! TS@N93:+! 4)2+8! .-! 0)'<,& @:JD! ,L3<& 5:J! )-8! #"+! 5?,%& 7'=32,%"& J'42$"! AUVWBD! '3#"! #"+! '.,19:.'! 8+93-3#3.-2! 3-5:08+8! )2! JK?! E5"+6)N5.-9.,6)-#! 8.506+-#2! '"35"! ),+! ):2.! *):38! XVW&JK?(! H"+! 5.-2#,)3-#2! 9,.6! #"+! 25"+6)! )::.'! 5:3+-#2! #.! ,+)8! )-8! ',3#+! E509:;! '.,19:.'! 8+93-3#3.-2! )2! ,+/0:),! 2#,05#0,+8! JK?(! X35"+,! XVWN+-)4:+8! 5:3+-#2! 6)7! :3-1! '.,19:.'! 8+93-3#3.-2! '3#"! +L#+,-):! ,+2.0,5+2! )-8! )883#3.-):! )--.#)#3.-2!2#.,+8!3-!2+$),)#+!XVW!93:+2!3-!#"+!40-8:+D!$+,")$2!023-/!*.5)40:),3+2!205"!)2!V04:3-!Y.,+(& H"+! '.,19:.'! 2#,05#0,+! 32! 8+93-+8! 023-/! )-! UZ?! .-#.:./7D! )-8! )--.#)#+8! '3#"! >XS2! 2.! #")#! #"3,8! $),#3+2! 5)-! 9.,6! 2+6)-#35!2#)#+6+-#2!)4.0#!)-7!5.6$.-+-#!.9!)-7!E509:;!'.,19:.'D!9.,!3-2#)-5+!#.!2)7!#")#!)!$),#350:),!2+,*35+!$,.805+2! .0#$0#2! .9! )! 5+,#)3-! #7$+D! .,! #")#! )! /3*+-! 8)#)! :3-1! ')2! )88+8! 47! )! 8399+,+-#! ,+2+),5"+,! #.! #"+! 403:8+,! .9! #"+! ,+2#! .9! #"+! '.,19:.'(! O-! 0-[3$$+8! '.,19:.'! 40-8:+! 5)-! 4+! 6)8+! $04:35! 023-/! )-7! 2#)-8),8! '+4! 2+,*+,! )-8! 4+5.6+! $),#! .9! #"+! ?3-1+8!U$+-!V)#)!5:.08(& E+6)-#35!)--.#)#3.-2!)-8!)!6)-39+2#!9.,!#"+!40-8:+!8+5:),+!#"+!$0,$.2+!.9D!)-8!:3-12!4+#'++-!#"+!8399+,+-#!5.6$.-+-#2! 9.,63-/! )! '.,19:.'(! H"32! )::.'2! #"3,8! $),#3+2! #.! +L#,)5#! )-8! )$$+-8! )--.#)#3.-2! )4.0#! 8)#)! )-8! 2+,*35+2! 02+8! 47! #"+! '.,19:.'(! H"+! +*.:0#3.-! .9! #"+! '.,19:.'! 8+93-3#3.-! 3#2+:9! 5)-! ):2.! 4+! 3-5:08+8! 3-! #"+2+! )--.#)#3.-2D! '3#"! ,+9+,+-5+2! #.! $,+*3.02!*+,23.-2!)-8!)0#".,2(& S-!,+5+-#!7+),2D!H)*+,-)!")2!4+5.6+!)-!3-#+,+2#3-/!#),/+#!9.,!,+2+),5"+,2!3-*+2#3/)#3-/!$04:32"+8!'.,19:.'2D!205"!)2!#".2+! 9.0-8!.-!67\L$+,36+-#(!\L$:.,3-/!#"+2+!'.,19:.'!8+93-3#3.-2!023-/!#"+!E509:;!O@S!5)-!,+*+):!5.6$)#343:3#3+2!)-8!36$:353#! 8)#)! #7$+2! .9! '+4! 2+,*35+2D! $,.*383-/! -+'! )--.#)#3.-2! 9.,! 2+,*35+! ,+/32#,3+2! :31+! P3.Y)#):./0+(! E3#+2! )-8! #..:2! 5)-! )88! )883#3.-):! 8+25,3$#3.-2! .9! #"+! '.,19:.'! )-8! #"+! +L$+,36+-#! 83,+5#:7! #.! #"+! E509:;! '.,19:.'! 40-8:+! '3#".0#! )99+5#3-/! +L+50#3.-!4+")*3.0,!.,!,+]03,3-/!)-7!8++$+,!0-8+,2#)-83-/!.9!#"+!9.,6)#(!H"+7!6)7!9.,!3-2#)-5+!)0#.6)#35)::7!3-2+,#!:3-12! #.!#"+!.,3/3-):!2.0,5+!)-8!)0#".,!3-!)!8.'-:.)8+8!40-8:+D!'"35"!5)-!4+!$+,232#+8!+*+-!39!#"+!'.,19:.'!32!+*.:*+8!90,#"+,! )-8!832#,340#+8!+:2+'"+,+(& >23-/!#"+2+!5)$)43:3#3+2!#.!#"+3,!90::!+L#+-#D!)!'.,19:.'!40-8:+!32!)!,+2+),5"!.4Q+5#D!3-5:083-/!+*+,7#"3-/!,+]03,+8!9.,!)!90::! ,+N+-)5#6+-#!.9!#"+!*3,#0):!+L$+,36+-#!A$.2234:7!)!90::!*3,#0):!6)5"3-+!3-!U^W!9.,6)#BD!90::!$,.*+-)-5+!)-8!8)#)!2+#2!.9!)! $04:32"+8!'.,19:.'!+L+50#3.-D!)-8!2+6)-#35!8+25,3$#3.-2!.9!'")#!),+!#"+!$0,$.2+2!)-8!.,3/3-2!.9!#"+!$,.805+8!*):0+2(!P7! ):2.! 3-5:083-/! )! @VW! ,+$,+2+-#)#3.-! .9! #"+! $)$+,! ),323-/! 9,.6! #"+! +L$+,36+-#D! E509:;! 5)$#0,+2! )! 90::7! ,+$+)#)4:+! )-8! ,+$,.80534:+!+N253+-5+!$04:35)#3.-(! O2! )! /+-+,):! '.,19:.'! :)-/0)/+D! 2509:! )-8! #;9:.'! ")*+! 4++-! #"+! #),/+#! .9! #,)-2:)#3.-2! )-8! /+-+,)#3.-! 9.,! +L)6$:+! 47! P3.K.47_2!E+)")'1D!)-8!+L+50#+8!.-!):#+,-)#3*+!+-/3-+2D!:31+!KUH\>X(!P7!8+93-3-/!#"+!'.,19:.'!:)-/0)/+!3-8+$+-8+-#:7! 9,.6!#"+!+L+50#3.-!+-/3-+D!E509:;!+-5.0,)/+2!205"!+L#+-23.-2!)-8!'38+,!02+(!H"32!$,+2+-#)#3.-!8+6.-2#,)#+2!".'!E509:;! 5)-!4+!02+8!9,.6!X047!)-8!Y:.Q0,+!$,./,)62!#.!/+-+,)#+!)!E509:;!'.,19:.'!#")#!5)-!4+!+L+50#+8!47!H)*+,-)(! J3%),)&<-M&NL!+:&NLO;PKQKRSOTU&N34'?,$%&:'22#..#'%V.&W"9&J1L&JLW*X:I*KPPW*Q&KWPTYKU&&JLW*X:I*KPPW*Q&KWPTRW! Poster 21 OntoCAT — an integrated programming toolkit for common ontology application tasks Tomasz Adamusiak∗1 , Natalja Kurbatova1 , Morris A. Swertz1,2 , and Helen Parkinson1 1 European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, United Kingdom Coordination Center, Department of Genetics, University Medical Center Groningen and Groningen Bioinformatics Center, University of Groningen, P.O. Box 30001, 9700 RB, Groningen, The Netherlands 2 Genomics Availability Website: www.ontocat.org Source: www.ontocat.org/svn License: LGPLv.3 1 Introduction Ontologies are essential to data integration, query expansion, and modelling biological knowledge in life sciences. Two major public ontology repositories provide programmatic access: the EBI Ontology Lookup Service (OLS) [1] and the NCBO BioPortal [5]. Many users also develop local ontologies, so it is important to integrate queries to local files. However, it is relatively difficult to connect to each of them, in particular because these resources are still evolving or require considerable experience with ontologies themselves. Therefore, we developed OntoCAT, a software toolkit that provides high level abstraction for interacting with ontology resources including local files in standard OWL and OBO formats (via OWL API [2]), and public ontology repositories: EBI OLS and NCBO BioPortal. The requirements for these were based on our own use cases of Experimental Factor Ontology (EFO) development, ArrayExpress and MOLGENIS data annotation and analysis, and on user feedback. Since its inception in 2010 only the Java package has seen 22 releases. Most recent progress includes the implementation of reasoning for querying of relations other than subsumption (e.g. partonomy). This is enabled for local ontologies via HermiT reasoner [4], which supports knowledge bases expressed in SROIQ(D) – the description logic underpinning OWL2 (see also www.ontocat.org/wiki/Reasoning) and OLS, which provides a dedicated web service. 2 3 Applications OntoCAT is being used by the ontocat Bioconductor/R package [3] and the concept recognition tool Zooma (zooma.sf.net). Acknowledgements This work was supported by the European Community’s Seventh Framework Programmes GEN2PHEN [grant number 200754], SLING [grant number 226073], and SYBARIS [grant number 242220]; The European Molecular Biology Laboratory; the Netherlands Organisation for Scientific Research [NWO/Rubicon grant number 825.09.008]; and the Netherlands Bioinformatics Centre [BioAssist/Biobanking platform and BioRange grant SP1.2.3]. Bibliography Implementation The library is implemented in Java6 and is available under the permissive LGPLv3 license. OntoCAT can also be used via other interfaces including a web-based ontology database and browser, scriptable REST service, and Google App application. OntoCAT was designed to support simple use cases in an easy to implement way, while still enabling the implementation of advanced algorithms. Many of such common tasks are demonstrated in code examples available at www.ontocat.org. A complete list of available ontology, term, and hierarchy methods named in a self-describing manner includes: getOntologies(), getOntology(), searchAll(), searchOntology(), getTerm(), getAllTerms(), getAnnotations(), getSynonyms(), getDefinitions(), getRootTerms(), getTermPath(), getChildren(), getParents(), getAllChildren(), getAllParents(), getRelations(). OntoCAT follows the convention over configuration design approach, i.e., requiring minimal configuration where possible. FileOntologyService, OlsOntologyService, and BioportalOntologyService are the core objects for working ∗ To with: OWL and OBO ontologies, EBI OLS and NCBO BioPortal respectively. Because each ontology service implements the same OntologyService interface, these core services can then be combined or extended to provide additional behaviour by adding a wrapper (decorator), e.g.: combination of multiple ontology resources into one service (CompositeServiceDecorator ), limiting and ranking of search results (SortedSubsetDecorator ), translating one ontology namespace to another (TranslatedOntologyService), Ehcache-based enterprise-grade caching (CachedServiceDecorator ), or enabling reasoner support (ReasonedFileOntologyService). The current repertoire of supported ontology resources could easily be extended for other resources such as DAML, Prot´ eg´ e-OWL API, ONKI API, or OntoSelect. Such services would only need to implement the OntologyService interface to immediately become aligned with pre-existing resources and allow for their seamless interchangeability. [1] Ct, R. G., Jones, P., Martens, L., Apweiler, R., and Hermjakob, H. (2008). The Ontology Lookup Service: more data and better tools for controlled vocabulary queries. Nucleic Acids Res, 36(Web Server issue), W372–W376. [2] Horridge, M. and Bechhofer, S. (2009). The OWL API: A Java API for Working with OWL 2 Ontologies. In OWLED 2009, 6th OWL Experienced and Directions Workshop, Chantilly, Virginia. [3] Kurbatova, N., Adamusiak, T., Kurnosov, P., Swertz, M. A., and Kapushesky, M. (in press). ontoCAT: an R package for ontology traversal and search. Bioinformatics. [4] Motik, B., Shearer, R., and Horrocks, I. (2009). Hypertableau Reasoning for Description Logics. Journal of Artificial Intelligence Research, 36, 165–228. [5] Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D. L., Storey, M.-A., Chute, C. G., and Musen, M. A. (2009). BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res, 37(Web Server issue), W170–W173. whom correspondence should be addressed: tomasz@ebi.ac.uk Poster 22 !"#$%&'(")*'$&)$+$),%-./'"01"23$."'%&)'34"$2'.4%2$&5'67'1%89%5"'#,$-)' $&.32,83$6&. :3"77"&'(;--"2<=>='?$@'A6634B='CD632'C2$&.E='F%D6'G2%##"&4;73<='H-%&'I$--$%@.J='H-"0'(".3$%.4+$-$K='C"3"2'L$8"M='N-$+$"2' :%--6,O='P%26.-%+'NQ'F%-84"&96R='L$84%2)'F6--%&)S='I$--$%@':166&"2S='T%@".'C2683"2<U='?462.3"&'H-3"46-V>='W4%2-".' C-"..X>='H%26&'(Q'Y896>='H&)2"%.'?$--">' Y&$+"2.$3X'67'Z[#"89='!"1%23@"&3'67'!"2@%36-65X='L%3V"#,25"2'H--""'<MU='BEKER'Z[#"89='\"2@%&X]' B^_LW'_&+$26&@"&3%-'A$6$&762@%3$8.'W"&32"=' W"&32"'762'_86-65X'%&)'FX)26-65X='(%8-"%&'A-)5='A"&.6&'Z%&"='W26`@%2.4'\$7762)='I%--$&5762)'Na<U'RAA='YG]' EI%5"&$&5"&'Y&$+"2.$3X='Z%#62%362X' 67'^"@%36-65X='!26"+"&)%%-.".3""5'<='MOUR'CA'I%5"&$&5"&='?4"'^"34"2-%&).]' J:8466-'67'W6@1,3"2':8$"&8"='Y&$+"2.$3X'67'(%&84".3"2='N0762)'L6%)Q' (%&84".3"2'(<E'SCZ='YG]'KA$63"8='?Y'!2".)"&='\"2@%&X]'K_,261"%&'A$6$&762@%3$8.'b&.3$3,3"='I"--86@"'?2,.3'\"&6@"'W%@1,.='F$&036&='W%@#2$)5"=' WA<U'<:!='YG]'M:X@#$6."='b^LbHcb2$.%'d'W%@1,.')"'A"%,-$",Q'EKUJB'L"&&".='e2%&8"]' O!%23@6,34'W6--"5"='C.X846-65X'%&)'A2%$&':8$"&8".' !"1%23@"&3='F$&@%&'A60'MBUO='F%&6+"2='^F'UEOKK='Y:H]' R_%5-"'\"&6@$8.='A%#2%4%@'L"."%284'W%@1,.='W%@#2$)5"'WABB'EH?='YG]'' SW6--"5"'67' Z$7"':8$"&8".='Y&$+"2.$3X'67'!,&)""='!,&)""=':863-%&)Q'!!<'K_FQ'YG]' '>!"#$%&':68$"3X < C26D"83'YLZ*'4331 ' '*cc ' '`$9$ ' 'Q'')"#$%&'Q''625'c''!"#$%&(")' Z$8"&."*'f%2$6,.='%--'!e:\'86@1-$%&3'g4331*cc```Q)"#$%&Q625c.68$%-h86&32%83i5,$)"-$&".j k!"#$%&l'$.'34"'&%@"'762'%')$.32$#,3$6&'67'2"%)$-X',.%#-"'.673`%2"'762'34"'Z$&,0'62'e2""A:!'9"2&"-.Q'b3'$.' @%$&3%$&")'#X'%'`62-)d`$)"'&"3`629'67'+6-,&3""2.'34%3'762@.'34"'k!"#$%&':68$"3XlQ'N1"&'36'%--'$&)$+$),%-.' $&3"2".3")'$&'86&32$#,3$&5'36'%'.3"%)$-X'$@126+$&5'86@1,3"'"&+$26&@"&3='$3'4%.'76,&)'@%&X',."2.'$&8-,)$&5' 346."'$&+6-+")'$&'.8$"&3$7$8'86@1,3$&5Q'?4$.'12"."&3%3$6&'5$+".'%&',1)%3"'6&'34"'-%3".3'#$6$&762@%3$8.' 1%89%5".'34%3'4%+"'#""&'%))")'36'!"#$%&'%&)'6&56$&5')"+"-61@"&3.'36'76.3"2'86--%#62%3$6&Q ?4"'@62"'`"'9&6`'%#6,3'8"--,-%2'#$6-65X'%&)'34"'#"33"2'6,2'86@@,&$3X'#"86@".'$&'%#.32%83$&5'%&)'762d @%--X'.362$&5'6,2'$&.$543.'%&)'7$&)$&5.='34"'@62"'.673`%2"'$.'82"%3")'36'%$)'2"."%284'%&)',&)"2.3%&)$&5Q' ?4"'!"#$%&'(")'86@@,&$3X'`629.'36'#2$&5'346."'366-.'726@'2"."%284"2./'`"#.$3".'36'`$)".12"%)'%11-$8%d 3$6&'#X'2"."%284"2.'2,&&$&5'!"#$%&'$&')2Xd'%&)'`"3d-%#.Q'I"'%-.6'4"-1',."2.'7$&)'34"'366-.'34"X'&"")='"Q5Q' #X'16$&3$&5'36'366-.'.,$3%#-"'%.'86@1%&$6&.'36'"+"2X'126+$)")'.673`%2"'1%89%5"'%&)'#X'$&8-,)$&5')68,d @"&3%3$6&'$&'%'.3%&)%2)'-68%3$6&Q'?4"'!"#$%&'(")'3%.9'1%5".'%&)'@%$-$&5'126+$)"'$&762@%3$6&'6&'34"'12"d 7"22")'26#,.3'366-.'%&)'%'762,@'762')$.8,..$&5'.1"8$7$8'126#-"@.Q''(%9$&5'34"'366-.'%&)'$&762@%3$6&')$2"83-X' %+%$-%#-"'36'.3,)"&3.'67'%--'%5".'%&)'#%89526,&).'61"&.'34"')662'762'3%-"&3")'$&)$+$),%-.'36')"+"-61' #$6$&762@%3$8.'.9$--.Q AN:W'BU<U='$3.'12"d@""3$&5.'%&)'%'!"#$%&d625%&$.")'@""3$&5'6&'#$6$&762@%3$8.'4%+"'126+$)")'$@1"3,.' 762'6,2'"77623.Q'N,2'&"`-Xd76,&)'$&),.32$%-'1%23&"2='_%5-"'\"&6@$8.='%8%)"@$8'526,1.'-$9"'^_LW'`$34' ^_AW'A$6dZ$&,0='.X.3"@'%)@$&$.32%362.'%&)'2"."%284"2.'%-$9"'%--'86&32$#,3"'36'34"'.%@"':,#+"2.$6&'%&)' \$3'.6,28"'32"".'36'86d@%$&3%$&'34"$2'."-"83")'1%89%5".Q'Z%.3'X"%2='`"'1%89%5")'_&."@#-='%))$3$6&%-'5"d &6@"'%.."@#-"2.'-$9"'($2%='@%&X'366-.'762'"+6-,3$6&%2X'."m,"&8"'%&%-X.$.'%&)'&"03d5"&"2%3$6&'."m,"&d 8$&5'8"&32$&5'6&'n$$@"='%&)'@62"Q'(%&X'-$#2%2$".'4%+"'#""&'%))")'36'34"'%284$+"'%.'2,&d'%&)c62'#,$-)d 3$@"')"1"&)"&8$".Q'_0$.3$&5'1%89%5".'`"2"',1)%3")'62'$@126+")'36'%))2"..'6,3.3%&)$&5'$..,".='.,84'%.' #,$-)'7%$-,2".'6&'@62"'"063$8'1-%3762@.='86@1%3$#$-$3$".'`$34'8,22"&3'86@1$-"2.=')68,@"&3%3$6&='32%&.-%3$6&.Q' I4"2"'%112612$%3"='7,&83$6&%-'1%384".'`"2"'."&3'36'34"',1.32"%@')"+"-61"2.Q (6.3'!"#$%&'(")'1%89%5".'762'86@1,3%3$6&%-'#$6-65X'#"86@"')$2"83-X'$&.3%--%#-"'3426,54'34"'2"5,-%2' ."2+"2.'67'!"#$%&Q'(%&X'634"2.='46`"+"2='`"2"'12"1%2")'36'.%3$.7X'%&'$@@")$%3"'-68%-'&"")'%&)'4%+"'&63' #""&'7$&%-$.")'762',1-6%)'36'34"'!"#$%&')$.32$#,3$6&Q'b&3"2".3$&5-X='34"'#,$-)'$&.32,83$6&.'67'@%&X'.,84' 1%89%5".'`"2"'&6&"34"-"..'76,&)'36'#"'67'.326&5'$&3"2".3'36'34"'86@@,&$3XQ'?4$.'$.'".1"8$%--X'32,"'`4"&' 34"'.673`%2"'$.'&6'-6&5"2'@%$&3%$&")'%&)'2"m,$2".'1%384$&5'762'86@1%3$#$-$3X'`$34'@6)"2&'86@1$-"2.='%&)'$3' 5$+".'%))$3$6&%-'86&7$)"&8"'36'4%+"'.6@"6&"'36'"084%&5"'86&8"2&.'%#6,3'34"'"77"83'67'$&.3%--$&5' %-3"2&%3$+"'-$#2%2$".'62'%'.1"")d,1'#X'613$@$V$&5'86@1$-"2'.`$384".'762'%'1%23$8,-%2'1-%3762@Q C,#-$8-X'.4%2$&5'#,$-)'$&.32,83$6&.'`$34'!"#$%&'(")'$.'%-.6'16..$#-"'762'.673`%2"'34%3'$.'&63'2")$.32$#,3")' `$34'!"#$%&='"Q5Q'#"8%,."'67'%'2".32$83$+"'-$8"&."Q'C2$@"'"0%@1-".'%2"'.""&'$&'.32,83,2%-'#$6-65X'`$34'f(!' 62'L6."33%Q'?46."'366-.'%2"'16`"27,-='#,3'4%+"'%&'"&62@6,.'.6,28"'32""'`4$84'126+$)".'@%&X'611623,&$d 3$".'36'@%9"'"2262.'),2$&5'34"$2'$&.3%--%3$6&Q'H88"..'36'34"'$@@")$%3"-X'"0"8,3%#-"'#,$-)'$&.32,83$6&.'%3' !"#$%&'(")='34"2"762"='.%+".'3$@"'g2"5%2)-"..'67'6&"o.'"01"2$"&8"j'd'@,84'@62"'3$@"'34%&'$3'`6,-)'86.3' 36'."&)'$&'%&'$@126+"@"&3Q'_+"2X#6)X'$.'$&+$3")'36'86&32$#,3"'36'6&"/.'%#$-$3X'%&)'$&3"2".3Q Poster 23 The BALL project: The Biochemical Algorithms Library (BALL) for Rapid Application Development in Structural Bioinformatics and its graphical user interface BALLView A. Hildebrandt3,4, A .K. Dehof1, D. Stöckel1, S.Nickels1, S.Müller1, M. Schumann2, H.P. Lenhof1, O.Kohlbacher2 1 Center for Bioinformatics, Saarland University, 2Center for Bioinformatics, University of Tübingen, 3Johannes-Gutenberg-Universität Mainz, 4 ahildebr@uni-mainz.de Project website: www.ball-project.org License: LGPL (BALL), GPL (BALLView) Abstract: Developing programs for structural bioinformatics is a difficult and often tedious task. Even if the algorithms have been carefully designed, the programmer has to solve a variety of complex and recurring problems not fundamentally related to the algorithm at hand, but necessary for real-world applications. With the Biochemical Algorithms Library (BALL), we present a versatile C++ class library for structural bioinformatics that is supplemented with a Python interface for scripting functionality and a number of applications like the molecular modeler BALLView. In recent years, BALL has seen a significant increase in functionality and substantial usability improvements. It has been ported to further operating systems; indeed, it currently supports all major brands. Moreover, BALL has evolved from a commercial product into a free-ofcharge, open source software licensed under the Lesser GNU Public License (LGPL). Recently, binary packages for BALL have been accepted for inclusion into the Debian distribution, enabling simple installation on Debian platforms and all of its derivatives. The current version (1.4.0 at the time of writing) contains more than 730 classes and more than 700,000 lines of code. The provided functionality covers a large number of common file formats in structural bioinformatics (pdb, mol2, hin to name a few), an extensive set of data structures and algorithms targeting molecular modeling and computational biology as well as several force field implementations. The graphical user interface BALLView provides access to a large part of the functionality of the underlying library BALL and its VIEW ± component which focuses at molecular visualization. Recently, the RTFact library was integrated into BALLView, allowing for real time ray tracing within the modeller. In addition, high-quality stereoscopic visualization is provided as well. BALLView is often used for teaching purposes, as the integrated python interpreter allows direct access to the presented data. BALL and BALLView are currently developed at the Center for Bioinformatics of Saarland University, the Center for Bioinformatics of the University of Tübingen, and the JohannesGutenberg University Mainz. References: A. Hildebrandt, A.K. Dehof, A. Rurainski, A. Bertsch, M. Schumann, N.C. Toussaint, A. Moll, D. Stockel, S. Nickels, S.C. Mueller, H.P. Lenhof, O. Kohlbacher: "BALL - Biochemical Algorithms Library 1.3", 2010, BMC Bioinformatics, 11:531 A. Moll, A. Hildebrandt, H.P. Lenhof, and O. Kohlbacher: ³%$//9LHZ $ WRRO IRU UHVHDUFK DQGHGXFDWLRQLQPROHFXODUPRGHOLQJ³, 2006, Bioinformatics, 22(3):365-366 Poster 24 Biopython Project Update Peter Cock∗, Brad Chapman†, et al. Bioinformatics Open Source Conference (BOSC) 2011, Vienna, Austria Website: http://biopython.org Repository: https://github.com/biopython/biopython License: Biopython License Agreement (MIT style, see http://www.biopython.org/DIST/LICENSE) In this talk we present the current status of the Biopython project, a long running distributed collaboration producing a freely available Python library for biological computation (Cock et al., 2009). Biopython is supported by the Open Bioinformatics Foundation (OBF). Since BOSC 2010, we have made three releases. Touching on key functionality, Biopython 1.55 (August 2010) made our command line tool wrappers directly executable, Biopython 1.56 (November 2010) added a UniProt XML parser, and Biopython 1.57 (April 2011) added an SQLite based indexer for sequence flat files which allows the use of indexes too big to hold in memory. All releases have seen more unit tests, more documentation, and more new contributors. In summer 2010 we had one Google Summer of Code (GSoC) project student, Joao Rodrigues, who worked on protein structure (PDB) code for Biopython. Some of Joao’s work has already been included in Biopython releases, and he and his mentor Eric Talevich (himself a GSoC 2009 student) are working to merge the rest of this work. Three new students have been accepted to work on Biopython for GSoC 2011. For the last six months, we have been running a BuildBot server (see http://buildbot.net/), with the offline parts of the Biopython unit test suite scheduled every night on build slaves running on Linux, Windows and Mac OS X, and both “C” Python and Jython (using the Java virtual machine). This has been beneficial in catching platform specific regressions – for example, under Python 3 which we are still working towards fully supporting. The BuildBot server is running on an OBF-maintained Amazon cloud server, while the slaves are initially all machines maintained by individual Biopython developers. References Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163 ∗ Plant Pathology, James Hutton Institute (formerly SCRI), Invergowrie, Dundee DD2 5DA, UK – p.j.a.cock@googlemail.com Core Facility, Harvard School of Public Health, Harvard University, Boston, MA, USA †Bioinformatics !"#$%&'()*'*+$"',-./ 012$$'3#+45'#46'$")',-./'324&27$+89 5 :")'.4$#7+2';4&$+$8$)'<27'3#41)7'=)&)#71">':2724$2>'.(>'3#4#6# ?)7)'*)'@7)&)4$'#'8@6#$)'<27'$")',-./'@72A)1$>'#46'+$%&'1)4$7#B'&2<$*#7)'@72681$>'$")'3"#62'6#$#C#&)' &1")9#D''3"#62'+&'#4'27E#4+&9'#E42&$+1'6#$#C#&)'&1")9#'$"#$'+4$)72@)7#$)&'*+$"'&)F)7#B'2$")7',-./' &2<$*#7)'$22B&>'+41B86+4E',G72*&)>'H@2BB2>':7+@#B>'#46'-HIJ=D''=)1)4$'#66+$+24&'$2'3"#62'#46'$")' ,-./'@72A)1$'+41B86)'#'4#$87#B'6+F)7&+$K'9268B)'<27'3"#62>'#',-./'24'$")'1B286')<<27$>'#46' +417)#&)6'12F)7#E)'2<'3"#62'+4':7+@#BD "$$@LMME926D27EM /2*4B2#6&L 3"#62L'"$$@LMM&2871)<27E)D4)$M@72A)1$&ME926M<+B)&M>'H7$+&$+1'N+1)4&)'O :7+@#BL'"$$@LMM$7+@#BD&2871)<27E)D4)$MPQR62*4B2#6S/2*4B2#6>',(T'FDU Poster 25 Title Author Affiliation Contact URL Code License Exploring human variation data with Clojure Brad Chapman Harvard School of Public Health, Boston, MA chapmanb@50mail.com http://ourvar.com https://github.com/chapmanb/r-var MIT Direct to consumer genetics companies such as 23andMe give non-specialists access to sequence data. This is a powerful way to democratize research, and I undertook a small project to make autoimmune disease variation associations more accessible to the general public. The goal was to provide a system on top of Ensembl, SNPedia and PubMed that prioritizes interesting variants and allows users to share their experiences in the context of these variations. The system is implemented on Google App Engine using Clojure; Clojure is a functional programming language with a Lisp syntax built on top of the Java Virtual Machine. In the process of implementing the prototype, I discovered several characteristics of the Clojure language that influenced my work in Python and other languages, and wanted to share these with the open bioinformatics community: • Easy incorporation of external libraries; this makes re-use simple and encourages community development of small packages that can be linked together. • Emphasis on functional code free of side effects, This eases the transition to parallel programming using multiple cores and larger frameworks such as Hadoop map/reduce. • The code structure encourages a separation of assignment and action. This change in philosophy enables production of more configurable and easy to understand code. • An emphasis on domain specific languages facilitates development of higher level APIs and abstraction, improving code re-use. By sharing, I hope to encourage you to explore Clojure and incorporate some of its better features into your language of choice. 1 EMBOSS: New developments and extended data access Peter Rice (pmr@ebi.ac.uk), Alan Bleasby, Jon Ison, Mahmut Uludag European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, United Kingdom. The European Molecular Biology Open Software Suite (EMBOSS) is a mature package of software tools developed for the molecular biology community. It includes a comprehensive set of applications for molecular sequence analysis and other tasks and integrates popular third-party software packages under a consistent interface. EMBOSS includes extensive C programming libraries and is a platform to develop and release software in the true open source spirit. A major new stable version is released each year and the current source code tree can be downloaded via CVS. All code is open source and licensed for use by everyone under the GNU Software licenses (GPL with LGPL library code). There have been many tens of thousands of downloads including site-wide installations all over the world since the project inception. EMBOSS is used extensively in production environments reflecting its mature status and has been incorporated into many web-based, standalone graphical and workflow interfaces including Galaxy, wEMBOSS, EMBOSS Explorer, JEMBOSS, SoapLab, SRS, Taverna and several commercial workflow packages. EMBOSS 6.4 will be released on 15th July 2011 (we always release on 15th July). We have made a major effort to add new features and applications for this release. New features include: • • • • • • • • • • • • • • Use of ontologies - fully integrated new EDAM ontology for data types and methods in EMBOSS, and for metadata annotation of public data resources Standard definitions for 1000+ data resources for all users Definition of "servers" as common access to multiple data resources Simple alias names for command-line access to data resources Catalogue of public data resources with web APIs Integration of NCBI taxonomy with taxon annotation of data resources and entries Retrieval of multiple datatypes - sequence, feature, ontology term, taxonomy, data resource description, sequence assembly … and plain text from any other source SOAP, REST, DAS, BioMart, Ensembl protocol support New query language combining multiple queries for any data resource Full support for integration in Galaxy Improved adherence to data format standards, e.g. GFF3 Applications to index, retrieve, utilize and analyze new data types Books published by Cambridge University Press E-Learning courses reusing the book text, supplemented by new material. Project home page: http://emboss.open-bio.org/ Release download site: ftp://emboss.open-bio.org/pub/EMBOSS/ Anonymous CVS server: http://www.open-bio.org/wiki/SourceCode Poster 26 G-language Project: the last 10 years and beyond Kazuharu Arakawa1 (gaou@sfc.keio.ac.jp) 1 Institute for Advanced Biosciences, Keio University, Fujisawa, 252-8520, Japan URL(project): http://www.g-language.org/ URL(code): http://sourceforge.jp/projects/glang/releases/ License: GNU General Public License v.2 Started in the year 2001, the G-language Project has been developing a series of open-source software tools for bioinformatic researches of genomes, especially focusing on those of bacteria. Following are the main software projects: • • • • • Genome Analysis Environment (http://www.g-language.org/) provides Perl libraries and UNIX shell interface for basic I/O of biological data, as well as more than 100 analysis tools. Genome Projector (http://www.g-language.org/GenomeProjector/) provides online zoomable browser of bacterial genomes using Google Maps API. Pathway Projector (http://www.g-language.org/PathwayProjector/) provides online zoomable browser of biochemical pathways using KEGG map and Google Maps API. G-language Bookmarklet (http://www.g-language.org/wiki/bookmarklet) is a bookmarklet that enables quick navigation to online bioinformatics resources including database searches and the use of web services. Keio Bioinformatics Web Service (http://www.g-language.org/kbws/) provides a collection of about 50 web service tools as EMBOSS commands. We would like to give an overview of our achievements over the last 10 years, and discuss future directions of the project. References: 1. Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M, "G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining", Bioinformatics, 2003, 19(2):305-306. 2. Arakawa K, Tomita M, "G-language System as a platform for large-scale analysis of highthroughput omics data", Journal of Pesticide Science, 2006, 31(3):282-288. 3. Arakawa K, Suzuki H, Tomita M, "Computational Genome Analysis Using The G-language System", Genes, Genomes and Genomics, 2008, 2(1): 1-13. 4. Arakawa K, Kido N, Oshita K, Tomita M, "G-language genome analysis environment with REST and SOAP web service interfaces", Nucleic Acids Res., 2010, 38 Suppl:W700-705 !"#$%&'()$*"+)$",-)-.+)$&%/-01")."/2'"3-0$)1)+/"45%/+)$&" !"#$%&'()*()+&'"*,-(.&/012$312"&-%4&5$6&7-8"41$%& '"*)$1$9:&;(1(-)*,&<$%%(*:"$%1+&=%(&'"*)$1$9:&>-0+&;(4#$%4+&>?&@ABCD+&E!?& F1"#$%G#()*()H#"*)$1$9:G*$#I& '-"%&J)$K(*:&3(6J-L(M&,::JMNN)(1(-)*,G#"*)$1$9:G*$#N6"$& !$O)*(&*$4(&-8-".-6.(&-:M&,::JMNN#69G*$4(J.(PG*$#& =J(%&!$O)*(&."*(%1(M&?J-*,(&DGB& & '"*)$1$9:&;(1(-)*,&6(L-%&:,(&4(8(.$J#(%:&$9&-&."6)-)0&$9&6-1"*&6"$"%9$)#-:"*1&9O%*:"$%1&"%&DBBA+& 3)"::(%&"%&<Q&-%4&6O".:&$%&:,(&GRST&9)-#(3$)2G&T,(&9")1:&8()1"$%&$9&:,(&'"*)$1$9:&5"$.$L0&U$O%4-:"$%& F'5UI&3-1&.-O%*,(4&"%&VO.0&DBWB+&-%4&"1&4(1"L%(4&:$&1()8(&:,(&%((41&$9&:,(&6"$"%9$)#-:"*"-%&60& J)$8"4"%L&J)(X3)"::(%&6-1"*&9O%*:"$%-.":0&1O*,&-1&9".(&J-)1()1+&"#J.(#(%:-:"$%1&$9&*$##$%&-.L$)":,#1&9$)& :,(&#-%"JO.-:"$%&$9&7R?+&;R?&-%4&J)$:("%&1(YO(%*(1+&-%4&*$%%(*:$)1&:$&*$##$%.0XO1(4&3(6&1()8"*(1G& '5U&3-1&*$%*("8(4&$9&-1&-%&$J(%&1$O)*(&J)$K(*:&9)$#&:,(&6(L"%%"%L+&-%4&)(.(-1(4&O%4()&:,(&=!ZX -JJ)$8(4&'!X[\&."*(%1(G&>$)2&,-1&*$%:"%O(4&1"%*(&:,(&.-O%*,&$9&8()1"$%&WGB+&3":,&:,(&(1:-6."1,#(%:&$9&-& L)$3"%L&*$##O%":0&$9&O1()1+&1$#(&$9&3,"*,&-)(&J.-0"%L&-%&-*:"8(&)$.(&"%&4(8(.$J#(%:&:,)$OL,&:,(& *$%:)"6O:"$%&$9&1$O)*(&*$4(+&6OL&)(J$):1&-%4&J-:*,(1G& >$)2&$%&-&1(*$%4&8()1"$%&$9&:,(&."6)-)0&"1&$%L$"%L+&-%4&3"..&)(1O.:&"%&:,(&.-O%*,&$9&8()1"$%&DGB&"%&:,(& 1O##()&$9&DBWWG&Z%&-44":"$%&:$&(%,-%*(#(%:1&:$&:,(&J()9$)#-%*(&-%4&*-J-*":0&$9&:,(&6-1"*&9(-:O)(1& *$%:-"%(4&"%&8()1"$%&W+&:,(&%(3&8()1"$%&3"..&J)$8"4(&-&)-%L(&$9&%(3&9(-:O)(1&-%4&4(#$&-JJ."*-:"$%1& "%*.O4"%L&-**(11&:$&-48-%*(4&#-:,&9O%*:"$%1+&-&*$#J-)-:"8(&7R?&1(YO(%*(&-11(#6.()+&-&)-%L(&$9& *$##-%4."%(&:$$.1&-%4&:$$.&1,$3"%L&,$3&:,(&7((J/$$#&8"1O-."]-:"$%&:(*,%$.$L0&*-%&6(&O1(4&:$& 6)$31(&-&L(%$#(&YO"*2.0&-%4&"%:O":"8(.0G& =O)&$%L$"%L&(99$):1&:$&J)$#$:(&:,(&L)$3:,&$9&-%&$J(%&1$O)*(&*$##O%":0&-)$O%4&'5U&"%*.O4(&:,(& (1:-6."1,#(%:&$9&*$##O%":0&9$)O#1&-:&,::JMNN#69G*$4(J.(PG*$#&-%4&:,(&*)(-:"$%&$9&-&T(*,%"*-.& ?48"1$)0&5$-)4&*$%1"1:"%L&$9&-*-4(#"*&-%4&*$##()*"-.&L)$OJ1&O1"%L&'5U&"%&:,(")&3$)2G&'$1:& "#J$):-%:.0+&3(&J.-%&:$&:)-%19()&$9&$3%()1,"J&$9&:,(&J)$K(*:&9)$#&'"*)$1$9:&:$&:,(&=O:()<O)8(& U$O%4-:"$%&F,::JMNN333G$O:()*O)8(G$)LNI+&#-2"%L&'"*)$1$9:&$%(&*$%:)"6O:$)&-#$%L&#-%0&:$&-&*$##$%& $J(%&1$O)*(&J)$K(*:G& T,"1&J)(1(%:-:"$%&3"..&)(J$):&$%&:,(&L)$3:,&$9&:,(&'"*)$1$9:&5"$.$L0&U$O%4-:"$%&*$##O%":0+&:,(& *,-..(%L(1&$9&)O%%"%L&-%&$J(%&1$O)*(&J)$K(*:&"%&-&*$##()*"-.&(%8")$%#(%:+&-%4&J.-%1&9$)&:,(&%(P:&J,-1(& $9&4(8(.$J#(%:G& & & !" #$$%" &'#" ($)" *)+,*)$-+..'/0" 0+/$1+" 2//$#2#'$/." '/" 3+/+)'-" 4+2#5)+" 4$)12#" 63447" ! "#$#%!&'!()**+,-)-%!! .-/,#%*!0*-)%#%1!#%!2#30314!5678-*9:/,! ;)#*+)#/,!.#*:/,*)!<-=3)-93)4!3>!9,*!.-?!@0-%/A!(3/#*94! ($*B-%%:9)-::*!CDE!FGHFI!! &J*=#%1*%E!6*)B-%4! K#$#%'9:L9J*=#%1*%'B$1'+* ! M8<!>3)!6;;!9330A#9N!,99$NOO1-0-?4'9J*=#%1*%'B$1'+*O!!7P!6;;!&330Q#9! M8<!>3)!:3J)/*!/3+*N!,99$NOO/3BBJ%#94'1G'=?'$:J'*+JO!! ! &,*! 6*%*)#/! ;*-9J)*! ;3)B-9! R6;;S! #:! -! 9-=7+*0#B#9*+! >0-9! >#0*! >3)B-9E! +*:/)#=#%1!-!,#*)-)/,#/-0!1)3J$#%1!3>!1*%3B#/!>*-9J)*:!-%+!:J=7>*-9J)*:!-%%39-9*+! >3)! -! $-)9#/J0-)! 1*%3B*E! )3J9#%*04! $)3K#+*+! =4! +#>>*)*%9! /*%9*):! -)3J%+! 9,*! 103=*'! 509,3J1,!9,#:!#:!-!T#+*04!-//*$9*+!>3)B-9E!9,*)*!-)*!:0#1,9!+#>>*)*%/*:!=*9T**%!9,*! >#0*:! >)3B! +#>>*)*%9! /*%9*):E! T,#/,! T#00! #%! 9J)%! 0*-+! 93! #%9*))J$9#3%:! #%! 9,*! +3T%:9)*-B! -%-04:#:'! &3+-4E! #%! 9,*! %*?9! 1*%*)-9#3%! :*UJ*%/#%1! *)-E! 9,*! +*K*03$B*%9:! #%! +**$7:*UJ*%/#%1! 9*/,%3031#*:! 0#A*! 8V57(*UJ*%/#%1! R8V57(*US! $)3K#+*! J:! T#9,! -! B3)*! $)*/#:*! B*-:J)*B*%9! 3>! 0*K*0:! 3>! 9,*! 9)-%:/)#$93B*! #%! +#>>*)*%9!+*K*03$B*%9-0!:9-1*:'!@)31)-B:!:J/,!-:!)WJ-%9E!)X#>>E!YJ>>+#>>E!B&Z.!-%+! YJ>>0#%A:! /-%! =*! J:*+! 93! *:9#B-9*! 9,*! -=J%+-%/*! 3>! -! 1#K*%! :*9! 3>! 9)-%:/)#$9:E! +#>>*)*%9#-0!9)-%:/)#$9!*?$)*::#3%!9*:9#%1!-%+!+*7%3K3!-::*B=04!3>!9)-%:/)#$9:!>)3B! 8V57(*U! +-9-'! [>9*%E! 9,*! -=3K*! $)31)-B:! )*UJ#)*! -! 0#990*! $)#3)! A%3T0*+1*! 3>! -%! 3)1-%#:B\:!A%3T%!9)-%:/)#$9:!#%!6;;7)*0-9*+!>3)B-9'! ! ]*)*E!T*!-)*!-++)*::#%1!9T3!B-#%!#::J*:E!T,#/,!T#00!/3B*!-/)3::!9,*!#%#9#-0! :9-1*!3>!9,*!-=3K*!94$*:!3>!8V57(*U!-%-04:#:!T3)A^!9,*4!-)*N!! _S! 9,*! -=3K*! $)31)-B:! -)*! UJ#9*! :9)#/9! -=3J9! 9,*! :$*/#>#/! >3)B-9! 3>! 9)-%:/)#$9! -%%39-9#3%!9,*4!-//*$9E!-%+! GS!#>!=#3031#:9:!T3J0+!0#A*!93!*K-0J-9*!9,*!B*9,3+:!-=3K*E!T,#/,!)*UJ#)*!+#>>*)*%9!>#0*! >3)B-9:E!9,*!/3B$-9#=#0#94!3>!>3)B-9:!#:!/)J/#-0'!! ! &3!9-/A0*!9,*!-=3K*!$)3=0*B:!T*!#%9*1)-9*+!-!/300*/9#3%!3>!9330:!#%!3J)!6-0-?4! :*)K#/*E! T,#/,! 1#K*:! 43J! -! =*99*)! J%+*):9-%+#%1! 3>! 9,*! -%%39-9#3%! >#0*:! =*>3)*! 43J! $)3/**+!T#9,!8V57(*U!-%-04:#:'!&,*!9330:!#%!9,*!>#):9!:*/9#3%!K-0#+-9*!9,*!6;;!>3)B-9! -%+!)*9J)%!-!)*$3)9!9,-9!1#K*:!+*9-#0:!3%!9,*!>#0*!/3%9*%9:'!!;3)!*?-B$0*E!9,*!>*-9J)*7 #+*%9#>#*)!B-$$#%1!#%!6;;!>#0*:!,-:!-!$39*%9#-0!)30*!#%!/3%%*/9#%1!:J=7>*-9J)*:!93!9,*! )*0-9*+!B-#%!>*-9J)*'!&,*!-K-#0-=0*!6;;!$-):#%1!$)31)-B:!-)*!,*-K#04!+*$*%+*%9!3%! >*-9J)*7#+*%9#>#*)! B-$$#%1! 93! /0-::#>4! +#>>*)*%9! 0*K*0:! 3>! >*-9J)*:! -%+! :J=! >*-9J)*:'! .3)*3K*)E!93!B*-:J)*!9,*!+#>>*)*%9#-0!1*%*!*?$)*::#3%!3)!)*-+!-=J%+-%/*!>3)!-!:*9! 3>! 9)-%:/)#$9:E! =-:*+! 3%! >#0*:! >)3B! 9,*! MY(Y! 6*%3B*! Y*%9*)! :**B:! 93! =*! +#>>#/J09! =*/-J:*! #%! 9,3:*! 6;;! >#0*:E! +#>>*)*%9! 9)-%:/)#$9! -%%39-9#3%:! >)3B! :-B*! 03/#! 3>! 9,*! 1*%3B*! -)*! %39! B*)1*+E! *K*%! 9,3J1,! 9,*4! B-4! =*03%1! 93! 9,*! :-B*! 1*%*'! [J)! $)31)-B!/-00*+!.*)1*<3/#!#:!-=0*!93!B*)1*!:*K*)-0!9)-%:/)#$9:!>)3B!-!:#%10*!03/J:!93! Poster 27 ADDAPTS: A Data-Driven Automated Pipeline and Tracking System Risha Narayan1,2, Kim Rutherford1, Rishi Nag1, Krys Kelly1 1 Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge, CB2 3EA, UK 2 rvn21@cam.ac.uk Project website & source code: http://www.sirocco-project.eu/pipeline/ Open Source License used: GPL v3 Much effort goes into handling the large amount of data produced by current sequencing technologies. In order to manage the raw data and the data about the data (the metadata), we saw a need for a database-backed automated processing pipeline with a web front end. We are developing ADDAPTS, a Data-Driven Automated Pipeline and Tracking System. The ADDAPTS relational database stores metadata about the samples and their associated data files. It interacts with the local sequencing centre LIMS system to initiate sequencing runs and later, to retrieve sequencing reads. A controller process monitors the database and starts new pipeline jobs when appropriate. The dependencies between pipeline tasks are configured in the database itself, and can be modified without pausing the pipeline. The pipeline uses raw sequencing output files in FASTQ format as input, the full analysis results in alignments viewable from the GBrowse genome viewer. The pipeline carries out the following processes: it de-multiplexes files from multiplexed sequencing runs; removes small RNA adapters or clips sequence reads as appropriate; filters reads by size; generates output files and statistics for each stage of the analysis; aligns the reads against an appropriate reference genomes; converts the resulting alignment files into several formats, such as GFF3, SAM and BAM file formats (using SAMtools); and generates BAM indexes for GBrowse. It supports data generated using Illumina and 454 sequencing technologies, and various sequencing applications (e.g. small RNA, RNA-seq, CHIP-seq, genomic DNA). The tracking system provides a web front end for entering, viewing and editing metadata. It also generates charts and statistics for each sample, provides links to the files generated from the pipeline analysis, and generates global reports for all samples. Poster 28 BALLView: A versatile molecular visualization and modeling tool S. Nickels1,2, A.K. Dehof1, D. Stöckel1, S. C. Müller1,2, L. Marsalek2, I. Georgiev2, H.P. Lenhof1, O. Kohlbacher1,3, A. Hildebrandt2,4 1 3 Project website: Source code: License: 2 Zentrum für Bioinformatik Saar, Intel Visual Computing Institute, 4 Eberhard Karls Universität Tübingen, Johannes-Gutenberg-Universität Mainz http://www.ballview.org http://www.ballview.org/Downloads or http://ball-trac.bioinf.uni-sb.de/browser BALLView is licensed under GNU Public License (GPL) Molecular viewing and editing tools are an important part of many application scenarios and processes in structural bioinformatics, computational chemistry, and pharmacy. Successful molecular modeling tools provide reliable modeling functionality, an intuitive graphical user interface, state-ofthe-art graphics and extensive documentation. Here, we present BALLView [1,2], a versatile and flexible molecular visualization and editing tool that is based on the Biochemical Algorithms Library (BALL) [3]. BALLView supports all major molecular file formats and is able to visualize molecular structures using a variety of different representations such as ball-and-stick, cartoon, surface, and volumetric models. Via its GUI, BALLView provides modeling functionality from the underlying BALL library like structure validation processors and molecular mechanics computations using different force fields. In addition, the full potential of the BALL library can be accessed through a Python scripting interface. An extended version of BALLView [4] uses real-time ray tracing for displaying molecular structures. The use of real-time ray tracing in combination with 3D stereo visualization provides a much better structural perception and allows for fast and direct creation of publication quality images. The current versions of BALLView (1.4.0) as well as the BALL library are available for all major platforms including Windows, MacOS X, and Linux. As part of the Debian-Med project, BALLView is directly available from package repositories of current Debian and Ubuntu releases. BALL and BALLView are open source projects and are licensed under LGPL (BALLView) and GPL (BALL) licenses. [1] Moll, A., Hildebrandt, A., Lenhof, H.-P., and Kohlbacher, O. , Bioinformatics, 2006, 22(3), 365-366 [2] Moll, A., Hildebrandt, A., Lenhof, H.-P., Kohlbacher, O. (2005), Journal of Computer-Aided Molecular Design, 2005, 19(11), 791-800 [3] A. Hildebrandt, A.K. Dehof, A. Rurainski, A. Bertsch, M. Schumann, N.C. Toussaint, A. Moll, D. Stockel, S. Nickels, S.C. Mueller, H.P. Lenhof, and O. Kohlbacher, BMC Bioinformatics, 2010, 11, 531 [4] L. Marsalek, I. Georgiev, A. K. Dehof, H.P. Lenhof, P. Slusallek and A. Hildebrandt , Information Visualization in Biomedical Informatics (IVBI) London, 2010, p. 239 - 245 Poster 29 !"#$%&'(")'*"+,-."/0' 1233-%4'567"+%23'8+$4+29/':+$9';-7<-%'=2,"+%2' ' !"#$%&'""'#()*+%,#-.%/0#112$34567+%862552$%94""207+%!"2:)#$;0#%<2$#;'=*+% /3#"';%>2"3#--#(2*+%86'#$%8.'"#$;?@2A2)*%#$;%B#0."2%C.1"2*% D#"#$E0EF'""'#()+%#E$2$#;'=+%:3#"';E12"3#--#(2+%).'"#$;?02A2)+%=#0."2E#EG.1"2HI(#$=32)620E#=EJ:+%% :0#112$3+%(.2""20HI'$1EJ$'?"J212=:E;2% % *8=3.."%.5%B.(KJ620%8='2$=2+%L$'M20)'6A%.5%9#$=32)620+%L/% 7N$)6'6J62%5.0%<2J0.%#$;%>'.'$5.0(#6'=)+%L$'M20)'6A%.5%OP12=:+%C20(#$A% 8+$>".7?/'(")'/-7"0'366KQRRFFFE6#M20$#E.0GEJ:R' *$@+."'.$&"0' 366KQRR=.;2EG..G"2E=.(RKR6#M20$#R).J0=2R10.F)2R6#M20$#S6#M20$#T7U2$G'$2T7U$26E)5E6#M20$#E67E#=6'M'6'2)T7U10#$=32)T7U(#'$62$ #$=2T7U2V620$#"?6.."?#=6'M'6A% A-."%."0%C<L%O2))20%C2$20#"%WJ1"'=%O'=2$)2%XOCWOY%7E*% % Z#M20$#%')%#$%#KK"'=#6'.$%63#6%)JKK.06)%;2)'G$%#$;%2V2=J6'.$%.5%1'.'$5.0(#6'=)%#$;%.6320%6AK2)%.5%F.0:5".F)E% N6% #"".F)% J)20)% 6.% =02#62% =.(K"2V% #$;% 2552=6'M2% ;#6#% K'K2"'$2)% 1A% =.(1'$'$G% 02(.62% &21% F'63% ".=#"% )20M'=2)% F0'662$%'$%[#M#E%% @2=2$6"A+%F2%3#M2%2V62$;2;%632%=#K#1'"'6'2)%.5%Z#M20$#%6.%=#620%5.0%632%'$M.=#6'.$%.5%#KK"'=#6'.$)%63#6%#02%$.6% K0.M';2;% #)% &21% )20M'=2)% .0% #02% F0'662$% '$% [#M#+% 1J6% 0#6320% #02% #M#'"#1"2% #)% 2V620$#"% =.((#$;% "'$2% 6..")E% \F'$G%6.%63')%2V62$)'.$+%)='2$6')6)%=#$%;2)'G$%F.0:5".F)%63#6%J)2%#$;%'$M.:2%M'06J#""A%#$A%5J$=6'.$#"'6A%63#6%')% #==2))'1"2% 2'6320% ".=#""A+% 2EGE+% J)'$G% #% =.((#$;% "'$2+% .0% 02(.62"A+% 2EGE+% .$% #% G0';% $.;2% 6.% F3'=3% J)20% =#$% #J632$6'=#62%J)'$G%G0';%)2=J0'6A%(2=3#$')()%.0%630.JG3%#$%88,%=3#$$2"E%% Z32%2V620$#"%6.."%)20M'=2%"26)%A.J0%F.0:5".F%=#""%6..")%F0'662$%'$%#$A%"#$GJ#G2%#$;%K#))%;#6#%'$6.%#$;%.J6%.5% 63.)2%632%6..")E%Z3')%')%#$%2#)A%F#A%6.%'$="J;2%=#"")%.5%W20"+%WA63.$%.0%#$A%.6320%=.((#$;%"'$2%)=0'K6%F'63'$% Z#M20$#E% Z32% W20"% )=0'K6% =#$% 2M2$% 12% ;A$#('=#""A% ;.F$".#;2;% 50.(% #% "'10#0A% .5% )=0'K6)E% Z32% 2V620$#"% 6.."% )20M'=2%#").%"26)%J)20)%'$620#=6%F'63%G0#K3'=#"%6..")%#)%K#06%.5%632%F.0:5".FE% Z.%#;;%#$%2V620$#"%6.."%#)%#%)20M'=2%F'63'$%#%Z#M20$#%F.0:5".F+%#%J)20%2'6320%=3..)2)%#%6.."%50.(%#%02G')60A%.5% 6..")%X5.0%2V#(K"2%632%6..")%#M#'"#1"2%'$%#%]21'#$%92;%'$)6#""#6'.$Y+%.0%)K2='5'2)%2VK"'='6"A%632%=.((#$;%6.%12% 2V2=J62;E% N$% 1.63% =#)2)% Z#M20$#% #J6.(#6'=#""A% =02#62)% #% )20M'=2% F'63'$% 632% F.0:5".F% =.002)K.$;'$G% 6.% #$% '$M.=#6'.$% .5% 632% 6.."E% Z32% )20M'=2% 3#)% '$KJ6% #$;% .J6KJ6% K.06)% ;20'M2;% 50.(% 632% 6.."% ;2)=0'K6'.$% #)% F2""% #)% K.06)%5.0%632%6.."^)%)6#$;#0;%.J6KJ6%#$;%200.0%)602#()E%Z32%K.06)%#02%J)2;%6.%K#))%;#6#%6.R50.(%#$%'$M.=#6'.$% .5%632%6.."E% N$% Z#M20$#+% 632% '$M.=#6'.$% (2=3#$')(% .5% #$% 2V620$#"% 6.."% ')% )2K#0#62;% 50.(% 632% 6.."^)% )K2='5'=#6'.$% #$;% 632% J)20% =#$% =3..)2% F3202% 632% 6.."% F'""% 12% 2V2=J62;E% U.0% 2V#(K"2+% 632% 2V620$#"% =.((#$;% =#$% 12% 2V2=J62;% .$% #% ".=#"%(#=3'$2%.0%.$%#%02(.62%(#=3'$2%6.%F3'=3%Z#M20$#%=#$%.K2$%#$%))3%=.$$2=6'.$%.$%123#"5%.5%632%J)20E%Z32% )K2='5'=#6'.$%.5%632%6.."%=#$%12%=3#$G2;%'$;2K2$;2$6"A%50.(%632%=3.'=2%.5%F3202%'6%')%2V2=J62;E%Z32%'$M.=#6'.$% (2=3#$')(%')%;2)'G$2;%).%63#6%;2M2".K20)%=#$%#;;%'$%632'0%.F$%2V2=J6'.$%2$M'0.$(2$6)E% Z#M20$#^)% 2V620$#"% 6..")% =#K#1'"'6'2)% 3#M2% 122$% ;2(.$)60#62;% F'63'$% 632% /$.F!@B^)% K#0#""2"% (2;'=#"% '(#G2% #$#"A)')%#6%632%C2$2M#%L$'M20)'6A%,.)K'6#")^%'$620$#"%G0';%="J)620%X366KQRRFFFE:$.F#0=E2JRYE%%Z3202%#02%#").% K"#$)% 6.% J6'"'_2% '6% '$% ;'G'6#"% K02)20M#6'.$+% =3#0#=620'_#6'.$% #$;% ('G0#6'.$% F.0:5".F)+% #)% K#06% .5% 632% 8B!W`% K0.-2=6%X366KQRRFFFE)=#K2?K0.-2=6E2JRYE% Z32% 2V620$#"% 6.."% K"JG'$% F#)% .0'G'$#""A% ;2M2".K2;% 5.0% Z#M20$#% *Ea% #)% K#06% .5% 632% /$.F!@B% K0.-2=6% 5.0% =.(K#6'1'"'6A% F'63% 632% <.0;JC0';% X366KQRRFFFE$.0;JG0';E.0GYE% !)% .5% Z#M20$#% 7Eb+% 63')% )20M'=2% 6AK2% F'""% 12% '$="J;2;% '$% 632% )6#$;#0;% Z#M20$#% ;')60'1J6'.$E% !5620% 632% 02"2#)2% .5% Z#M20$#% 7Eb+% .$2% 5.=J)% F'""% 12% .$% 5J06320% '(K0.M'$G% 632% 5"2V'1'"'6A% 5.0% 02(.62% '$M.=#6'.$% (2=3#$')()% 5.0% 2V620$#"% 6..")+% 'E2E% 5.0% G0';% =.(KJ6'$G+% .$% #% B.$;.0%="J)620%#$;%.$%!&8?6AK2%=".J;)E% Z3')%F.0:%F#)%2$#1"2;%1A%632%`L%c63%U0#(2F.0:%W0.G0#(%'$% 632% =.$62V6% .5% 632% /$.F!@B% K0.-2=6% XN8Z% db7ce*Y+% #$;%632%(AC0';%K"#65.0(%G0#$6%X`WRCd7c7bfR*Y%50.(%632%L/^)%`W8@BE% Poster 30 !"#$%&'()#*+",-.('/-(0#1-&(#2('/-(.-*%,'")(1-+(1"'/('/-(.0--3(#2( &-4%'"#,%4(3%'%+%.-(56-&7",8( J Baran, A Cros, JM Guberman, S Haider, J Hsu, Y Liang, E Rivkin, J Wang, M Wong-Erasmus, L Yao, J Zhang, A Kasprzyk Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3, Canada web: http://www.biomart.org svn: https://code.oicr.on.ca/svn/biomart/biomart-java/branches/release-0_8-candidate_6 license: GNU Lesser General Public License v2.1 9+.'&%)'( The semantic web holds great potential for use with biological data, facilitating the creation of new tools that make use of and find connections between data sets in ways that are currently not possible. However, the utility of the semantic web for biological research is currently hindered by two major obstacles: slow querying speeds and a small set of semanticallyannotated data. To address both of these obstacles, we have added semantic web capabilities to the latest version of the BioMart database management system, thereby taking advantage of its query optimization features, as well as the wide variety of data available through BioMart databases. The latest release of BioMart allows semantic querying through the integration of two accepted standards of the semantic web: OWL for describing ontologies and SPARQL for executing queries. When a new mart is created, an OWL ontology is automatically generated. This can be accessed and queried using a REST interface. Users can also create queries interactively through the web GUI and convert that to a SPARQL query with the click of a button. Currently, semantic web features are being incorporated into all of the members of BioMart Central Portal, a repository incorporating a large number of biological databases, thereby creating a huge repository of semantically-enabled biological data. The semantic web access is still under development, with several plans for the future: BioMart will incorporate existing biomedical ontologies, and will also support the definition of custom semantic relationships between ontologies. Poster 31 !"#$%&'"()*+ ,-&(.&'.,/+0#1"2"3,+&(.+2#445("2&-"#(+"(+6"#,2"3(23+ ! ! "#$%&&%'($$#&)%!"%&$*&+,-.-/-!0%1&!23+45.-6-!(&&%7%*4%!"%&)%8$3+8*9-!:%;*&&!<%=#38+,-! >?34377+!@*AA%'"+88%,-!B?83$!C%D4*8,-E-!F++!G%84%&5H!%&5!)?+!I3*"?%83&=!A*;;#&3)3+$J! ! ! ,J!K&3L+8$3)D!*M!NOM*85-!KPQ!.J!<RIIRQ!6J!S:@B'S:IB-!KPQ!9JC?+!<%83*!S+=83!R&$)3)#)+-! R)%4DQ!HJ!>M3T+8!F)5-!KPQ!EJ!:IR-!KPQ!/!$%J$%&$*&+U=;%34JA*;! "#$%&'(!)&*+,(&!-./!'$001.,(,&+2!)))3*,$+4-#,.53$#5!! 6-(-7$51&¶+!'$/&2!4((8299/#18-73$#5!! ! ! :&+&-#'4!'$001.,(,&+;!<1./,.5!-5&.',&+;!-./!%$1#.-7+!8-#(,',8-(&!,.!(4&!/&=&7$80&.(!$<! 8+7*8)3&=! $)%&5%85$! M*8! )?+! V3*$A3+&A+! 5*;%3&>! ($! &.+1#&! (4-(! +4-#&/! &?8&#,0&.(+! -#&! #&8$#(&/! ),(4! &.$154! ,.<$#0-(,$.! ($! *&! '$08#&4&.+,*7&! -./! '-.! *&! @,.! 8#,.',87&A! #&8#$/1'&/;!'$08-#&/!$#!,.(&5#-(&/3!B,0,7-#!(#&./+!&?,+(!,.!*$(4!(4&!#&517-($#C!-#&.-+J=JD! -./!'$00&#',-7!+',&.'&+J=JE3!"#$7LIHUDWLRQRIVWDQGDUGVLVDSRVLWLYHVLJQRIVWDNHKROGHUV¶ &.5-5&0&.(;!*1(!4$)!01'4!/$!)&!F.$)!-*$1(!(4&+&!+(-./-#/+G!H4,'4!$.&+!-#&!0-(1#&! -./! +(-*7&! &.$154! ($! 1+&! $#! #&'$00&./G! H4,'4! /$0-,.@+A! /$! (4&C! '$=&#G! H4,'4! ($$7+! -./!/-(-*-+&+!,087&0&.(!)4,'4!+(-./-#/@+AG! I,$B4-#,.5! ,+! -! '$77-*$#-(,=&! 8#$%&'(! (4-(! )$#F+! -(! (4&! 57$*-7! 7&=&7! ($! *1,7/! +(-*7&! 7,.F-5&+!*&()&&.!%$1#.-7+;!<1./&#+!@,087&0&.(,.5!51,/-.'&!($!-1(4$#+!-./!/-(-!+4-#,.5! 8$7,',&+;! #&+8&'(,=&7CA! -./! )&77J'$.+(,(1(&/! +(-./-#/,K-(,$.! &<<$#(+! ,.! (4&! *,$+',&.'&+! /$0-,.3! L.! /$,.5! +$;! ,(! )$#F+! ($! &?8&/,(&! (4&! '$001.,'-(,$.! -./! (4&! 8#$/1'(,$.! $<! -.! ,.(&5#-(&/! +(-./-#/+J*-+&/! <#-0&)$#F! <$#! (4&! '-8(1#&! -./! +4-#,.5! $<! 4,54J(4#$15481(! 5&.$0,'+! -./! <1.'(,$.-7! 5&.$0,'! *,$+',&.'&! /-(-;! ,.! 8-#(,'17-#3! M! 8#$($(C8&! $<! (4&! !"#$%&'"()+2&-&1#)53!,+!7,=&!-./!(4-(!,.!8-#(.&#+4,8!),(4!F&C!87-C&#+!+J=JN!,(!-,0+!($2! x A+&)8%43T+! '$001.,(CJ/&=&7$8&/! *,$+',&.'&! +(-./-#/+! @<,51#&! *&7$)A;! '7-++,<,&/! ,.($! (4#&&! (C8&+2! !"#$!%&'() !"*+&!","'%-! @0,.,0-7! ,.<$#0-(,$.! '4&'F7,+(+! ($! #&8$#(! $<! (4&! +-0&! '$#&! +&(! ,.<$#0-(,$.A;! %"!,&'$.$(&/0.) 0!%&10/%-! @+1'4! -+! '$.(#$77&/! =$'-*17-#,&+!-./!$.($7$5,&+!($!/&+'#,*&!(4&!,.<$#0-(,$.A;!-./!"2/30'(")1$!,0%-!@($! '$001.,'-(&!(4&!,.<$#0-(,$.AO! x 43&W! ($! 8$7,',&+;! $(4&#! 8$#(-7+! +J=H-E-X! $8&.! -''&++! #&+$1#'&+! +J=JP! -./! 7,+(+! $<! ($$7+! -./! /-(-*-+&+!,087&0&.(,.5!(4&!+(-./-#/+!+J=JQO! x 5+L+4*7!-./!0-,.(-,.!-!+&(!$<!'#,(&#,-!<$#!-++&++,.5!(4&!R1-7,(C!-./!<$#0-7!#,5$#!$<!(4&! +(-./-#/+;!*1(!-7+$!(4&!,.(&#$8&#-*,7,(C!-./!#&7-(,$.+!-0$.5!(4&0O! x M*$)+8! ,.(&#$8&#-*,7,(C;! -//#&++,.5! $=&#7-8+! -./! /187,'-(,$.! $<! &<<$#(+! (4-(! 4-08&#! (4&,#!),/&#!18(-F&!-./!,.(&#<&#&!),(4!(4&!'#&-(,$.!$<!+(-./-#/+J'$087,-.(!+C+(&0+3! ! 78! S,&7/! T3;! B-.+$.&! BM;! +)! %4;! "A3+&A+;! DUUQO! 98! 86 )'$ ³&'(5 'DWD 6WDQGDUGV 3ODQ 9HUVLRQ! ´O! :8! V3! I-#.&+! +)! %4;! S%)! @+L! 08#=! 03$A*L-! DUUQO! ;8! IV6! :&+&-#'4! W$(&+2! )))3*,$0&/'&.(#-73'$09*0'#&+.$(&+9+&#,&+9/-(-+4-#,.5O!<8!VLIIL2!)))30,**,3$#5O!=8!I,$"$#(-72! *,$8$#(-73*,$$.($7$5C3$#5O! >8! XIX! S$1./#C2! )))3$*$<$1./#C3$#5O! ?8! W-(1#&! "#&'&/,.5+2! 8#&'&/,.5+3.-(1#&3'$0O!@8!W&1#$+',&.'&!L.<$#0-(,$.!S#-0&)$#F2!)))3.&1,.<$3$#5+ Poster 32 !"#$%&!"#$%"&'(")*$%+,"-./"0,12*"02.23-./+,#2*1"4,#," ƒ–ïæ!ƒŽƒæ"#$%#!&'(!&)*+,-./(("#!012+3!ƒ”–ƒæ‡�‹«‹ó–·4#!56-27+/86,!9(3*:6,+;#!<-=2*!>?8@,-"#A#!B3*!56-27+23*!9-C*, D#! B/*!E7/*F#!G-27+/@@,-!H383:I24#! 3*1!E*J,!B/*377,* "#$! %=3+)7KI3(37LM::7K)2MK*/! "5/=8)+3+2/*3(!92/(/JC!N*2+#!N*2!5/=8)+2*J#!9,-J,*#!O/-P3CK!$Q,83-+=,*+!/@!E*@/-=3+2:7#!N*2.,-72+C!/@!9,-J,*#!O/-P3CK!45,*+,-!@/-! 92/(/J2:3(! R,S),*:,! <*3(C727#! >,:6*2:3(! N*2.,-72+C! /@! Q,*=3-I#! G/*J,*7! TC*JMC#! Q,*=3-IK! ;E*7+2+)+! 1,!92/(/J2,!,+!562=2,!1,7! &-/+U2*,7#! 5OHR#! N*2.,-72+U! TC/*! "#! V-3*:,K! AE*7+2+)+,! @/-! 92/2*@/-=3+2:7#! 92,(,@,(1! N*2.,-72+C#! W,-=3*CK! DE*7+2+)+,! @/-! 53*:,-! H,7,3-:6#!X7(/!N*2.,-72+C!Y/782+3(#!O/-P3CK!F0)-/8,3*!92/2*@/-=3+2:7!E*7+2+)+,#!0Z9T#!Y2*[+/*#!NGK! !"#$%&'#()* $##5!6602.7148./96:2.&);<=8=0%#,=8714" <.32(3M(,!)*1,-!+6,!>/%,#2?%">.++.31":@<A;"B8C!(2:,*:,!P2+6!3112+2/*3((C!3((/P,1!2*:()72/*#!,[+,*72/*7!3*1!-,7+-2:+2/*7!2*!—•‡”ǯ•! \ZT!*3=,783:,K!5/*+-2M)+2/*7!+/!*,P!:3*/*2:3(!.,-72/*7#!2*!+6,!!"#$%&'#()!\ZT!*3=,783:,#!3-,!P,(:/=,!)*1,-!7)8,-.272/*!/@!+6,! 92/\RQ!:/*7/-+2)=!]2*!/-1,-!+/!I,,8!92/\RQ!3!:/==/*#!:3*/*2:3(!13+3!=/1,(^K! <!:/==/*!\ZT!R:6,=3!]\RQ^#!1,@2*2*J!3!:3*/*2:3(!\ZT! @/-=3+#! 27! 2=8/-+3*+! @/-! 7=//+6! :/=83+2M2(2+C! /@! 6,+,-/J,*,/)7! +//(7! 3*1! 13+3! -,7/)-:,7#! 3*1! 2*! 83-+2:)(3-! @/-! :/==)*2:3+2/*! P2+6! _,M! 7,-.2:,7K! >6,! (3:I! /@! 3! :/==/*! \RQ`M37,1! @/-=3+! @/-! +6,! M372:! M2/2*@/-=3+2:7! +C8,7! /@! 13+3! 637! =/+2.3+,1! +6,! 1,.,(/8=,*+!/@!92/\RQ!a"bK! 92/\RQ! 27! 3*! \ZT! R:6,=3! 1,@2*2*J! @/-=3+7! /@!+6,!=32*! M2/2*@/-=3+2:7! +C8,7! /@! 13+3! +63+! 3-,! */+! =/1,((,1! MC! 3*C! 78,:23(27,1! 7+3*13-1! \ZT! R:6,=37! 7):6! 37! @/-! ,[3=8(,! R9ZT#! &Q9ZT#! Z<W0`ZT#! ZEV#! W5QZT#! /-! 86C(/\ZT! a$`FbK! 92/\RQ! +6)7! @/:)7,7! /*! M2/=/(,:)(3-! 7,S),*:,7#! 3(2J*=,*+7#! 3*1! 3**/+3+2/*! MC! 3*C! I2*1! /@! @,3+)-,7! /-! 8-/8,-+2,7K! >6,7,! =32*! +C8,7! 3-,! 3::/=83*2,1! MC! 1,@2*2+2/*7! /@! 13+3`-,7/)-:,! 3*1! /*+/(/JC! -,@,-,*:,! @/-=3+7#! 8-/.,*3*:,! =,+313+3#! 7:/-,7#! 3*1!/+6,-K!92/\RQ!637!M,,*!://-12*3+,1!P2+6!+6,! 0)-/8,3*! 0Z9H<50! 8-/c,:+! @/:)7,1! /*! 8-3:+2:3(! 2*+,-/8,-3M2(2+C!3=/*J!M2/2*@/-=3+2:7!+//(7! adbK! >6,! 32=! /@! 92/\RQ! 27! +/! M,:/=,! )7,1! 37! 3! :3*/*2:3(#! Dz•–ƒ†ƒ”†dz 13+3! @/-=3+! @/-! 7,S),*:,! 13+3! 3*1! J,*,-2:! @,3+)-,! 3**/+3+2/*7K! E+! 1/,7! */+! =,3*! +63+! 92/\RQ! 76/)(1!M,!e+6,!/*(C!@/-=3+e#!M)+!3*!,[:63*J,!@/-=3+!+63+! :3*! M,! :/==/*! +/! 7,.,-3(! +//(7! ]37! /*,! /@! =)(+28(,! @/-=3+7! +6,! +//(7! 3-,! 7)88/-+2*J^K! >//(7! :3*! 8-/1):,! 3*1! :/*7)=,! 92/\RQ! 12-,:+(C#! /-! 92/\RQ!:3*!M,!)7,1!37! 3*! 2*+,-=,123+,! :3*/*2:3(! @/-=3+!-2:6!,*/)J6!+/!,*3M(,! :/*.,-72/*7! 3=/*J! 12.,-7,! @/-=3+7K! 92/\RQ! +C8,7! :3*! M,!12-,:+(C!2*:()1,1!2*+/!/+6,-!\ZT!R:6,=37#!/-!+6,C!:3*! M,! @)-+6,-! ,[+,*1,1! /-! -,7+-2:+,1! 2*! 3! 72=2(3-! P3C! +/! /Mc,:+`/-2,*+,1! 8-/J-3==2*J! :(377,7K! >6,! \ZT! R:6,=3! :3*!7,-.,!37!3!78,:2@2:3+2/*!@/-!J,*,-3+2*J!=/-,!,@@2:2,*+! M2*3-C! -,8-,7,*+3+2/*7! 7):6! 37! 0\E! afbK! 92/\RQ! @/-=3+7! ! 63.,! 1,+32(,1! 7+-):+)-,! 3*1! 3-,! -2:6!,*/)J6!+/!7)88/-+! .3-C2*J!-,S)2-,=,*+7!@/-!=3:62*,`)*1,-7+3*13M(,!13+3! 3*1! =,+313+3! -,8-,7,*+3+2/*#! M)+! 3+! +6,! 73=,! +2=,! +-C2*J! */+! +/! M,! +//! :/=8(2:3+,1K! R,=3*+2:7! /@! +6,! 7C*+3:+2:! 92/\RQ! +C8,7! 27! 1,@2*,1! .23! R<_RQT! 3**/+3+2/*!P2+6!:/*:,8+7!@-/=!+6,!0Q<Z!/*+/(/JC!a"gbK! >6,!62J6(2J6+7!/@!+6,!92/\RQ!@/-=3+!2+7,(@!3-,h!7+-):+)-,1! =,+313+3! /@! 72=8(,! 7,S),*:,! -,:/-17! ]37! /88/7,1! +/! V<R><! &*+,"-*7^#! 8-/.,*3*:,! =,+313+3! 2*! 3((! +C8,7! /@! 8-/:,77,1! 13+3!3*1!-,@,-,*:,7#!7+-):+)-,1!-,@,-,*:,7!+/! 13+3! -,7/)-:,7! 3*1! /*+/(/JC! :/*:,8+7! 2*:()12*J! 3! .*/-"-)! /@! +6,! -,(3+2/*#! :/=8(,[! -,(3+2/*7! M,+P,,*! 7,S),*:,!@,3+)-,7#!@,3+)-,!=/1,(!13+3#!=)(+28(,!:/=8(,[! 7:/-,7! P2+6! =,3*2*J7#! @/-=3(27,1! 3**/+3+2/*! /@! -,(3+,1! 8/72+2/*7! /)+721,! /@! +6,! 3**/+3+,1! 7,S),*:,#! 3*1!=/-,K! >6,! *,P! 92/\RQ! .,-72/*! "K"! 637! /8+2=27,1! +6,! 7C*+3[! @/-! @,3+)-,! 3**/+3+2/*#! 7:/-,7#! 7,=3*+2:! 3*1! 13+3! -,@,-,*:,7K! E+! 3((/P7! 3**/+3+2/*! /@! 1,*7,! 7,S),*:,! @,3+)-,7! 388(2:3M(,! +/! P6/(,`J,*/=,! 3**/+3+2/*7! ,[:63*J,1! @/-! ,[3=8(,! 2*! +6,! 7+3*13-127,1! M2*3-C! 0\E! @/-=3+K! >6,! @2-7+! M,+3! .,-72/*! /@! 92/\RQ! "K"! 637! M,,*! -,(,37,1! 2*! Z3C! $g""K! 0[+,*72.,! 7)88/-+! @/-! P6/(,` J,*/=,! 3(2J*=,*+7#! 2*12.21)3(! J,*/=2:7#! 3*1! 7,S),*:,! 8-/@2(,7!P2((! M,!311,1!2*!+6,!*,[+!M,+3!.,-72/*7K! E*! @)+)-,#! 92/\RQ! =)7+! M,! =32*+32*,1! 3*1! -,J)(3-(C! -,@2*,1!2*!/-1,-!+/!@2+!+6,!12.,-7,!3*1!:63*J2*J!*,,17!/@! +6,! M2/2*@/-=3+2:7! :/==)*2+CK! E*./(.,=,*+! /@! +6,! :/==)*2+C! 27! ,77,*+23(! M/+6! @/-! +6,! )8+3I,! 3*1! +6,! @)-+6,-! 1,.,(/8=,*+! P62:6!=)7+!M,!://-12*3+,1!MC!+6,! ,=,-J2*J! 92/\RQ! :/*7/-+2)=K! >/! ,*3M(,! (3-J,-`7:3(,! 31/8+2/*#!7)88/-+2.,!8-/J-3==3+2:!3*1!2*+,-3:+2.,!+//(7! 63.,! +/! M,! 1,.,(/8,1#! 2*:()12*J! @/-=3+! :/*.,-+,-7! 3*1! 2*+,J-3+2/*! P2+6! +6,! Xi9iV! 92/%! @-3=,P/-I7K a"b!ƒŽƒæǡǤ!*01/,K!]$g"g^!92/\RQh!+6,!:/==/*!13+3`,[:63*J,!@/-=3+!@/-!,.,-C13C!M2/2*@/-=3+2:7! P,M!7,-.2:,7K!2"#"-+#(./0"3%#! DE#!2A;g`2A;DK! a$b! Y):I3#! ZK! *01 /,K! ]$gg4^! >6,! 7C7+,=7! M2/(/JC! =3-I)8! (3*J)3J,! ]R9ZT^h! 3! =,12)=! @/-! -,8-,7,*+3+2/*! 3*1! ,[:63*J,!/@!M2/:6,=2:3(!*,+P/-I!=/1,(7K! 2"#"-+#(./0"3%#!=F#!A$;` A4"K! a4b!_,7+M-//I#BK! *01/,'1]$ggA^! &Q9ZTh!+6,!-,8-,7,*+3+2/*! /@!3-:62.3(!=3:-/=/(,:)(3-!7+-):+)-,!13+3!2*! \ZTK!2"#"-+#(./0"3%#!D=#!fddȂff$K! a;b!R8,((=3*#&K>K!*01/,'1]$gg$^!Q,72J*!3*1! 2=8(,=,*+3+2/*!/@!=2:-/3--3C! J,*,!,[8-,772/*! =3-I)8!(3*J)3J,! ]Z<W0 `ZT^K!4*-#.*12"#,K#!B#!-,7,3-:6gg;DK"`gg;DKfK! aAb! Y,-=c3I/M#YK! *01 /,'1 ]ʹͲͲͶȌ Š‡ ǯ•‘Ž‡…—Žƒ”–‡”ƒ…–‹‘ˆ‘”ƒ–Ȅ3!:/==)*2+C!7+3*13-1!@/-!+6,!-,8-,7,*+3+2/*!/@!8-/+,2*!2*+,-3:+2/*!13+3K! 5/0'2"#0*36-#,'#!DD#!"FF` "d4K! aDb! G/++=3**#HK! *01 /,'1 ]$ggd^! <!7+3*13-1!ZEWRjZEZR!:/=8(23*+!\ZT!7:6,=3h!+/P3-1!+6,!1,.,(/8=,*+!/@!+6,! W,*/=2:!5/*+,[+)3(!Q3+3!Z3-I)8!T3*J)3J,!]W5QZT^K!789:;#!=D#! ""A`"$"K! aFb!Y3*#ZKkK!3*1!l=37,I#5KZK!]$ggf^!86C(/\ZTh!\ZT!@/-!,./()+2/*3-C!M2/(/JC! 3*1! :/=83-3+2.,!J,*/=2:7K! 28:12"#"-+#(./0"3%#! =C#!4ADK!! adb!&,++2@,-#!RK!*01/,K!]$g"g^!>6,!0Z9H<50!_,M!7,-.2:,!:/((,:+2/*K!5<3,*"31=3"&%1>*%'#!BG#!_Dd4Ȃ_DddK! afb!0@@2:2,*+!\ZT!E*+,-:63*J,!]0\E^!V/-=3+!"KgK!6++8hjjPPPKP4K/-Jj>Hj,[2j!! a"gb!6++8hjj,13=/*+/(/JCK7/)-:,@/-J,K*,+! ! Poster 33 !"#$%&'(')*)+)&,-./01-")2-.3'%$1')0(4',5-1')5#,)1"#$%)1#6.$40(&) :;< 78)9#,', ;)=8)=1/>(/',, :;?;< : : ? ;)@8)A'0B'(C4'0(', ;)98)D,#('(E',& ;)F8)=.'1/4 ;)+8)D"#CCGH,-(%C4I44', :) !""#$%$&$'("')"*+(+,$-"./$0+1$'2'345"#+/67,1+(,"')"8+0$-62"*+(+,$-&9"8'2+-:267"6(0";2$($-62"<=6716-'2'345" """">((&?7:-@"8+0$-62"A($%+7&$,49">((&?7:-@9"B:&,7$6" C""#+/67,1+(,"')"#6,6?6&+"6(0">()'716,$'("D4&,+1&5">(&,$,:,+"')";'1/:,+7"D-$+(-+5"" """"A($%+7&$,4"')">((&?7:-@9">((&?7:-@9"B:&,7$6"" " " " " " """""""""<)1#(4,0E$4'%)'J$-""K) " ;'1/:,+7" &-$+(-+" /264&" 6(" $1/'7,6(," 7'2+" $(" ,'064E&" 3+(+,$-" 7+&+67-=F" G+H," 3+(+76,$'(" &+I:+(-$(3" ,+-=('2'3$+&" /7'0:-+" 6(" +('71':&" 61':(," ')" 06,69" ,=+7+?4" /:&=$(3" 3+(+,$-" 26?'76,'7$+&",'",=+"2$1$,&"')"06,6"&,'763+"6(0"-'1/:,6,$'(62"/'J+7F"G+J"$()'716,$-&"6//7'6-=+&" 67+" ,=+7+)'7+" (++0+0" ,'" +2$1$(6,+" ,=+&+" &='7,-'1$(3&9" /7'%$0+" /'&&$?$2$,$+&" ,'" 6//24" -:77+(," 623'7$,=1&"$(",=+"67+6"')"K$'$()'716,$-&"6(0"$1/7'%+",=+$7":&6?$2$,4F"" L+" /7+&+(," !"#$%&'(';" 6" &4&,+1" ,'" &$1/2$)4" ,=+" 6--+&&" ,'" -'1/:,6,$'(62" 7+&':7-+&" 6(0" 6&&'-$6,+0"-'1/:,6,$'(62"1'0+2&"')"-2:&,+7"67-=$,+-,:7+&"M/:?2$-"6(0"/7$%6,+"-2':0&NF";2':03+(+" 6&&$&,&"+(0":&+7&"$("+H+-:,$(3"6(0"1'($,'7$(3"0$%+7&+"623'7$,=1&"%$6"6"04(61$-6224"376/=$-62"J+?" $(,+7)6-+"6(0"/7'%$0+&"6("O8P"$(,+7)6-+",'"600" ):,:7+"0+%+2'/1+(,&"'7"6(4"@$(0"')"/7'3761&F" L+" 0+1'(&,76,+" '(" +H$&,$(3" 623'7$,=1&" M+F3F" ;2':0K:7&," Q!RN" ='J" ,=+" $(,+376,$'(" -6(" ?+" 0'(+" J$,=" 2$,,2+" +))'7,9" ,=+7+?4" 16@$(3" ;2':03+(+" +&/+-$6224" :&+):2" )'7" ,=+" +%62:6,$'(" ')" 0$))+7+(," 6//7'6-=+&9" )'7" ,=+" 7+/7'0:-,$'(" ')" /7+%$':&" 7+&:2,&" 6(0" )'7" 6" &$1/2$)$+0" :&63+" ')" -:77+(," 623'7$,=1&F"K6&+0"'("'/+("&':7-+")761+J'7@&"2$@+"B/6-=+"S60''/"QCR"6(0"B/6-=+"L=$77"QTR9"J+" $1/2+1+(,+0"6"/7',',4/+",'"%62$06,+"':7"6//7'6-="M&++"U$3:7+"!NV">("6")$7&,"&,+/9",=+":&+7"$&"6?2+" ,'"&+,":/"6"-2:&,+7"67-=$,+-,:7+"%$6"6(" $(,:$,$%+" J+?" $(,+7)6-+F" B" ):224" '/+76?2+" 6(0" -:&,'1$W+0" -2:&,+7" $&" ,=+("/7'%$0+09"$(-2:0$(3"622"(+-+&&674" :&+7" 06,6" 6(0" &+7%$-+&" M2$@+" B/6-=+" S60''/N"'("$,F">("6"&:?&+I:+(,"&,+/"6" 376/=$-62" $(,+7)6-+" M-622+0" !"#$%&'( )#*+,-.',( /0%,12#',3( !)/!" $&" 6:,'16,$-6224" 26:(-=+0" '(" ,=+" 16&,+7" ('0+" ')" ,=+" -2:&,+7F" .8>" -6(" ?+" &++(" 6&" 6(" 6?&,76-,$'(" ')" ,=+" :(0+724$(3" &4&,+1" 67-=$,+-,:7+" )7'1" ,=+" +(0" :&+79" 6&" $," 2$+&" '(" ,'/" ')" ,=+" $(,+376,+0" /7'3761&" 6(0" 622'J&" ,=+" :&+7" ,'" -'11:($-6,+" 6(0" $(,+76-," Figure 1 - Cloudgene Prototype in Action J$,=",=+"-2:&,+7"6&"J+22"6&",'"7+-+$%+" )++0?6-@" ')" -:77+(,24" +H+-:,+0" J'7@)2'J&F" ;2':03+(+" -6(" ?+" '/+76,+0" %$6" -'()$3:76,$'(" )$2+&" J$,="-2+67"0+)$(+0"$(/:,"6(0"':,/:,"%67$6?2+&",'"1$($1$W+"-'()$3:76,$'("$&&:+&")'7"+(0":&+7&F"X'" &:1" :/9" ;2':03+(+" $&" 6" )7++" &'),J67+" &'2:,$'(" ,=6," 622'J&" ,=+" :&+7" ,'" #'',$$( &021#$%1.'%.1,9" ,4,'.%,( 0,5( 61( ,4&$%&07( #"761&%89$" $(" 6" :&+7" )7$+(024" J649" 1,*16-.',( 1,$."%$" 6(0" 960&%61( #0-( *&*,"&0,"86/Y+0:-+"Z'?&F" " <7'Z+-,"&$,+V"/44.LMM1"#$%&'('8$0EN8-18-4) D':7-+"-'0+V"/44.LMM1"#$%&'('8$0EN8-18-4M%#O("#-%C8/46") P$-+(&+V"*GA"*<P"%T" " " P:Q"8F";F"D-=6,WF";2':0K:7&,V"=$3=24"&+(&$,$%+"7+60"16//$(3"J$,="86/Y+0:-+F"K$'$()'716,$-&9"C[V!T\T]!T\^9"_:("C``^F"" P?Q"B/6-=+"S60''/F"=,,/Vaa=60''/F6/6-=+F'73a PRQ"B/6-=+"L=$77F"=,,/Vaa$(-:?6,'7F6/6-=+F'73aJ=$77a" Poster 34 The RapidMiner Plugin for Taverna: bringing Data Mining Tools to Bioinformatics Workflows Simon Jupp1, James Eales1, Simon Fischer2, Sebastian Land2, Rishi Ramgolam1, Alan Williams1 and Robert Stevens1 {simon.jupp, james.eales, rishi.ramgolam, alan.r.williams, robert.stevens}@manchester.ac.uk, {fischer, land}@rapid-i.com 1School of Computer Science, University of Manchester, Oxford Road. Manchester M13 9PL, UK GmbH, Stockumer Str. 475, 44227 Dortmund, Germany Project Site: http://www.e-lico.eu/ Source code: http://taverna.googlecode.com/svn/unsorted/taverna-elico/ Licence: GNU Lesser General Public License (LGPL) 2.1 2Rapid-I Knowledge discovery through pattern finding in data is central to modern molecular biology, which now has thousands of databases and similar numbers of tools for processing those data. Any data analysis in molecular biology involves gathering and processing data from many sources, even before the analysis for the central biological question takes place. Taverna (http://www.taverna.org.uk) is a workflow workbench that allows bioinformaticians to create data pipelines involving distributed Web services and other forms of tool; these workflows gather and manage data in order to perform analyses that ask biological questions. RapidMiner (RM) is an open source, cross-platform application, released under the AGPLv3, that brings a large suite of data processing, visualization and data mining tools to bear upon tables of data, such as those that can be gathered by Taverna. A typical task for RM is to apply a series of operators to a table of gene products, their functions and locations to perform simple correlations of location and function. More sophisticated tasks will involve training classifiers over a set of features, selecting the best features and then applying the classifier to test data. The RapidAnalytics enterprise server from Rapid-I (http://rapid-i.com/) provides a platform for interacting with these data mining operators from RM via the RapidAnalytics execution service. Through the RM plugin for Taverna we have combined the ability to gather and process data from many molecular biological sources with RM’s data mining capabilities to provide a powerful tool for scientific analysis. The RapidAnalytics execution service is a single polymorphic WSDL service. It takes a reference to the input file locations for the operator as input, along with a set of parameters including the name of the operator to execute. The polymorphic functionality, however, can make it difficult to work with in an environment like Taverna. For this reason the RM-specific plugin was developed. Using the plugin, the available operators within RM are exposed within Taverna. In addition we provide dialog-based interactions for setting input file locations and operator invocation parameters. RapidAnalytics requires the data being processed to sit within the RapidAnalytics repository. This reduces the need to pass large amounts of data between services and improves the execution time on file transfer overheads associated with running distributed workflows. So, before data can be analyzed in Taverna, these data must be first uploaded onto the RapidAnalytics server. When any RM operator in Taverna is used, the user is given a configuration dialog. From this dialog a user can access the RapidAnalytics server in order to browse or upload new data to the repository. The first release of the plugin is currently limited to a subset of RM operators. Simple operators that work on input files and generate a set of output files are easily handled. However, RM also contains some specialized operators, that we call dominating operators that control the execution of one or more sub data mining processes. This control currently requires some logic that cannot be expressed in Taverna via web services alone. These tools are important in data mining, so work is currently underway to extend the RM plugin. The RM plugin already makes available a large number of data processing, visualization and data mining tools for bioinformatics analyses implemented as workflows within Taverna. In order to illustrate the benefits the RM plugin brings, we have developed several workflows to demonstrate its functionality in some typical bioinformatics tasks. These workflows are available for download from myExperiment in http://www.myexperiment.org/groups/402.html This work is funded by the EU/FP7/ICT-2007.4.4 e-LICO project. Poster 35 !"#$%"&'$()* +(&* ,(%-".(#* /0%* 1"0"2/0%34$".#5* * 6'7$"87(* ,(9'(2.(* :7";23(2$* * !"#"$%&'%($)*+,-.%/01"*%2'%!$)3#"$%0-4%5")66$"7%/'%20$#)-% % 8)99":"%)6%;,6"%<3,"-3"*.%=-,>"$*,#7%)6%?@-4"".%?@-4"".%=A.%??B%CDE% % <34"75%F'>'#$)*+,-G4@-4""'03'@H% =%0>(.$5%+##FIJJKKK'3)1FL,)'4@-4""'03'@HJM0L0K*% ,0'%.(*?0)(5%+##FIJJKKK'3)1FL,)'4@-4""'03'@HJM0L0K*J0$3+,>"JM0L0NF$)M"3#'O,F% @".(2.(5%PF03+"%Q%% % R"S4%9,H"%#+"%0L*#$03#%#)%L"%3)-*,4"$"4%6)$%#+"%#09H%0-4%#+"%F)*#"$%*"**,)-*'% % R"L*"$>,3"*%#"3+-)9):7%+0*%L"3)1"%0%F)F@90$%1"#+)4%#)%10H"%L,),-6)$10#,3*% #"3+-,T@"*%K,4"97%0>0,90L9"%,-%0%F$):$0110L9"%)$%*3$,F#0L9"%6)$1'%%D)K">"$.% "U"3@#,)-%9,1,#*%)6%F@L9,3%K"L%*"$>,3"*%0$"%)6#"-%#))%$"*#$,3#,>"%0-4%#+"%40#0%F$,>037%)6% #+",$%@*"$*%30--)#%L"%:@0$0-#""4'%V-%044,#,)-.%#+"%033"**,L,9,#7%)6%K"L%*"$>,3"*%0*% K"99%0*%#+"%6$"T@"-37%)6%#+",$%,-#"$603"%3+0-:"*%,*%)@#*,4"%)6%#+"%@*"$*S%3)-#$)9'% W,-0997.%10-7%K"L%*"$>,3"*%F$)>,4"%>"$7%9,1,#"4%033"**%#)%#+"%3)110-4N9,-"% F0$01"#"$*%)6%#+"%#))9*%#+"7%K$0F'% D"$"%K"%F$"*"-#%#+"%/P>0%2,),-6)$10#,3*%P-097*"*%R"L%<"$>,3"*%X/P2PR<Y% *7*#"1.%0%3)99"3#,)-%)6%<ZP!%K"L%*"$>,3"*%K+,3+%)>"$3)1"*%#+"*"%F$)L9"1*'% /P2PR<%9"#*%0%-)$109%@*"$%4"F9)7.%)F"$0#"%0-4%3)-6,:@$"%K"L*"$>,3"*%)-%#+",$%)K-% +0$4K0$"'%(+"%/P2PR<%,-*#0990#,)-%F$)3"4@$"%,*%*,1F9"%0-4%033"**,L9"%6)$%0-% 0>"$0:"%@*"$%K,#+)@#%F$):$011,-:%"UF"$,"-3"'%/P2PR<%30-%$@-%)-%0%*,-:9"% 3)1F@#"$.%)$%0%39@*#"$%10-0:"4%L7%Z$039"%5$,4%[-:,-".%;)04%<+0$,-:%W03,9,#7%0-4% )#+"$%*7*#"1*%>,0%?\]PP'%8)-6,:@$0#,)-%6,9"*%*F"3,67%+)K%M)L*%*+)@94%L"%$)@#"4%#)% #+"%9)309%103+,-"%)$%39@*#"$%4"F"-4,-:%)-%9)04%)$%*,O"%)6%M)L'%%% /P2PR<%3@$$"-#97%F$)>,4"*%F$):$0110#,3%033"**%#)%#+"%3)11)-%1@9#,F9"% *"T@"-3"%09,:-1"-#%1"#+)4*%!$)L3)-*.%(N3)66"".%]@*39".%]066#.%0-4%89@*#09R%0-4% 30-%L"%033"**"4%6$)1%#+"%F)F@90$%/09>,"K%1@9#,F9"%09,:-1"-#%0-097*,*%K)$HL"-3+%)$% #+"%3)110-4N9,-"%39,"-#%#+0#%,*%4,*#$,L@#"4%K,#+%/P2PR<'%P9#+)@:+%3@$$"-#97%6)3@*,-:% )-%1@9#,F9"%09,:-1"-#.%#+"%/P2PR<%6$01"K)$H%,*%69"U,L9"%0-4%30-%L"%"0*,97%"U#"-4"4% K,#+%)#+"$%K"L%*"$>,3"*'%R"%F90-%#)%044%4,*)$4"$.%3)-*"$>0#,)-%0-4%*"3)-40$7% *#$@3#@$"%F$"4,3#,)-%K"L%*"$>,3"*%#)%/P2PR<%*))-'% Poster 36 !"#$%&'(')"'%*+"',%-$.+'/$)0+1%$2'/%$'30)4+'*$%.+,,3"4' )"#')")56,3, &!"#$%&'(%)*+,,"%&&-.&/#"0$,&12&34)56*% 7(892&":&3*",;*(%;(,&$%<&=>9)*9*"%&$%<&?(%9()&:")&3*",;*(%;(,. @$)"6*%,+$&A%,9*9>9(9.&'B6,"CB5(%&D.&="C>0.&'><<*%5(.&EF(<(% 7 0$#"5%GH$)(9$2")5 #998IJJFFF2(%<)"C2%(9 K",9&":&9#(&;"<(&*,&>%<()&9#(&LM;6$>,(&3E7&6*;(%,(2&E*%56(&:*6(,&$)(&NOP2 Q#*6(&R>*6<*%5&S*)9>$6MQ")0&3$,(.&$&<$9$R$,(&":&,8$9*"M9(08")$6&5(%(&(T8)(,,*"%&8$99()%,& <>)*%5&!"#$%$&'()&<(C(6"80(%9.&F(&:">%<&9#$9&9#(&;>))(%9&9""6&;#$*%&:")&0*;)",;"8G&F$,& *%,>::*;*(%92&?"00();*$6&,":9F$)(&;">6<&%"9&R(&0"<*:*(<&:")&">)&8>)8",(,&,*%;(&F(&6$;+(<&9#(& ,">);(&;"<(2&A0$5(!.&9#(&<(:$;9"M,9$%<$)<&*0$5(&,":9F$)(.&F$,&%"9&<(,*5%(<&:")&0"<()%& %((<,2&Q(&:">%<&9#$9&F(&F()(&%"9&9#(&"%6G&"%(,&)>%%*%5&*%9"&6*0*9$9*"%,&$%<&<(;*<(<&9"&<"& ,"0(9#*%5&$R">9&*92 /"&,"6C(&9#(,(&8)"R6(0,&F(&#$C(&<(C(6"8(<&$&%(F&86$9:")0&M&U%<)"C&M&9#$9&;"C(),&9#(&(%9*)(& 0*;)",;"8G&F")+:6"F2&A9&*,&$&86>5M*%&:)$0(F")+&;"%,*,9*%5&":&-VW&WWW&6*%(,&":&!$C$.&)>%%*%5& "%&$66&"8()$9*%5&,G,9(0,2&/#(&:"66"F*%5&$)(&U%<)"CX,&:($9>)(, • A9&#$,&$<C$%;(<&;"%9)"6&":&9#(&0*;)",;"8(&#$)<F$)(&YC*$&K*;)"M0$%$5()Z • A9&;$%&C*(F.&$%%"9$9(.&8)";(,,&$%<&$%$6G[(&)(;")<*%5,2&=(F&$886*;$9*"%,&$)(& *%9)"<>;(<&F*9#&9#(&5)$8#*;$6&8)"5)$00*%5&6$%5>$5(&Y:6"F,Z.&*%9()8)(9(<&!$C$&")&%(F& 86>5*%,&YF)*99(%&*%&!$C$Z2&U%<)"C&;$%&$6,"&R(&>,(<&$,&$&6*R)$)G.&0$+*%5&#($CG&<$9$& $%$6G,*,&,*086(2 • /#()(&$)(&$9&9#(&0"0(%9&$R">9&-LW&*0$5(&8)";(,,*%5&:*69(),&$%<&\W&86>5*%, • ]&%(F&,;#(0$&:")&<$9$&,9")$5(&F#*;#&#$%<6(,&$)R*9)$)G&0(9$<$9$.&V7&)(;")<*%5,.& 0*T(<&)(,"6>9*"%,&$%<&;"08)(,,*"%&YR"9#&6",,G&$%<&6",,6(,,Z2 • P$)5(&<$9$,(9,&$)(&#$%<6(<&5)$;(:>66G2&K")("C().&9#(&,8((<&":&0",9&"8()$9*"%,&*,& >%$::(;9(<&RG&9#(&,*[(&":&<$9$,(9,2&U%<)"C&,>88")9,&0",9&:*6(&:")0$9,&YC*$&3*"M :")0$9,Z 8-$'/$)0+1%$2'3,'%*+"')"#'/$++'%/'.9)$4+:' ;<'3,'934956'/5+=3>5+')"#'.)"'>+')#%*<+#'<%' )"6'"++#,'3"'.-$$+"<'$+,+)$.9:'?,')"' )**53.)<3%"@'1+'9)&+'-,+#'%-$'/$)0+1%$2' <%'A-)"<3/6')'5)$4+'"-0>+$'%/'4+"+,B' +=*$+,,3%"'5+&+5,:' Poster 37 fastapl – a utility for processing fasta format data Paul Horton AIST, Computational Biology Research Center Tokyo, Japan horton-p@aist.go.jp Introduction Processing of sequence and annotation data in multifasta format is a common task in bioinformatics. Existing resources to facilitate this task either provide a predefined set of configurable scripts to cover many common cases (e.g. EMBOSS tools) or provide libraries (e.g. BioPerl) which can be called from user programs. fastapl, (FASTA Perl Loop, pronounced like ”fasta apple”) is a new and complementary approach, motivated by the observation that the tasks users require are often simple but ad hoc tasks. Some of which would be almost trivial to do with standard linux tools such as grep, sed, wc, etc. – except for the fact that these tools are line based, rather than fasta record based. fastapl provides functionality analogous to perl used with its -n and related switches, but looping over fasta records instead of lines. Examples fastapl is best described by listing examples: Reformat sequence lines to have max line length 100. % fastapl -p -l 100 Truncate sequences to maximum sequence length of 39. % fastapl -p -e ’$seq = substr( $seq, 0, 39 )’ Reverse complement DNA sequences. % fastapl -p -e ’$seq = reverse $seq; $seq = tr/acgtACGT/tgcaTGCA/’ Print records of sequences not starting with methionine. % fastapl -g -e ’$seq !~ /^M/’ Randomly shuffle sequences. % fastapl -M ’List::Util qw(shuffle)’ -p -e ’$seq = join( "", shuffle(@seq) )’ Sort records by id. % fastapl --sort -e ’$id1 cmp $id2’ Code Generation An unusual and useful feature of fastapl is that it can generate a standalone program in standard perl for any command it can handle (-n option). This is very helpful when debugging user scripts. Moreover, it eases the transition from quick and dirty one-liners to more robust programs – which often happens when we find ourselves repeating tasks we initially thought would be only need to be done once or twice. Availability and Future Directions fastapl may be used by anyone under the GNU general public license (GPLv3). More examples, documentation, and the source code are available at http://seq.cbrc.jp/fastapl. We are currently developing fastqpl, which is closely analogous to fastapl, but for use with fastq format files. In addition, we plan to package this software as a debian package. Poster 38 )/$0($Q2SHQ6RXUFH$JHQW%DVHG0RGHOOLQJ3ODWIRUP IRU+LJK3HUIRUPDQFH&RPSXWLQJRI%LRORJLFDO6\VWHPV 0HVXGH%LFDN6LPRQ&RDNOH\0LNH+ROFRPEH'DZQ)LHOG (PDLOPELFDN#FHKDFXN 1(5&&HQWUHIRU(FRORJ\DQG+\GURORJ\0DFOHDQ%XLOGLQJ%HQVRQ/DQH:DOOLQJIRUG8.2;%% 'HSDUWPHQWRI&RPSXWHU6FLHQFH5HJHQW&RXUW3RUWREHOOR6WUHHW8QLYHUVLW\RI6KHIILHOG6KHIILHOG6'3 )/$0(LVGLVWULEXWHGZLWKD*3/Y/LFHQVHYLDKWWSVRXUFHIRUJHQHWSURMHFWVIODPHDEPILOHVSURMHFW 7KH LGHRORJLHV VXUURXQGLQJ &HOOXODU $XWRPDWD JDYH ELUWK WR WKH FRQFHSW RI DJHQWEDVHG PRGHOOLQJ $%0 ,QWURGXFHG E\ 5H\QROGV LQ $%0 UHFHQWO\ EHFDPH WKH GULYLQJ IRUFH LQ YDULRXVUHVHDUFKDUHDVHVSHFLDOO\DIWHUWKHDGYHQWRISRZHUIXOSDUDOOHOFRPSXWHUV5DWKHUWKDQ ORRNLQJDWV\VWHPVDVDZKROH$%0HQFRXUDJHVERWWRPXSDSSURDFKHVDOORZLQJIRFXVRQWKH HPHUJHQFH RI FRPSOH[LW\ DV D FRQVHTXHQFH RI LQWHUDFWLRQV EHWZHHQ LQGLYLGXDO DJHQWV 7KH SULQFLSDOFRPSRQHQWVRIDV\VWHPDUHUHSUHVHQWHGDVDXWRQRPRXVDJHQWVZKLFKDUHDVVLJQHG FKDUDFWHULVWLFSURSHUWLHVDQGEHKDYLRXUDOUXOHVZLWKLQDQHQYLURQPHQW7KH\HQFDSVXODWHDVWDWH WKDW FKDQJHV EDVHG RQ WKH H[FKDQJH RI LQIRUPDWLRQ GXULQJ ORFDO LQWHUDFWLRQV 7KXV RYHU WLPH WKHV\VWHPHYROYHVIURPPLFUROHYHOWRPDFUROHYHODQGFRPSOH[EHKDYLRXULVREVHUYHGDWWKH SRSXODWLRQOHYHO6\VWHPVEDVHGRQGLIIHUHQWLDOHTXDWLRQVWHQGWRDSSUR[LPDWHDFWXDOEHKDYLRXU ZKHUHDV DJHQWEDVHG V\VWHPV FDSWXUH GLIIHUHQFHV EHWZHHQ LQGLYLGXDOV VXSSRUWLQJ IOH[LELOLW\ DQGYDULDELOLW\ )/$0(GHYHORSHGDWWKH8QLYHUVLW\RI6KHIILHOGLVDQRSHQVRXUFH$%0IUDPHZRUNWKDWDOORZV PRGHOOHUV IURP DOO GLVFLSOLQHV WR HDVLO\ ZULWH DJHQWEDVHG PRGHOV %DVHG RQ ;0/ DQG & LW VXSSRUWV YDULRXV OHYHOV RI FRPSOH[LW\ IURP PRGHOOLQJ PROHFXOHV WR FRPSOHWH FRPPXQLWLHV E\ RQO\ YDU\LQJ DJHQW GHILQLWLRQV DQG IXQFWLRQV ;PDFKLQHV DUH XVHG DV WKH DJHQW DUFKLWHFWXUH ZKLFK SURYLGH DJHQWV ZLWK PHPRU\ VWDWHV DQG WUDQVLWLRQ IXQFWLRQV $JHQWV FRPPXQLFDWH YLD D 0HVVDJH 3DVVLQJ ,QWHUIDFH 03, KDQGOHG E\ DQ LQWHOOLJHQW PHVVDJH ERDUG ZKLFK DOORZV ILOWHULQJRIPHVVDJHVWKHUHE\LPSURYLQJSHUIRUPDQFH)/$0(XVHVDGLVWULEXWHGPRGHO6LQJOH 3URJUDP0XOWLSOH'DWD630'DQGKDQGOHVGHDGORFNVWKURXJKV\QFKURQLVDWLRQSRLQWV 0RGHOOHUVDUHKLQGHUHGE\FRPSOH[LWLHVRISRUWLQJPRGHOVRQSDUDOOHOSODWIRUPVYHUVXVWKHWLPH WDNHQWRUXQODUJHVLPXODWLRQVRQDVLQJOHPDFKLQH)/$0(VWDQGVRXWIURPRWKHUVDVLWDOORZV FRPSOH[PRGHOVWREHUXQLQSDUDOOHORYHU+LJK3HUIRUPDQFH&RPSXWLQJ+3&JULGVYLD+3&[ OLEUDULHV E\ DXWRPDWLFDOO\ JHQHUDWLQJ WKUHDGHG FRGH $V D UHVXOW VLPXODWLRQV ZLWK DV PDQ\ DV DJHQWV ZHUH VXFFHVVIXOO\ SHUIRUPHG ZLWKLQ PLQXWHV HQKDQFLQJ UHVHDUFK LQ WHUPV RI WLPH DQG FRPSOH[LW\ RI PRGHOV 7KLV ZRUN ZDV GRQH LQ FROODERUDWLRQ ZLWK WKH 6FLHQFH DQG 7HFKQRORJ\)DFLOLWLHV&RXQFLODWWKH5XWKHUIRUG$SSOHWRQ/DEVDVSDUWRIWKH(85$&(SURMHFW )/$0(KDVEHHQXVHGLQPRGHOOLQJYDULRXVELRORJLFDOV\VWHPVUDQJLQJIURPIRUDJLQJVWUDWHJLHV RILQGLYLGXDODQWVLQODUJHFRORQLHVEHKDYLRXURI(FROLEDFWHULDLQGHR[\JHQDWHGHQYLURQPHQWV LQWHUFHOOXODU ERQGLQJ PLJUDWLRQ SUROLIHUDWLRQ LQ XURWKHOLDO WLVVXHV WR WKH HIIHFW RI JHQRPH VL]H RQSKDJHDQGEDFWHULDOLQWHUDFWLRQV)XQGHGE\WKH(365&)/$0(DLPVWRSURYLGHLPSURYHG +3&SHUIRUPDQFHE\DOORZLQJG\QDPLFGLVWULEXWLRQRIDJHQWVGXULQJSDUDOOHOVLPXODWLRQV Poster 39 GeneNomenclatureUtils: Tools for annotating genes and comparing gene lists with community resources. * Mike D.R. Croning and Seth G.N. Grant *†* Genes to Cognition Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, †UK and Division of Clinical Neuroscience, Royal Infirmary of Edinburgh, 51 Little France Crescent, Old Dalkeith Road, Edinburgh, EH16 4SA. Email: mdr@sanger.ac.uk Project URL: www.genes2cognition.org/software/GeneNomenclatureUtils Code URL: https://github.com/mdrc/GeneNomenclatureUtils License: Artistic License 2.0, http://www.opensource.org/licenses/artistic-license-2.0.php Verifying, annotating, storing, and comparing lists of genes is now an essential task in modern experimental and computational biology. These lists can be derived from so-called �omics’ technologies that allow one to investigate gene (transcriptomics) or protein expression (proteomics) across the genome under defined conditions such as drug treatment, developmental stage, or clinical status. Similarly lists of candidate disease genes are being generated from the huge quantities of sequence data produced by geneticists and clinicians employing new-sequencing technologies to identify sequence variation in large cohorts of human patient samples. Comparing lists of genes produced by different labs from these disparate experimental sources remains an arduous, but obligatory task for the bioinformatician. This is because gene and protein identifiers are not stable, as they are continually revised and withdrawn over time by the gene nomenclature committees (such as HGNC) and database identifiers provided by other community genomic resources change similarly. This �identifier creep’ is confounded by the fact that gene list comparisons often have to be made across species and model organism. Here we provide an extensive suite of 40+ command-line driven utilities that are designed to simply this process. We provide an automated means to fetch the key nomenclature and other genomic, bibliographic and disease annotation resources from: HGNC, MGI, NCBI Entrez, OMIM, PubMed and UniProt, and store them on local disk. Required MEDLINE records are cached in a local MySQL database. This ensures a high-availability of all the resources, and allows a consistent set of nomenclature and annotation to be used across the lifetime of an ongoing experiment or analysis. Scripts are provided to verify the gene symbols, and species-specific database identifiers to which they are associated, and to project across genome to orthologous genes and identifiers. Once a list is verified it can be quickly integrated into the list comparison engine, which can store and compare any number of such gene lists in a particular identifier-space. This makes it straightforward to manage hundreds of lists, compare them, and subsequently recheck their nomenclature. Other tools allow look-up of human disease association in OMIM, what genes have reported mouse knockout models, and the frequency of occurrence of protein domains, as assessed from a list of genes. The component Perl scripts can quite simply be chained together to create repeatable workflows, and some care has been taken to use a consistent (and minimal) set of well-documented command line parameters for this purpose. The tools perform input/output using tab-delimited text files, a widelyaccepted format supported across the community. The package incorporates a number of accessory scripts to manipulate and check tab-delimited files, and to interconvert with proprietary spreadsheet formats, often used in experimental research environments. GeneNomenclatureUtils is free software, collaboratively-developed, easy to install, and should prove useful to both bioinformaticians and experimental investigators working with lists of genes. Poster 40 GenomeTools – a versatile and efficient bioinformatics toolkit Gordon Gremme, Sascha Steinbiss, and Stefan Kurtz Center for Bioinformatics, University of Hamburg, Bundesstraße 43, 20146 Hamburg, Germany Contact E-mail: steinbiss@zbh.uni-hamburg.de URL of the project web site: http://genometools.org Git source repository: git://genometools.org/genometools.git License: ISC (BSD-like) While most bioinformatics software is written to be efficient in terms of space and time, other aspects of the software, like extensibility and portability, are mostly neglected. Often, applications are developed to serve only one small task, raising the need for a myriad of �glue’ scripts to integrate single tools into a specific work flow. Comprehensive toolkits for building bioinformatics applications exist (e.g. Bioperl, Biojava, . . . ) but they are often tied to a specific language, limiting their reusability in other contexts. To address these problems we have developed the GenomeTools, a free software toolkit for tasks relevant when working with large genomes. The GenomeTools provide an extensive software library for storage, indexing, and processing of both genomic sequences and annotations using an object-oriented interface. Written in C, the library is accessible via bindings to a variety of script-programming languages. Based on the library, the GenomeTools contain a collection of advanced bioinformatics tools for sequence handling and analysis, transposon prediction, and annotation visualization, some of which have already been published separately [1–6]. The tools show how the GenomeTools library can be used to write well-maintainable, clean software without compromising efficiency. References [1] S. Gr¨af, F.G.G. Nielsen, S. Kurtz, M.A. Huynen, E. Birney, H. Stunnenberg, and P. Flicek. Optimized design and assessment of whole genome tiling arrays. Bioinformatics, 23 ISMB/ECCB 2007:i195–i204, 2007. [2] D. Ellinghaus, S. Kurtz, and U. Willhoeft. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics, 9:18, 2008. [3] S. Kurtz, A. Narechania, J.C. Stein, and D. Ware. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics, 9:517, 2008. [4] S. Steinbiss, G. Gremme, C. Sch¨arfer, M. Mader, and S. Kurtz. AnnotationSketch: a genome annotation drawing library. Bioinformatics, 25(4):533–534, 2009. [5] S. Steinbiss, U. Willhoeft, G. Gremme, and S. Kurtz. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res, 37(21):7002–7013, 2009. [6] D. J. Schmitz-H¨ubsch and S. Kurtz. MetaGenomeThreader: A software tool for predicting genes in DNAsequences of metagenome projects. In Streit, W. and Daniel, R., editor, Metagenomics. Methods and Protocols, Methods in Molecular Biology. Springer, Berlin, 2010. Poster 41 !"#$%&'"$()(*"(+,$"(-./%0$(12#2(32%$4./5$(2"6( 7/$%8('"#$%920$ *:;<+=->( !"#$%&'#()&*(+,"-(.',/"01213*(4'#1",'(5&6'#7*(+/01'#(8'00*(9"0$17( 87#601#7*(:1;"(<%#"*(=1>?'0/(9216?*(='/";(@6ABC#*(D&,1"(9&,,1E'#*(F73( :1>;,"2(GHIJ *??!@!*;!+A> K#1E"0316%(7L(8'2M01/$"*(L?NOP ( (Q ( (>'2R'> ( (R((&;( B=+CDE;( :=@>( (SSS(R((1#6"021#"(R((70$( -+:=ED( E+1D> (?66B(TUU ( (SSS ( (R((1#6"021#"(R((70$(U((M07S3"0(*( (?66B(TUU ( (SSS ( (R((1#6"021#"(R((70$(U((S1;1(U((9VW8?">;7&6( @!EDA-D> <FH< ?:A1!AF> X",,>72"(Y0&36('#/(6?"(WI)UW)F=IR I#6"0:1#"(Z(+#([B"#(97&0>"(4'6'ZX'0"?7&3"('#/(\&"0%(1#6"0L'>" I#6"0:1#"(GSSSR1#6"021#"R70$J(13('(B7S"0L&,(7B"#(37&0>"(3%36"2(L70(M&1,/1#$(]&"0%Z7B61Z 213"/(/'6'(S'0"?7&3"3R(I6(3&BB7063(/'6'(1#6"$0'617#(L072(36'#/'0/(M17,7$1>',(L702'63( '#/(2';"3(16("'3%(67('//(%7&0(7S#(/'6'R(+(37B?1361>'6"/(S"M('BB,1>'617#(B07E1/"3(L,"-1Z M,"(]&"0%('>>"33(L70('#%(/'6'(27/",R(4'6'(>'#(M"(B07$0'22'61>('>>"33"/(M%(M&1,/1#$( '#/("-">&61#$(]&"01"3(E1'(S"MZ3"0E1>"(+HI(S?73"(>7/"(>'#("E"#(M"($"#"0'6"/(S16?1#( 6?"(S"M('BB,1>'617#(1#(E'017&3(B07$0'221#$(,'#$&'$"3R( Poster 42 !"#$%&'()*(+,"-.%/%0-.( !1(21(3,04/&,5(31(61(+,0.7%-5(81(91(:1(9",/%-5(;1(!1(2",/0-( <0##&=&(0>(?%>&(@4%&-4&.5(A-%$&,.%/B(0>(8C-D&&5(AE1(40-/"4/*(F1G,04/&,HDC-D&&1"41CI( ( !"#$%&"'(7//G*JJ'''1F"#$%&'10,=(K.0C,4&5(D04CL&-/"/%0-("-D(,&#"/&D(.0>/'",&(#%-I.M(( ()*+,"'(7//G*JJ'''1F"#$%&'10,=J.0C,4&J.0C,4&17/L#( -%,".$"'/!"#$%&'(%.(%LG#&L&-/&D(%-(!"$"("-D(%.(L"D&("$"%#"N#&(C-D&,(/7&(01-/231( ( +7&('%D&#B(C.&D(LC#/%G#&(.&OC&-4&("#%=-L&-/($%.C"#%P"/%0-5("--0/"/%0-("-D(&D%/%-=(/00#(Q(!"#$%&'5( '%##(.00-(&-/&,("(/,"-.%/%0-(G&,%0D1(+7&(4C,,&-/(!"#$%&'()(40D&N".&5('7%47(.CGG0,/.("(F"$"("GG#&/( "-D(./"-DR"#0-&(D&.I/0G("GG#%4"/%0-5('".(D&$&#0G&D(%-()SST1(@%-4&(/7&-5("(=,&"/(-CLN&,(0>( >&"/C,&.("-D(%LG,0$&L&-/.(7"$&(N&&-(L"D&(/0(%/(K.&&( 7//G*JJ'''1F"#$%&'10,=J,&#&".&U%./0,B17/L#(>0,(D&/"%#.M5("-D(/7%.(D&$&#0GL&-/("4/%$%/B(7".( C-D0CN/&D#B(40-/,%NC/&D(/0(%/.(40-/%-C&D(.C44&..1(U0'&$&,5('7%#./(!"#$%&'()(L&&/.(/7&(-&&D.(0>( L"-B(N%0#0=%./.("-D(N%0%->0,L"/%4%"-.5(%/.(C-D&,#B%-=(.0>/'",&(",47%/&4/C,&(7".(N&40L&( %-4,&".%-=#B(40LG#&V1(+0("##&$%"/&(/7&.&(%..C&.5('&('%##(.00-(N&=%-("(L"F0,(,&$%.%0-(0>(/7&(4C,,&-/( ",47%/&4/C,&(/0(4,&"/&(!"#$%&'($&,.%0-(WX('7%47('%##(>"4%#%/"/&(/7&(%-/,0DC4/%0-(0>(-&'(>&"/C,&.(".( G#C=%-.(D&$&#0G&D(NB("('%D&,(D&$&#0GL&-/(40LLC-%/B1( ( !"#$%&'(6&,.%0-()1Y(%.(&VG&4/&D(/0(N&(/7&(G&-C#/%L"/&(L"F0,(,&#&".&(N".&D(0-(/7&(6&,.%0-()( 40D&N".&1(Z/('%##(%-40,G0,"/&("(-CLN&,(0>(-&'(>&"/C,&.5(%-4#CD%-=(%LG,0$&D(.CGG0,/(>0,( $%.C"#%P%-=("-D("-"#BP%-=([\:(.&OC&-4&("#%=-L&-/.5(/7"-I.(/0(40-/,%NC/%0-.(>,0L(0C,()S]S( ;00=#&(@CLL&,(0>(<0D&(./CD&-/5(?"C,&-(?C%(K7//G*JJF"#$%&'R,-"1N#0=.G0/140LM1(?%-I&D("#%=-L&-/( "-D(./,C4/C,&($%.C"#%P"/%0-.(7"$&(N&&-(%LG,0$&D5("##0'%-=("(.&/(0>(40LG#&V&.(0,(LC#/%RD0L"%-( G,0/&%-(./,C4/C,&.(/0(N&(40#0C,&D(C.%-=("#%=-L&-/.("..04%"/&D('%/7(%-D%$%DC"#(D0L"%-.(0,( G,0/&%-.('%/7%-(/7&(.4&-&1(!"#$%&'?%/&5(/7&("GG#&/($&,.%0-(0>(!"#$%&'5(7".("#.0(N&&-(&V/&-D&D('%/7( "#%=-L&-/("--0/"/%0-($%.C"#%P"/%0-(4"G"N%#%/%&.(/7"/('&,&(G,&$%0C.#B(0-#B("$"%#"N#&(%-(/7&(!"#$%&'( 8&.I/0G1(Z/.(GCN#%4(:3Z(7".("#.0(N&&-(&V/&-D&D5("-D("(G,0/0/BG&(F"$".4,%G/(#%N,",B(D&$&#0G&D(/0( >"4%#%/"/&(/%=7/(%-/&=,"/%0-5(%-4#CD%-=(/7&(&V47"-=&(0>(L0C.&(0$&,.5(.&#&4/%0-("-D($%.C"#%P"/%0-( "//,%NC/&.(N&/'&&-(/7&("GG#&/("-D(0/7&,('&N(N".&D($%.C"#%P"/%0-(40LG0-&-/.(0-(/7&(G"=&1( ( :(-CLN&,(0>(!"#$%&'(40LLC-%/B(&$&-/.(7"$&("#.0(/"I&-(G#"4&5(G,0$%D%-=(C.('%/7($"#C"N#&( >&&DN"4I(,&=",D%-=(!"#$%&'^.(C.&>C#-&..("-D(,&#&$"-4&(/0(N%0#0=%./.1(_$&,(.&$&-(!"#$%&'(/C/0,%"#.( 7"$&(7"GG&-&D(0$&,(/7&(#"./(])(L0-/7.5(%-4#CD%-=(/'0(,&.%D&-/%"#(40C,.&.5(7&#D("/(/7&(`92?R`2Z( %-(<"LN,%D=&5(AE("-D(`92?(U&%D&#N&,=(%-(;&,L"-B1(+7&.&(40C,.&.(40$&,("##(".G&4/.(0>(!"#$%&'^.( C.&5(%-4#CD%-=(/7&(%-./"##"/%0-(0>(!:2:a@(.&,$&,.(K+,0.7%-(&/("#15()S]]("-D( 7//G*JJ''140LGN%01DC-D&&1"41CIJF"N"'.M5('7%47(&-"N#&.(!"#$%&'(C.&,.(/0(G&,>0,L( 40LGC/"/%0-"##B(%-/&-.%$&("-"#B.%.(/".I.(0-(/7&%,(#04"#(L"47%-&(0,(4#C./&,1(a&(7"$&("#.0( &VG&,%L&-/&D('%/7(-&'(.0>/'",&(D&G#0BL&-/("GG,0"47&.5(%-4#CD%-=(/7&(G,0$%.%0-(0>(0>>R#%-&( L%,,0,.(0>(/7&(!"#$%&'("-D(!:2:a@('&N(.%/&.5(/0("##0'(C.&,.(%-(,&L0/&(#04"/%0-.('%/7(.#0'(0,( C-./"N#&(%-/&,-&/(40--&4/%0-.(/0("44&..("##(!"#$%&'('&N(,&.0C,4&.(#04"##B1(( ( !"#$%&'(C."=&(./"/%./%4.(.C==&./(/7&(D&.I/0G("GG#%4"/%0-(%.(-0'(C.&D(0$&,(TSSS(/%L&.("(L0-/7( '0,#D'%D&1(Z/(%.(/7&,&>0,&(4,%/%4"#(/7"/(/7&($&,.%0-()(.&,%&.(0>(!"#$%&'(40-/%-C&.(/0(N&(L"%-/"%-&D( "-D(%LG,0$&D('7%#./(/7&($&,.%0-(W(40D&N".&(%.(D&$&#0G&D1(@0L&(0>(/7%.(L"%-/&-"-4&(&>>0,/(40L&.( >,0L(/7&(!"#$%&'(C.&,(40LLC-%/B(Q('7%47(%.(D%$&,.&5("-D(%-4#CD&.("(-CLN&,(0>(&VG&,/(.0>/'",&( D&$&#0G&,.('70(40-/,%NC/&($"#C"N#&(G"/47&.(/0(%/.(40D&N".&1(U0'&$&,5(/7&.&(40-/,%NC/%0-.(",&( 7"LG&,&D(NB(!"#$%&'^.(4C,,&-/(.0C,4&(D%./,%NC/%0-(L0D&#5('7%47(%-$0#$&.(GCN#%.7%-=(.C44&..%$&( ,&#&".&.(".("(.0C,4&(",47%$&1(+0(#0'&,(/7&(N",,%&,(/0(>C/C,&(40-/,%NC/%0-5('&(7"$&(-0'(L"D&( !"#$%&'^.($&,.%0-(40-/,0#(,&G0.%/0,B(GCN#%4#B("44&..%N#&1(2B(0G&-%-=(/7&(G,0F&4/(,&G0.%/0,B("-D( %-/&=,"/%-=(%/('%/7(!"#$%&'^.(&V%./%-=(NC=(/,"4I&,5('&(70G&(/0(>0./&,("-(%-4,&".&(%-(40-/,%NC/%0-(/0( !"#$%&'(>,0L(/7&(0G&-(.0C,4&(N%0%->0,L"/%4.(.0>/'",&(D&$&#0GL&-/(40LLC-%/B1( ( [&>&,&-4&.( +,0.7%-5(365(3,04/&,5(!25("-D(2",/0-5(;(K)S]]M1(b!:2:a@*9@:(8%./,%NC/&D(a&N(@&,$%4&.(>0,( 2%0%->0,L"/%4.*(9C#/%G#&(@&OC&-4&(:#%=-L&-/c1(!"#$%#&'$()*+%,#)-#.)")-/"0123)4*5# Poster 43 jsD AS and Dasty3, enabling D AS protein visualisation Leyla Garcia1*, Bernat Gel2,3, Rafael C. Jimenez1, Jose M. Villaveces1, Gustavo A. Salazar4,1, Nicola Mulder4, Maria Martin1, Alexander Garcia5 and Henning Hermjakob1 1 European Bioinformatics Institute, Hinxton, UK. Software Department, UPC-BarcelonaTech, Barcelona, Spain. 3 Hereditary Cancer Program, Institute for Predictive and Personalised Medicine of Cancer, Badalona, Spain. 4 Computational Biology Group, Department of Clinical Laboratory Sciences, University of Cape Town, South Africa. 5 University of Arkansas, Biomedical Informatics, Medical Center, Arkansas, USA * ljgarcia@ebi.ac.uk 2 jsD AS: JavaScript client library for the Distributed Annotation System; it is an open source tool freely available under the terms of the GNU Lesser General Public License. Project Web site and source code available at http://code.google.com/p/jsdas/ Dasty3: Dasty3 is an extensible Web-based framework that supports protein visualization. Dasty3 is an open source tool freely available under the terms of the GNU General Public License. Official Web site at http://www.ebi.ac.uk/dasty/. Project Web site and source code available at http://code.google.com/p/dasty/ A BST R A C T The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic or protein sequences (http://www.biodas.org). Its client-server architecture involves servers that manage the data distribution, clients that handle the data manipulation and visualisation, and a registry that provides a repository for registration and discovery of DAS services. A DAS server can comprise more than one data source that actually provides the information on sequences, i.e. reference data sources, or annotations, i.e. annotation data sources. The current specification can be found at the official Wiki page (http://www.biodas.org/wiki/DAS1.6!" Although DAS clients can be developed from scratch, it is also possible to use a client library. A DAS client library encapsulates the communication with the server as well as the data parsing. Currently there are client libraries in PERL ± Bio::DAS::Lite, JavaScript ±jsDAS, and Java ±Dasobert and JDas. #$%&'( )$( *( +*,*'-.)/0( %&'( -1)230( 1)4.*.5( 06*0( 7*3*82$( *11( 062( *$/2-0$( 9:( %&'( ;*0*( ;)$-9,2.5( *3;( .20.)2,*1<( )0( )$( :=115( -97/1)*30(>)06(062(1*02$0($/2-):)-*0)93?(%&'(@"A"((B6)$(1)4.*.5(C=2.)2$(062(%&'($9=.-2$<(93-2(062(;*0*()$(.20.)2,2;D()0(/*.$2$( 062(EFG(;9-=7230(*3;(8232.*02$(+*,*'-.)/0(94#2-0$(.2/.2$230)38()0"(#$%&'()$(-97/120215(*$53-6.939=$<()0(.21)2$(93(-*114*-H( :=3-0)93$(09(/.9,);2(:22;4*-H(93($=--2$$(9.(:*)1=.2"(I0()$(J.9$$(K.)8)3(L2$9=.-2('6*.)38(MJKL'!(-97/*0)412D(>6)-6(72*3$()0( -*3(*--2$$(06).;(/*.05($2.,2.$(;).2-015(:.97(062(4.9>$2.(>)069=0(*(/.9N5( ±*1069=86()0()3-1=;2$(*(:*114*-H(:9.($2.,2.$(390(520( JKL'O23*412;"( #$%&'( 79;=12$( *.2( ;21),2.2;( *$( *3( &PI( /.9,);)38( 062( -9.2( :=3-0)93*1)05?( QLG( /*.$)38D( EFG( /*.$)38D( -9332-0)93(7*3*8)38D(20-. Dasty3 is a Web-based framework based on its predecessor ±Dasty2, and developed upon jsDAS. It allows visualising and manipulating proteins from DAS sources as well as from third party providers. Dasty3 relies on a modular architecture that makes it easier to integrate new plug-ins; it also offers a public API that comprises methods for integrating, visualising, and manipulating information from different data sources. Users can perform searches based on a protein accession or identifier. Dasty3 uses PICR (http://www.ebi.ac.uk/Tools/picr) to match the query to a UniProt entry; the protein sequence is retrieved from the UniProt DAS reference server (http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot). Protein information from all other data sources configured in Dasty3 is then retrieved; data sources can be easily added or removed. The set of predefined plug-ins provides the user with a unified, organised, and interactive view: (i) ontology filter plug-in to navigate and filter the ontology terms used in DAS ±BioSapiens, Sequence, Protein Modification, and Evidence Code ontologies, (ii) 3D view of the protein structures from JMOL applet, (iii) positional features plug-in to display annotations related to particular positions of amino acids in the protein sequence, including information about the type and the method used to generate the annotation, labels, the graphical representation of the annotation, the data source providing it, and the category ±inferred from manual or electronic means, (iv) writeback plugin to allow users to create and modify existing annotations, (v) sequence plug-in to display the sequence and highlights the amino acids from the annotations selected in the positional features plug-in, (vi) non-positional features plug-in to display annotation related to the whole protein such as publications, and (vii) interactions plug-in to show information about the molecule interaction of the protein. Dasty3 is highly modular and extensible; new components can be easily added to the framework. The interoperability delivered by the API facilitates the flow of data across plug-ins. The architecture also facilitates the organisation of the frontend by means of templates; easing in this way the definition of the layout and improving the user experience. Poster 44 !"#$%&'())*+,-./ʹ/,0-.123-,40/45/-6./7489+-.2/3,:.:/:2+1/:.;,10/ ;+,-./<'())*+,-.=/,0-4/-6./!40;-30>/#054283-,40/$,0.2/<!"#$%=/ $327/?@--,1ABCB/$327.D/*76+8300A/30:/ED,F.2/!46DG376.2A/ ! "##$%&'()%*%+,*-./0%123(4&+0&-(,*-()%*%+,*-./0%123(5&#/-0.&+0(*,(4*.#60&-(71%&+1&3(( 8+%9&-2%0:(*,(;<=%+>&+3(7/+'(!?3(@AB@C(;<=%+>&+3(D&-./+:( E4*--&2#*+'%+>(/60F*-G(-*&00%>H%+,*-./0%IJ6+%K06&=%+>&+J'&( H24I.7-&J?KL/6--9LMM3G,N,05N+0,&-+.G,01.0N:.M*45-O32.MP0,8.04:.;M73::;+,-./ K,7.0;.L/KQHK/ R6./S,476.8,73D/(D142,-68;/K,G232T/<S(KK=/,;/30/399D,73-,40/5238.O42P/,0/'UU/-63-/63;/G..0/ ;9.7,5,73DDT/ :.;,10.:/ -4/ 3DD4O/ 239,:/ ;45-O32./ 924-4-T9,01/ 30:/ -4/ ;,10,5,730-DT/ 2.:+7./ :.F.D498.0-/-,8.;/,0/-6./5,.D:/45/7489+-3-,403D/84D.7+D32/G,4D41T/30:/84D.7+D32/84:.D,01N/ #-/ 924F,:.;/ 30/ .V-.0;,F./ ;.-/ 45/ :3-3/ ;-2+7-+2.;/ 3;/ O.DD/ 3;/ 7D3;;.;/ 542/ 84D.7+D32/ 8.7630,7;B/ 3:F307.:/ ;4DF3-,40/ 8.-64:;B/ 748932,;40/ 30:/ 303DT;,;/ 45/ 924-.,0/ ;-2+7-+2.;B/ 5,D./ ,8942-M.V942-B/30:/F,;+3D,>3-,40/WAXN/E0/-49/45/S(KK/O./:.F.D49.:/-6./7489+-.2/3,:.:/:2+1/ :.;,10/ ;+,-./ <'())*+,-.=B/ 3/ 74892.6.0;,F./ 74DD.7-,40/ 45/ -44D;/ O6,76/ 3,8;/ -4/ .3;./ -6./ 399D,73-,40/ 45/ 748840/ 7489+-.2&3,:.:/ :2+1/ :.;,10/ -3;P;/ GT/ 924F,:,01/ 3DD/ 0.7.;;32T/ -44D;/ 30:/3D142,-68;B/O6,76/32./+;.3GD./,0/3/;,89D./30:/740;,;-.0-/8300.2/WYXN/Z.2.B/O./92.;.0-/3/ 04F.D/,0-.123-,40/8.7630,;8/45/'())*+,-./,0-4/-6./!"#$%/O42P5D4O/;T;-.8/-4/5+2-6.2/.3;./ -6./399D,73-,40/45/-64;./-44D;/O,-6,0/D321./O42P5D4O;B/3DD4O,01/-6./+;.2/-4/72.3-./30:/D3+076/ O42P5D4O;/:,2.7-DT/5248/3/2,76/7D,.0-/+;.2/,0-.2537.N/R6./!40;-30>/#054283-,40/$,0.2/<!"#$%=/ ,;/3/84:+D32/:3-3/.V9D423-,40/9D3-5428/-63-/3DD4O;/84:.D,01/45/O42P5D4O;/542/:3-3/303DT;,;/ W[XN/R6./G3;,7/+0,-/45/.V.7+-,40/O,-6,0/!"#$%/,;/3/04:.B/-6+;/3::,-,403D/5+07-,403D,-T/730/G./ 3::.:/-4/!"#$%/GT/72.3-,01/0.O/!"#$%/04:.;N/R6./1.0.23-,40/45/!"#$%K'())*+,-./04:.;/ ,;/7322,.:/4+-/,0/3/1.0.2,7/53;6,40/GT/G+,D:,01/!"#$%/04:.;/3+-483-,73DDT/5248/30/\$K&G3;.:/ 04:./;9.7,5,73-,40/<:.;72,G,01/2.]+,2.:/,09+-/5,D.;B/ƉĂƌĂŵĞƚĞƌƐ͕͙ͿƚŚĂƚ.376/'())*+,-./-44D/ ;+99D,.;N/ Q.0.23DDTB/ 30T/ 74DD.7-,40/ 45/ -44D;/ 3:6.2,01/ -4/ -6,;/ 04:./ ;9.7,5,73-,40/ 8.7630,;8/ 730/04O/G./,0-.123-.:/,0-4/!"#$%/30:/O./3D2.3:T/63F./,0-.123-.:/40./45/4+2/4-6.2/D,G232,.;B/ 038.DT/-6./E9.0$*/D,G232T/542/83;;/;9.7-248.-2T/303DT;,;/W^XN/#0/-6./5+-+2.B/4-6.2/-44D/;.-;/ O,DD/G./,0-.123-.:/+;,01/-6./92494;.:/1.0.2,7/!"#$%/04:./1.0.23-,40/8.7630,;8N/ / AN/ YN/ [N/ ^N/ Z,D:.G230:-//&0(/$JB/)"LL(K(=%*1F&.%1/$(/$>*-%0F.2($%=-/-:(!JMJ/S$'/G,4,054283-,7;B/Y_A_N/!!L/ 9N/`[AN/ *76+8300/ &0( /$J/ '"5576%0&G( "( ,$&N%=$&( /+'( *#&+( ,-/.&O*-I( /+'( O*-I,$*O( 2:20&.( ,*-( 1*.#60&-K/%'&'('-6>('&2%>+J(P./+621-%#0(%+(#-&#/-/0%*+Q( OOONP0,8.N421/ *-+28/ &0( /$J/ R#&+S7( K( /+( *#&+K2*6-1&( 2*,0O/-&( ,-/.&O*-I( ,*-( ./22( 2#&10-*.&0-:J/ S$'/ G,4,054283-,7;B/Y__aN/"L/9N/Ab[N/ Poster 45 !"#$%$&'()*+',-+.$/-&+$0+12$3(&'+0$&+$"#$)$45+&-6-(&#,+ !"#$%&'()*+,-+."/"0*)*%."1"2)$+."/!"3-45),*%."6"7*895)."/"7+-.":";8*%&."(" <8=>8%."/"#*%&."0"#?8@@A.";":*$."/"B?*%&."1"C*+D)EA>" F%@*)8$"G%+@8@-@5"H$)"2*%I5)"<5+5*)I?."J$)$%@$."F%@*)8$"!K3"L1M."2*%*9*" (,*8NO",*)85PQ$%&'5)*+,-+R$8I)P$%PI*"! "#$%!&''(%))"""*$+,-./'*,/0)! 123%!&''(1%))4,5#*,+4/*,3*4.)123)$+,-./')$+,-./'67.2.)$/.34&#1)/#8#.1#69:;64.35+5.'#:<! 8+4#31#%!=>?!@#11#/!=#3#/.8!AB$8+4!@+4#31#!2C*D! 786'&(#'+ OncoPortal is the collaborative vision of both academic and industry partners to build a data integration and management system tailored to the oncology arena. To achieve this goal, OncoPortal uses the BioMart framework that is known for its ability to seamlessly integrate disparate data sources and allow for cross data-mining. The idea is to provide researchers with a pre-configured BioMart that has been customized to manage oncology data in a distributed environment that is typical of collaborative projects. The advantage of using BioMart as a starting point is that it is free, open-source and most importantly, it enables the researcher to capitalize on the rich capabilities of the system such as installing a local version of the portal, controlling the web interface look and feel as well as how the data is linked, queried and protected if necessary. In addition, the wealth of available resources provided by the BioMart architecture could be drawn upon to offer annotations that would be useful in complementing the oncology data. One of the first incarnations of this tool is the International Cancer Genome Consortium (ICGC) portal which can be found at http://dcc.icgc.org/ and has been available to the wider research community for some time. This unified point of access enables cross-comparisons of high-complexity data from 24 cancer studies from 12 countries consisting of 3,478 genomes broken down into 13 cancer types and subtypes. Currently, 7 genomic data types are cohesively and transparently handled. These data types include (i) simple somatic mutations, (ii) copy number alterations, (iii) structural rearrangements, (iv) gene expression, (v) miRNA, (vi) DNA methylation, and (vii) exon junctions. At present, a researcher is able to mine each data type across one or more datasets as well as run simple analyses based on a selection of genes, mutations and pathways. Besides ICGC data, there is also federated access to public external annotation resources such as the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome and the Catalogue of Somatic Mutations in Cancer (COSMIC) just to name a few. The future of OncoPortal will not only be to provide data federation and mining but also leverage the breadth of BioMart’s functionality such as its new plug-in feature that aims to give researchers a way to enable more complex data analysis across one or more of the datasets. With these capabilities in place, this portal intends to bring data rapidly to the forefront of oncology research and serve as a model for oncology data, mining and analysis to cancer researchers worldwide. Poster 46 SEQCRAWLER – A CLOUD READY INDEXING PLATFORM Biological data indexing and browsing platform Authors: Olivier Sallou Affiliation: GenOuest BioInformatics Platform,IRISA, Rennes, France, Olivier.sallou@irisa.fr URL: http://seqcrawler.sourceforge.net Code: http://sourceforge.net/projects/seqcrawler/ License: CeCill v2 (GNU like) (http://www.cecill.info/licences/Licence_CeCILL_V2-en.html) I. INTRODUCTION Seqcrawler takes its roots in software like SRS or Lucegene. Its goal is to provide an indexing platform to ease the search of data in biological banks. Besides metadata information search, it can store raw data such as original sequence and transcripts. At least it provides a sequence browser to visualize results (genes on chromosomes etc…). The software integrates different technologies and software and ties them together in a coherent and scalable platform, ready to run on a single computer or on a cloud, in a fully scalable architecture. II. ARCHITECTURE The software runs on Linux systems. It is composed of 3 components: • The indexer/search interface • The genome browser (Gbrowse2), linked to the index via a specific DBI interface • The raw data storage backend (Riak or MongoDB), a NOSQL database. The software is coded in Java and Perl scripts (based on BioPerl). It runs on Linux systems and can be either installed manually or installed with a Live DVD or a virtual machine containing all required software and components. The platform ties those 3 components to share and extract data. However, each component can be queried or activated independently on a system. Each component is scalable and can be extended to reach expected dimensions. We could have for example 2 index shards on 2 servers and 1 browser collocated with 1 storage system. A. Indexer/search platform The component provides a program to index data from different formats (Genbank, Embl, GFF…). The source data is cut in key/value pairs that can be used as query parameters. Specific fields can be defined with a special format in the index configuration (dates, spatial…). This allows a complex query on fields with AND/OR conditions. Additionally, full text search can be done. The index engine is Solr based. Readseq is used to analyze various formats, for non-dedicated implementations. The index engine supports index sharding (possibility to cut a large index in smaller parts) and transparently query all the shards over the network. Sharding adds the possibility to extend/specialize the indexes over several servers. Any server can be queried, all shards will be triggered and results merged on queried server. It also provides a REST interface to trigger it and render results (cf. APIs). Indexed fields can be indexed and/or stored for later retrieval. GFF documents are fully indexed/stored; some other formats index most of the fields but store only minimal information. A link to the original document allows extraction of the source (part of the document is extracted via an extraction script). Index can be built/updated offline (for initial creation or large uploads) or online, and support replication. B. Genome browser The browse of the genome is done using the GMOD GBrowse2 Perl software. A DBI interface has been created to link GBrowse2 to the index. All GFF indexed files (GBrowse required format) can be visualized, independently of the index location and splitting. GBrowse query an index master (possibly with load sharing), and the index master sends the merged results to the DBI interface. The interface is in charge of remote querying the index and looping over all the results that should be displayed (meaning that further “pages” of the index may be required). To get “document” details (a gene for example), the index is queried again to get full available information and the NOSQL storage is also queried to get raw data corresponding to the element. Any server can be queried separately from the index servers. From a general perspective, any kind of data browser could be set here, only a driver to the index is required (via simple HTTP interface). C. NOSQL storage The raw data is stored in a NOSQL backend. Software supports Riak and MongoDB, but can be extended to other systems via an interface. NOSQL storage eases the horizontal extension of the data among many servers. Data can be replicated for security and easily scaled with data sharding. With the NOSQL database, the data is stored as a document with a unique id and its metadata, accessible from any node of the system. In Seqcrawler, data may be cut in smaller chunks to tie to the database requirements. Then, the program takes in charge the collection of all the chunks to recreate the original data help with stored metadata. III. APIS The index component can be queried via HTTP REST to get index page results in JSON or XML formats. Page size and page number can be specified in query. The NOSQL storage can be queried either via HTTP GET (Riak) or specific drivers (Perl, Java etc… for MongoDB) using the id of the document. Poster 47 SeqGI: Sequence Read Enrichment at Genomic Intervals Inês de Santiago, Tom Carroll, Ana Pombo MRC Clinical Sciences Centre, Imperial College School of Medicine, London, UK. desantiago.ines07@csc.mrc.ac.uk http://seqgi.sourceforge.net/ GNU Lesser General Public License The visualisation and statistical evaluation of read profiles over genomic features are core components in the interpretation of high-throughput sequencing data. These processes have largely remained disparate and so have led to the use of multiple softwares requiring interconversion between differing file formats. Furthermore, the increasing use of multiple biological samples in ChIP-Seq studies demand for statistical and computational methods suitable for the assessment of biological variation. SeqGI is an open source software that provides a GUI framework for the simultaneous visualisation and testing of sequence read distributions both between and within classes of user defined genomic features. The software is written in Python and R, and runs on all standard operating systems. SeqGI can be used to intersect BED, WIG or output files from standard aligners with a set of dictated genomic features in order to calculate and illustrate the read density at these genomic intervals or at some distance from known regions/features. Profile plots and heatmaps are used to visualise single or multiple read distributions across features whereas scatter and box plots allow for the identification of differential read densities both between individual or classes of genomic features as well as between conditions. Alongside normalisation, transformation and classical parametric and non-parametric tests, SeqGI’s statistical framework allows for the analysis of differential read densities as count data using methodology implemented in the DESeq Bioconductor package. SeqGI provides users with an intuitive graphical interface, combining both visualisation and statistical tools and so assists in the rapid interpretation of sequencing data. Poster 48 !"#$#%&"'$()("*)+"$,-%."/0-1&$,0-$'%2%$(-03"&&)+4$ 56*"$7**-0228$9)+4$:;<8$=%>)'$?%<&&*"-$ @+)>"-&)26$0,$A%*),0-+)%8$B%+2%$A-<C$ 1"**-022D&0"E<3&3E"'<$ ;22(FGG4"+0."H3%+3"-E<3&3E"'<$;22(&FGG4)2;<#E30.G1"**-022GI".<&$ J(%3;"$K)3"+&"8$L"-&)0+$MEN$ O+"$0,$2;"$30..0+$2%&1&$2;%2$#)0)+,0-.%2)3&$-"&"%-3;$*%#&$,%3"$)&$2;"$+""'$20$("-,0-.$%$&"2$0,$&2%+'%-'$ 3%*3<*%2)0+&$0+$3;%+4)+4$'%2%$%2$-"4<*%-$)+2"->%*&E$P,$3%-"$)&$+02$2%1"+8$2;"$2%&1$3%+$#"30."$%$30+>0*<2"'$/"#$0,$ &3-)(2&$*)>)+4$)+$%+$%--%6$0,$-%+'0.$')-"320-)"&$2;%2$0>"-/-)2"$2;"$'%2%$(-0'<3"'$#6$(-">)0<&$-<+&E$Q;"-"$"R)&2$ ,-%."/0-1&$%+'$."2;0'0*04)"&$2;%2$3%+$;"*($&2%>"$0,,$2;"&"$#%'$-"&"%-3;$;%#)2&E$S-0.$-0**$60<-$0/+$&3-)(2&$ 2;%2$<&"$/"#$#%&"'$T0BUK$'%2%#%&"&$*)1"$A0<3;=V$20$.0-"$)+2"4-%2"'$&0*<2)0+&$*)1"$W%*%R6$0-$S)-";0&"$2;"-"$ )&$%$4-0/)+4$3<*2<-"$0,$#)0)+,0-.%2)3&$/"#$#%&"'$()("*)+"$"+4)+"&E O+"$(-%32)3%*$"R%.(*"$0,$2;"$+""'$,0-$%$/"#$#%&"'$()("*)+"$"+4)+"$)&$2;"$(-"(-03"&&)+4$2;%2$)&$'0+"$,0-$2;"$ @ABA$A%+3"-$W"+0."$#-0/&"-$X$;22(&FGG4"+0."H3%+3"-E<3&3E"'<$Y8$/;)3;$(-0>)'"&$%$>)&<%*)C%2)0+$(0-2%*$0,$ 2;"$(<#*)3$%33"&&)#*"$'%2%$,-0.$Q;"$A%+3"-$W"+0."$J2*%&$XQAWJY$(-0Z"32E$Q;)&$'%2%$3<--"+2*6$30>"-&$0>"-$[N$ '),,"-"+2$"R("-)."+2%*$(*%2,0-.&$%+'$0>"-$\NNN$&%.(*"&$,-0.$[\$'),,"-"+2$26("&$0,$3%+3"-&E$Q;"$&3%*"$0,$2;"$ '%2%$)&$"R("32"'$20$4-0/$'-%.%2)3%**6$,<-2;"-$)+$2;"$+"R2$,"/$6"%-&$/;)*"$QAWJ$-%.()+4$<($'%2%$(-0'<32)0+E$ Q;)&$30+&2%+2*6$4-0/)+4$&"2$0,$"R("-)."+2&$-"]<)-"&$-"4<*%-$<('%2"$20$3%23;$+"/*6$%''"'$'%2%8$P+$%'')2)0+8$ #"3%<&"$2;"$30.()*"'$'%2%$#"30."&$2;"$#%&)&$,0-$02;"-$%+%*6&)&8$)2$)&$).(0-2%+2$20$.%)+2%)+$%-3;)>"&$0,$-"&<*2&$ &0$2;%2$(-">)0<&$-<+&$3%+$#"$-""R%.)+"'$),$+""'"'E I".<&$X;22(&FGG4)2;<#E30.G1"**-022GI".<&Y$)&$%$&6&2".$'">"*0("'$%2$@ABA$20$;"*($,0-.%*)C"$QAWJ$'%2%$ )+(<2E$P2$/%&$#<)*2$20$"+&<-"$2;%2$&<-"$2;%2$%**$Z0#$.%+%4"."+28$'%2%$%33"&&$%+'$4"+"-%2"'$-"(0-2&$%-"$%33"&&)#*"$ >)%$/"#$#%&"'$)+2"-,%3"E$P2$%**0/&$2;"$<&"-&$20$'",)+"$()("*)+"&$30.(-)&"'$0,$&"2&$0,$30++"32"'$1"6G>%*<"$'%2%$ &2%31&$,)**"'$/)2;$9BOT$'%2%E$Q;"&"$'%2%$&2%31&$%-"$2)"'$204"2;"-$#6$.%(G-"'<3"$&26*"$0("-%2)0+&$/;)3;$0("-%2"$ 0,$2;"$9BOT$,0-.%22"'$'%2%$&20-"'$)+$2;"$&2%31&$0-$%22%3;"'$#)+%-6$,)*"&E I".<&$".(;%&)C"&$2;"$).(0-2%+3"$0,$)+&2%+3"'$'%2%8$&0$2;%2$"%3;$+"/$)+&2%+3"$0,$%$()("*)+"$-<+$"R)&2&$)+$)2^&$ 0/+$&(%3"E$P+&2%+3"$3%+$#"$'<.("'$20$%-3;)>"$%+'$&20-"'$0+$')&1$%+'$-".0>"'$,-0.$2;"$%32)>"$'%2%#%&"E$ B;0<*'$0*'$-"&<*2&$+""'$20$#"$-""R%.)+"'8$2;"6$3%+$#"$-"*0%'"'$)+20$2;"$%32)>"$'%2%#%&"E$Q;"-"$)&$%*&0$%+$ %22".(2$20$&20-"$30'"$%&&03)%2"'$/)2;$%$()("*)+"$/;"+$)&$/%&$,)-&2$)+&2%+3"'8$&0$2;%2$),$()("*)+"$30'"$;%&$ 3;%+4"'$0>"-2)."$%+'$30'"$%&&03)%2"'$/)2;$0*'"-$-"&<*2&$3%+$#"$-""R%.)+"'E J**$'%2%$%&&03)%2"'$/)2;$2;"$()("*)+"8$%+'$%**$0,$)2^&$)+&2%+3"&8$)&$%33"&&)#*"$>)%$%$I7BQ,<*$)+2"-,%3"8$/)2;$%**$ '%2%$(-"&"+2"'$)+$9BOT$,0-.%2E$Q;"&"$."%+&$2;%2$%+6$()"3"$0,$'%2%$&20-"'$)+$2;"$'%2%$&2%31&$)&$%33"&&)#*"$#6$ J9J_$]<"-6E$B0$2;"$'",)+)2)0+$0,$2;"$()("*)+"$&2-<32<-"$#"30."&$2;"$'",)+)2)0+$0,$2;"$/"#$J`P$<&"'$20$%33"&&$ )2E$Q;<&8$2;"$'">"*0(."+2$0,$2;"$()("*)+"$*"%'&$20$2;"$&).<*2%+"0<&$'">"*0(."+2$0,$2;"$/"#%(($<&"'$20$%33"&&$ '%2%E$V)+%-6$,)*"&$3%+$#"$%22%3;"'$20$>%*<"&$)+$2;"$'%2%$&2%318$&0$2;"$9BOT$'%2%$&2%31&$3%+$#"$<&"'$20$'"&3-)#"$ 2;"$."2%H)+,0-.%2)0+$%#0<2$%$()"3"$0,$#)+%-6$'%2%E$Q;)&$/%6$2;"$()("*)+"$3%+$4<)'"$/;)3;$(-04-%.&$%-"$-<+$0+$ &"2&$0,$,)*"&8$/)2;0<2$;%>)+4$20$30+>"-2$;)4;$'"+&)26$'%2%$20$9BOTE$ P+$2;"$/0-*'$0,$#)0)+,0-.%2)3&$-"&"%-3;$2;"-"$)&$%$4-0/)+4$+<.#"-$0,$'">"*0()+4$()("*)+"$"+4)+"&E$I".<&^&$ &2-)32$%';"-"+3"$20$%$I7BQ,<*$9BOT$.0'"*$."%+&$)2$)&$&).(*"$20$#-)'4"$20$02;"-$&0,2/%-"$&0*<2)0+&$%+'$(-">"+2$ (-0'<32$*031$)+E$!;%2">"-$&0*<2)0+$%$*%#$'0"&$3;00&"$20$2%31*"$2;"$+""'$,0-$/0-1,*0/$0-4%+)C%2)0+$2;"-"$%-"$ 3"-2%)+$30+3"(2&$2;%2$.<&2$#"$(-)0-)2)C"'E$!"#$%33"&&)#)*)268$'%2%$%-3;)>%#)*)268$%+'$200*$)+2"-,%3"$ &2%+'%-')C%2)0+$%-"$2;"$30-+"-&20+"&$0,$I".<&E Poster 49 !"#$%&'ʹ'("'#")*+,()#-*'*"-#,."/*")'0.,'1#.2.+#3(2'"*)4.,56! ,D 3D CD "#$%&'(!)&%'(*+,-!.'#!/0#12&%3-!4&1&%!56&%7'##,-! 86*+'&9!/':;7'##,-!<96=&%!/>+9?'*+&%,-!@'#(A4&1&%!B&#+>;C! ! E#6=&%(61F!>;!G0?6#H&#-!I'*:91F!>;!J*6&#*&-!K&L1M!;>%!N>7L:1&%!J*6&#*&-!)&%7'#F! O>*+&!K6'H#>(16*(!)7?@-!4+'%7'!O&(&'%*+!P!Q'%9F!K&=&9>L7&#1!R#;>%7'16*(-!4&#2?&%H-!)&%7'#F! J''%9'#$!E#6=&%(61F-!N&#1&%!;>%!S6>6#;>%7'16*(-!)&%7'#F! Q7'69T!H&%'(*+U6#;>%7'16VM:#6A1:&?6#H&#M$&! EOBT!+11LTWW:#6L'XM(;M#&1W! B6*&#(&T!)5E!B&((&%!)&#&%'9!4:?96*!B6*&#(&!=C! ! ! R#!1+&!9'(1!1&#!F&'%(!$6;;&%&#1!$'1'!;>%7'1(!+'=&!?&&#!$&=&9>L&$!;>%!%&L%&(&#16#H!?6>9>H6*'9!#&1Y>%V(M! GY>!>;! 1+&7! &=>9=&$! 1>! ?&! Z:'(6! (1'#$'%$(T! S6>4"[! \,]! '(! &X*+'#H&! ;>%7'1! 6#! 1+&! L%>1&>76*(-! '#$! JS8B! \3]! ;>%! 7&1'?>96*! 7>$&96#H! 6#! (F(1&7(! ?6>9>HFM! S>1+! ;>%7'1(! '%&! [8B! ?'(&$! '#$! $&(*%6?&! ?6>9>H6*'9! %&9'16>#(+6L(! ?&1Y&&#!&#1616&(M!G>$'F-!7>(1!1>>9(!6#!1+&(&!%&(&'%*+!'%&'(!L%>=6$&!67L>%1!'#$!&XL>%1!;:#*16>#'961F!;>%!'1!9&'(1! >#&!>;! 1+&(&!;>%7'1(M! J6#*&! S6>4"[! '#$! JS8B! >=&%9'L! L'%19F-! 61! 6(! L>((6?9&! '#$!%&'(>#'?9&! 1>! *>7?6#&!?>1+! ;>%7'1(!'#$!1>!6#1&H%'1&!1+&7!Y61+>:1!9>((!>;!6#;>%7'16>#M!! G+&!H>'9!>;!E#64"[!6(!1>!L%>=6$&!'!$'1'!Y'%&+>:(&!;>%!?6>9>H6*'9!#&1Y>%V(-!Y+&%&!$6;;&%&#1!?6>9>H6*'9!$'1'?'(&(! '%&!;:99F!(&7'#16*'99F!6#1&H%'1&$-!:(6#H!'!;:(6>#!>;!1+&!&((&#16'9(!>;!?>1+!$'1'!;>%7'1(!S6>4"[!'#$!JS8BM! ^&! *>7L9&1&9F! %&A$&(6H#&$! >:%! &X6(16#H! $'1'! Y'%&+>:(&! S5__! \C]-! Y+6*+! Y'(! 6#1%>$:*&$! '! $&*'$&! 'H>M! ^&! %&L9'*&$!1+&!>9$!$'1'!7>$&9!S6>N>%&!?F!'!#&Y!>#&-!*>7?6#6#H!?>1+!;>%7'1(M! I>%!1+6(-!Y&!'#'9F2&$!1+&!7>$&9! 7'LL6#H! >;! (>7&! 1>>9(-! Y+6*+! *'#! *>#=&%1! >#&! ;>%7'1! 6#1>! 1+&! >1+&%! >#&M! G+&! S6>4"[! *>77:#61F! 6(! '9(>! Y>%V6#H! >#! 6#*>%L>%'16#H! L'%1(!>;! JS8B! 6#1>! 1+&!#&X1! S6>4"[! 9&=&9! `-!Y+6*+! 6(! (1699! 6#! &'%9F! $&;6#616>#! L+'(&M! ^61+!1+6(!#&Y!6#1&%#'9!$'1'!7>$&9!Y&!;:99F!(:LL>%1!1+&!*:%%&#1!(1'#$'%$62'16>#!&;;>%1(!6#!(F(1&7(!?6>9>HFM! ^&!+'=&!67L9&7&#1&$!1+&!$'1'!7>$&9!>;!E#64"[!6#!N__!'#$!.'='-!'#$!*%&'1&$!'#!>?a&*1A%&9'16>#'9!7'LL6#H!;>%! JbB!$'1'?'(&(M!^&!'$'L1&$!>:%!&X6(16#H!S5__!$'1'!Y'%&+>:(&!1>!1+&!#&Y!$'1'!7>$&9!'#$!%&Y%>1&!1+&!7>(1! 67L>%1'#1!67L>%1&%(M!J6#*&-!>:%!#&Y!$'1'!7>$&9!6(!?'(&$!>#!S6>4"[!'#$!JS8B!7>(1!$'1'?'(&(!*'#!?&!$6%&*19F! 67L>%1&$M!^&!'9(>!6#1%>$:*&$!'!7&*+'#6(7!;>%!(1>%6#H!$6;;&%&#1!=&%(6>#(!>;!(>:%*&!$'1'?'(&(!:(6#H!'!H%'L+! ?'(&$!=&%(6>#6#H! *>#*&L1M! G+6(! &#'?9&(!:(! 1>! 9>'$!>#9F! (:?(&1(!>;! 1+&!*>7L9&1&! #&1Y>%V! ?F!$&;6#6#H! &XL96*61! =&%(6>#(!>;!(>:%*&!$'1'?'(&(M! "!#&Y!$'1'!7>$&9!%&Z:6%&(!'!9>1!>;!'$'L16>#(!1>!1+&!(:%%>:#$6#H!*>$&-!Y+6*+!Y&!'%&!*:%%&#19F!Y>%V6#H!>#M! R#! ;:1:%&-!1+&!(&%=&%!Y699!L%>=6$&!$6;;&%&#1!Y&?!(&%=6*&(!;>%!(&'%*+6#H-!'#'9F26#H-!'#$!=6(:'9626#H!1+&!#&1Y>%VM!<#! 1+&!*96&#1!(6$&-!Y&!'%&!67L9&7&#16#H!'!#&Y!=&%(6>#!>;!>:%!*>7L%&+&#(6=&!=6(:'962'16>#!'#$!'#'9F(6(!1>>9!S65"! \C]M!S&*':(&!>;!+'=6#H!$6;;&%&#1!=&%(6>#(!>;!1+&!(>:%*&!$'1'?'(&(!6#1&H%'1&$-!Y&!&XL&*1!'!7'((6=&!&#9'%H&7&#1! >;!1+&!(1>%&$!$'1'M!G+&%&;>%&-!Y&!L9'#!1>!(Y61*+!1>!'!OKI!$'1'?'(&-!Y+6*+!6(!7>%&!>L16762&$!;>%!H%'L+A?'(&$! $'1'M!8>(1!(:61'?9&!;>%!>:%!L:%L>(&(!6(!OKIAC[!\`]-!Y+6*+!>:1L&%;>%7(!L>L:9'%!%&9'16>#'9!$'1'?'(&(!='(19FM! ! ,M! 3M! CM! `M! K&76%!Q-!N'%F!84-!4'9&F!J-!I:V:$'!/-!B&7&%!N-!c'(1%6V!R-!^:!)-!KdQ:(1'*+6>!4-!J*+'&;&%!N-!B:*6'#>!.!"#! $%T!78*'9#.$%&'3.//:"#);'6)("<(,<'0.,'=()84(;'<()('68(,#"+M!&$#!'()#"*+,)%!3e,e-!>?fgDTgChAg`3M! @:*V'!8-!I6##&F!"-!J':%>!@8-!S>9>:%6!@-!K>F9&!.N-!/61'#>!@-!"%V6#!"4-!S>%#(1&6#!S.-!S%'F!K-!N>%#6(+A S>Y$&#! "! "#! $%T! 78*' 6;6)*/6' 1#.2.+;' /(,5:='2("+:(+*'@A9BCDE' ('/*<#:/'0.,' ,*=,*6*")()#."' ("<' *F38("+*'.0'1#.38*/#3(2'"*)4.,5'/.<*26M!'()(,-)./$#(*0!3eeC-!GHf`DTh3`AhC,M! /0#12&%! .-! S'*V&(! N-! S9:7! G-! )&%'(*+! "-! />+9?'*+&%! <-! /':;7'##! 8-! B&#+>;! @A4T! 9IJ9' K' 78*' 9#.38*/#3(2'I*)4.,5'J()(1(6*L!'12!'()(,-)./$#(*0!3eei-!?f,DTCjiM! 5&:7'##!G-!^&6V:7!)T!MJNKO&E'('MPAQK6);2*'*"+#"*'0.,'MJNM!3.)*!456'!7,8)9!3eek-!Gf,DTj`iAjhgM! ! G+&!L%>a&*1!6(!(:LL>%1&$!?F!1+&!ĞƵƚƐĐŚĞ&ŽƌƐĐŚƵŶŐƐŐĞŵĞŝŶƐĐŚĂĨƚ;^WWϭϯϯϱ͚^ĐĂůĂďůĞsŝƐƵĂůŶĂůLJƚŝĐƐ͚Ϳ�! Poster 50 ([SDQGLQJ%LR/LQX[WRFUHDWHDSODWIRUPIRUDQDO\VLVRIFRPPXQLW\GLYHUVLW\DQG IXQFWLRQ 7LP%RRWK0HVXGH%LFDN'DQ3DVV3HWH.LOOH'DZQ)LHOG 1(5&&HQWUHIRU(FRORJ\DQG+\GURORJ\0DFOHDQ%XLOGLQJ%HQVRQ/DQH:DOOLQJIRUG8.2;%% &DUGLII6FKRRORI%LRVFLHQFHV&DUGLII8QLYHUVLW\0DLQEXLOGLQJ3DUN3ODFH&DUGLII&)$7 6HHWKH%LR/LQX[KRPHSDJHIRUDOOOLQNVDQGPRUHLQIRKWWSQHEFQHUFDFXNWRROVELROLQX[ (QYLURQPHQWDO PLFURELDO GLYHUVLW\ UHYHDOHG E\ PDVV VHTXHQFLQJ KDV EHHQ D KRW WRSLF VLQFH WKH JURXQGEUHDNLQJ 6DUJDVVR 6HD VWXG\ QHDUO\ D GHFDGH DJR 9HQWHU HW DO 6FLHQFH 1RZ ZLWK KLJK WKURXJKSXW VHTXHQFLQJ +76 HYHQ D PRGHVW ODE FDQ SURGXFH WKDW YROXPH RI VHTXHQFH DQG IURP SROOXWHG PLQH ZDWHU WR WKH KXPDQ JXW WR WKH ZDWHUV RI WKH DQWDUFWLF UHVHDUFKHUV DUH UHDG\ WR WXUQ WKLV SRZHUIXO VSRWOLJKW RQ QHZ HQYLURQPHQWV DQG GLVFRYHU WKH GLYHUVLW\DQGIXQFWLRQRIRUJDQLVPVWKDWOLYHWKHUH $ YDULHW\ RI VSHFLDOLVW RSHQ VRXUFH WRROV 4,,0( $PSOLFRQ1RLVH 0RWKXU M0278WD[RQHUDWRU WKH9(*$15SDFNDJHHWFDUHUDSLGO\EHLQJGHYHORSHGWRPDNHVHQVHRIWKHGDWDDQGSHUIRUP WKHPDQ\VWHSVQHFHVVDU\WRJRIURPUDZVHTXHQFHRIDPSOLFRQVLH6ULERVRPDOEDUFRGHV HWF WR PHDQLQJIXO DQDO\VHV OLNH GLYHUVLW\ LQGLFHV UDUHIDFWLRQ FXUYHV UDQN DEXQGDQFH $V ZHOO DV DQDO\VLV XVHUV PXVW 4& PDQLSXODWH VWRUH DQG VXEPLW WKHLU GDWD WR SXEOLF UHSRVLWRULHV UHTXLULQJ NQRZOHGJH RI \HW PRUH VRIWZDUH .HHSLQJ SDFH ZLWK DOO WKH WRROV QHHGHG FDQ EH D ERWWOHQHFNWRUHVHDUFKHUVHYHQDVWKHDFWXDOVHTXHQFLQJEHFRPHVHDVLHU 7KH %LR/LQX[ SODWIRUP EDVHG RQ 8EXQWX KDV EHHQ GHYHORSHG IRU RYHU HLJKW \HDUV DV D IUHH WXUQNH\DQDO\VLVVROXWLRQIRUELRLQIRUPDWLFV7RPHHWWKHQHHGVRIHQYLURQPHQWDOUHVHDUFKHUVWR FRSHZLWKGDWDIURP+76VWXGLHVZHQRZLQFOXGHDEXQGOHRINH\WRROVDQGDQDO\VLVSLSHOLQHV LQ %LR/LQX[ WR VXSSRUW DPSOLFRQ VHTXHQFH DQDO\VLV :H FXUDWH GRFXPHQWDWLRQ DQG PDLQWDLQ DWDEOHRIFDSDELOLWLHVIRUHDFKWRROWRDOORZXVHUVWRVHOHFWWKHDSSURSULDWHRQHVDQGILQGZKLFK FDQ ZRUN WRJHWKHU :H ZRUN ZLWK 'HELDQ0HG 0|OOHU HW DO %0& %LR WR HQVXUH RXU SDFNDJLQJ ZRUN LV FRQWULEXWHG WR FRUH 'HELDQ DQG 8EXQWX UHSRVLWRULHV DQG RXU SDFNDJHV DUH DOVRDGGHGWR&ORXG%LR/LQX[KWWSFORXGELROLQX[RUJ :H KDYH DOVR EHHQ SDUW RI WKH GHYHORSPHQW RI D QHZ PLQLPXP LQIRUPDWLRQ VWDQGDUG IRU GHVFULELQJJHQHPDUNHUVHTXHQFHGDWD0,0$5.6<LOPD]HWDO1DWXUH%LRWHFKQRORJ\ ,Q SDUWLFXODU %LR/LQX[ QRZ LQFOXGHV VRIWZDUH IURP WKH ,6$ SURMHFW KWWSLVDWRROVRUJ 7KLV VXSSRUWV ERWK WKH FDSWXUH RI 0,0$5.6 FRPSOLDQW DQQRWDWLRQV DQG DOVR WKH FUHDWLRQ RI RQOLQH FDWDORJXHVRIGDWDVHWVDQGDVVLVWHGVXEPLVVLRQWRWKHSXEOLFUHSRVLWRULHVKHUH(0%/65$ &RQWLQXHGXSGDWLQJRIWKH%LR/LQX[DPSOLFRQDQDO\VLVH[SHUWEXQGOHZLOOEHQHILWWKHH[SDQGLQJ FRPPXQLW\ RI UHVHDUFKHUV ZRUNLQJ LQ WKLV GRPDLQ :H PDLQWDLQ WKH VRIWZDUH DV D FRPPXQLW\ SURMHFWDQGLQYLWHSDUWLFLSDWLRQIURPLQWHUHVWHGSDUWLHV Poster 51 Study capturing: from research question to sample annotation Kees van Bochove1,2, Jeroen Wesbeek8, Tjeerd Abma3, Jahn-Takeshi Saito4, Robert Horlings1, Siemen Sikkema7, Chris Evelo4, Ben van Ommen8 and Jildau Bouwman8 The Hyve, Utrecht 2NBIC BioAssist Engineering Team, 3UMCU Metabolomics Centre, UMC Utrecht, 4BiGCaT Department of Bioinformatics, Maastricht University, 5Leiden/Amsterdam Centre for Drug Research, 6Plant Research International, Wageningen University, 7Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, 8TNO Quality of Life, Zeist. Correspondence to Kees van Bochove - kees@thehyve.nl. 1 !"#$%&'$( !" #$%&'(#" )*+,,$-.$" '/" 0&'&-/'%1+2&)#" &#" 2*$" $3)*+-.$" '/" 0&'&-/'%1+2&)#" 4+2+" +-4" 2*$" +))'15+-6&-."#)&$-2&/&)")',,+0'%+2&'-7"8*$"$32$-2"'/"4+2+"$3)*+-.$"4&//$%#"/'%"9+%&'(#"4'1+&-#:" /'%" $3+15,$:" /'%" 2%+-#)%&52'1&)#" 4+2+:" 2*$%$" +%$" ;$,,<$#2+0,&#*$4" %$5'#&2'%&$#:" ;*$%$+#" /'%" $7.7" 1$2+0','1&)#" 2*$#$" +%$" *+%4" 2'" /&-47" =';$9$%:" 2*$%$" &#" '-$" )'11'-+,&26" 0$2;$$-" +,," 0&'&-/'%1+2&)#"&-9$#2&.+2&'-#"&-9',9&-."0&','.6:"+-4"2*+2"&#"2*$"5%$#$-)$"'/"+"#2(46"4$#&.-:";*&)*" &#"*+%4"2'")+52(%$"&-"+"4+2+0+#$7">#5$)&+,,6";&2*")'15,$3"#2(4&$#:"#()*"+#")$%2+&-"-(2%&.$-'1&)#" &-2$%9$-2&'-"#2(4&$#:";&2*"4&//$%$-2"2&1$5'&-2#:"'%.+-#:"#+15,$#:"+##+6#"+-4"'2*$%"#2(46"/+)2'%#:" &2"0$)'1$#"*+%4"2'"?$$5"2%+)?"'/";*+2"+"4+2+5'&-2"'%"#$2"'/"1$+#(%$1$-2#"1$+-#"&-"2*$")'-2$32" '/"2*$"#2(46"4$#&.-7"8*$%$/'%$:";$"4$9$,'5$4"+-"'5$-"#'(%)$";$0"+55,&)+2&'-"@ABCDE"2*+2"+,,';#" 0&','.&#2#"2'"$-2$%"'%"(5,'+4"2*$&%"#2(46"4$#&.-:"+,'-.";&2*"'2*$%"#2(46"1$2+4+2+"#()*"+#"#(0F$)2" &-/'%1+2&'-:" $9$-2#" '%" #+15,$" )*+%+)2$%&#2&)#7" G$" +,#'" 4$9$,'5$4" +" #&15,$" H>B8" &-2$%/+)$" /'%" *''?&-." (5" +)2(+," I'1&)#J" 4+2+" 2'" 2*&#" #2(46" 4$#&.-:" #'" 2*+2" )%'##<'1&)#" K($%&$#" )+-" 0$" $#2+0,&#*$4:" +-4" 0(&,2" 2$)*-','.6" 1'4(,$#" &15,$1$-2&-." 2*&#" &-2$%/+)$" /'%" 2%+-#)%&52'1&)#:" 1$2+0','1&)#:"-$32".$-$%+2&'-"#$K($-)&-."+-4"),&-&)+,")*$1&#2%67"8*&#"+55,&)+2&'-"'%&.&-+2$4"&-" 2*$"L(AM"40LN"+-4"LOC"PBN"5%'F$)2#:"+-4"#$%9$#"+#"+"5%$)(%#'%"+-4"&-/%+#2%()2(%$"-'4$"/'%"2*$" #$1+-2&)";$0"2$)*-','.&$#"2*+2"+%$"4$9$,'5$4";&2*&-"2*$"M5$-N=!C8B"5%'F$)27" )'%**+#,-$#( " a. Study overview of an existing study | b. Subject information inclusion | c. An Excel importer for large datasets" " .*"#/$*Q"*225QRR40-57'%."<")-0%'*('-1*Q"*225QRR2%+)7-0&)7-,R.#)/"@!5+)*$"S7T",&)$-#$E" " Poster 52