Katie Baker

Katie Baker

2015 MECEA PhD winner

So you want to learn bioinformatics?

When I started my PhD at the University of Dundee, I spent about a year in the lab developing a ChIP-seq (chromatin immunoprecipitation followed by Next Generation Sequencing) protocol for barley. Foreseeing a bioinformatics* bottleneck, with the encouragement of my supervisor Prof. Andy Flavell, I set about learning Linux command line computing, data mining and the Java programming language. Here, I’d like to describe some of my experiences and relay some advice as a biologist-turned-bioinformatician.

My first encounter with Linux was to identify ancient gene pairs in barley using a program called MCScanX. Quite unusually, the program installed without a hitch, ran quickly and efficiently and produced beautiful results (a large proportion of the time spent doing bioinformatics is installing, trialling and debugging programs that have terrible documentation, devour RAM or just will not work). I was hooked from that moment onwards and I took every opportunity I could to develop my bioinformatics, data analysis and programming skills.

For a biologist learning bioinformatics, there is no fast track. Courses can be great for focusing the mind, but the real work is spent teaching yourself bioinformatics programs and how to write scripts and code. I was lucky to have the guidance of Dr. Micha Bayer at the James Hutton Institute who set me off on the track of read mapping, SNP calling and Java programming. Being able to discuss data analysis or programming issues with an experienced bioinformatician is one of the most valuable sources for help for anyone learning bioinformatics.

The most important attribute for success with bioinformatics is perseverance. Trial and error, identifying mistakes and debugging programs were an important part of learning the trade. Moreover, given enough time and effort, any willing biologist is capable of performing command line NGS analysis. Bioinformatics is a fast-paced field with new programs and methodology being developed all the time to keep up with data generation. Awareness of the available tools is critical, but if you can’t find a tool to do a task, with the right skills you can build your own. During my final year I developed multiple tools for different tasks including ordering barley contigs, estimating recombination rates and plotting distributions of transposable elements.

Bioinformatics should be an integral part of an undergraduate education, arming the next generation of biologists with the tools to handle “big data”. Future challenges to crop improvement may be solved with the help of NGS data analysis. Indeed, these are skills valuable in many career paths and data scientists are in high demand. Monogram 2015 was great for relating my NGS research to the wider cereal community and discussing the opportunities bioinformatics may hold for crop improvement. I feel lucky to be working in such a dynamic, exciting field and recommend interested biologists to take the plunge and develop their bioinformatics skills.

* Bioinformatics is a large field but my specialisation is NGS so when I refer to “bioinformatics” I refer to the computational approaches to NGS data handling and analysis.