Dr. Jason Gallant was a featured guest on the Mid-Michigan Pet Expert Talk Show, hosted by Lee Cohen and Rick Preuss. The show was broadcast on September 12th, 2015 on 1320 AM WILS Lansing, but the interview segments are also posted here. Thanks Rick and Lee for a great conversation!
We are pleased to announce our new postdoctoral research associate, Will Pitchers, joined the lab on August 1st, 2015. Will comes from Ian Dworkin’s lab, where he worked on the evolutionary genomics of wing shape in Drosophila. He received his Ph.D. at the University of Exeter in 2010 working on crickets, with a thesis entitled “Trait Integration as a Constraint on Phenotypic Evolution”. Will has published many scientific papers regarding genetics and evolution, and will be a very valuable member of our team. Welcome Will!
A National Science Foundation postdoctoral position is now available in the Electric Fish Laboratory in the Department of Integrative Biology at Michigan State University (http://efish.zoology.msu.edu) concerning the genomic basis of electric signal diversity in mormyrid electric fish. Electric signals are crucial in the evolution of reproductive isolation among the rapidly diverged Paramormyrops genus. This interdisciplinary project will involve the combination of gene expression analysis (RNA-seq), population genomics, and the application of new and exciting transgenic techniques in non-model systems. For an overview of approaches see Gallant et al. (2014, Science) and Gallant et al. (2014, Nature Communications).
This position is available as of May 2015, but start dates are negotiable. Appointment for this position will be initially for 12 months, with the possibility of renewal for up to three years, contingent on the continued availability of funds as well as satisfactory output of successful applicant.
Duties: The successful applicant will be responsible for constructing genomic and transcriptomic libraries, as well as their assembly and analysis using bioinformatics tools in a high-performance computing environment. Work will require the development of new research methodologies and tools, and publishing results in high-impact journals. In addition, successful applicants will be responsible for co-training undergraduate, graduate students and technicians, as well as contributing to the overall research environment of the University. Successful applicant will be encouraged to participate in a one-of-a-kind NSF-sponsored BEACON center for the study of evolution in action (http://bit.ly/GN0Rhx), for which MSU is the host institution. Professional development opportunities will be provided in conjunction with the MSU Office of Postdoctoral Training and the Center For Academic Excellence.
Required qualifications: Ph.D. or equivalent degree in biology, evolution, bioinformatics, genetics or related field. Publication of work based on Ph.D. thesis is required, as well as teaching and supervisory experience. A strong working knowledge of next-generation sequencing technologies, as well as proficiency in the use of UNIX/Linux command line operating systems. Competence in at least one computer scripting language (R, Python, Perl, MATLAB). Must be willing and able to perform fieldwork in Gabon, West-Central Africa.
Preferred qualifications: Experience with genomic assembly and analysis, population genetics, experience and background in communication systems and aquatic vertebrates, fluency or competency in French.
To apply, email the following: a cover letter, current CV, and the contact information (phone number and email address) of three referees to jgallant [at] msu.edu.
The format of the reception featured a discussion by Patrick Hsu, from Editas Medicine, formerly of Harvard (a feature speaker at last year’s CRISPR Symposium) as well as shorter talks by David Arnosti, Kathy Meek, Eran Andrechek, Keith Latham and Jason Gallant.
We’ve been trying to breed Pollimyrus isidori in the lab for the last couple of months, which are well reputed good parents . Withstanding some pH problems that we have been having over the past months, we’ve finally witnessed the fruits of our labor. As of another big clutch of eggs this morning, we have more than 200 individuals!
To my surprise, the 50 or so individual fry we found yesterday afternoon all seemed to be hanging out near their dad– certainly not a random distribution. Watching dad for a few minutes revealed a very interesting behavior– whenever a fry (which at this early stage are quite pathetic swimmers) managed to get away from the nest, Dad would come by and scoop him it up in his mouth. Initially, I thought that he was eating the eggs, but then I witnessed that after a minute or so, he would swim back to his nest and spit it back out! Armed with my iPhone, I got some video of the pickup behavior. Most of the time, he put the eggs where I couldn’t video record, so the video represents the pickup of one egg and the dropoff of another:
This morning, Monica and I found many more fresh eggs– we took them up to count, and then returned a portion to the tank in a petri dish to see what the father would do– about 50 all told. We left the lights off for about 2 hours, and when I came back, all of the eggs were tucked safely back in the nest! Truly impressive behavior!
Savvas Constantinou (BS Trinity College ’12; MS Central Connecticut State University ’14) will be joining the Ph.D. program in Integrative Biology at Michigan State University in Fall 2015. Savvas did his master’s thesis working on classical conditioning in planarian flatworms, and is currently a technician in the laboratory of Terri Williams at Trinity College. We are very excited to have him join our lab, where he will be working on some exciting projects on the evolution and development of electric organs! Welcome Savvas!
I’ve been running an experiment trying to implement AWS in some of our bioinformatics projects. After surveying some nice reviews (Yandell and Ence, 2012), I’ve decided on the MAKER pipeline to get things started. The basic approach will be to take orthologous proteins from other organisms, together with existing transcriptome data (assembled by cufflinks), and use this as a means of pulling out putative coding regions to train an ab-inito gene predictor (such as SNAP or Augustus) to build gene models. Given the numerous BLAST steps that are involved in aligning this data, as well as processing it, the folks in the Yandell lab made MAKER parallel, through the use of either OpenMPI or MPICH2. Although we have an installation of the MAKER software on campus, it hasn’t been updated in a while, and there is something of a queue run very long jobs with lots of processors– I wondered if there were other options.
Many folks have touted the virtues of using Amazon Web Services (AWS) for bioinformatics (e.g. here)– it is great for making portable machines for bioinformatics analysis. For the uninitiated, AWS ‘leases’ computing time to anyone with a credit card for the purposes of data crunching. It’s tremendously handy, but bioinformaticians typically deploy only one machine for their analysis– not very handy for taking advantage of the MPI capabilities of MAKER. Certainly, someone with a strong background in computer science or programming could set up a cluster using AWS, but who has the time? Some quick googling uncovered a tremendous software package called, Starcluster, essentially a script that takes all of the pain out of this process. It’s software written by folks at MIT, and with some minor edits to a configuration file, you can get your own cluster up and running in a matter of minutes.
What follows is an assembly of links and advice in getting MAKER running on AWS. The system that I’ve been able to assemble through a lot of trial and error works fairly reliably on large eukaryotic genomes. By my estimation, a full run (including repeat masking, protein BLASTs, ab-inito gene prediction, etc.) takes about 7-8 hours on 200 nodes with 4 processors each. Not too shabby.
Step 1: Set up Starcluster and Amazon EC2
Rather than reinvent the wheel, check out Wes Kendall‘s awesome tutorial on how to set up AWS and Starcluster to launch your first cluster.
Starcluster also maintains a quick start guide that is also quite nice if you’d like to follow that instead, though it lacks the information about getting set up with AWS, if you’ve never used it before.
Once you’ve started and terminated your first cluster, come back here and we will play with the config files.
Step 2: Set up region and data volumes
You’ll want to put your data somewhere– the instances that you will create have what is called “ephemeral” storage, meaning that the memory is cleared each time the computer is started up. To keep your data, you’ll have to create a volume to store it. EC2 makes this very easy. Go to your EC2 dashboard, and click on “Volumes” under Elastic Block Store. Click “Create Volume” at the top. Choose a General Purpose (SSD) drive, and request an appropriate size (you’ll want to be able to accommodate the size of the data that you are using in MAKER, as well as the output data. My data drive is 300GB, which is more than ample room (probably overkill). Pricing is per GB, and varies regionally. The information on pricing is located here. You’ll also want to choose an availability zone to work within– EBS volumes are not accessible outside their availability zone. Remember the availability zone that you choose for your data volume (it matters not). When you click “create” you’ll be returned to a list of the volumes you’ve created– when the creation process is complete, you’ll be able to name your volume something descriptive (e.g. “data”). Note the volume ID.
Next, you’ll want starcluster to play nicely with your newly created volume. To do this, you simply need to edit the availability zone settings of your cluster by editing the configuration file:
AWS_REGION_NAME=us-east-1d#this matches the availability zone of your data volume
AWS_REGION_HOST=ec2.us-east-1.amazonaws.com# this is found by looking on AWS (link below)
The region hosts are maintained on this list by Amazon.
Step 3: Configure MPI and Install MAKER onto an AMI
With that out of the way, we must now get down to the business of running MAKER on your new cluster. To start with, we want install MAKER on our machine. A nice overview of the process is located here:
The first 64-bit image should work perfectly fine for our purposes. Noting the AMI number (ami-3393a45a), we will tell starcluster to start a one-node cluster for us to download our software on. We’ll use an m1.xlarge machine for this, although it doesn’t really matter so long as it is a 64-bit machine.
Once starcluster is done, you can connect with the following command:
starcluster sshmaster software_install
It is important to note at this time that you are logged in as the “root” user when connecting by this method. The root user’s home directory “/root” is NOT shared via NFS to other nodes on the system, which means trouble for using MPI with your installed programs. The home folder of the cluster user (set in your configuration file by default to be “sgeadmin”) is, however shared by NFS. That’s where you want to put MAKER. Go to the following website to register and obtain the download link for MAKER:
We can see here that the path to libmpi.so is: “/usr/lib/openmpi/lib/libmpi.so”. You’ll need to set some environment variables next, before you go any further. Open ~/.bashrc with your favorite text editor and add the following lines at the bottom:
Close your .bashrc file and reload it by typing the following:
Now we’re ready to rock! You should be able to install Maker by following the standard build instructions included in the distribution:
## Follow command prompts.
## Choose Y for Maker for MPI, and use default paths for MPI.
./Build install deps
#Note that you'll have to register at repbase in order to download
#the repeatmasker libraries
You should now have a working installation of MAKER!
Once you’re satisfied that everything is installed correctly, you’ll want to save the image in order to use it on your cluster. Go to the AWS EC2 control panel, and click on “Instances”. Select your “master” node from the list, and click “Actions>Image>Create Image”. Give your image a name and description, and click “Create Image”. Note that any volumes attached to the machine will also be imaged by default. You can remove these safely to make the smallest image possible (click the little x’s).
Once the image is completed, you’ll be able to find the AMI id under “Images>AMIs”.
Step 4: Configure Your Cluster
You’re going to want to make some changes to the starcluster configuration file. Cluster configuration is based on a template system which is fully extensible, which means you don’t have to retype all of the configuration options, only the ones you want to change. We’ll base our new template on the “default” small cluster template. Inside your “Additional Cluster Templates” section, add the following:
NODE_IMAGE_ID=XX#insert the AMI ID number for the instance containing your MAKER installation
In addition, you’ll want to tell starcluster where to find your EBS data volume that you created earlier. In the configuration file, find the “Configuring EBS Volumes” section in your config file and add the following:
VOLUME_ID=# volume ID for your EBS 'data' volume
You’re ready to rock!
Step 5: Setup Your Cluster
Notes on “Hardware” and Pricing
First things first, you’ll want to select the “hardware” (known as an instance) that you’ll be making your cluster out of. The specs and pricing per hour are located here:
This can be somewhat overwhelming at first– but worry not! For my experimentation purposes, I’ve been using m1.xlarge resources, which feature four virtual CPUs, about 15GB of RAM and decent network performance. You can get pretty crazy with RAM and CPUs, but this typically suffices. Feel free to play with this as you see fit!
You have the option of paying the listed “reserved” rates, meaning that as long as you want to use the machine, it’s yours! But, as scientists with ever shrinking grant funds, sometimes its a good idea to minimize costs whenever possible. Enter Amazon’s “Spot Pricing”. Here you can bid on computing time, and so long as your bid is not exceeded, you can use the resources you request. There is a bit of an art to selecting the proper price so your jobs don’t die– but thankfully Amazon does the hard work of finding out what historical pricing has been. To figure this out for your desired hardware, StarCluster makes it pretty easy to check:
Which will open up a webpage with pricing over the last 30 days, as well as print a summary of the pertinent information to your terminal:
>>>Fetching spot history form1.large(VPC)
So, were you to bid on these computers, you would pay $0.0161 per instance per hour– a significant savings off the current “reserved” price of $0.175 per Hour (about 10% the price in fact!). If demand for these computers shoots up (as you can see happens periodically from your graph), pricing goes up. If your bid exceeds the current price, you get to keep running at the new price. If not, your instance is closed. The trick is to make a bid that is high enough to keep your jobs running– typically, I’ve been working at approximately double the current price, and that seems to work fairly well.
Launch your cluster with the following command. Remember, we’ve set a 200-node cluster in our configuration file. The -b flag sets your bid price, and the -c tells starcluster to use your makercluster template. Because there are so many nodes, this might take a while!
Once everything is started up, you are ready to run Maker. You can upload data from your personal hard drive to the /mnt/data storage area by the following:
starcluster put makercluster/mnt/data/local/path/to/datafile
Once you’ve uploaded your data, go ahead and login to your master node:
starcluster sshmaster makercluster
Start a directory for your maker data to live in, then the following to generate your Maker control files.
Specifically for your maker cluster, you want to reduce the chunk size of the DNA to be more RAM-efficient and set the temporary directory to be a location specific to each node, rather than using NFS (which slows things down and creates headaches). In the maker_opts.ctl file, set the following options:
Finally, you’ll have to generate a hosts file so that OpenMPI knows how to farm out the jobs:
Which looks like:
# The following lines are desirable for IPv6 capable hosts
Copy this to a text editor, and remove all of the lines prior to the line containing “master”. Remove all IP addresses, so that the following list looks like the following:
Save this text file as “hosts.txt” in the maker output directory.
Once you’re satisfied that everything is correctly setup, you should be able to run MAKER with a command like the following:
Jason Gallant, assistant professor of Zoology received a $699,000 grant from the National Science Foundation to investigate the genomic basis of electric signal diversity in mormyrid electric fish. Electric fish, such as mormyrids, produce weak electric fields for the purposes of communication and navigation through their environments. As part of this three-year research project starting in May 2015, Gallant will leverage his recent discovery of a ‘hybrid zone’ between populations of electric fish with distinct electric signals to identify genes responsible for differences in electric signals. The project will draw on next-generation genomic sequencing technologies, as well the development of new transgenic techniques in electric fish. This research is important because electric discharges are a critical component in the speciation of mormyrid electric fish. Identifying genes responsible for behavioral differences within species will ultimately help biologists understand how changes in behavior can facilitate, or perhaps cause, one species to become multiple species. In connection with the work, Gallant’s laboratory will bring a new educational outreach program, focusing on “forms of energy”, to middle school students in Olivet, MI.
Perhaps the nerdiest of all Christmas decorations I’ve ever put up, here is the official electric fish christmas tree!
This christmas tree is not “powered” by electric fish, rather it is controlled by an electric fish. Weakly electric fish continuously produce pulses of electricity for communication and navigation in their environments– every time the fish produces a pulse, the tree lights up.
This is done with a little RadioShack magic– an Arduino board with a relay shield is all you need!
(1) Arduino Uno (or equivalent)
(2) Seed Studios Relay Shield
(3) Old extension cable (two-prong)
(4) Amplifier & Electrode (great info here on some low cost DIY options )
(5) Code below (suggestions for improvement are welcome!)
Take an old indoor extension cable, and carefully splice it and connect one of the relays. Next, a simple amplifier is connected to an electrode and dropped in the water with the electric fish. The amplifier output is connected to the analog input pin and ground of the Arduino board. Fire up your Arduino editor, and load the following code:
Electric Fish Christmas Tree 1.0
Analog Input with prescale change (thanks to jmknapp)