Homework
1. Linux install and BASH
Installation of a Linux environment. Most bioinformatics is done with the Linux operating system, rather than for example windows. Ubuntu is a modern version of Linux which is free and widely used. If you run windows, you can install an Ubuntu terminal directly with the Ubuntu subsystem.
- Open the Windows Store on your machine and install Ubuntu 22.04 (not 20.04 and not 24.04!).
- Press the windows key W and type ‘cmd’, then enter. This will open the windows command line.
- Type this and then enter:
- wsl –unregister ubuntu
- Type this and then enter:
- wsl –install
- Enter a username and password you can remember (mine is the same as my university account).
- Now open your Ubuntu terminal!
When you have successfully installed ubuntu, make sure you update it with these two commands:
sudo apt update
sudo apt upgrade
**IMPORTANT**
THE COMMANDS OF THE COURSE ARE SPECIFICALLY MADE FOR A WINDOWS SYSTEM RUNNING THE WINDOWS SUBSYSTEM FOR LINUX (WSL) AS INSTALLED ABOVE – I STRONGLY SUGGEST THAT YOU ONLY USE THAT
IF YOU HAVE AN OLD VERSION OF UBUNTU YOUR WINDOWS, GET RID OF IT AND INSTALL AS ABOVE
IF YOU HAVE UBUNTU THROUGH VIRTUAL BOX, GET RID OF IT AND INSTALL AS ABOVE
IF YOU HAVE ANYTHING OTHER THAN AS DESCRIBED ABOVE, GET RID OF IT AND REINSTALL AS ABOVE
WE ALWAYS SPEND WAY TOO MUCH TIME TRYING TO MAKE ALTERNATIVES WORK, SO PLEASE JUST DO AS INSTRUCTED
**EVEN MORE IMPORTANT**
PLEASE JUST DO THE ABOVE, IT WILL SAVE YOU SO MUCH TIME
For MAC OS X users, you are already running Linux underneath the OS X interface. You have access to the command line with a standard program called ‘Terminal’. The work we will be doing should all work here (but with some minor details).
Email me if you have any issues.
Regardless of your operating system, make sure you have a working terminal and then do a Linux tutorial here:
https://app.datacamp.com/learn/courses/introduction-to-bash-scripting.
The basic programming language of the Linux command line is called bash, and we will be using this extensively. If you show up with no knowledge of this, you will probably not learn a whole lot, so do the tutorial and play around as much as you can.
2. Conda installation
One of the most tricky parts of bioinformatics is the installation of packages. Package A needs version X of package B, but package C need version Y of Package B, which might not be possible. Luckily, the conda package manager takes care of this for us by working out the details of these dependencies and allows us to make individual ‘environments’ for each set of packages for each analysis.
Follow the instructions here (go all the way to the bottom and remember you are now running Linux and not windows). Copy and paste (and run!) each command individually.
https://docs.anaconda.com/miniconda/
For MAC users, you should obviously use the macOS instructions.
Close your terminal and open it again to finalize the installation.
3. Installation of R and Rstudio
We need the programming language R for the metataxonomy. R is a statistically minded programming language, which is great for statistics and for plotting – all my statistics and plots are made in R. We will use a great integrated development environment (IDE) for R called Rstudio. Rstudio makes writing and running code exceptionally easy. Rstudio runs directly in windows (or macOS) so don’t mix it up with Linux!
First you install R:
Then you install Rstudio