Python FAQ
Workshop: using Python at IAC
Slides of the Using Python at IAC workshop (11.12.2019)
General
Warning: do not automatically activate an environment in .bashrc
- Automatically activating an environment in (e.g.
source activate iacpy3_2019
) in your .bashrc
can break your login
- If you cannot login and suspect it is because of this issue
- open alternative (non-graphical) shell with
CTRL + ALT + F1
- edit your
.bashrc
and remove the line with source activate environment
Also check the docs on JupyterHub!
JupyterHub
Should I use Python?
Tutorials
Which version should I use? python 2 or python 3?
- You should use python 3 for every new project.
- Note: Python 2 has reached its end-of-life in 2019. It no longer receives bugfixes. Most projects (numpy, matplotlib, ...) stoped supporting python 2!
What's the recommended way to use python at IAC?
- We recommend to use
conda
to manage packages and environments. See below.
What about other solutions?
- Python installed on the machines?
It is discouraged to use the python installation on the machines. Package versions can change unexpectedly which may lead to code incompatibilities. This solution is not viable for long-term reproducibility.
- Virtual environments?
Conda is superior to virtual environments because it also handles non-python dependencies (such as the netCDF library) and can therefore offer a more stable environment.
What are Best Practices when using python?
- This file has to be made executable (
chmod +x startup
) and needs to be invoked as source startup
.
- See this gist for a more complete example
Why is it important to work with a fixed environment?
- Python packages undergo a rapid development and may become incompatible with your script. If you want to re-run your analysis in at a later stage you want to have the same versions for the packages you used. Therefore it is important that you know which environment you used.
Using Conda (on Linux)
What is conda?
- conda is a program that manages (python) packages and environments. It allows to use a centralized installation while still providing user flexibility in term of package installation.
What is mamba?
- mamba is a faster drop-in replacement for conda - it allows for much faster dependency solving. You can replace almost every
conda command
by mamba command
What is a conda environment?
- From the conda documentation: A conda environment is a directory that contains a specific collection of conda packages that you have installed. For example, you may have one environment with NumPy 1.7 and its dependencies, and another environment with NumPy 1.6 for legacy testing. If you change one environment, your other environments are not affected. You can easily activate or deactivate environments, which is how you switch between them. You can also share your environment with someone by giving them a copy of your environment.yaml file.
What is a conda package?
- A conda package is a compressed tarball file that contains system-level libraries, Python or other modules, executable programs and other components. Conda keeps track of the dependencies between packages and platforms. Thus, it can not only handle python packages but also other dependencies (e.g. the netCDF c library).
How do I use conda?
- Load the module
module load conda
- View all environments
conda env list
What environments are available?
- You can also make all environments available with
module load conda
Older environments
-
module load miniconda3
is equivalent to module load conda
I am missing a package - what can I do?
- Write to iac-linux@env.ethz.ch - we can generally add single packages to the existing environments. If you are impatient, see below.
How can I install a single package?
- You will need to create your own environment (see below), and then add the package with
mamba install <package>
How can I create my own environment?
- conda allows to manage environments without being root!
Create a new environment
- NB: don't forget to include ipython in your new environment
Clone and tweak an existing environment
Background information about the differences between pip, conda and anaconda
How can I create an executable python script when using a conda environment?
- To run a python script directly from the command line (
./script.py
) you need to add the following at the top of your script
#!/usr/bin/env python
- However, the environment needs to be loaded, before it is executed.
- Of course the file needs to be executable (
chmod +x scripy.py
)
Run spyder remotely
Spyder 3 shows strange symbols
- Something goes wrong with the font in spyder3. The workaround is to use the symbols of spyder2.
- In spyder3 go to
Tools
> General (Appearance)
> Icon Theme
: Change to Spyder 2
and restart spyder.
user packages no longer available in conda environment and JupyerHub
- Packages installed with
pip install <package> --user
(site packages) are no longer available in personal conda environments.
- This should not effect most users.
Why was this change introduced?
- Conda environments should be self-contained and reproducible. Site packages 'pollute' the clean workspace (i.e. there could be packages in an environment that were never installed into it).
- You don't need
--user
to install packages with conda+pip.
- Most often you will be able to install packages directly with conda (
conda install <package>
)
- When using conda+pip it is not necessary to install packages with
--user
; you can do this with pip install <package>
I'am missing my self-installed packages, what can I do?
What has changed?
-
module load conda
now does the following: export PYTHONNOUSERSITE=1
. This avoids putting the site-packages in the pythonpath, as explained in the documentation.
Packages
xarray
How do I prevent xarray from automatically adding _FillValue
to coordinates or variables without NaN
?
- Specify encoding when saving to disk (xarray version 10.2)
# Saving back to disk with encoding dictionary telling not to use _FillValues for specified coordinates or variables
encoding = {'lat' : {'_FillValue' : None}, 'lon' : {'_FillValue' : None}, 'time' : {'_FillValue' : None}}
xarray.Dataset.to_netcdf('./outfile.nc',format='NetCDF4',encoding=encoding)
Why did xarray change the type of an nc attribute to string, causing other software to crash on the resulting nc file?
- Probably there was a special character (e.g. 'ü') in the attribute. Two suggested workarounds. (xarray version 10.2)
# Workaround A) use 'NetCDF4_CLASSIC'
xarray.Dataset.to_netcdf('./outfile.nc',format='NetCDF4_CLASSIC')
# Workaround B) change the attribute name
xarray.Dataset.attrs['institution'] = 'IAC ETH Zuerich' # instead of 'IAC ETH Zürich'
xarray.Dataset.to_netcdf('./outfile.nc',format='NetCDF4')
matplotlib
I don't see a figure when typing plt.show()
?
There are two ways to achieve this:
More information can be found here:
http://matplotlib.sourceforge.net/users/customizing.html
DEPRECATED - NO LONGER RELEVANT
How to run jupyter notebook on a server?
- See below how to run a notebook on a server from your personal computer.
- Here we to setup a jupyter notebook running on a server and use it from your computer.
- It will make use of tmux on the server to keep a session running even if you are not logged in anymore.
- For security purpose the jupyter notebook produces a token at start, you need to copy this token (password) to the login screen of the notebook.
- Open the browser and go to: http://127.0.0.1:8888
- Copy the token given by the jupyter notebook on the server and paste it in the login field.
- This also works to tunnel a notebook from a linux machine to a windows machine
- e.g. via (WSL [windows subsystem for linux]), or Start -> cmd
- You can recover the full address including the token on the SERVER by pressing
CTRL C
once.
Access notebook from your personal computer
- Please don't run jupyter/ python on fog or fog2. These are login nodes that don't have many ressources.
You cannot directly ssh
to our servers. Therefore, the above solution does not work from you personal computer. There are two possibilities.
1) Configure fog as a 'jumphost'
2) USE VPN