How to create a Python library:
Ever wanted to create a Python library, albeit for your team at work or for some open source project online? In this blog you will learn how to!
The tutorial is easiest to follow when you are using the same tools, however it is also possible for you to use different ones.
The tools used in this tutorial are:
- Linux command prompt
- Visual Studio Code
Step 1: Create a directory in which you want to put your library
Open your command prompt and create a folder in which you will create your Python library.
Remember:
- With pwd
you can see your present working directory.
- With ls
you can list the folders and files in your directory.
- With cd <path>
you can change the current present directory you are in.
- With mkdir <folder>
you can create a new folder in your working directory.
In my case, the folder I will be working with is mypythonlibrary
. Change the present working directory to be your folder.
Step 2: Create a virtual environment for your folder
When starting your project, it is always a good idea to create a virtual environment to encapsulate your project. A virtual environment consists of a certain Python version and some libraries.
Virtual environments prevent the issue of running into dependency issues later on. For example, in older projects you might have worked with older versions of the numpy
library. Some old code, that once worked beautifully, might stop working once you update its version. Perhaps parts of numpy
are no longer compatible with other parts of your program. Creating virtual environments prevents this. They are also useful in cases when you are collaborating with someone else, and you want to make sure that your application is working on their computer, and vice versa.
(Make sure you changed the present working directory to the folder you are going to create your Python library in (cd <path/to/folder>
).)
Go ahead and create a virtual environment by typing:> python3 -m venv venv
Once it is created, you must now activate the environment by using:> source venv/bin/activate
Activating a virtual environment modifies the PATH and shell variables to point to the specific isolated Python set-up you created. PATH is an environmental variable in Linux and other Unix-like operating systems that tells the shell which directories to search for executable files (i.e., ready-to-run programs) in response to commands issued by a user. The command prompt will change to indicate which virtual environment you are currently in by prepending (yourenvname
).
In your environment, make sure you have pip installed wheel
, setuptools
and twine
. We will need them for later to build our Python library.> pip install wheel
> pip install setuptools
> pip install twine
Step 3: Create a folder structure
In Visual Studio Code, open your folder my python library
(or any name you have given your folder). It should look something like this:
You now can start adding folders and files to your project. You can do this either through the command prompt or in Visual Studio Code itself.
- Create an empty file called
setup.py
. This is one of the most important files when creating a Python library! - Create an empty file called
README.md
. This is the place where you can write markdown to describe the contents of your library for other users. - Create a folder called
mypythonlib
, or whatever you want your Python library to be called when you pip install it. (The name should be unique on pip if you want to publish it later.) - Create an empty file inside
mypythonlib
that is called__init__.py
. Basically, any folder that has an__init__.py
file in it, will be included in the library when we build it. Most of the time, you can leave the__init__.py
files empty. Upon import, the code within__init__.py
gets executed, so it should contain only the minimal amount of code that is needed to be able to run your project. For now, we will leave them as is. - Also, in the same folder, create a file called
myfunctions.py
. - And, finally, create a folder tests in your root folder. Inside, create an empty
__init__.py
file and an emptytest_myfunctions.py
.
Your set-up should now look something like this:
Step 4: Create content for your library
To put functions inside your library, you can place them in the myfunctions.py
file. For example, copy the haversine function in your file:
from math import radians, cos, sin, asin, sqrtdef haversine(lon1: float, lat1: float, lon2: float, lat2: float) -> float:
"""
Calculate the great circle distance between two points on the
earth (specified in decimal degrees), returns the distance in
meters. All arguments must be of equal length. :param lon1: longitude of first place
:param lat1: latitude of first place
:param lon2: longitude of second place
:param lat2: latitude of second place
:return: distance in meters between the two sets of coordinates
"""
# Convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# Haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers return c * r
This function will give us the distance in meters between two latitude and longitude points.
Whenever you write any code, it is highly encouraged to also write tests for this code. For testing with Python you can use the libraries pytest
and pytest-runner
. Install the library in your virtual environment:> pip install pytest==4.4.1
> pip install pytest-runner==4.4
Let’s create a small test for the haversine function. Copy the following and place it inside the test_myfunctions.py
file:
from mypythonlib import myfunctionsdef test_haversine():
assert myfunctions.haversine(52.370216, 4.895168, 52.520008,
13.404954) == 945793.4375088713
Finally, let’s create a setup.py
file, that will help us to build the library. A limited version of setup.py
will look something like this:
from setuptools import find_packages, setupsetup(
name='mypythonlib',
packages=find_packages(),
version='0.1.0',
description='My first Python library',
author='Me',
license='MIT',
)
The name variable in setup holds whatever name you want your package wheel file to have. To make it easy, we will gave it the same name as the folder.
Set the packages you would like to create
While in principle you could use find_packages()
without any arguments, this can potentially result in unwanted packages to be included. This can happen, for example, if you included an __init__.py
in your tests/
directory (which we did). Alternatively, you can also use the exclude argument to explicitly prevent the inclusion of tests in the package, but this is slightly less robust. Let’s change it to the following:
from setuptools import find_packages, setupsetup(
name='mypythonlib',
packages=find_packages(include=['mypythonlib']),
version='0.1.0',
description='My first Python library',
author='Me',
license'MIT',
)
Set the requirements your library needs
Note that pip does not use requirements.yml
/ requirements.txt
when your project is installed as a dependency by others. Generally, for that, you will have to specify dependencies in the install_requires
and tests_require
arguments in your setup.py
file.
Install_requires
should be limited to the list of packages that are absolutely needed. This is because you do not want to make users install unnecessary packages. Also note that you do not need to list packages that are part of the standard Python library.
However, since we have only defined the haversine function so far and it only uses the math library (which is always available in Python), we can leave this argument empty.
Maybe you can remember us installing the pytest
library before. Of course, you do not want to add pytest
to your dependencies in install_requires
: it isn’t required by the users of your package. In order to have it installed automatically only when you run tests you can add the following to your setup.py
:
from setuptools import find_packages, setupsetup(
name='mypythonlib',
packages=find_packages(include=['mypythonlib']),
version='0.1.0',
description='My first Python library',
author='Me',
license='MIT',
install_requires=[],
setup_requires=['pytest-runner'],
tests_require=['pytest==4.4.1'],
test_suite='tests',
)
Running:> python setup.py pytest
will execute all tests stored in the ‘tests’ folder.
Step 5: Build your library
Now that all the content is there, we want to build our library. Make sure your present working directory is /path/to/my python library
(so the root folder of your project). In your command prompt, run:
> python setup.py bdist_wheel
Your wheel file is stored in the “dist” folder that is now created. You can install your library by using:> pip install /path/to/wheelfile.whl
Note that you could also publish your library to an internal file system on intranet at your workplace, or to the official PyPI repository and install it from there.
Once you have installed your Python library, you can import it using:import mypythonlib
from mypythonlib import myfunctions
#Importing the pandas
import pandas as pd
#Loading the data
data = pd.read_csv('Iris.csv')
List of Top Libraries in Python
Now that we do understand a bit about what libraries are and what Python is, let us do a deep dive into some of the most commonly used libraries in Python:
1. Pandas
Pandas is a BSD (Berkeley Software Distribution) licensed open-source library. This popular library is widely used in the field of data science. They are primarily used for data analysis, manipulation, cleaning, etc. Pandas allow for simple data modeling and data analysis operations without the need to switch to another language such as R. Usually, Python libraries use the following types of data:
- Data in a dataset.
- Time series containing both ordered and unordered data.
- Rows and columns of matrix data are labelled.
- Unlabeled information
- Any other type of statistical information
2. NumPy
NumPy is one of the most widely used open-source Python libraries, focusing on scientific computation. It features built-in mathematical functions for quick computation and supports big matrices and multidimensional data. “Numerical Python” is defined by the term “NumPy.” It can be used in linear algebra, as a multi-dimensional container for generic data, and as a random number generator, among other things. Some of the important functions in NumPy are arcsin(), arccos(), tan(), radians(), etc. NumPy Array is a Python object which defines an N-dimensional array with rows and columns. In Python, NumPy Array is preferred over lists because it takes up less memory and is faster and more convenient to use.
NumPy
Features:
Interactive: NumPy is a very interactive and user-friendly library.
- Mathematics: NumPy simplifies the implementation of difficult mathematical equations.
- Intuitive: It makes coding and understanding topics a breeze.
- A lot of Interaction: There is a lot of interaction in it because it is widely utilised, hence there is a lot of open source contribution.
The NumPy interface can be used to represent images, sound waves, and other binary raw streams as an N-dimensional array of real values for visualization. Numpy knowledge is required for full-stack developers to implement this library for machine learning.
3. Keras
Keras is a Python-based open-source neural network library that lets us experiment with deep neural networks quickly. With deep learning becoming more common, Keras emerges as a great option because, according to the creators, it is an API (Application Programming Interface) designed for humans, not machines. Keras has a higher adoption rate in the industry and research community than Tensor Flow or Theano. It is recommended that you install the Tensor Flow backend engine before installing Keras .
Features:
It runs without a hitch on both the CPU (Central Processing Unit) and GPU (Graphics Processing Unit).
Keras supports nearly all neural network models, including fully connected, convolutional, pooling, recurrent, embedding, and so forth. These models can also be merged to create more sophisticated models.
Keras modular design makes it very expressive, adaptable and suited well to cutting-edge research . Keras ’ is a Python-based framework, that makes it simple to debug and explore different models and projects.