π§πΎβπ»
prep
Overview description of the prep work for the sprint
π» Single-use data analysis programs
Learning Objectives
We’ve seen two different ways of analysing some input to produce an output.
Sometimes we can use, or combine, existing tools to get answers. For instance, we can count the words in a file with wc
.
Sometimes we can write custom tools when existing tools don’t quite do what we want. For instance, we wrote a program to count specific words.
When we want to answer some question, sometimes it’s useful to write a program we may only use once. (Or we may re-use in the future.)
It’s not always obvious whether it’s easier to try to use tools that already exist, or to write our own.
π‘Tip
This is like in real life! Imagine if you had two differently sized bottles, and wanted to pour all of the liquid from one to the other without spilling any.
You can imagine making the perfect tube that has exactly the right size connector at each end to connect to the bottles.
Or maybe you already have a funnel that’s about the right size - not perfect, but close enough, and you can probably use.
But if you got a really wide or really narrow bottle, maybe that funnel wouldn’t be good enough and you would need to make a custom solution.
Sometimes the format of our data makes it easier or harder to use existing tools.
Let’s look at some sample data:
[
{
"name": "Daniel",
"score": 100
},
{
"name": "Kristina",
"score": 120
},
{
"name": "Iulia",
"score": 95
},
{
"name": "Aleks",
"score": 190
},
{
"name": "Daniel",
"score": 80
},
{
"name": "Fatima",
"score": 110
}
]
Here are a few questions we may want to answer about this data:
- What was the name of the first person to play the game?
- What was the name of the last person to play the game?
- Who had the highest score?
- The names of everyone who played the game directly after Daniel?
We can probably answer all of these questions with jq
. We can also definitely write a program to answer all of these questions for us.
The first three are similarly hard to solve in jq
or with a programming language.
The last one is quite hard to solve in jq
.
Exercise
Solve all of the first three questions in both jq
and your choice of JavaScript or Python.
Which approach do you think is quicker to write? Which is easier to think about?
Exercise
Solve the fourth question in your choice of JavaScript or Python.
Now spend no more than 20 minutes trying to solve it with jq
.
What do you think makes this harder to solve in jq
?
What
π Comparing JavaScript and Python
Learning Objectives
JavaScript and Python have many things in common.
Most differences are “cosmetic”. Here are some examples of cosmetic differnces:
- Some functions and operators have different names. But often there are functions/operators which do exactly the same thing.
- JavaScript uses
{}
around blocks of code and we choose if we indent code. Python uses:
and indentation is required. - In JavaScript we choose to name variables in
camelCase
, whereas in Python we choose to name variables insnake_case
. In both langues we could do either; this is called aconvention π§Ά .π§Ά convention A convention is something a group (maybe a team, or a company, or most users of a programming language) agree to do. It’s not quite a rule - things could work another way. But we agree one way we’ll all do it anyway.
e.g. in Python you could name one variablefirstName
and anothermiddle_name
and anotherLASTname
, but if everyone agrees to usefirst_name
andmiddle_name
andlast_name
it makes it a bit easier for everyone to read because they know what to expect.
Recall our “count containing words” JavaScript code. Now think about what it would look like in Python.
import { program } from "commander";
import { promises as fs } from "node:fs";
import process from "node:process";
program
.name("count-containing-words")
.description("Counts words in a file that contain a particular character")
.option("-c, --char <char>", "The character to search for", "e");
program.parse();
const argv = program.args;
if (argv.length != 1) {
console.error(`Expected exactly 1 argument (a path) to be passed but got ${argv.length}.`);
process.exit(1);
}
const path = argv[0];
const char = program.opts().char;
const content = await fs.readFile(path, "utf-8");
const countOfWordsContainingChar = content
.split(" ")
.filter((word) => word.includes(char))
.length;
console.log(countOfWordsContainingChar);
Think about what we’re doing in this code.
Try to list the high-level ideas. This means describing in English what we’re achieving, using sentences like like “Reading a file”.
We’re not trying to think about the programming concepts we’re doing here (we aren’t talking about things like “Assigning a variable” or “An if statement”). Think about what a non-programmer would want to understand about our program.
You may have slightly different answers, but the programme is doing roughly the following things:
- Parsing command line flags - writing down what flags we expect to be passed, and reading values for them based on the actual command line.
- Validating the flags (i.e. checking that exactly one path was passed).
- Reading a file.
- Splitting the content of the file up into words.
- Counting how many of the words contained a particular character.
- Printing the count.
These are the meaningful things we needed to do. To solve the same problem with Python, we’d still do all of these things.
We did some other things in our code to make it work. For example, we imported some modules. To write this code in Python, we might need modules or we might not. Importing modules isn’t one of our goals, it was just something we needed to do to help us.
We split up things we need to do into two categories: essential and accidental.
Essential means it is a core part of the problem. e.g. in order to count how many words are in a file, it is essential that we read the file.
Accidental means it isn’t what we care about doing, but we may need to do it anyway. e.g. importing the process
module isn’t essential to our problem, but we needed to do it anyway so we could report errors.
Think about real life
Imagine we want to post a parcel, so we take the bus to the post office.
Essential to our goal is getting the parcel to someone who will deliver it.
Accidental to this, we took the bus. There may be ways we could achieve our essential goal without getting the bus. Maybe we could walk or cycle to the post office. Maybe we could arrange for someone from the post office to come to our home and collect the parcel.
The accidental things we did were important - they helped us get our essential goal done. But we shouldn’t get too attached to the accidental things - maybe we will replace them later.
When we’re thinking about how we use different languages, it’s useful to think about what parts of our problem are essential (we’ll need to do them in any language), and which parts are accidental (it’s just something we had to do on the way to achieve our aim).
Whether we write the JavaScript someArray.length
or the Python len(some_array)
isn’t a big difference. Both lines do the same thing, they just express it differently.
π Converting JavaScript to Python
Learning Objectives
Parsing command line flags
In JavaScript, we wrote this code:
import { program } from "commander";
program
.name("count-containing-words")
.description("Counts words in a file that contain a particular character")
.option("-c, --char <char>", "The character to search for", "e");
program.parse();
const argv = program.args;
const path = argv[0];
const char = program.opts().char;
Which of the following are essential goals in this code, and which are accidental goals?
Drag essential/accidental from ππΎ onto each goal ππ½
-c
argument (defaulting to e
if they don’t).const
variable called argv
.program
from the commander
library.0
in the program.args
array.--help
implementation to help a user if they don’t know how to use our tool.program.name()
.If we want to work out how to do this in Python, we should focus on the essential goals. We may want to search for things like “Parse command line flags Python” and “Default argument values Python” because they get to the essential problems we’re trying to solve.
Searching Google for “Parse command line flags Python” brought us to the Python argparse documentation. The example code looks pretty similar to what we were doing in JavaScript. We can probably write something like:
import argparse
parser = argparse.ArgumentParser(
prog="count-containing-words",
description="Counts words in a file that contain a particular character",
)
parser.add_argument("-c", "--char", help="The character to search for", default="e")
parser.add_argument("path", help="The file to search")
args = parser.parse_args()
There are some differences here.
- With commander we were calling functions on a global
program
, whereas with argparse we construct a newArgumentParser
which we use. add_argument
takes separate parameters for the short (-c
) and long (--char
) forms of the option -commander
expected them in one string.- The Python version uses a lot of named arguments (e.g.
add_argument(...)
tookhelp=
,default=
), whereas the JavaScript version (option(...)
) used a lot of positional ones. - The Python version handles positional arguments itself as arguments with names (
path
), whereas the JavaScript version just gives us an array of positional arguments and leaves us to understand them.
Validating command line flags
In our JavaScript code, we needed to check that there was exactly one positional argument.
We don’t need to do this in our Python code. Because argparse
treats positional arguments as arguments, it actually already errors if we pass no positional arguments, or more than one.
So we can tick this essential requirement off our list. Different languages or libraries do things differently, and that’s ok!
π‘Tip
We don’t need to convert every line.
We’re trying to convert essential requirements.
Exercise
π Reading a file
In JavaScript we wrote:
import { promises as fs } from "node:fs";
const content = await fs.readFile(path, "utf-8");
If we search Google for “Read file Python”, we get an example which suggests we can write something like:
with open(args.path, "r") as f:
content = f.read()
Comparing these shows some interesting differences, particularly around
Scope
In Python, we made our content
variable in an indented block.
In JavaScript this wouldn’t have worked - in JavaScript when we declare a variable with const
it only exists in the scope where it was defined.
In Python, the content
variable can be used for the rest of the function it’s declared in. We call this
with
blocks
In Python, there’s this with
construct. Instead of writing f = open(args.path, "r")
we wrote with open(args.path, "r") as f:
.
This has two interesting effects:
One is that the variable we’re declaring (f
) doesn’t get hoisted - it only exists within the with
block.
The other is that at the end of the with
block, the file is closed. Not only does f
stop existing at the end of the block, but some code also gets run to clean up the resources f
was using.
π Splitting the content of the file up into words
In JavaScript we wrote:
content.split(" ")
Googling for “Python split string” suggests we can write exactly the same code!
content.split(" ")
π Counting words containing a character
In JavaScript we wrote:
content
.split(" ")
.filter((word) => word.includes(char))
.length
What JavaScript calls arrays, Python calls lists. Arrays and lists are basically the same.
Googling for “Python filter list” suggests there are two things we can use - a filter
function, or something called a “list comprehension”. Some people prefer one, other people prefer the other.
Let’s try out both approaches. We can do this in a standalone program, rather than in the whole word-counting program. This gives us a lot more control, and makes it easier for us to experiment.
Exercise
Create a new file, filter.py
. Start it with:
content = "this is a list of words"
char = "i"
filtered = TODO
print(filtered)
Now fill in the TODO. First, use a list comprehension. Run the file and make sure you get the expected output.
Next, replace your list comprehension with some code that calls the global function filter
. (filter
takes a function, and it may be useful to know that lambda
is a keyword for making an anonymous function in Python, similar to arrow functions in JavaScript). Run the file and make sure you get the expected output.
Now that we’ve learnt how to do the filtering, we can apply what we’ve learnt to the program we’re converting.
π Putting it all together
Instead of calling console.log
, in Python we call print
.
import argparse
parser = argparse.ArgumentParser(
prog="count-containing-words",
description="Counts words in a file that contain a particular character",
)
parser.add_argument("-c", "--char", help="The character to search for", default="e")
parser.add_argument("path", help="The file to search")
args = parser.parse_args()
with open(args.path, "r") as f:
content = f.read()
count_of_words_containing_char = len([word for word in content.split(" ") if args.char in word])
print(count_of_words_containing_char)
This looks similar to the JavaScript version. The shape is the same, but every line is a little bit different.
Some programming languages are very different, as different as Mandarin and English. But JavaScript and Python are, essentially, quite similar, like Spanish and Portuguese.
π Virtual environments
Learning Objectives
We often use libraries in Python.
Python handles dependencies differently from JavaScript, but they have similarities.
We’ve seen that in JavaScript we write down what dependencies we need in a package.json
file, and when we run npm install
they will get fetched into a folder called node_modules
.
In Python, we write down what dependencies we need in a file called requirements.txt
. It doesn’t contain JSON, it just contains a list of dependencies, one per line.
Virtual environments
To install the dependencies, we need to make something called a virtual environment, where they will get installed to.
Comparing virtual environments and node_modules
A virtual environment is like a node_modules
folder - it contains all of the dependencies that are installed into it.
When we run node
, node
automatically looks for a node_modules
folder to find dependencies in.
When we run python3
, python3
doesn’t automatically look for a virtual environment. We need to activate it - tell python3
which virtual environment we want to use.
There are trade-offs here:
node
uses a convention to locate installed dependencies - we don’t need to do anything except put the folder in the right place, and it will automatically get used.python3
uses configuration to locate installed dependencies - we need to configure which virtual environment it should use by activating it.
One of the benefits of using configuration is that we could have different virtual environments with different versions of the same dependencies (e.g. to test a version upgrade), and we can switch between them.
One of the drawbacks of using configuration is that we need to do the configuration - we need to explicitly activate the virtual environment for it to be used.
First we need to create the virtual environment. We do this by running python3 -m venv .venv
. This will create a virtual environment in a directory named .venv
. We could actually create it anywhere, e.g. we could run python3 -m venv /tmp/python_modules
to create it in a directory named /tmp/python_modules
. We tend to just use a directory called .venv
at the root of our project.
πNote
This is another example of a convention - you could name your virtual environment anything, but if we all agree to call it .venv
then we all know what this directory is when we see it.
It also means we can write scripts, or .gitignore
file entries assuming that’s where the virtual environment will be.
Next we need to activate the virtual environment. We do this by running . .venv/bin/activate
(yes, the command we’re running is .
with a path as an argument - the .
is important). This will only activate the virtual environment for the terminal window we’re in - if you’re using more than one terminal window, you’ll need to activate it in each of them.
Finally we need to install our dependencies into the virtual environment. We do this by running pip install -r requirements.txt
. This is saying “Please install all of the dependencies listed in requirements.txt
into the currently active virtual environment”.
After we’ve done this, we should be able to import
any installed dependencies into our Python code. This will work as long as we have activated the virtual environment in the terminal window where we’re running our program.
π Using Python dependencies
Learning Objectives
Let’s create a small program which uses a dependency.
We’re going to use the cowsay
library to make a program which outputs a picture of a cow saying something.
First let’s create a Python file which tries to use cowsay
:
πNote
It’s important that you don’t name your Python file the same as the name of a library you’re trying to import.
We can’t call our file cowsay.py
because we’re going to try to import cowsay
.
We can call it main.py
or hello.py
or cow.py
. Just not cowsay.py
.
In this example, we’ll call it cow.py
.
import cowsay
Run python3 cow.py
. It will trigger this error:
ModuleNotFoundError: No module named 'cowsay'
This is because we haven’t installed cowsay yet.
Installing our dependency
We will create a virtual environment, activate it, and install cowsay to it:
% python3 -m venv .venv
% . .venv/bin/activate
(.venv) % echo cowsay > requirements.txt
(.venv) % pip install -r requirements.txt
When we activate a virtual environment, its name gets shown before our terminal prompt. This is a useful reminder that we’re in a virtual environment!
Running our program
Run python3 cow.py
. We don’t get an error because we installed cowsay
into our active virtual environment.
Open a new terminal and run python3 cow.py
. You will get an error again! This is because we haven’t activated a virtual environment.
Run . .venv/bin/activate
and then python3 cow.py
. It will start working again.
Now we can finish our program - let’s have the cow say the arguments back to the user (joining together the arguments with spaces). We need to use a slice to skip the first argument, which is our program name:
import cowsay
import sys
cowsay.cow(" ".join(sys.argv[1:]))
Notice how import cowsay
and import sys
look the same - as long as we’ve installed dependencies, we can import
them just like we can import things that are built into Python.
(.venv) % python3 cow.py Hello friend
____________
| Hello friend |
============
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||