πŸ§‘πŸΎβ€πŸ’» prep

Overview description of the prep work for the sprint

πŸ’» Single-use data analysis programs

Learning Objectives

We’ve seen two different ways of analysing some input to produce an output.

Sometimes we can use, or combine, existing tools to get answers. For instance, we can count the words in a file with wc.

Sometimes we can write custom tools when existing tools don’t quite do what we want. For instance, we wrote a program to count specific words.

When we want to answer some question, sometimes it’s useful to write a program we may only use once. (Or we may re-use in the future.)

It’s not always obvious whether it’s easier to try to use tools that already exist, or to write our own.

πŸ’‘Tip

This is like in real life! Imagine if you had two differently sized bottles, and wanted to pour all of the liquid from one to the other without spilling any.

You can imagine making the perfect tube that has exactly the right size connector at each end to connect to the bottles.

Or maybe you already have a funnel that’s about the right size - not perfect, but close enough, and you can probably use.

But if you got a really wide or really narrow bottle, maybe that funnel wouldn’t be good enough and you would need to make a custom solution.

Sometimes the format of our data makes it easier or harder to use existing tools.

Let’s look at some sample data:

[
    {
        "name": "Daniel",
        "score": 100
    },
    {
        "name": "Kristina",
        "score": 120
    },
    {
        "name": "Iulia",
        "score": 95
    },
    {
        "name": "Aleks",
        "score": 190
    },
    {
        "name": "Daniel",
        "score": 80
    },
    {
        "name": "Fatima",
        "score": 110
    }
]

Here are a few questions we may want to answer about this data:

  1. What was the name of the first person to play the game?
  2. What was the name of the last person to play the game?
  3. Who had the highest score?
  4. The names of everyone who played the game directly after Daniel?

We can probably answer all of these questions with jq. We can also definitely write a program to answer all of these questions for us.

The first three are similarly hard to solve in jq or with a programming language.

The last one is quite hard to solve in jq.

Exercise

Solve all of the first three questions in both jq and your choice of JavaScript or Python.

Which approach do you think is quicker to write? Which is easier to think about?

Exercise

Solve the fourth question in your choice of JavaScript or Python.

Now spend no more than 20 minutes trying to solve it with jq.

What do you think makes this harder to solve in jq?

What heuristics 🧢 🧢 heuristics A heuristic is a guideline. It’s not an exact rule, but a β€œgood enough” idea to guess what approach you should use to answer a question. can you think of about when to use existing tools vs writing your own?

πŸ“– Comparing JavaScript and Python

Learning Objectives

JavaScript and Python have many things in common.

Most differences are “cosmetic”. Here are some examples of cosmetic differnces:

  • Some functions and operators have different names. But often there are functions/operators which do exactly the same thing.
  • JavaScript uses {} around blocks of code and we choose if we indent code. Python uses : and indentation is required.
  • In JavaScript we choose to name variables in camelCase, whereas in Python we choose to name variables in snake_case. In both langues we could do either; this is called a convention 🧢 🧢 convention A convention is something a group (maybe a team, or a company, or most users of a programming language) agree to do. It’s not quite a rule - things could work another way. But we agree one way we’ll all do it anyway.

    e.g. in Python you could name one variable firstName and another middle_name and another LASTname, but if everyone agrees to use first_name and middle_name and last_name it makes it a bit easier for everyone to read because they know what to expect.
    .

Recall our “count containing words” JavaScript code. Now think about what it would look like in Python.

import { program } from "commander";
import { promises as fs } from "node:fs";
import process from "node:process";

program
    .name("count-containing-words")
    .description("Counts words in a file that contain a particular character")
    .option("-c, --char <char>", "The character to search for", "e");

program.parse();

const argv = program.args;
if (argv.length != 1) {
    console.error(`Expected exactly 1 argument (a path) to be passed but got ${argv.length}.`);
    process.exit(1);
}
const path = argv[0];
const char = program.opts().char;

const content = await fs.readFile(path, "utf-8");
const countOfWordsContainingChar = content
  .split(" ")
  .filter((word) => word.includes(char))
  .length;
console.log(countOfWordsContainingChar);

Think about what we’re doing in this code.

Try to list the high-level ideas. This means describing in English what we’re achieving, using sentences like like “Reading a file”.

We’re not trying to think about the programming concepts we’re doing here (we aren’t talking about things like “Assigning a variable” or “An if statement”). Think about what a non-programmer would want to understand about our program.

You may have slightly different answers, but the programme is doing roughly the following things:

  • Parsing command line flags - writing down what flags we expect to be passed, and reading values for them based on the actual command line.
  • Validating the flags (i.e. checking that exactly one path was passed).
  • Reading a file.
  • Splitting the content of the file up into words.
  • Counting how many of the words contained a particular character.
  • Printing the count.

These are the meaningful things we needed to do. To solve the same problem with Python, we’d still do all of these things.

We did some other things in our code to make it work. For example, we imported some modules. To write this code in Python, we might need modules or we might not. Importing modules isn’t one of our goals, it was just something we needed to do to help us.

We split up things we need to do into two categories: essential and accidental.

Essential means it is a core part of the problem. e.g. in order to count how many words are in a file, it is essential that we read the file.

Accidental means it isn’t what we care about doing, but we may need to do it anyway. e.g. importing the process module isn’t essential to our problem, but we needed to do it anyway so we could report errors.

Think about real life

Imagine we want to post a parcel, so we take the bus to the post office.

Essential to our goal is getting the parcel to someone who will deliver it.

Accidental to this, we took the bus. There may be ways we could achieve our essential goal without getting the bus. Maybe we could walk or cycle to the post office. Maybe we could arrange for someone from the post office to come to our home and collect the parcel.

The accidental things we did were important - they helped us get our essential goal done. But we shouldn’t get too attached to the accidental things - maybe we will replace them later.

When we’re thinking about how we use different languages, it’s useful to think about what parts of our problem are essential (we’ll need to do them in any language), and which parts are accidental (it’s just something we had to do on the way to achieve our aim).

Whether we write the JavaScript someArray.length or the Python len(some_array) isn’t a big difference. Both lines do the same thing, they just express it differently.

πŸ“– Converting JavaScript to Python

Learning Objectives

Parsing command line flags

In JavaScript, we wrote this code:

import { program } from "commander";

program
    .name("count-containing-words")
    .description("Counts words in a file that contain a particular character")
    .option("-c, --char <char>", "The character to search for", "e");

program.parse();

const argv = program.args;
const path = argv[0];
const char = program.opts().char;

Which of the following are essential goals in this code, and which are accidental goals?

Drag essential/accidental from πŸ‘†πŸΎ onto each goal πŸ‘‡πŸ½

Allow a user to pass a -c argument (defaulting to e if they don’t).
Made a const variable called argv.
Import program from the commander library.
Allow a user to pass a path as a positional argument.
Looked up element 0 in the program.args array.
Supply a nice --help implementation to help a user if they don’t know how to use our tool.
Use the commander library.
Called the function program.name().

If we want to work out how to do this in Python, we should focus on the essential goals. We may want to search for things like “Parse command line flags Python” and “Default argument values Python” because they get to the essential problems we’re trying to solve.

Searching Google for “Parse command line flags Python” brought us to the Python argparse documentation. The example code looks pretty similar to what we were doing in JavaScript. We can probably write something like:

import argparse

parser = argparse.ArgumentParser(
    prog="count-containing-words",
    description="Counts words in a file that contain a particular character",
)

parser.add_argument("-c", "--char", help="The character to search for", default="e")
parser.add_argument("path", help="The file to search")

args = parser.parse_args()

There are some differences here.

  • With commander we were calling functions on a global program, whereas with argparse we construct a new ArgumentParser which we use.
  • add_argument takes separate parameters for the short (-c) and long (--char) forms of the option - commander expected them in one string.
  • The Python version uses a lot of named arguments (e.g. add_argument(...) took help=, default=), whereas the JavaScript version (option(...)) used a lot of positional ones.
  • The Python version handles positional arguments itself as arguments with names (path), whereas the JavaScript version just gives us an array of positional arguments and leaves us to understand them.

Validating command line flags

In our JavaScript code, we needed to check that there was exactly one positional argument.

We don’t need to do this in our Python code. Because argparse treats positional arguments as arguments, it actually already errors if we pass no positional arguments, or more than one.

So we can tick this essential requirement off our list. Different languages or libraries do things differently, and that’s ok!

πŸ’‘Tip

We don’t need to convert every line.

We’re trying to convert essential requirements.

Exercise

Identify all of the essential requirements from our JavaScript program, and finish implementing the Python version.

πŸ“– Reading a file

In JavaScript we wrote:

import { promises as fs } from "node:fs";

const content = await fs.readFile(path, "utf-8");

If we search Google for “Read file Python”, we get an example which suggests we can write something like:

with open(args.path, "r") as f:
    content = f.read()

Comparing these shows some interesting differences, particularly around scope 🧢 🧢 Scope Scope is where a variable can be accessed from. .

Scope

In Python, we made our content variable in an indented block.

In JavaScript this wouldn’t have worked - in JavaScript when we declare a variable with const it only exists in the scope where it was defined.

In Python, the content variable can be used for the rest of the function it’s declared in. We call this hoisting 🧢 🧢 hoisting Hoisting is where a variable is considered to exist at a broader scope than where it was declared. .

with blocks

In Python, there’s this with construct. Instead of writing f = open(args.path, "r") we wrote with open(args.path, "r") as f:.

This has two interesting effects:

One is that the variable we’re declaring (f) doesn’t get hoisted - it only exists within the with block.

The other is that at the end of the with block, the file is closed. Not only does f stop existing at the end of the block, but some code also gets run to clean up the resources f was using.

πŸ“– Splitting the content of the file up into words

In JavaScript we wrote:

content.split(" ")

Googling for “Python split string” suggests we can write exactly the same code!

content.split(" ")

πŸ“– Counting words containing a character

In JavaScript we wrote:

content
  .split(" ")
  .filter((word) => word.includes(char))
  .length

What JavaScript calls arrays, Python calls lists. Arrays and lists are basically the same.

Googling for “Python filter list” suggests there are two things we can use - a filter function, or something called a “list comprehension”. Some people prefer one, other people prefer the other.

Let’s try out both approaches. We can do this in a standalone program, rather than in the whole word-counting program. This gives us a lot more control, and makes it easier for us to experiment.

Exercise

Create a new file, filter.py. Start it with:

content = "this is a list of words"
char = "i"

filtered = TODO

print(filtered)

Now fill in the TODO. First, use a list comprehension. Run the file and make sure you get the expected output.

Next, replace your list comprehension with some code that calls the global function filter. (filter takes a function, and it may be useful to know that lambda is a keyword for making an anonymous function in Python, similar to arrow functions in JavaScript). Run the file and make sure you get the expected output.

Now that we’ve learnt how to do the filtering, we can apply what we’ve learnt to the program we’re converting.

πŸ“– Putting it all together

Instead of calling console.log, in Python we call print.

import argparse

parser = argparse.ArgumentParser(
    prog="count-containing-words",
    description="Counts words in a file that contain a particular character",
)

parser.add_argument("-c", "--char", help="The character to search for", default="e")
parser.add_argument("path", help="The file to search")

args = parser.parse_args()

with open(args.path, "r") as f:
    content = f.read()
count_of_words_containing_char = len([word for word in content.split(" ") if args.char in word])
print(count_of_words_containing_char)

This looks similar to the JavaScript version. The shape is the same, but every line is a little bit different.

Some programming languages are very different, as different as Mandarin and English. But JavaScript and Python are, essentially, quite similar, like Spanish and Portuguese.

πŸ“– Virtual environments

Learning Objectives

We often use libraries in Python.

Python handles dependencies differently from JavaScript, but they have similarities.

We’ve seen that in JavaScript we write down what dependencies we need in a package.json file, and when we run npm install they will get fetched into a folder called node_modules.

In Python, we write down what dependencies we need in a file called requirements.txt. It doesn’t contain JSON, it just contains a list of dependencies, one per line.

Virtual environments

To install the dependencies, we need to make something called a virtual environment, where they will get installed to.

Comparing virtual environments and node_modules

A virtual environment is like a node_modules folder - it contains all of the dependencies that are installed into it.

When we run node, node automatically looks for a node_modules folder to find dependencies in.

When we run python3, python3 doesn’t automatically look for a virtual environment. We need to activate it - tell python3 which virtual environment we want to use.

There are trade-offs here:

  • node uses a convention to locate installed dependencies - we don’t need to do anything except put the folder in the right place, and it will automatically get used.
  • python3 uses configuration to locate installed dependencies - we need to configure which virtual environment it should use by activating it.

One of the benefits of using configuration is that we could have different virtual environments with different versions of the same dependencies (e.g. to test a version upgrade), and we can switch between them.

One of the drawbacks of using configuration is that we need to do the configuration - we need to explicitly activate the virtual environment for it to be used.

First we need to create the virtual environment. We do this by running python3 -m venv .venv. This will create a virtual environment in a directory named .venv. We could actually create it anywhere, e.g. we could run python3 -m venv /tmp/python_modules to create it in a directory named /tmp/python_modules. We tend to just use a directory called .venv at the root of our project.

πŸ“Note

This is another example of a convention - you could name your virtual environment anything, but if we all agree to call it .venv then we all know what this directory is when we see it.

It also means we can write scripts, or .gitignore file entries assuming that’s where the virtual environment will be.

Next we need to activate the virtual environment. We do this by running . .venv/bin/activate (yes, the command we’re running is . with a path as an argument - the . is important). This will only activate the virtual environment for the terminal window we’re in - if you’re using more than one terminal window, you’ll need to activate it in each of them.

Finally we need to install our dependencies into the virtual environment. We do this by running pip install -r requirements.txt. This is saying “Please install all of the dependencies listed in requirements.txt into the currently active virtual environment”.

After we’ve done this, we should be able to import any installed dependencies into our Python code. This will work as long as we have activated the virtual environment in the terminal window where we’re running our program.

πŸ“– Using Python dependencies

Learning Objectives

Let’s create a small program which uses a dependency.

We’re going to use the cowsay library to make a program which outputs a picture of a cow saying something.

First let’s create a Python file which tries to use cowsay:

πŸ“Note

It’s important that you don’t name your Python file the same as the name of a library you’re trying to import.

We can’t call our file cowsay.py because we’re going to try to import cowsay.

We can call it main.py or hello.py or cow.py. Just not cowsay.py.

In this example, we’ll call it cow.py.

import cowsay

Run python3 cow.py. It will trigger this error:

ModuleNotFoundError: No module named 'cowsay'

This is because we haven’t installed cowsay yet.

Installing our dependency

We will create a virtual environment, activate it, and install cowsay to it:

% python3 -m venv .venv
% . .venv/bin/activate
(.venv) % echo cowsay > requirements.txt
(.venv) % pip install -r requirements.txt

When we activate a virtual environment, its name gets shown before our terminal prompt. This is a useful reminder that we’re in a virtual environment!

Running our program

Run python3 cow.py. We don’t get an error because we installed cowsay into our active virtual environment.

Open a new terminal and run python3 cow.py. You will get an error again! This is because we haven’t activated a virtual environment.

Run . .venv/bin/activate and then python3 cow.py. It will start working again.

Now we can finish our program - let’s have the cow say the arguments back to the user (joining together the arguments with spaces). We need to use a slice to skip the first argument, which is our program name:

import cowsay
import sys

cowsay.cow(" ".join(sys.argv[1:]))

Notice how import cowsay and import sys look the same - as long as we’ve installed dependencies, we can import them just like we can import things that are built into Python.

(.venv) % python3 cow.py Hello friend
  ____________
| Hello friend |
  ============
            \
             \
               ^__^
               (oo)\_______
               (__)\       )\/\
                   ||----w |
                   ||     ||