How to import IMDB Database using imdbpy

How to import the tvs.gz files from IMDB to PostgreSQL using imdbpy

#dev, #docker

In this post, I am going to show you how you can download and import tsv.gz files from IMDB.

Start your server

If you don’t have a PostgreSQL server running on your machine you can run using Docker:

$ docker run -v imdb:/var/lib/postgresql/data --name imdbpg --rm postgres

Basic dependencies

We will need python3, git, and wget.

If you are using Docker before installing the dependencies run:

$ docker exec -it imdbpg bash
# apt-get update

And install the dependencies using apt-get:

# apt-get --yes install python3-dev python3-pip wget git postgresql-server-dev-all

Step 1: Install the required tools

# pip3 install git+https://github.com/alberanid/imdbpy psycopg2

Step 2: Download the tsv.gz files

You can use a regex to download only the files which name contains tsv.gz:

# wget -A "*tsv.gz" --mirror "https://datasets.imdbws.com/"

Step 3: Import the data

# su -c "createdb imdb" postgres
# s32imdbpy.py --verbose /datasets.imdbws.com/ postgres://postgres@localhost/imdb

That all.

Posts in this series

References