How to import IMDB Database using imdbpy
In this post, I am going to show you how you can download and import tsv.gz
files from IMDB.
Start your server
If you don't have a PostgreSQL server running on your machine you can run using Docker:
$ docker run -v imdb:/var/lib/postgresql/data --name imdbpg --rm postgres
Basic dependencies
We will need python3
, git
, and wget
.
If you are using Docker before installing the dependencies run:
$ docker exec -it imdbpg bash
# apt-get update
And install the dependencies using apt-get
:
# apt-get --yes install python3-dev python3-pip wget git postgresql-server-dev-all
Step 1: Install the required tools
# pip3 install git+https://github.com/alberanid/imdbpy psycopg2
Step 2: Download the tsv.gz files
You can use a regex to download only the files which name contains tsv.gz
:
# wget -A "*tsv.gz" --mirror "https://datasets.imdbws.com/"
Step 3: Import the data
# su -c "createdb imdb" postgres
# s32imdbpy.py --verbose /datasets.imdbws.com/ postgres://postgres@localhost/imdb
That all.