Using Python to convert a CSV file to SQL database

For a project that I’m planning I’m going to need to make an API that can look up city names and return some coordinates in latitude and longitudinal form. Here’s how I did it for you to follow along and (maybe) learn something in the process assuming you haven’t already done this a million times in another capacity 🙂

Begin by grabbing the Basic edition of the world cities and towns database from the Simple Maps website:

https://simplemaps.com/data/world-cities

Now, using an editor of your choice create a project folder with the following form and place the downloaded worldcities.csv inside it along with a new python file:

project-folder/
├─ converter.py
├─ worldcities.csv

Open up converter.py and import the following:

import csv
import sqlite3

We need to read the worldcities.csv into memory, I did this defining a new function as follows:

def readCSV(path):
        with open(path, 'r') as csv_file:
            csv_reader = csv.reader(csv_file, delimiter=',')

That’s okay to start with, but how do we cast it into a list object? It turns out that files loaded with csv.reader method are iterables, that means they are amenable to the python “for loop” syntax, which will return a list with each cell corresponding to a column of a given row at any given point of the loop.

def readCSV(path):

        with open(path, 'r') as csv_file:
            csv_reader = csv.reader(csv_file, delimiter=',')
            
            for row in csv_reader:
                  print(row)

readCSV('worldcities.csv')

This will flood your terminal with about 44,000 lines of data! Not exactly what we want but it proves we’re on the right track.

Let’s make the function do something useful, like cast the data to a list and then return the list:

def readCSV(path):
        new_list = []
        with open(path, 'r') as csv_file:
            csv_reader = csv.reader(csv_file, delimiter=',')
            
            for row in csv_reader:
                  new_list.append(row)
        return new_list

We can test it works by printing the first entry in the list:

result = readCSV('worldcities.csv')
print(result[1])

For me this produces the following in the terminal:

['Tokyo', 'Tokyo', '35.6897', '139.6922', 'Japan', 'JP', 'JPN', 'Tōkyō', 'primary', '37732000', '1392685764']

So far so good. But not all that data is useful for me. The Simple Maps documentation structures the data into this schematic:

citycity_asciilatlongcountryiso2iso3admin_namepopulationid

I only need the city names (both the full name and the ascii approximation minus diacritical marks), the coordinates and iso3 country code. We can do this by selecting only the data corresponding to columns we want. Here’s a new function:

def readCSV(path):
        
        """This will only work for the basic version of the 
        world cities database csv"""

        filtered_cities = []

        with open(path, 'r') as csv_file:
            csv_reader = csv.reader(csv_file, delimiter=',')

            iterationID = 0
            for row in csv_reader:
                  new_row=[]
                  #adding id, city, latitude, longitude, country and iso3 country code to new_row
                  new_row.extend((iterationID, row[0], row[1], row[2], row[3], row[4], row[6]))
                  filtered_cities.append(new_row)
                  iterationID += 1

        return filtered_cities

This uses the list method extend in order to conveniently add multiple values in one go to a list. IterationID is something that will be used later in combination with the SQL database.

Now begins the second stage of the operation: creating an SQL database for this data. Add a new python file with the name database_create.py to your project:

project-folder/
├─ converter.py
├─ worldcities.csv
├─ database_create.py

Open up database_create.py and add the following:

import sqlite3

DB_Connection = sqlite3.connect('cities.db')

DB_Connection.execute('''
    CREATE TABLE citiesDB
        (ID INT PRIMARY KEY NOT NULL,
        city TEXT NOT NULL,
        city_ascii TEXT,
        latitude FLOAT NOT NULL,
        longitude FLOAT NOT NULL,
        country TEXT NOT NULL,
        code TEXT NOT NULL);
    ''')

DB_Connection.close()

Running this script will automatically create a new database called cities.db and create a table within it that can hold the data we will assign to it in the next steps. If you’ve done an introduction to SQL this is all standard stuff, however if it is all a mystery to you then I recommend doing Learn SQL with Codeacademy.

Now you’re ready to add the data to your table, return to your first python file and after using the function created earlier to create a filtered list of cities, add the following commands:

Filtered_World_Cities = readCSV('worldcities.csv')

# Adding data to database
DB_Connection = sqlite3.connect('cities.db')
DB_Cursor = DB_Connection.cursor()
DB_Cursor.executemany('INSERT INTO citiesDB VALUES(?, ?, ?, ?, ?, ?, ?);', Filtered_World_Cities)
DB_Connection.commit()
DB_Connection.close()

In a split second the operation will conclude and you will have a database filled with the city name and location data ready for lookup.

sql table