Append CSV Files Together With Python (Skipping Headers)

Authors Note: This code has been updated and improved upon. I won’t remove this post, but I highly suggest you use the code from the improved code in the post found here.

Last night I had a project where the client needed 30 .csv files compiled into a single sheet. I figured that I could concatenate them together using Python real quick. Python makes tasks like this fairly trivial, but I like to experiment a bit so rather than just write a loop appending each file I wrote a class to do it with some methods. It’s reusable for anyone who has a similar task. I had wanted to add a utility which would compare the headers of 1 file with the headers of the previous file but I was on the clock so I’ll save that bit for another day.

Python has a csv module but I don’t use it in this class yet. The Python csv module is really just a utility in itself to make working with CSV text files easier. Since there is no difference between a regular text file and a csv file and I know the format of the files being fed to the class I simply used Pythons open method to grab a handle to my files. I did import the csv python module though to remind me to use it when I extend this in the future.

Authors Note: This code has been updated and improved upon. I won’t remove this post, but I highly suggest you use the code from the improved code in the post found here.

Last night I had a project where the client needed 30 .csv files compiled into a single sheet. I figured that I could concatenate them together using Python real quick. Python makes tasks like this fairly trivial, but I like to experiment a bit so rather than just write a loop appending each file I wrote a class to do it with some methods. It’s reusable for anyone who has a similar task. I had wanted to add a utility which would compare the headers of 1 file with the headers of the previous file but I was on the clock so I’ll save that bit for another day.

Python has a csv module but I don’t use it in this class yet. The Python csv module is really just a utility in itself to make working with CSV text files easier. Since there is no difference between a regular text file and a csv file and I know the format of the files being fed to the class I simply used Pythons open method to grab a handle to my files. I did import the csv python module though to remind me to use it when I extend this in the future.


Python class to append csv files together

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
"""
Merge together .csv files. I'm creating this because I have 30 .csv files which need to be concatenated
together. This uses the headers from 1 file and then will concate w/o headers 1 through n such as :
"customer-data.csv", "customer-data (1).csv" ... "customer-data (n).csv"
"""
# Todo: make use of the csv module and check headers from each file to make sure data lines up.
# Todo: allow explicit filenames to be passed in as well as lists of files
import csv
class CSV_Monster:
current_filename = ""
file_h = None
i = None
with_headers = False
def __init__(self, base_name, last_i, first_i=None, save_name="final-csv-output.csv"):
"""
Prepare to write the files. Open the save name and wait for run() to be called on the
utility
:base_name str: The common file path and name of csv files to parse
:last_i int: The highest integer found in file names. such as base/name-csv-file (30).csv
:first_i int|None: My files start without a number so None is default.
:save_name str: The base name of the final output .csv
"""
self.base_name = base_name
self.save_name = save_name
self.last_i = last_i
self.i = first_i
# --- Open up the output file for writing
print("Preparing to write to file: %s" % (self.save_name,))
self.output_file = open(self.save_name, 'w+')
def open(self, filename=None):
"""Open the self.current_filename for reading As of now this method is overkill,
there is no way to explicitly tell the class to append a file which isn't named
using the self.base_name convention. But I would like to extend the object to
be able to allow the user to pass in specific names.
:filename str: Pass filename to explicitly open. Default - self.current_filename
:{return} file_handler:
"""
if filename is not None:
self.current_filename = filename
self.file_h = open(self.current_filename, 'r')
print "opening up file: %s" % (self.current_filename,)
return self.file_h
def next(self):
"""Setup the next file to be read then return file handler by calling self.open()
#Todo: next() should be able to consume a list as well.
:{return}:file_handler:
"""
if self.i > self.last_i:
return False
self.current_filename = self.base_name
if self.i is None:
self.i = 0
self.with_headers = True
else:
self.current_filename += " ("+str(self.i)+")"
self.with_headers = False
# complete the filenamme for the current file
self.current_filename += ".csv"
# incriment for the next file
self.i = self.i + 1
return self.open()
def run(self):
"""Roll through each file and append lines to the output."""
while self.next():
# should skip the headers unless self.with_headers is True
if not self.with_headers:
self.file_h.next()
# Append each line in current file to the output file
for line in self.file_h:
self.output_file.write(line)
self.file_h.close()
print("Closing the main file... completing operations.")
self.output_file.close()
def check_headers(self):
# TODO: Add header checks
pass
# This will concate w/o headers customer-data.csv, customer-data (1).csv ... customer-data (30).csv
csv_monster = CSV_Monster("files/customer-data", 30)
csv_monster.run()

Here is a link to the Gist on Github.

A few things I would like to do is add header checking which can notify the user if the headers from 1 file don’t match up with the headers from the next. In cases like that it would be good for the user to be able to set options of what the class should do in that case. It could either 1. Terminate, 2. Ignore, 3. Append with headers, 4. Write to another file. With option 4 it could be possible to pass a folder with 100 files into the class and it could append all files with common headers together. Another thought would be to add max file (row) size so the user could handle overflow.