Replace comma with semicolon when creating Csv Dataframe - python

I have a code that creates a csv file, when I first open it I everything is in one column so I have to do the usual
Go to Data and do the following. The data is then spplited into columns.
I work with Office 365, and recently I was told that if I change the commas with semicolons then when I open the newly created file Csv file, Excel will automatically open the file already separated into columns.
I’m asking for some advice here, since having to do this process for every created Csv file is really time consuming.
Looking for a way to alter my code so it does this automatically maybe instead of splitting columns with commas, do it with semicolons in this case. Just to try if this works out.
with open('created.csv', 'w', newline='') as f:
writer = csv.writer(f)
[1]: https://i.stack.imgur.com/OtxO4.png

If you already want to transform an existing file you can do it like that:
with open('created.csv', 'r', encoding='utf-8') as f_in, open("outfile.csv", 'w') as f_out:
for line in f_in:
line = line.split(",")
line = ";".join(line)
f_out.write(line)
In case you have already a dataframe you can do it like #jezrael said in the comment with:
df.to_csv('created.csv', sep=';')
As mention in the comment you are already using the csv module to write your file. You have to change this line in your code:
writer = csv.writer(f)
to
writer = csv.writer(f, delimiter=';')
As for me if I open a csv splitted with "," I have to that thing you described in your question. But if I open a csv splitted with ";" it's already in the right columns.
This is (for Windows user at least) dependent on your region settings. This can be different for everyone dependent on your language settings.
You can check them here and also change it if you want:
https://www.itsupportguides.com/knowledge-base/office-2013/excel-20132016-how-to-change-csv-delimiter-character/

Related

I need to edit a python script to remove quotes from a csv, then write back to that same csv file, quotes removed

I have seen similar posts to this but they all seem to be print statements (viewing the cleaned data) rather than overwriting the original csv with the cleaned data so I am stuck. When I tried to write back to the csv myself, it just deleted everything in the file. Here is the format of the csv:
30;"unemployed";"married";"primary";"no";1787;"no";"no";"cellular";19;"oct";79;1;-1;0;"unknown";"no"
33;"services";"married";"secondary";"no";4747;"yes";"cellular";11;"may";110;1;339;2;"failure";"no"
35;"management";"single";"tertiary";"no";1470;"yes";"no";"cellular";12;"apr"185;1;330;1;"failure";"no"
It is delimited by semicolons, which is fine, but all text is wrapped in quotes and I only want to remove the quotes and write back to the file. Here is the code I reverted back to that successfully reads the file, removes all quotes, and then prints the results:
import csv
f = open("bank.csv", 'r')
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
print(' '.join(row))
finally:
f.close()
Any help on properly writing back to the csv would be appreciated, thanks!
See here: Python CSV: Remove quotes from value
I've done this basically two different ways, depending on the size of the csv.
You can read the entire csv into a python object (list), do some things and then
overwrite the other existing file with the cleaned version
As in the link above, you can use one reader and one writer, Create a new file, and write line by-line as you clean the input from the csv reader, delete the original csv and rename the new one to replace the old file.
In my opinion option #2 is vastly preferable as it avoids the possibility of data loss if your script has an error part way through writing. It also will have lower memory usage.
Finally: It may be possible to open a file as read/write, and iterate line-by-line overwriting as you go: But that will leave you open to half of your file having quotes, and half not if your script crashes part way through.
You could do something like this. Read it in, and write using quoting=csv.QUOTE_NONE
import csv
f = open("bank.csv", 'r')
inputCSV = []
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
inputCSV.append(row)
finally:
f.close()
with open('bank.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=';')
for row in inputCSV:
csvwriter.writerow(row)

Reading and Writing into CSV file at the same time

I wanted to read some input from the csv file and then modify the input and replace it with the new value. For this purpose, I first read the value but then I'm stuck at this point as I want to modify all the values present in the file.
So is it possible to open the file in r mode in one for loop and then immediately in w mode in another loop to enter the modified data?
If there is a simpler way to do this please help me out
Thank you.
Yes, you can open the same file in different modes in the same program. Just be sure not to do it at the same time. For example, this is perfectly valid:
with open("data.csv") as f:
# read data into a data structure (list, dictionary, etc.)
# process lines here if you can do it line by line
# process data here as needed (replacing your values etc.)
# now open the same filename again for writing
# the main thing is that the file has been previously closed
# (after the previous `with` block finishes, python will auto close the file)
with open("data.csv", "w") as f:
# write to f here
As others have pointed out in the comments, reading and writing on the same file handle at the same time is generally a bad idea and won't work as you expect (unless for some very specific use case).
You can do open("data.csv", "rw"), this allows you to read and write at the same time.
Just like others have mentioned, modifying the same file as both input and output without any backup method is such a terrible idea, especially in a condensed file like most .csv files, which is normally more complicated than a single .Txt based file, but if you insisted you can do with the following:
import csv
file path = 'some.csv'
with open('some.csv', 'rw', newline='') as csvfile:
read_file = csv.reader(csvfile)
write_file = csv.writer(csvfile)
Note that code above will trigger an error with a message ValueError: must have exactly one of create/read/write/append mode.
For safety, I preferred to split it into two different files
import csv
in_path = 'some.csv'
out_path = 'Out.csv'
with open(in_path, 'r', newline='') as inputFile, open(out_path, 'w', newline='') as writerFile:
read_file = csv.reader(inputFile)
write_file = csv.writer(writerFile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in read_file:
# your modifying input data code here
........

add row to a csv file with python

I have two csv files that each have 2 columns, one of them being the date. I want to add the second column of the second file to the first file, resulting a file with 3 columns.
I did it by creating a new file and appending the data to it this way:
import csv
coinsfile = open('total-bitcoins.csv', newline='')
pricesfile = open('market-price.csv', newline='')
coins = csv.reader(coinsfile, delimiter=',')
prices = csv.reader(pricesfile, delimiter=',')
with open('result.csv', 'w') as res:
for coin_row, price_row in zip(coins, prices):
line = str(coin_row[0]) + ',' + str(coin_row[1]) + ',' + str(price_row[1])
res.append(line)
The code runs without any errors but the result is a csv file which is completely empty.
Where am I making the mistake, or is there a better way to do this job?
res is a file handle, so the append method doesn't apply to it. So there's an attribute error while the output file is opened, which results in an empty output file (or, yes, one of the input files is empty, ending zip immediately, but this answer explains how to fix the next issues)
A quickfix would be:
res.write(line+"\n")
but the best way would be to flatten the result of zip and feed it to a csv.writer object (using a comprehension to generate each row by addition of both input csv rows)
import csv
with open('result.csv', 'w', newline="") as res, open('total-bitcoins.csv', newline='') as coinsfile, open('market-price.csv', newline='') as pricesfile:
coins = csv.reader(coinsfile)
prices = csv.reader(pricesfile)
cw = csv.writer(res)
cw.writerows(coin_rows+price_row for coin_row, price_row in zip(coins, prices))
note that newline="" is required when writing your files (Python 3) to avoid the infamous blank line "bug" when running windows
I have added the input files in the with statement to ensure that the inputs are closed when exiting it. And also removed the delimiter parameter as comma is the default.
The easiest way to satisfy this need would be using a library like pandas. Using pandas, adding a column to an existing file would be as easy as loading the file into a dataframe, and adding the required column to it in just one line.
Adding can be done by mere assignment, or through join/merge methods.

Merging several csv files and storing the file names as a variable - Python

I am trying to append several csv files into a single csv file using python while adding the file name (or, even better, a sub-string of the file name) as a new variable. All files have headers. The following script does the trick of merging the files, but does not cover the file name as variable issue:
import glob
filenames=glob.glob("/filepath/*.csv")
outputfile=open("out.csv","a")
for line in open(str(filenames[1])):
outputfile.write(line)
for i in range(1,len(filenames)):
f = open(str(filenames[i]))
f.next()
for line in f:
outputfile.write(line)
outputfile.close()
I was wondering if there are any good suggestions. I have about 25k small size csv files (less than 100KB each).
You can use Python's csv module to parse the CSV files for you, and to format the output. Example code (untested):
import csv
with open(output_filename, "wb") as outfile:
writer = None
for input_filename in filenames:
with open(input_filename, "rb") as infile:
reader = csv.DictReader(infile)
if writer is None:
field_names = ["Filename"] + reader.fieldnames
writer = csv.DictWriter(outfile, field_names)
writer.writeheader()
for row in reader:
row["Filename"] = input_filename
writer.writerow(row)
A few notes:
Always use with to open files. This makes sure they will get closed again when you are done with them. Your code doesn't correctly close the input files.
CSV files should be opened in binary mode.
Indices start at 0 in Python. Your code skips the first file, and includes the lines from the second file twice. If you just want to iterate over a list, you don't need to bother with indices in Python. Simply use for x in my_list instead.
Simple changes will achieve what you want:
For the first line
outputfile.write(line) -> outputfile.write(line+',file')
and later
outputfile.write(line+','+filenames[i])

Trying to import a list of words using csv (Python 2.7)

import csv, Tkinter
with open('most_common_words.csv') as csv_file: # Opens the file in a 'closure' so that when it's finished it's automatically closed"
csv_reader = csv.reader(csv_file) # Create a csv reader instance
for row in csv_reader: # Read each line in the csv file into 'row' as a list
print row[0] # Print the first item in the list
I'm trying to import this list of most common words using csv. It continues to give me the same error
for row in csv_reader: # Read each line in the csv file into 'row' as a list
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
I've tried a couple different ways to do it as well, but they didn't work either. Any suggestions?
Also, where does this file need to be saved? Is it okay just being in the same folder as the program?
You should always open a CSV file in binary mode (Python 2) or universal newline mode (Python 3). Also, make sure that the delimiters and quote characters are , and ", or you'll need to specify otherwise:
with open('most_common_words.csv', 'rb') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';', quotechar='"') # for EU CSV
You can save the file in the same folder as your program. If you don't, you can provide the correct path to open() as well. Be sure to use raw strings if you're on Windows, otherwise the backslashes may trick you: open(r"C:\Python27\data\table.csv")
It seems you have a file with one column as you say here:
It is a simple list of words. When I open it up, it opens into Excel
with one column and 500 rows of 500 different words.
If so, you don't need the csv module at all:
with open('most_common_words.csv') as f:
rows = list(f)
Note in this case, each item of the list will have the newline appended to it, so if your file is:
apple
dog
cat
rows will be ['apple\n', 'dog\n', 'cat\n']
If you want to strip the end of line, then you can do this:
with open('most_common_words.csv') as f:
rows = list(i.rstrip() for i in f)

Categories