csv.writer writing each character of word in separate column/cell - python

Objective: To extract the text from the anchor tag inside all lines in models and put it in a csv.
I'm trying this code:
with open('Sprint_data.csv', 'ab') as csvfile:
spamwriter = csv.writer(csvfile)
models = soup.find_all('li' , {"class" : "phoneListing"})
for model in models:
model_name = unicode(u' '.join(model.a.stripped_strings)).encode('utf8').strip()
spamwriter.writerow(unicode(u' '.join(model.a.stripped_strings)).encode('utf8').strip())
It's working fine except each cell in the csv contains only one character.
Like this:
| S | A | M | S | U | N | G |
Instead of:
|SAMSUNG|
Of course I'm missing something. But what?

.writerow() requires a sequence ('', (), []) and places each index in it's own column of the row, sequentially. If your desired string is not an item in a sequence, writerow() will iterate over each letter in your string and each will be written to your CSV in a separate cell.
after you import csv
If this is your list:
myList = ['Diamond', 'Sierra', 'Crystal', 'Bridget', 'Chastity', 'Jasmyn', 'Misty', 'Angel', 'Dakota', 'Asia', 'Desiree', 'Monique', 'Tatiana']
listFile = open('Names.csv', 'wb')
writer = csv.writer(listFile)
for item in myList:
writer.writerow(item)
The above script will produce the following CSV:
Names.csv
D,i,a,m,o,n,d
S,i,e,r,r,a
C,r,y,s,t,a,l
B,r,i,d,g,e,t
C,h,a,s,t,i,t,y
J,a,s,m,y,n
M,i,s,t,y
A,n,g,e,l
D,a,k,o,t,a
A,s,i,a
D,e,s,i,r,e,e
M,o,n,i,q,u,e
T,a,t,i,a,n,a
If you want each name in it's own cell, the solution is to simply place your string (item) in a sequence. Here I use square brackets []. :
listFile2 = open('Names2.csv', 'wb')
writer2 = csv.writer(listFile2)
for item in myList:
writer2.writerow([item])
The script with .writerow([item]) produces the desired results:
Names2.csv
Diamond
Sierra
Crystal
Bridget
Chastity
Jasmyn
Misty
Angel
Dakota
Asia
Desiree
Monique
Tatiana

writerow accepts a sequence. You're giving it a single string, so it's treating that as a sequence, and strings act like sequences of characters.
What else do you want in this row? Nothing? If so, make it a list of one item:
spamwriter.writerow([u' '.join(model.a.stripped_strings).encode('utf8').strip()])
(By the way, the unicode() call is completely unnecessary since you're already joining with a unicode delimiter.)

This is usually the solution I use:
import csv
with open("output.csv", 'w', newline= '') as output:
wr = csv.writer(output, dialect='excel')
for element in list_of_things:
wr.writerow([element])
output.close()
This should provide you with an output of all your list elements in a single column rather than a single row.
Key points here is to iterate over the list and use '[list]' to avoid the csvwriter sequencing issues.
Hope this is of use!

Just surround it with a list sign (i.e [])
writer.writerow([str(one_column_value)])

Related

Same python code block gives different outputs at different time

I want to create a word dictionary. The dictionary looks like
words_meanings= {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Output: rekindle , pesky, verge, maneuver, accountability
Here rekindle , pesky, verge, maneuver, accountability they are the keys and relight, annoying, border, activity, responsibility they are the values.
Now I want to create a csv file and my code will take input from the file.
The file looks like
rekindle | pesky | verge | maneuver | accountability
relight | annoying| border| activity | responsibility
So far I use this code to load the file and read data from it.
from google.colab import files
uploaded = files.upload()
import pandas as pd
data = pd.read_csv("words.csv")
data.head()
import csv
reader = csv.DictReader(open("words.csv", 'r'))
words_meanings = []
for line in reader:
words_meanings.append(line)
print(words_meanings)
This is the output of print(words_meanings)
[OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
It looks very odd to me.
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Now I create an empty list and want to append only key values. But the output is [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
I am confused. As per the first code block it only included keys but now it includes both keys and their values. How can I overcome this situation?
I would suggest that you format your csv with your key and value on the same row. Like this
rekindle,relight
pesky,annoying
verge,border
This way the following code will work.
words_meanings = {}
with open(file_name, 'r') as file:
for line in file.readlines():
key, value = line.split(",")
word_meanings[key] = value.rstrip("\n")
if you want a list of the keys:
list_of_keys = list(word_meanings.keys())
To add keys and values to the file:
def add_values(key:str, value:str, file_name:str):
with open(file_name, 'a') as file:
file.writelines(f"\n{key},{value}")
key = input("Input the key you want to save: ")
value = input(f"Input the value you want to save to {key}:")
add_values(key, value, file_name)```
You run the same block of code but you use it with different objects and this gives different results.
First you use normal dictionary (check type(words_meanings))
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
and for-loop gives you keys from this dictionary
You could get the same with
keys_letter = list(words_meanings.keys())
or even
keys_letter = list(words_meanings)
Later you use list with single dictionary inside this list (check type(words_meanings))
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
and for-loop gives you elements from this list, not keys from dictionary which is inside this list. So you move full dictionary from one list to another.
You could get the same with
keys_letter = words_meanings.copy()
or even the same
keys_letter = list(words_meanings)
from collections import OrderedDict
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = list(words_meanings.keys())
keys_letter = list(words_meanings)
print(keys_letter)
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = words_meanings.copy()
keys_letter = list(words_meanings)
print(keys_letter)
The default field separator for the csv module is a comma. Your CSV file uses the pipe or bar symbol |, and the fields also seem to be fixed width. So, you need to specify | as the delimiter to use when creating the CSV reader.
Also, your CSV file is encoded as Big-endian UTF-16 Unicode text (UTF-16-BE). The file contains a byte-order-mark (BOM) but Python is not stripping it off, so you will notice the string '\ufeffrekindle' contains the FEFF UTF-16-BE BOM. That can be dealt with by specifying encoding='utf16' when you open the file.
import csv
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(f, delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
Running this on your CSV file produces this:
{'rekindle ': 'relight ', 'pesky ': 'annoying', 'verge ': 'border', 'maneuver ': 'activity ', 'accountability': 'responsibility'}
Notice that there is trailing whitespace in the key and values. skipinitialspace=True removed the leading whitespace, but there is no option to remove the trailing whitespace. That can be fixed by exporting the CSV file from Excel without specifying a field width. If that can't be done, then it can be fixed by preprocessing the file using a generator:
import csv
def preprocess_csv(f, delimiter=','):
# assumes that fields can not contain embedded new lines
for line in f:
yield delimiter.join(field.strip() for field in line.split(delimiter))
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(preprocess_csv(f, '|'), delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
which now outputs the stripped keys and values:
{'rekindle': 'relight', 'pesky': 'annoying', 'verge': 'border', 'maneuver': 'activity', 'accountability': 'responsibility'}
As I found that no one able to help me with the answer. Finally, I post the answer here. Hope this will help other.
import csv
file_name="words.csv"
words_meanings = {}
with open(file_name, newline='', encoding='utf-8-sig') as file:
for line in file.readlines():
key, value = line.split(",")
words_meanings[key] = value.rstrip("\n")
print(words_meanings)
This is the code to transfer a csv to a dictionary. Enjoy!!!

IndexError: tuple index out of range in showing columns of CSV

Guys i am new in python and i dont know how to solve the problem. Thanks for the help guys.
import csv
with open("ict.csv", 'r') as csvFile:
csvRead = csv.reader(csvFile)
print(csvRead)
# for line in csvRead :
# print(line)
header = csvFile.readline().strip().split(',')
print(header)
entries = []
for line in csvFile:
parts = line.strip().split(',')
row = dict()
for i, h in enumerate(header):
row[h] = parts[i]
# print(row)
entries.append(row)
entries.sort(key= lambda r: r['Gen. Ave.'])
for e in entries [:12]:
print('{0}Student No.,Gen. Ave. {10:,}'.format(
e['Student No.'],e['Gen. Ave.']
))
Student No. | Gen. Ave. | Program
1 | 90.5 | CS
The problem, as pointed out in the comments, is that one of your format specifiers - {10:,} - is wrong. The initial 10 is telling Python to use the 10th argument provided to format, but you have only provided two, hence the IndexError.
You actually want to provide the second element of the tuple, at index 1, so change to {10:,} to {1:,}. Also, the comma (,) operator in the format string - telling the formatter to use a comma as the thousand separator - can only be used on numeric inputs. The value of entries['Gen. Ave.'] is a string, because it's been read from a file, so you need to convert it to a number. This code should work:
for e in entries [:12]:
print('{0}Student No.,Gen. Ave. {1:,}'.format(
e['Student No.'], int(e['Gen. Ave.'])
))
However the position specifiers in your format strings can be removed completely, because Python will use apply the arguments to format in the order that they are written, so you can have:
for e in entries [:12]:
print('{}Student No.,Gen. Ave. {:,}'.format(
e['Student No.'], int(e['Gen. Ave.'])
))
Finally, you can avoid manually building dicts for each row in your csv by using the csv module's DictReader class, which will create a dict for each row as it's read, leaving your code looking like this:
with open("ict.csv", 'r') as csvFile:
csvRead = csv.DictReader(csvFile)
entries = []
for line in csvRead:
entries.append(line)
entries.sort(key=lambda r: r['Gen. Ave.'])
for e in entries[:12]:
print('{}Student No.,Gen. Ave. {:,}'.format(e['Student No.'], int(e['Gen. Ave.'])))
First of all, you are not using the csvRead instance. You should be reading from it instead of the csvFile.
Example:
I have the following something.csv CSV file:
76.94,76.944,76.945
76.97,76.979,76.980
77.025,77.025,77.025
77.063,77.264,77.064
77.1,77.64,77.3
Now if I do:
import csv
pf = open("something.csv", "r")
read = csv.reader(pf)
for r in read:
print(r)
pf.close()
You will get the following output:
python your_script.py
['76.94', '76.944', '76.945']
['76.97', '76.979', '76.980']
['77.025', '77.025', '77.025']
['77.063', '77.264', '77.064']
['77.1', '77.64', '77.3']

Python reading in integers from a csv file into a list

I am having some trouble trying to read a particular column in a csv file into a list in Python. Below is an example of my csv file:
Col 1 Col 2
1,000,000 1
500,000 2
250,000 3
Basically I am wanting to add column 1 into a list as integer values and am having a lot of trouble doing so. I have tried:
for row in csv.reader(csvfile):
list = [int(row.split(',')[0]) for row in csvfile]
However, I get a ValueError that says "invalid literal for int() with base 10: '"1'
I then tried:
for row in csv.reader(csvfile):
list = [(row.split(',')[0]) for row in csvfile]
This time I don't get an error however, I get the list:
['"1', '"500', '"250']
I have also tried changing the delimiter:
for row in csv.reader(csvfile):
list = [(row.split(' ')[0]) for row in csvfile]
This almost gives me the desired list however, the list includes the second column as well as, "\n" after each value:
['"1,000,000", 1\n', etc...]
If anyone could help me fix this it would be greatly appreciated!
Cheers
You should choose your delimiter wisely :
If you have floating numbers using ., use , delimiter, or if you use , for floating numbers, use ; as delimiter.
Moreover, as referred by the doc for csv.reader you can use the delimiter= argument to define your delimiter, like so:
with open('myfile.csv', 'r') as csvfile:
mylist = []
for row in csv.reader(csvfile, delimiter=';'):
mylist.append(row[0]) # careful here with [0]
or short version:
with open('myfile.csv', 'r') as csvfile:
mylist = [row[0] for row in csv.reader(csvfile, delimiter=';')]
To parse your number to a float, you will have to do
float(row[0].replace(',', ''))
You can open the file and split at the space using regular expressions:
import re
file_data = [re.split('\s+', i.strip('\n')) for i in open('filename.csv')]
final_data = [int(i[0]) for i in file_data[1:]]
First of all, you must parse your data correctly. Because it's not, in fact, CSV (Comma-Separated Values) but rather TSV (Tab-Separated) of which you should inform CSV reader (I'm assuming it's tab but you can theoretically use any whitespace with a few tweaks):
for row in csv.reader(csvfile, delimiter="\t"):
Second of all, you should strip your integer values of any commas as they don't add new information. After that, they can be easily parsed with int():
int(row[0].replace(',', ''))
Third of all, you really really should not iterate the same list twice. Either use a list comprehension or normal for loop, not both at the same time with the same variable. For example, with list comprehension:
csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
next(reader, None) # skip the header
lst = [int(row[0].replace(',', '')) for row in reader]
Or with normal iteration:
csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
lst = []
for i, row in enumerate(reader):
if i == 0:
continue # your custom header-handling code here
lst.append(int(row[0].replace(',', '')))
In both cases, lst is set to [1000000, 500000, 250000] as it should. Enjoy.
By the way, using reserved keyword list as a variable is an extremely bad idea.
UPDATE. There's one more option that I find interesting. Instead of setting the delimiter explicitly you can use csv.Sniffer to detect it e.g.:
csvdata = "Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n"
csvfile = StringIO(csvdata)
dialect = csv.Sniffer().sniff(csvdata)
reader = csv.reader(csvfile, dialect=dialect)
and then just like the snippets above. This will continue working even if you replace tabs with semicolons or commas (would require quotes around your weird integers) or, possibly, something else.

Remove double quotes from iterator when using csv writer

I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?
I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])
writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.

Writing List of Strings to Excel CSV File in Python

I'm trying to create a csv file that contains the contents of a list of strings in Python, using the script below. However when I check my output file, it turns out that every character is delimited by a comma. How can I instruct csv.writer to delimit every individual string within the list rather than every character?
import csv
RESULTS = ['apple','cherry','orange','pineapple','strawberry']
result_file = open("output.csv",'wb')
wr = csv.writer(result_file, dialect='excel')
for item in RESULTS:
wr.writerow(item)
I checked PEP 305 and couldn't find anything specific.
The csv.writer writerow method takes an iterable as an argument. Your result set has to be a list (rows) of lists (columns).
csvwriter.writerow(row)
Write the row parameter to the writer’s file object, formatted according to the current dialect.
Do either:
import csv
RESULTS = [
['apple','cherry','orange','pineapple','strawberry']
]
with open('output.csv','w') as result_file:
wr = csv.writer(result_file, dialect='excel')
wr.writerows(RESULTS)
or:
import csv
RESULT = ['apple','cherry','orange','pineapple','strawberry']
with open('output.csv','w') as result_file:
wr = csv.writer(result_file, dialect='excel')
wr.writerow(RESULT)
Very simple to fix, you just need to turn the parameter to writerow into a list.
for item in RESULTS:
wr.writerow([item])
I know I'm a little late, but something I found that works (and doesn't require using csv) is to write a for loop that writes to your file for every element in your list.
# Define Data
RESULTS = ['apple','cherry','orange','pineapple','strawberry']
# Open File
resultFyle = open("output.csv",'w')
# Write data to file
for r in RESULTS:
resultFyle.write(r + "\n")
resultFyle.close()
I don't know if this solution is any better than the ones already offered, but it more closely reflects your original logic so I thought I'd share.
A sample - write multiple rows with boolean column (using example above by GaretJax and Eran?).
import csv
RESULT = [['IsBerry','FruitName'],
[False,'apple'],
[True, 'cherry'],
[False,'orange'],
[False,'pineapple'],
[True, 'strawberry']]
with open("../datasets/dashdb.csv", 'wb') as resultFile:
wr = csv.writer(resultFile, dialect='excel')
wr.writerows(RESULT)
Result:
df_data_4 = pd.read_csv('../datasets/dashdb.csv')
df_data_4.head()
Output:
IsBerry FruitName
0 False apple
1 True cherry
2 False orange
3 False pineapple
4 True strawberry

Categories