Writing for loop outputs to CSV columns - python

I have a for loop that prints 4 details:
deats = soup.find_all('p')
for n in deats:
print n.text
The output is 4 printed lines.
Instead of printing, what I'd like to do is have each 'n' written to a different column in a .csv. Obviously, when I use a regular .write() it puts it in the same column. In other words, how would I make it write each iteration of the loop to the next column?

You would create the csv row as a loop (or using list comprehension) I will show the explicit loop for ease of reading and you can change it to a single list comprehension line yourself.
row = []
for n in deats:
row.append(n)
Now you have row ready to write to the .csv file using csv.Writer()

Hei, try like this:
import csv
csv_output = csv.writer(open("output.csv", "wb")) # output.csv is the output file name!
csv_output.writerow(["Col1","Col2","Col3","Col4"]) # Setting first row with all column titles
temp = []
deats = soup.find_all('p')
for n in deats:
temp.append(str(n.text))
csv_output.writerow(temp)

You use the csv module for this:
import csv
with open('output.csv', 'wb') as csvfile:
opwriter = csv.writer(csvfile, delimiter=','
opwriter.writerow([n.text for n in deats])

extra_stuff = pie,cake,eat,too
some_file.write(",".join(n.text for n in deats)+"," + ",".join(str(s) for s in extra_stuff))
??? is that all you are looking for?

Related

Csv, Python, separating elements in one column to different columns

So I have a CSV file like this,
how can I separate them into different columns like this,
using python without using the pandas lib.
Implementation that should work in python 3.6+.
import csv
with open("input.csv", newline="") as inputfile:
with open("output.csv", "w", newline="") as outputfile:
reader = csv.DictReader(inputfile) # reader
fieldnames = reader.fieldnames
writer = csv.DictWriter(outputfile, fieldnames=fieldnames) # writer
# make header
writer.writeheader()
# loop over each row in input CSV
for row in reader:
# get first column
column: str = str(row[fieldnames[0]])
numbers: list = column.split(",")
if len(numbers) != len(fieldnames):
print("Error: Lengths not equal")
# write row in output CSV
writer.writerow({field: num for field, num in zip(fieldnames, numbers)})
Explanation of the code:
The above code takes two file names input.csv and output.csv. The names being verbose don't need any further explanation.
It reads each row from input.csv and writes corresponding row in output.csv.
The last line is a "dictionary comprehension" combined with zip (similar to "list comprehensions" for lists). It's a nice way to do a lot of stuff in a single line but same code in expanded form looks like:
row = {}
for field, num in zip(fieldnames, numbers):
row[field] = num
writer.writerow(row)
It is already separated into different columns by , as separator, but the european version of excel usually uses ; as separator. You can specify the separator, when you import the csv:
https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba
If you really want to change the file content with python use the replace function and replace , with ;: How to search and replace text in a file?

Can I print lines randomly from a csv in Python?

I'm trying print lines randomly from a csv.
Lets say the csv has the below 10 lines -
1,One
2,Two
3,Three
4,Four
5,Five
6,Six
7,Seven
8,Eight
9,Nine
10,Ten
If I write a code like below, it prints each line as a list in the same order as present in the CSV
import csv
with open("MyCSV.csv") as f:
reader = csv.reader(f)
for row_num, row in enumerate(reader):
print(row)
Instead, I'd like it to be random.
Its just a print for now. I'll later pass each line as a List to a Function.
This should work. You can reuse the lines list in your code as it is shuffled.
import random
with open("tmp.csv", "r") as f:
lines = f.readlines()
random.shuffle(lines)
print(lines)
import csv
import random
csv_elems = []
with open("MyCSV.csv") as f:
reader = csv.reader(f)
for row_num, row in enumerate(reader):
csv_elems.append(row)
random.shuffle(csv_elems)
print(csv_elems[0])
As you can see I'm just printing the first elem, you can iterate over the list, keep shuffling & print
Well you can define a list, append all elements of csv file into it, then shuffle it and print them, assume that the name of this list is temp
import csv
import random
temp = []
with open("your csv file.csv") as file:
reader = csv.reader(file)
for row_num, row in enumerate(reader):
temp.append(row)
random.shuffle(temp)
for i in range(len(temp)):
print(temp[i])
Why better don't you use pandas to handle csv?
import pandas as pd
data = pd.read_csv("MyCSV.csv")
And to get the samples you are looking for just write:
data.sample() # print one sample
data.sample(5) # to write 5 samples
Also if you want to pass each line to a function.
data_after_function = data.appy(function_name)
and inside the function you can cast the line into a list with list()
Hope this helps!
Couple of things to do:
Store CSV into a sequence of some sort
Get the data randomly
For 1, it’s probably best to use some form of sequence comprehension (I’ve gone for nested tuple in a list as it seems you want the row numbers and we can’t use dictionaries for shuffle).
We can use the random module for number 2.
import random
import csv
with open("MyCSV.csv") as f:
reader = csv.reader(f)
my_csv = [(row_num, row) for row_num, row in enumerate(reader)]
# get only 1 item from the list at random
random_row = random.choice(my_csv)
# randomise the order of all the rows
shuffled_csv = random.shuffle(my_csv)

CSV writer goes to next line for each input value

I am new to Python. I am trying to write numbers in a CSV file. The first number makes the first element of the row. Second number second and then a new row should start. However, the way that my code works, instead of adding the second element to the same row, it makes a new row.
For instance what I want is:
a1,b1
a2,b2
But what I get is:
a1
b1
a2
b2
I use a loop to continuously write values into a CSV file:
n = Ratio # calculated in each loop
with open('ex1.csv', 'ab') as f:
writer = csv.writer(f)
writer.writerow([n])
...
m = Ratio2 # calculated in each loop
with open('ex1.csv', 'ab') as f:
writer = csv.writer(f)
writer.writerow([m])
I would like the results to be in format of
n1,m1
n2,m2
Example for writing to a file and then reading it back and printing it:
import csv
with open('ex1.csv', 'w') as f: # open file BEFORE you loop
writer = csv.writer(f) # declare your writer on the file
for rows in range(0,4): # do one loop per row
myRow = [] # remember all column values, clear list here
for colVal in range(0,10): # compute 10 columns
m = colVal * rows # heavy computing (your m or n)
myRow.append(m) # store column in row-list
writer.writerow(myRow) # write list containing all columns
with open('ex1.csv', 'r') as r: #read it back in
print(r.readlines()) # and print it
Output:
['0,0,0,0,0,0,0,0,0,0\r\n', '0,1,2,3,4,5,6,7,8,9\r\n', '0,2,4,6,8,10,12,14,16,18\r\n', '0,3,6,9,12,15,18,21,24,27\r\n']
which translates to a file of
0,0,0,0,0,0,0,0,0,0
0,1,2,3,4,5,6,7,8,9
0,2,4,6,8,10,12,14,16,18
0,3,6,9,12,15,18,21,24,27
You can also stuff each rows list (copy it by myList[:]) into another list and use writer.writerows([ [1,2,3,4],[4,5,6,7] ]) to write all your rows in one go .
See: https://docs.python.org/2/library/csv.html#writer-objects or https://docs.python.org/3/library/csv.html#writer-objects

csv file loop results

I'm trying to extract csv files by the cities with `re.findall(), but when I try to do that and write to results to another csv file, it loops over and over many times!
import io
import csv
import re
lines=0
outfile1 =codecs.open('/mesh/وسطى.csv','w','utf_8')
outfile6 =codecs.open('/mesh/أخرى.csv','w','utf_8')
with io.open('/mishal.csv','r',encoding="utf-8",newline='') as f:
reader = csv.reader(f)
for row in f :
for rows in row:
lines += 1
#الوسطى
m = re.findall('\u0634\u0642\u0631\u0627\u0621',row)
if m:
outfile1.write(row)
else:
outfile6.write(row)
print("saved In to mishal !")
f.close()
I want the re.finall() cities to not loop, just execute once for each match—not loooooooping so many times whenever there's a match.
Here's a screenshot of the output showing the excessive looping:
csv readers return a list for each line of the file - your outer loop is iterating over the lines/rows and your inner loop is iterating over items in each row. It isn't clear what you want. but your conditional writes happen for each item in each row. If your intent is to check and see if there is a match in the row instead of items in the row,
for row in f :
match = False
for item in row:
lines += 1 #??
#الوسطى
match = re.search('\u0634\u0642\u0631\u0627\u0621',item)
if match:
outfile1.write(row)
else:
outfile6.write(row)
You could accomplish the same thing just iterating over the lines in the file without using a csv reader
with io.open('/mishal.csv','r',encoding="utf-8",newline='') as f:
for line in f:
#الوسطى
if re.search('\u0634\u0642\u0631\u0627\u0621',line):
outfile1.write(line)
else:
outfile6.write(line)

How to read a text file into a list or an array with Python

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.
The text file is formatted as follows:
0,0,200,0,53,1,0,255,...,0.
Where the ... is above, there actual text file has hundreds or thousands more items.
I'm using the following code to try to read the file into a list:
text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()
The output I get is:
['0,0,200,0,53,1,0,255,...,0.']
1
Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?
You will have to split your string into a list of values using split()
So,
lines = text_file.read().split(',')
EDIT:
I didn't realise there would be so much traction to this. Here's a more idiomatic approach.
import csv
with open('filename.csv', 'r') as fd:
reader = csv.reader(fd)
for row in reader:
# do something
You can also use numpy loadtxt like
from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)
So you want to create a list of lists... We need to start with an empty list
list_of_lists = []
next, we read the file content, line by line
with open('data') as f:
for line in f:
inner_list = [elt.strip() for elt in line.split(',')]
# in alternative, if you need to use the file content as numbers
# inner_list = [int(elt.strip()) for elt in line.split(',')]
list_of_lists.append(inner_list)
A common use case is that of columnar data, but our units of storage are the
rows of the file, that we have read one by one, so you may want to transpose
your list of lists. This can be done with the following idiom
by_cols = zip(*list_of_lists)
Another common use is to give a name to each column
col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
by_names[col_name] = by_cols[i]
so that you can operate on homogeneous data items
mean_apple_prices = [money/fruits for money, fruits in
zip(by_names['apples revenue'], by_names['apples_sold'])]
Most of what I've written can be speeded up using the csv module, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).
Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.
If you need indexed access you can use
by_cols = list(zip(*list_of_lists))
that gives you a list of lists in both versions of Python.
On the other hand, if you don't need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine...
file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column
This question is asking how to read the comma-separated value contents from a file into an iterable list:
0,0,200,0,53,1,0,255,...,0.
The easiest way to do this is with the csv module as follows:
import csv
with open('filename.dat', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
Now, you can easily iterate over spamreader like this:
for row in spamreader:
print(', '.join(row))
See documentation for more examples.
Im a bit late but you can also read the text file into a dataframe and then convert corresponding column to a list.
lista=pd.read_csv('path_to_textfile.txt', sep=",", header=None)[0].tolist()
example.
lista=pd.read_csv('data/holdout.txt',sep=',',header=None)[0].tolist()
Note: the column name of the corresponding dataframe will be in the form of integers and i choose 0 because i was extracting only the first column
Better this way,
def txt_to_lst(file_path):
try:
stopword=open(file_path,"r")
lines = stopword.read().split('\n')
print(lines)
except Exception as e:
print(e)

Categories