Im accessing a csv file, looping through all of its rows(strings) and i want too keep / print all parts of each string which start with a "." , has two words in the middle and ends with either a "." "?" or "!".
For example, if the string was: "This is my new channel. Please subscribe!" i'd only want to keep the ". Please subscribe!"
So far i only have this to show me how many words are inside each string:
with open("data2.csv", encoding="utf-8", newline='') as f:
reader = csv.reader(f)
for row in reader:
rowstr = str(row[1])
res = len(row[1].split())
print(res)
I've tried:
with open("data2.csv", encoding="utf-8", newline='') as f:
reader = csv.reader(f)
for row in reader:
rowstr = row[1]
res = len(row[1].split())
re.findall(r"\.\S+\s\S+[.?!]", rowstr)
print(row[1])
I get no output from findall, only from printing row[1]
Fixed it
Working code:
with open("data2.csv", encoding="utf-8", newline='') as f:
reader = csv.reader(f)
for row in reader:
rowstr = row[1]
res = len(row[1].split())
finalData = re.findall(r"(\.\W\w+\W\w+[\.\?!])", rowstr)
print(finalData)
You can use regular expression:
re.findall(r'(\.\W\w+\W\w+[\.\?!])$',"This is my new channel. Please subscribe!" )
which output:
['. Please subscribe!']
Regex is the best solution to the problems like this. Please refer here here!
Related
I want to remove the "." character. When I use this
text = "Rmyname.lastname#mail.com"
text = (text.replace('.',' '))
head, sep, tail = text.partition('#')
print(head)
It works and this is the output: Rmyname lastname
But when load an external file and read every line, its doesnt replace the "." character.
with open('found.txt', 'r') as csvfile:
spamreader = csv.reader(csvfile)
for row in spamreader:
head = (row[0].replace('.', ' '))
head, sep, tail = row[0].partition('#')
print(head)
This is the output: Rmyname.lastname
How can i solve this?
You store the result of the replacement into the variable head. The original row[0] still has the period. Change row[0].partition('#') to head.partition('#').
I am trying to remove non-ascii characters from a file. I am actually trying to convert a text file which contains these characters (eg. hello§‚å½¢æˆ äº†å¯¹æ¯”ã€‚ 花å) into a csv file.
However, I am unable to iterate through these characters and hence I want to remove them (i.e chop off or put a space). Here's the code (researched and gathered from various sources)
The problem with the code is, after running the script, the csv/txt file has not been updated. Which means the characters are still there. Have absolutely no idea how to go about doing this anymore. Researched for a day :(
Would kindly appreciate your help!
import csv
txt_file = r"xxx.txt"
csv_file = r"xxx.csv"
in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))
for row in in_txt:
for i in row:
i = "".join([a if ord(a)<128 else''for a in i])
out_csv.writerows(in_txt)
Variable assignment is not magically transferred to the original source; you have to build up a new list of your changed rows:
import csv
txt_file = r"xxx.txt"
csv_file = r"xxx.csv"
in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))
out_txt = []
for row in in_txt:
out_txt.append([
"".join(a if ord(a) < 128 else '' for a in i)
for i in row
]
out_csv.writerows(out_txt)
I need to remove the second line of my csv file.
I am using the code below but unfortunately it doesn't work.
data = ""
adresse = "SLV.csv"
if os.path.exists(adresse) :
f = open(adresse,"ab")
writer = csv.writer(f,delimiter = ",")
reader = csv.reader(open(adresse,"rb") , delimiter = ",")
for line in reader:
if reader.line_num == 2:
writer.writerow(line)
f.close()
Since all you want to do is remove the second line, using the csv module is overkill. It doesn't matter if the file is comma separated data or Vogon poetry. Write the front parts, skip the middle part and write the end.
import shutil
# generate test file
with open('x.txt', 'w') as f:
for i in range(10):
f.write('line %d\n' % i)
# skip one line
with open('x.txt','rb') as rd, open('x.txt', 'rb+') as wr:
wr.write(rd.readline())
rd.readline()
shutil.copyfileobj(rd, wr)
wr.truncate()
print open('x.txt').read()
write to a temp file and update the original after:
if os.path.exists(adresse) :
with open(adresse,"r") as f,open("temp.csv" "a+") as temp:
writer = csv.writer(temp,delimiter = ",")
reader = csv.reader(f , delimiter = ",")
for ind, line in enumerate(reader):
if ind == 2:
continue
else:
temp.writerow(line)
temp.seek(0)
with open(adresse,"w") as out:
reader = csv.reader(temp , delimiter = ",")
writer = csv.writer(out,delimiter = ",")
for row in reader:
writer.writerow(line)
If the files can be read into memory just call list on reader and remove the second element:
if os.path.exists(adresse) :
with open(adresse,"r") as f:
reader = list(csv.reader(f , delimiter = ","))
reader.pop(1)
with open(adresse,"w") as out:
writer = csv.writer(out,delimiter = ",")
for row in reader:
writer.writerow(row)
if i understand you correctly you are trying to make a new file and you don't want to insert the line number 2.
if this is your scenario there is a trivial bug in your procedure, that is:
if reader.line_num != 2:
writer.writerow(line)
Here my solution, less code for you to write:
>>> import pyexcel as pe # pip install pyexcel
>>> sheet = pe.load("SLV.csv")
>>> del sheet.row[1] # first row starts at index 0
>>> sheet.save_as("SLV.csv")
I do agree with tdelaney, and this is a far more compact solution
lines = open('x.txt', 'r').readlines()
lines.pop(1)
open('x.txt', 'w').writelines(lines)
I have similar problem to this guy: find position of a substring in a string
The difference is that I don't know what my "mystr" is. I know my substring but my string in the input file could be random amount of words in any order, but i know one of those words include substring cola.
For example a csv file: fanta,coca_cola,sprite in any order.
If my substring is "cola", then how can I make a code that says
mystr.find('cola')
or
match = re.search(r"[^a-zA-Z](cola)[^a-zA-Z]", mystr)
or
if "cola" in mystr
When I don't know what my "mystr" is?
this is my code:
import csv
with open('first.csv', 'rb') as fp_in, open('second.csv', 'wb') as fp_out:
reader = csv.DictReader(fp_in)
rows = [row for row in reader]
writer = csv.writer(fp_out, delimiter = ',')
writer.writerow(["new_cola"])
def headers1(name):
if "cola" in name:
return row.get("cola")
for row in rows:
writer.writerow([headers1("cola")])
and the first.csv:
fanta,cocacola,banana
0,1,0
1,2,1
so it prints out
new_cola
""
""
when it should print out
new_cola
1
2
Here is a working example:
import csv
with open("first.csv", "rb") as fp_in, open("second.csv", "wb") as fp_out:
reader = csv.DictReader(fp_in)
writer = csv.writer(fp_out, delimiter = ",")
writer.writerow(["new_cola"])
def filter_cola(row):
for k,v in row.iteritems():
if "cola" in k:
yield v
for row in reader:
writer.writerow(list(filter_cola(row)))
Notes:
rows = [row for row in reader] is unnecessary and inefficient (here you convert a generator to list which consumes a lot of memory for huge data)
instead of return row.get("cola") you meant return row.get(name)
in the statement return row.get("cola") you access a variable outside of the current scope
you can also use the unix tool cut. For example:
cut -d "," -f 2 < first.csv > second.csv
I'm really new to python and I have a simple question. I have a .csv file with the following content:
123,456,789
I want to read it and store it into a variable called "number" with the following format
"123","456","789"
So that when I do
print number
It will give the following output
"123","456","789"
Can anybody help?
Thanks!
Update:
The following is my code:
input = csv.reader(open('inputfile.csv', 'r'))
for item in input:
item = ['"' + item + '"' for item in item]
print item
It gave the following output:
['"123"', '"456"', '"789"']
Here's how to do it:
import csv
from io import StringIO
quotedData = StringIO()
with open('file.csv') as f:
reader = csv.reader(f)
writer = csv.writer(quotedData, quoting=csv.QUOTE_ALL)
for row in reader:
writer.writerow(row)
with reader=csv.reader(StringIO('1,2,3')) the output is:
print quotedData.getvalue()
"1","2","3"
Using the csv-module, you can read the .csv file line-by-line and process each element from a tuple you gain. You can then just enclose each element into double-quotes.
import csv
reader = csv.reader(open("file.csv"))
for line in reader:
# line is a tuple ...
If the whole file only contains numbers you can just open it as a regular file:
with open("file.csv") as f:
for line in f:
print ','.join('"{}"'.format(x) for x in line.rstrip().split(','))
It'd be better to append the lines to an array with append, tho. For example:
with open("file.csv") as f:
lines=[line.rstrip().split(',') for line in f]
There is a CSV module there that might help you as well.
import csv
spamReader = csv.reader(open('eggs.csv', 'rb'))
for row in spamReader:
this_row = ['"' + str(item) + '"' for item in row]
print this_row
import csv
csvr = csv.reader(open(<yourfile.csv>,'r'))
def gimenumbers():
for row in csvr:
yield '","'.join(row)