How can I remove unnecssary characters from a csv file - python

I want to remove all the special characters from the csv file. I tried in many ways but couldn't fix it
import re
data=("C:/Users/Niroshima/Desktop/Research/post.csv")
for i in data.values():
i = re.sub(r'[^\x00-\x7F]', '', i)
print(i)
And this error came up
AttributeError
Traceback (most recent call last)
<ipython-input-17-ee7352e82dd3> in <module>
----> 1 for i in data.values():
2 i=re.sub(r'[^\x00-\x7F]','',i)
3 print(i)
AttributeError: 'str' object has no attribute 'values'

data is just your file name, try opening the file and changing each line like so:
file_name = "C:/Users/Niroshima/Desktop/Research/post.csv"
with open(file_name) as f:
for line in f:
l = re.sub(r'[^\x00-\x7F]','', line)
print(l)
If you want this data in another file, then you have to write each l to a different file

Related

Reading line-by-line textfile with for loop and get error IndexError: list index out of range

I have a tab delimited text file containing these values.
My input textfile:
0.227996254681648 0.337028824833703 0.238163571416268 0.183009231781289 0.085746697332588 0.13412895376826
0.247891283973758 0.335555555555556 0.272129379268419 0.187328622765857 0.085921240923626 0.128372465534807
0.264761012183693 0.337777777777778 0.245917821271498 0.183211905363232 0.080493183753814 0.122786059549795
0.30506091846298 0.337777777777778 0.204265153911403 0.208453197418743 0.0715575291087 0.083682658454807
0.222748815165877 0.337028824833703 0.209714942778068 0.084252659537679 0.142013573559938 0.234672985858848
I would like to input each line of the my textfile, run a for loop over each line with some equations and output 4 values after the equation (fsolve) and append it to another text file, line-by-line.
My code:
with open("/path/inputtextfile.txt") as f:
for line in f:
map(float, line.strip().split())
if 'str' in line:
break
BB=columns[0]
LL=columns[1]
FF=columns[2]
VV=columns[3]
GG=columns[4]
TT=columns[5]
x=1
FF2=FF**2
BB2=1-BB
LL2=1-LL
def f2(z):
a=z[0]
b=z[1]
c=z[2]
d=z[3]
f=np.zeros(4)
f[0]=x*a*((1-c)*BB+(b-d)*BB2)-VV
f[1] =((x**2)*(a**2)*((BB2**2)*b*d-(BB**2)*c))-GG
f[2]= x*a*(b*BB2*(1-x*d*a*BB2)-c*BB*(1-x*a*BB))-TT
f[3]= (LL*LL2*BB*BB2*(d - b*c)**2) -FF2
return f
z= fsolve(f2,[1,1,1,1])
a_file = open("path/to/new/output/textfile.txt", "a")
np.savetxt(a_file, z, fmt='%1.10f', newline=" ")
a_file.close()
I am not so sure, if the whole code works fine. But something seems for sure not to work with the input of my numbers as they are not floating-point values. I get this error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-86-2808bd00222a> in <module>
5 break
6 BB=columns[0]
----> 7 LL=columns[1]
8 FF=columns[2]
9 VV=columns[3]
IndexError: list index out of range
Why you don't use csv module for this purpose. You only need to have the file in .csv and You could use it in this way:
import csv # to import the module in your class
lines = []
with open("filename.csv") as f:
csvReader = csv.reader( f, delimiter="" ) #in your case the delimeter is space between cells, if use commas you use delimiter=","
for row in csvReader:
lines.append(row) #or you can use your formulas here to work each line

How to apply regex sub to a csv file in python

I have a csv file I wish to apply a regex replacement to with python.
So far I have the following
reader = csv.reader(open('ffrk_inventory_relics.csv', 'r'))
writer = csv.writer(open('outfile.csv','w'))
for row in reader:
reader = re.sub(r'\+','z',reader)
Which is giving me the following error:
Script error: Traceback (most recent call last):
File "ffrk_inventory_tracker_v1.6.py", line 22, in response
getRelics(data['equipments'], 'ffrk_inventory_relics')
File "ffrk_inventory_tracker_v1.6.py", line 72, in getRelics
reader = re.sub(r'\+','z',reader)
File "c:\users\baconcatbug\appdata\local\programs\python\python36\lib\re.py",
line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
After googling to not much luck, I would like to ask the community here how to open the csv file correctly so I can use re.sub on it and then write out the altered csv file back to the same filename.
csv.reader(open('ffrk_inventory_relics.csv', 'r')) is creating a list of lists, and when you iterate over it and pass each value to re.sub, you are passing a list, not a string. Try this:
import re
import csv
final_data = [[re.sub('\+', 'z', b) for b in i] for i in csv.reader(open('ffrk_inventory_relics.csv', 'r'))]
write = csv.writer(open('ffrk_inventory_relics.csv'))
write.writerows(final_data)
If you don't need csv you can use replace with regular open:
with open('ffrk_inventory_relics.csv', 'r') as reader, open('outfile.csv','w') as writer:
for row in reader:
writer.write(row.replace('+','z'))

AttributeError: 'list' object has no attribute 'split' in Python

I am having problems with this bit of code
import csv
temp = open("townsfile.csv", "r")
towns = temp.read()
temp.close()
print(towns)
eachTown = towns.split("\n")
print (eachTown)
record = eachTown.split(",")
for line in eachTown:
record = eachItem.split(",")
print(record)
newlist=[]
newlist.append(record)
newlist=[]
for eachItem in eachTown:
record = eachItem.split(",")
newlist.append(record)
print(newlist)
It returns this error
Traceback (most recent call last):
File "N:/Python practice/towns.py", line 10, in <module>
record = eachTown.split(",")
AttributeError: 'list' object has no attribute 'split'
Can anyone help me with this
The csv module gives you this text parsing functionality, you do not need to do it yourself.
import csv
with open("townsfile.csv", "r") as f:
reader = csv.reader(f, delimiter=',')
towns = list(reader)
print(towns)
The problem you have is that list.split() does not exist, you are trying to use str.split() but you already split it into a list of strs. You would need to do it for every str in the list.
eachTown = towns.split("\n")
This code return list. List don't have attribute split. You should replace
record = eachTown.split(",")
like this
records = [rec.split(",") for rec in eachTown]
But better if you start using module csv for read this file.

Splitting uneven spaced column in Python

I tried to use the below program
import os
HOME= os.getcwd()
STORE_INFO_FILE = os.path.join(HOME,'storeInfo')
def searchStr(STORE_INFO_FILE, storeId):
with open (STORE_INFO_FILE, 'r') as storeInfoFile:
for storeLine in storeInfoFile:
## print storeLine.split(r'\s+')[0]
if storeLine.split()[0] == storeId:
print storeLine
searchStr(STORE_INFO_FILE, 'Star001')
An example line in the file:
Star001 Sunnyvale 9.00 USD Los_angeles/America sunnvaleStarb#startb.com
But it gives the below error
./searchStore.py Traceback (most recent call last): File
"./searchStore.py", line 21, in
searchStr(STORE_INFO_FILE, 'Star001') File "./searchStore.py", line 17, in searchStr
if storeLine.split()[0] == storeId: IndexError: list index out of range
I have tried printing using split function on the command line and I was able to print it.
It looks like you have an empty or blank line in your file:
>>> 'abc def hij\n'.split()
['abc', 'def', 'hij']
>>> ' \n'.split() # a blank line containing white space
[]
>>> '\n'.split() # an empty line
[]
The last 2 cases show that an empty list can be returned by split(). Trying to index that list raises an exception:
>>> '\n'.split()[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
You can fix the problem by checking for empty and blank lines. Try this code:
def searchStr(store_info_file, store_id):
with open (store_info_file) as f:
for line in f:
if line.strip() and (line.split()[0] == store_id):
print line
Adding line.strip() allows you to ignore empty lines and lines containing only whitespace.
Code has an issue if split method returns an empty list.
You can change code that calls split method and add error handling code.
Following can be done
storeLineWords = storeLine.split()
if len(storeLineWords) > 0 and storeLineWords[0] == storeId:

counting occurence of a word in a text file using python

I'm tring to count the occurence of a word in a text file.
sub = 'Date:'
#opening and reading the input file
#In path to input file use '\' as escape character
with open ("C:\\Users\\md_sarfaraz\\Desktop\\ctl_Files.txt", "r") as myfile:
val=myfile.read().replace('\n', ' ')
#val
#len(val)
occurence = str.count(sub, 0, len(val))
I'm getting this error :--
>>> occurence = str.count('Date:', 0,len(val))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a character buffer object
>>> occurence = str.count('Date:', 0,20)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a character buffer object
You are over-complicating it:
open(file).read().count(WORD)
You're using count wrong. Try this:
occurence = val.count(sub)
If you want to know how many times the word Date: occurs in the text file, this is one way to do it:
myfile = open("C:\\Users\\md_sarfaraz\\Desktop\\ctl_Files.txt", "r").read()
sub = "Date:"
occurence = myfile.count(sub)
print occurence

Categories