Reading values in a csv file while deleting unwanted ones - python

I want to read a csv file that contains floats and arrays. I only want to collect float values and get rid of array ones.
I have tried the following code :
with open('resultsMC_100_var.csv', "r") as input:
with open('new.csv', "w") as output :
for line in input :
if not line.count(('[') or (']')) :
output.write(line)
But the problem is that array values are written on several lines so the code does not work as intended...
I show you the first line of my csv file so you can have an idea of how it is built :
51.3402815384;28.1789716134;76.7144759149;28.5590830355;50.719035557;4.83225361254;[ 23.35145494 23.6919634 21.1406396 77.35953884 121.68508966 23.02126533 24.64623985 22.30757623 59.53286234 86.01880338 22.34363071 29.75759786 30.94420056 27.24198645 21.62989704
22.57036406 23.09155954 26.32781992 22.82521813 99.12230864
22.04329951 22.50081984 104.84634521 59.48921929 34.47985424
What I would like to do is a code that reads all the values, then stops if it meets the symbol [ and then reads again as soon as it meets ]. I do not know how to do it properly and I have not found a similar topic on this website so I will be thankful to anyone who can helps me.

The problem with your statement is that line.count(('[') or (']')) is the same as writing line.count('['), since a non-empty string is evaluated to True...
A simple solution here would be to use a regex:
import re
with open('test.txt', "r") as f:
content = f.read()
with open('new.txt', "w") as output :
new_line = re.sub(r"\[[^\[\]]*\]", "", content, flags=re.MULTILINE)
output.write(new_line)

You could try using regex. Here's what I think will work.
import re
inp = open("results.csv", "r")
inp_data = inp.read()
out_data = re.sub(r"\[[^\[\]]*\]", "", inp_data)
out = open("xyz.csv", "w")
out.write(out_data)
This first reads your input data into a string.
It then replaces all arrays with "". You can then write this updated string to a new file. Hope this helps!

Related

Python Read strings from file

Unfortunately I have the problem that I can not read strings from files, the part of the code would be:
names = ["Johnatan", "Jackson"]
I tried it with this =
open("./names.txt", "r")
instead of (code above)- list of names, but unfortunately this does not work, if I query it without the file it works without problems.
I would be very happy if someone could help me and tell me exactly where the problem is.
f = open('./names.txt', 'r', encoding='utf-8')
words_string = f.readlines()
words = words_string.split(',')
print(words)
I hope it helps.. As your file contains all the words with comma-separated using readlines we get all the lines in the file as a single string and then simply split the string using .split(','). This results in the list of words that you need.
try read the data in the file like this:
with open(file_path, "r") as f:
data = f.readlines()
data = ['maria, Johnatan, Jackson']
inside data is the list of name. you can parse it using split(",")
you can split in lines if you have in the file, Johnatan,Jackson,Maria
doing:
with open("./names.txt", "r", encoding='utf-8') as fp:
content = fp.read()
names = content.split(",")
you can also do:
names = open("./names.txt", "r", encoding='utf-8').read().split(",")
if you want it to be oneliner,

read keyword in txt file, and print add text + keyword

I got many keywords in txt file to python using f = open().
And I want to add text before each keywords.
example,
(http://www.google.com/) + (abcdefg)
add text keywords imported
It have tried it, I can't result I want.
f = open("C:/abc/abc.txt", 'r')
data = f.read()
print("http://www.google.com/" + data)
f.close()
I tried it using "for".
But, I can't it.
Please let me know the solution.
many thanks.
Your original code has some flaws:
you only read the first line of the file, with data = f.read(). If you want to read all the lines from the file, use a for;
data is a str-type variable, which may have more than one word. Thus, you must split this line into words, using data.split()
To solve your problem, you need to read each line from the file, split the line into the words it has, then loop through the list with the words, add the desired text then the word itself.
The correct program is this:
f = open("C:/abc/abc.txt", 'r')
for data in f:
words = data.split()
for i in words:
print("http://www.google.com/" + i)
f.close()
with open('text.txt','r') as f:
for line in f:
print("http://www.google.com/" + line)

Python: How to capitalize the first column of a .txt file.

I have a .csv formatted .txt file. I am deliberating over the best manner in which to .capitalize the text in the first column.
.capitalize() is a string method, so I considered the following; I would need to open the file, convert the data to a list of strings, capitalize the the required word and finally write the data back to file.
To achieve this, I did the following:
newGuestList = []
with open("guestList.txt","r+") as guestFile :
guestList = csv.reader(guestFile)
for guest in guestList :
for guestInfo in guest :
capitalisedName = guestInfo.capitalize()
newGuestList.append(capitalisedName)
Which gives the output:
[‘Peter’, ‘35’, ‘ spain’, ‘Caroline’, ‘37’, ‘france’, ‘Claire’,’32’, ‘ sweden’]
The problem:
Firstly; in order to write this new list back to file, I will need to convert it to a string. I can achieve this using the .join method. However, how can I introduce a newline, \n, after every third word (the country) so that each guest has their own line in the text file?
Secondly; this method, of nested for loops etc. seems highly convoluted, is there a cleaner way?
My .txt file:
peter, 35, spain\n
caroline, 37, france\n
claire, 32, sweden\n
You don't need to split the lines, since the first caracter of the first word is the first caracter of the line :
with open("lst.txt","r") as guestFile :
lines=guestFile.readlines()
newlines=[line.capitalize() for line in lines]
with open("lst.txt","w") as guestFile :
guestFile.writelines(newlines)
You can just use a CSV reader and writer and access the element you want to capitalize from the list.
import csv
import os
inp = open('a.txt', 'r')
out = open('b.txt', 'w')
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
row[0] = row[0].capitalize()
writer.writerow(row)
inp.close()
out.close()
os.rename('b.txt', 'a.txt') # if you want to keep the same name

Read lines from .txt and if the first and last char equals to X and Y, add some text after that string

i'm trying to solve a problem that consists in:
open a .txt input file
readlines of that .txt
store the lines values in a list
check if lin.startswith("2") and lin.endswith("|")
if it's true, then lin2 = lin + "ISENTO"
write the edited lines to an output .txt file
Here's what i got until now...
def editTxt():
#open .txt file and read the lines
filename = askopenfilename(filetypes=[("Text files","*.txt"), ("Text files","*.TXT")])
infile = open(filename, 'r')
infile.read()
#save each line in a list called "linhas" outside the editTxt function
with open(filename, 'r') as f:
linhas = f.readlines()
#create the output file
outfile = open(filename + "_edit.txt", 'w')
#checking the condition and writing the edited lines
for linha in linhas:
if linha.startswith("2") and linha.endswith("|"):
linha = linha + "ISENTO"
outfile.write(linha)
#close files
outfile.close()
infile.close()
The problem is that my output file is exactly equals my input file...
i've already tried to use if linha[0] == "2" and linha[len(linha)-1] == "|"
but then i figured it out that the module f.readlines() just add \n after my string...
so i tried with if linha[0] == "2" and linha[len(linha)-3] == "|"
but i didn't worked too...
some guys told me that i should use the replace function.. but i couldn't figure how
The real file example:
lin1: 10|1,00|55591283000185|02/03/2015|31/03/2015
lin2: 20|I||VENDA|0|9977|02/03/2015 00:00:00|02/03/2015 11:48:00|1|5102|||07786273000152|OBSERVE SEGURANCA LTDA|RUA
MARINGA,|2150||BOA VISTA|RIBEIRAO PRETO|SP|14025560||39121530|
lin3: 30|1103|DAT 05MM - 5.102||PC|1,0000|19,9000|19,90|090|0,00|0,00|0,00
I just need to change the lin2, because it starts with "2" and ends with "|"
what i need after running the editTxt function:
lin2: 20|I||VENDA|0|9977|02/03/2015 00:00:00|02/03/2015 11:48:00|1|5102|||07786273000152|OBSERVE SEGURANCA LTDA|RUA
MARINGA,|2150||BOA VISTA|RIBEIRAO PRETO|SP|14025560||39121530|ISENTO
please python experts, show me an easier way to do this with another code or preferably explaining to me whats wrong with mine..
thx!
You were very close with your last attempt
The '\n' line terminator is not literally the characters '\' and 'n'. It's a special character that is represented, for convenience by '\n'. So it's only one character in your string not two.
Hopefully, that should give you enough of a hint to figure out how to change your code :)

Python read() works with UTF-8 but readlines() "doesn't"

So, I am working with a (huge) UTF-8 encoded file. The first thing I do with it it's get it's lines in a list using the File Object readlines() method. However when I use the print command for debugging I get things like, for example, \xc3 etc.
Here's a really small example that replicates my problem; I created a t.txt file that contains only the text "Clara Martínez"
f = open("t.txt", "r")
s = f.read()
print s
Clara Martínez
#If I do the following however
lines = f.readlines()
for l in lines:
print l
['Clara Mart\xc3\xadnez']
#write however works fine!
f2 = open("t2.txt", "w")
for l in lines:
f2.write(l)
f2.close()
f1.close()
And then I open the "t2.txt", the string is correct, i.e: Clara Martínez.
Is there any way to "make" readlines() work as read()?
You claim that this:
lines = f.readlines()
for l in lines:
print l
Will result in this:
['Clara Mart\xc3\xadnez']
This is not true, it will not. I think you made a mistake in your code, and wrote this:
lines = f.readlines()
for l in lines:
print lines
That code will give the result you say, assuming the file contains only one line with the text 'Clara Mart\xc3\xadnez'.

Categories