python writelines from a list made from .split() - python

I have a very long string with vertical and horizontal delimiters in this format:
[|Bob Hunter|555-5555|B|Polycity|AK|55555||#|Rob Punter|999-5555|B|Bolycity|AZ|55559|rpunter#email.com|#|....and so on...]
I would like to generate a list from this long string using split('#') and then write each element as a line to a new text file like so:
|Bob Hunter|555-5555|B|Polycity|AK|55555||
|Rob Punter|999-5555|B|Bolycity|AZ|55559|rpunter#email.com|
I will then import it into excel and delimit by the pipes.
f1 = open(r'C:\Documents\MyData.html','r')
f2 = open(r'C:\Documents\MyData_formatted.txt','w')
lines = f1.read().split("#")
for i in lines:
f2.writelines(i)
f2.close()
f1.close()
However, the txt file remains one line and only a partial amount of the data is written to the file (only about 25% is there). How can I get python to split the data by the # symbol and write each element of the resulting list to a file as a new line?

This is your corrected code, I changed line variable to records, because we're not dealing with lines and just to avoid confusion:
records = f1.read()
records = records[1:] # remove [
records = records[:-1] # remove ]
records = records.split("#")
for rec in records:
f2.write(rec + "\n")
And since you mentioned you need this data in excel, use csv files and from excel open your csv output file and excel will format your output as needed without you having to do that manually:
import csv
w = csv.writer(f2, dialect="excel")
records = [line.replace("|", ",") +"\n" for line in records]
for rec in records:
w.writerow([rec])

I think that before every # we should also delete |, because without that, after every splitted rocord we will get || as first characters in every line. That's why we should split |#, not only #.
Try this:
with open('input.txt','r') as f1:
text = f1.read().lstrip('[').rstrip(']').split("|#") #remove '[' and ']' from each side
with open('output.txt','w') as f2:
for line in text:
f2.write('%s\n' % line) #write to file as string with new line sign

Related

How can I read a file and convert the data to numbers?

I am trying to read a txt file that contains
1,2 20000
and potentially read other text files with the same type, only with more numbers like:
1,2,3 30000
or
2,3,4,5 2000000.
with open('coordinate.txt','r') as file:
for line in file:
line = line.strip()
pieces = line.split()
data.append(pieces)
and then assign coord = data[0] and trial = data[1]
but coord becomes ["1,2"] which I just have no clue how to separate 1 and 2 by getting rid of the comma and making into numpy form. How can I appropriately read the file and assign it as the format I want it to be?
You can split on multiple delimiters by using a regular expression. Something like:
import re
txt = "1,2 20000"
txt_arr = re.split(r'[,\s]\s*', txt)
More info here
The easier way to do it is by using the split method:
pieces = []
lines = None
with open('cooridnate.txt', 'r') as fp:
lines = fp.readlines() # Get lines
for line in lines:
splitted_line = line.rstrip().split(',') # Split line with "," as a delimiter
pieces.append({
'coord': int(splited_line[0]),
'trial': splitted_line[1:]
}) # Append the splitted in pieces
print(pieces)
You may need to create a loop for trial too (if you want to convert it too).
Is it what you were looking for?

How do I convert a table in notepad into CSV format?

I have this table of data in Notepad
But it's not really a table because there aren't like official columns. It's just looks like a table, but the data is organized using spaces.
I want to convert it into a CSV format. How should I go about doing this?
The panda python packages I am using for data analysis work best with CSV, as far as I understand.
Here is a hackjob python script to do exactly what you need. Just save the script as a python file and run it with the path of your input file as the only argument.
UPDATED: After reading the comments to my answer, my script now uses regular expressions to account for any number of spaces.
import re
from sys import argv
output = ''
with open(argv[1]) as f:
for i, line in enumerate(f.readlines()):
if i == 0:
line = line.strip()
line = re.sub('\s+', ',', line) + '\n'
else:
line = re.sub('\s\s+', ',', line)
output += line
with open(argv[1] + '.csv', 'w') as f:
f.write(output)
So this is put into a file (if you call it csvify.py) and executed as:
python csvify.py <input_file_name>
csvify.py:
from sys import argv
from re import finditer
#Method that returns fields separated by commas
def comma_delimit(line, ranges):
return ','.join(get_field(line, ranges))
#Method that returns field info in appropriate format
def get_field(line, ranges):
for span in ranges: #Iterate through column ranges
field = line[slice(*span)].strip() #Get field data based on range slice and trim
#Use str() function if field doesn't contain commas, otherwise use repr()
yield (repr if ',' in field else str)(field)
#Open the input text file from command line (readonly, closed automatically)
with open(argv[1], 'r') as inp:
#Convert the first line (assumed header) into range indexes
#Use finditer to split the line by word border until the next word
#This assumes no spaces within header names
columns = map(lambda match: match.span(), finditer(r'\b\w+\s*', inp.readline()))
inp.seek(0) #Reset file pointer to beginning to include header line
#Create new CSV based on input file name
with open(argv[1] + '.csv', 'w') as txt:
#Writes to file and join all converted lines with newline
txt.write('\n'.join(comma_delimit(line, columns) for line in inp.readlines()))

writing the data in text file while converting it to csv

I am very new with python. I have a .txt file and want to convert it to a .csv file with the format I was told but could not manage to accomplish. a hand can be useful for it. I am going to explain it with screenshots.
I have a txt file with the name of bip.txt. and the data inside of it is like this
I want to convert it to csv like this csv file
So far, what I could do is only writing all the data from text file with this code:
read_files = glob.glob("C:/Users/Emrehana1/Desktop/bip.txt")
with open("C:/Users/Emrehana1/Desktop/Test_Result_Report.csv", "w") as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
So is there a solution to convert it to a csv file in the format I desire? I hope I have explained it clearly.
There's no need to use the glob module if you only have one file and you already know its name. You can just open it. It would have been helpful to quote your data as text, since as an image someone wanting to help you can't just copy and paste your input data.
For each entry in the input file you will have to read multiple lines to collect together the information you need to create an entry in the output file.
One way is to loop over the lines of input until you find one that begins with "test:", then get the next line in the file using next() to create the entry:
The following code will produce the split you need - creating the csv file can be done with the standard library module, and is left as an exercise. I used a different file name, as you can see.
with open("/tmp/blip.txt") as f:
for line in f:
if line.startswith("test:"):
test_name = line.strip().split(None, 1)[1]
result = next(f)
if not result.startswith("outcome:"):
raise ValueError("Test name not followed by outcome for test "+test_name)
outcome = result.strip().split(None, 1)[1]
print test_name, outcome
You do not use the glob function to open a file, it searches for file names matching a pattern. you could open up the file bip.txt then read each line and put the value into an array then when all of the values have been found join them with a new line and a comma and write to a csv file, like this:
# set the csv column headers
values = [["test", "outcome"]]
current_row = []
with open("bip.txt", "r") as f:
for line in f:
# when a blank line is found, append the row
if line == "\n" and current_row != []:
values.append(current_row)
current_row = []
if ":" in line:
# get the value after the semicolon
value = line[line.index(":")+1:].strip()
current_row.append(value)
# append the final row to the list
values.append(current_row)
# join the columns with a comma and the rows with a new line
csv_result = ""
for row in values:
csv_result += ",".join(row) + "\n"
# output the csv data to a file
with open("Test_Result_Report.csv", "w") as f:
f.write(csv_result)

Python: How to capitalize the first column of a .txt file.

I have a .csv formatted .txt file. I am deliberating over the best manner in which to .capitalize the text in the first column.
.capitalize() is a string method, so I considered the following; I would need to open the file, convert the data to a list of strings, capitalize the the required word and finally write the data back to file.
To achieve this, I did the following:
newGuestList = []
with open("guestList.txt","r+") as guestFile :
guestList = csv.reader(guestFile)
for guest in guestList :
for guestInfo in guest :
capitalisedName = guestInfo.capitalize()
newGuestList.append(capitalisedName)
Which gives the output:
[‘Peter’, ‘35’, ‘ spain’, ‘Caroline’, ‘37’, ‘france’, ‘Claire’,’32’, ‘ sweden’]
The problem:
Firstly; in order to write this new list back to file, I will need to convert it to a string. I can achieve this using the .join method. However, how can I introduce a newline, \n, after every third word (the country) so that each guest has their own line in the text file?
Secondly; this method, of nested for loops etc. seems highly convoluted, is there a cleaner way?
My .txt file:
peter, 35, spain\n
caroline, 37, france\n
claire, 32, sweden\n
You don't need to split the lines, since the first caracter of the first word is the first caracter of the line :
with open("lst.txt","r") as guestFile :
lines=guestFile.readlines()
newlines=[line.capitalize() for line in lines]
with open("lst.txt","w") as guestFile :
guestFile.writelines(newlines)
You can just use a CSV reader and writer and access the element you want to capitalize from the list.
import csv
import os
inp = open('a.txt', 'r')
out = open('b.txt', 'w')
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
row[0] = row[0].capitalize()
writer.writerow(row)
inp.close()
out.close()
os.rename('b.txt', 'a.txt') # if you want to keep the same name

python tab delimited retrieve column and delete empty lines

I have a tab delimited text file that is consists of two columns, something like:
Apple123 2
Orange933 2
Banana33334 2
There maybe empty lines at the bottom. How can I:
1. Strip the empty lines, and
2. write to a file that consists only the first column?
My problem right now is that if I use line.strip() then the line consists of a list that has the length of 10 (for example for the first line) not 2. If I use csv.reader(..., dialect = excel-tab) then I can't use strip() so I can't get rid of the empty lines.
This should do the trick:
with open(infilename) as infile, open(outfilename) as outfile:
for line in infile:
line = line.strip()
if line:
outfile.write("{}\n".format(line.split("\t")[0]))
You could maybe do this with Python's basic string manipulation (str.split and so on):
infile = open("/path/to/myfile.txt")
outfile = open("/path/to/output.txt", "w") # Clears existing file, open for writing
for line in infile:
if len(line.strip()) == 0:
# skip blank lines
continue
# Get first column, write it to file
col1 = line.split("\t")[0]
outfile.write(col1 + "\n")
outfile.close()

Categories