writing the data in text file while converting it to csv - python

I am very new with python. I have a .txt file and want to convert it to a .csv file with the format I was told but could not manage to accomplish. a hand can be useful for it. I am going to explain it with screenshots.
I have a txt file with the name of bip.txt. and the data inside of it is like this
I want to convert it to csv like this csv file
So far, what I could do is only writing all the data from text file with this code:
read_files = glob.glob("C:/Users/Emrehana1/Desktop/bip.txt")
with open("C:/Users/Emrehana1/Desktop/Test_Result_Report.csv", "w") as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
So is there a solution to convert it to a csv file in the format I desire? I hope I have explained it clearly.

There's no need to use the glob module if you only have one file and you already know its name. You can just open it. It would have been helpful to quote your data as text, since as an image someone wanting to help you can't just copy and paste your input data.
For each entry in the input file you will have to read multiple lines to collect together the information you need to create an entry in the output file.
One way is to loop over the lines of input until you find one that begins with "test:", then get the next line in the file using next() to create the entry:
The following code will produce the split you need - creating the csv file can be done with the standard library module, and is left as an exercise. I used a different file name, as you can see.
with open("/tmp/blip.txt") as f:
for line in f:
if line.startswith("test:"):
test_name = line.strip().split(None, 1)[1]
result = next(f)
if not result.startswith("outcome:"):
raise ValueError("Test name not followed by outcome for test "+test_name)
outcome = result.strip().split(None, 1)[1]
print test_name, outcome

You do not use the glob function to open a file, it searches for file names matching a pattern. you could open up the file bip.txt then read each line and put the value into an array then when all of the values have been found join them with a new line and a comma and write to a csv file, like this:
# set the csv column headers
values = [["test", "outcome"]]
current_row = []
with open("bip.txt", "r") as f:
for line in f:
# when a blank line is found, append the row
if line == "\n" and current_row != []:
values.append(current_row)
current_row = []
if ":" in line:
# get the value after the semicolon
value = line[line.index(":")+1:].strip()
current_row.append(value)
# append the final row to the list
values.append(current_row)
# join the columns with a comma and the rows with a new line
csv_result = ""
for row in values:
csv_result += ",".join(row) + "\n"
# output the csv data to a file
with open("Test_Result_Report.csv", "w") as f:
f.write(csv_result)

Related

How to extract the data from text file if the txt file do not have column?

I want to save the wimp mass only from the text file below into another txt file for plotting purposes. I have a lot of other .txt files to read WIMP_Mass from. I try to use np.loadtxt but cannot since there is string there. Can you guys suggest the code to extract Wimp_Mass, and the output value to be appended to .txt file without deleting the old values.
Omegah^2: 2.1971043635736895E-003
x_f: 25.000000000000000
Wimp_Mass: 100.87269924860568 GeV
sigmav(xf): 5.5536288606920853E-008
sigmaN_SI_p: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SI_n: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SD_p: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SD_n: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
Nevents: 0
smearing: 0
%:a1a1_emep 24.174602466963883
%:a1a1_ddx 0.70401899013937730
%:a1a1_uux 10.607701601533348
%:a1a1_ssx 0.70401807105204617
%:a1a1_ccx 10.606374255125269
%:a1a1_bbx 0.70432127586224602
%:a1a1_ttx 0.0000000000000000
%:a1a1_mummup 24.174596692050287
%:a1a1_tamtap 24.172981870222447
%:a1a1_vevex 1.3837949256836950
%:a1a1_vmvmx 1.3837949256836950
%:a1a1_vtvtx 1.3837949256836950
You can use regex for this, please find below code:
import re
def get_wimp_mass():
# read content of file
with open("txt.file", "r") as f:
# read all lines from file into a list
data_list = f.readlines()
# convert data from list to string
data = "\n".join(data_list)
# use regex to fetch the data
wimp_search = re.search(r'Wimp_Mass:\s+([0-9\.]+)', data, re.M | re.I)
if wimp_search:
return wimp_search.group(1)
else:
return "Wimp mass not found"
if __name__ == '__main__':
wimp_mass = get_wimp_mass()
print(wimp_mass)
You can use basic regex if you want to extract value the code would go something like this:
import re
ans=re.findall("Wimp_Mass:[\ ]+([\d\.]+)",txt)
ans
>>['100.87269924860568']
If you wanted a more general code to extract everything, it could be
re.findall("([a-zA-Z0-9\^:\%]+)[\ ]+([\d\.]+)",txt)
you might have to add in some edge cases, though
Here is a simple solution:
with open('new_file.txt', 'w') as f_out: # not needed if file already exists
pass
filepaths = ['file1.txt', 'file2.txt'] # take all files
with open('new_file.txt', 'a') as f_out:
for file in filepaths:
with open(file, 'r') as f:
for line in f.readlines():
if line.startswith('Wimp_Mass'):
_, mass, _ = line.split()
f_out.write(mass) # writing current mass
f_out.write('\n') # writing newline
It first creates a new text file (remove if file exists). Then you need to enter all file paths (or just names if in same directory). The new_file.txt is opened in append mode and then for each file, the mass is found and added to the new_file.

Using same code for multiple text files and generate multiple text files as output using python

I have more than 30 text files. I need to do some processing on each text file and save them again in text files with different names.
Example-1: precise_case_words.txt ---- processing ---- precise_case_sentences.txt
Example-2: random_case_words.txt ---- processing ---- random_case_sentences.txt
Like this i need to do for all text files.
present code:
new_list = []
with open('precise_case_words.txt') as inputfile:
for line in inputfile:
new_list.append(line)
final = open('precise_case_sentences.txt', 'w+')
for item in new_list:
final.write("%s\n" % item)
Am manually copy+paste this code all the times and manually changing the names everytime. Please suggest me a solution to avoid manual job using python.
Suppose you have all your *_case_words.txt in the present dir
import glob
in_file = glob.glob('*_case_words.txt')
prefix = [i.split('_')[0] for i in in_file]
for i, ifile in enumerate(in_file):
data = []
with open(ifile, 'r') as f:
for line in f:
data.append(line)
with open(prefix[i] + '_case_sentence.txt' , 'w') as f:
f.write(data)
This should give you an idea about how to handle it:
def rename(name,suffix):
"""renames a file with one . in it by splitting and inserting suffix before the ."""
a,b = name.split('.')
return ''.join([a,suffix,'.',b]) # recombine parts including suffix in it
def processFn(name):
"""Open file 'name', process it, save it under other name"""
# scramble data by sorting and writing anew to renamed file
with open(name,"r") as r, open(rename(name,"_mang"),"w") as w:
for line in r:
scrambled = ''.join(sorted(line.strip("\n")))+"\n"
w.write(scrambled)
# list of filenames, see link below for how to get them with os.listdir()
names = ['fn1.txt','fn2.txt','fn3.txt']
# create demo data
for name in names:
with open(name,"w") as w:
for i in range(12):
w.write("someword"+str(i)+"\n")
# process files
for name in names:
processFn(name)
For file listings: see How do I list all files of a directory?
I choose to read/write line by line, you can read in one file fully, process it and output it again on block to your liking.
fn1.txt:
someword0
someword1
someword2
someword3
someword4
someword5
someword6
someword7
someword8
someword9
someword10
someword11
into fn1_mang.txt:
0demoorsw
1demoorsw
2demoorsw
3demoorsw
4demoorsw
5demoorsw
6demoorsw
7demoorsw
8demoorsw
9demoorsw
01demoorsw
11demoorsw
I happened just today to be writing some code that does this.

Searching rows of a file in another file and printing appropriate rows in python

I have a csv file like this: (no headers)
aaa,1,2,3,4,5
bbb,2,3,4,5,6
ccc,3,5,7,8,5
ddd,4,6,5,8,9
I want to search another csv file: (no headers)
bbb,1,2,3,4,5,,6,4,7
kkk,2,3,4,5,6,5,4,5,6
ccc,3,4,5,6,8,9,6,9,6
aaa,1,2,3,4,6,6,4,6,4
sss,1,2,3,4,5,3,5,3,5
and print rows in the second file(based on matching of the first columns) that exist in the first file. So results will be:
bbb,1,2,3,4,5,,6,4,7
ccc,3,4,5,6,8,9,6,9,6
aaa,1,2,3,4,6,6,4,6,4
I have following code, but it does not print anything:
labels = []
with open("csv1.csv", "r") as f:
f.readline()
for line in f:
labels.append((line.strip("\n")))
with open("csv2.csv", "r") as f:
f.readline()
for line in f:
if (line.split(",")[1]) in labels:
print (line)
If possible, could you tell me how to do this, please ? What is wrong with my code ? Thanks in advance !
This is one solution, although you may also look into csv-specific tools and pandas as suggested:
labels = []
with open("csv1.csv", "r") as f:
lines = f.readlines()
for line in lines:
labels.append(line.split(',')[0])
with open("csv2.csv", "r") as f:
lines = f.readlines()
with open("csv_out.csv", "w") as out:
for line in lines:
temp = line.split(',')
if any(temp[0].startswith(x) for x in labels):
out.write((',').join(temp))
The program first collects only labels from csv1.csv - note that you used readline, where the program seems to expected all the lines from the file read at once. One way to do it is by using readlines. The program also has to collect the lines from readlines - here it stores them in a list named lines. To collect the labels, the program loops through each line, splits it by a , and appends the first element to the array with labels, labels.
In the second part, the program reads all the lines from csv2.csv while also opening the file for writing the output, csv.out. It processes the lines from csv2.csv line by line while at the same time writing the target files to the output file.
To do that, the program again splits each line by , and looks if the label from csv2 is found in the labels array. If it is, that line is written to csv_out.csv.
Try using pandas, its a very effective way to read csv files into a data structure called dataframes.
EDIT
labels = []
with open("csv1.csv", "r") as f:
f.readline()
for line in f:
labels.append((line.split(',')[0])
with open("csv2.csv", "r") as f:
f.readline()
for line in f:
if (line.split(",")[0]) in labels:
print (line)
I it so that labels only contains the first part of the string so ['aaa','bbb', etc]
Then you want to check if line.split(",")[0] is in labels
Since you want to only match it based on the first column, you should use split and then get the first item from the split which is at index 0.

python writelines from a list made from .split()

I have a very long string with vertical and horizontal delimiters in this format:
[|Bob Hunter|555-5555|B|Polycity|AK|55555||#|Rob Punter|999-5555|B|Bolycity|AZ|55559|rpunter#email.com|#|....and so on...]
I would like to generate a list from this long string using split('#') and then write each element as a line to a new text file like so:
|Bob Hunter|555-5555|B|Polycity|AK|55555||
|Rob Punter|999-5555|B|Bolycity|AZ|55559|rpunter#email.com|
I will then import it into excel and delimit by the pipes.
f1 = open(r'C:\Documents\MyData.html','r')
f2 = open(r'C:\Documents\MyData_formatted.txt','w')
lines = f1.read().split("#")
for i in lines:
f2.writelines(i)
f2.close()
f1.close()
However, the txt file remains one line and only a partial amount of the data is written to the file (only about 25% is there). How can I get python to split the data by the # symbol and write each element of the resulting list to a file as a new line?
This is your corrected code, I changed line variable to records, because we're not dealing with lines and just to avoid confusion:
records = f1.read()
records = records[1:] # remove [
records = records[:-1] # remove ]
records = records.split("#")
for rec in records:
f2.write(rec + "\n")
And since you mentioned you need this data in excel, use csv files and from excel open your csv output file and excel will format your output as needed without you having to do that manually:
import csv
w = csv.writer(f2, dialect="excel")
records = [line.replace("|", ",") +"\n" for line in records]
for rec in records:
w.writerow([rec])
I think that before every # we should also delete |, because without that, after every splitted rocord we will get || as first characters in every line. That's why we should split |#, not only #.
Try this:
with open('input.txt','r') as f1:
text = f1.read().lstrip('[').rstrip(']').split("|#") #remove '[' and ']' from each side
with open('output.txt','w') as f2:
for line in text:
f2.write('%s\n' % line) #write to file as string with new line sign

How do I convert a table in notepad into CSV format?

I have this table of data in Notepad
But it's not really a table because there aren't like official columns. It's just looks like a table, but the data is organized using spaces.
I want to convert it into a CSV format. How should I go about doing this?
The panda python packages I am using for data analysis work best with CSV, as far as I understand.
Here is a hackjob python script to do exactly what you need. Just save the script as a python file and run it with the path of your input file as the only argument.
UPDATED: After reading the comments to my answer, my script now uses regular expressions to account for any number of spaces.
import re
from sys import argv
output = ''
with open(argv[1]) as f:
for i, line in enumerate(f.readlines()):
if i == 0:
line = line.strip()
line = re.sub('\s+', ',', line) + '\n'
else:
line = re.sub('\s\s+', ',', line)
output += line
with open(argv[1] + '.csv', 'w') as f:
f.write(output)
So this is put into a file (if you call it csvify.py) and executed as:
python csvify.py <input_file_name>
csvify.py:
from sys import argv
from re import finditer
#Method that returns fields separated by commas
def comma_delimit(line, ranges):
return ','.join(get_field(line, ranges))
#Method that returns field info in appropriate format
def get_field(line, ranges):
for span in ranges: #Iterate through column ranges
field = line[slice(*span)].strip() #Get field data based on range slice and trim
#Use str() function if field doesn't contain commas, otherwise use repr()
yield (repr if ',' in field else str)(field)
#Open the input text file from command line (readonly, closed automatically)
with open(argv[1], 'r') as inp:
#Convert the first line (assumed header) into range indexes
#Use finditer to split the line by word border until the next word
#This assumes no spaces within header names
columns = map(lambda match: match.span(), finditer(r'\b\w+\s*', inp.readline()))
inp.seek(0) #Reset file pointer to beginning to include header line
#Create new CSV based on input file name
with open(argv[1] + '.csv', 'w') as txt:
#Writes to file and join all converted lines with newline
txt.write('\n'.join(comma_delimit(line, columns) for line in inp.readlines()))

Categories