Data comes out shifted using python - python
What this code is supposed to do is transfer weird looking .csv files written in one line into a multilined csv
import csv
import re
filenmi = "original.csv"
filenmo = "data-out.csv"
infile = open(filenmi,'r')
outfile = open(filenmo,'w+')
for line in infile:
print ('read data :',line)
line2 = re.sub('[^0-9|^,^.]','',line)
line2 = re.sub(',,',',',line2)
print ('clean data: ',line2)
wordlist = line2.split(",")
n=(len(wordlist))/2
print ('num data pairs: ',n)
i=0
print ('data paired :')
while i < n*2 :
pairstr = ','.join( pairlst )
print(' ',i/2+1,' ',pairstr)
pairstr = pairstr + '\n'
outfile.write( pairstr )
i=i+2
infile.close()
outfile.close()
What I want this code to do is change a messed up .txt file
L,39,100,50.5,83,L,50.5,83
into a normally formatted csv file like the example below
39,100
50.5,83
50.5,83
but my data comes out like this
,39
100,50.5
83,50.5
83,
I'm not sure what went wrong or how to fix this. So it would be great if someone could help
::Data Set::
L,39,100,50.5,83,L,50.5,83,57.5,76,L,57.5,76,67,67.5,L,67,67.5,89,54,L,89,54,100.5,49,L,100.5,49,111.5,45.5,L,111.5,45.5,134,42,L,134,42,152.5,44,L,152.5,44,160,46.5,L,160,46.5,168,52,L,168,52,170,56.5,L,170,56.5,162,64.5,L,162,64.5,152.5,70,L,152.5,70,126,85.5,L,126,85.5,113.5,94,L,113.5,94,98,105.5,L,98,105.5,72.5,132,L,72.5,132,64.5,145,L,64.5,145,57.5,165.5,L,57.5,165.5,57,176,L,57,176,63.5,199.5,L,63.5,199.5,69,209,L,69,209,76,216.5,L,76,216.5,83.5,222,L,83.5,222,90.5,224.5,L,90.5,224.5,98,225.5,L,98,225.5,105.5,225,L,105.5,225,115,223,L,115,223,124.5,220,L,124.5,220,133.5,216.5,L,133.5,216.5,142,212,L,142,212,149,207,L,149,207,156.5,201.5,L,156.5,201.5,163.5,195.5,L,163.5,195.5,172.5,185.5,L,172.5,185.5,175,180.5,L,175,180.5,177,173,L,177,173,177.5,154,L,177.5,154,174.5,142.5,L,174.5,142.5,168.5,133.5,L,168.5,133.5,150,131.5,L,150,131.5,135,136.5,L,135,136.5,120.5,144.5,L,120.5,144.5,110.5,154,L,110.5,154,104,161.5,L,104,161.5,99.5,168.5,L,99.5,168.5,98,173,L,98,173,97.5,176,L,97.5,176,99.5,178,L,99.5,178,105,179.5,L,105,179.5,112.5,179,L,112.5,179,132,175.5,L,132,175.5,140.5,175,L,140.5,175,149.5,175,L,149.5,175,157,176.5,L,157,176.5,169.5,181.5,L,169.5,181.5,174,185.5,L,174,185.5,178,206,L,178,206,176.5,214.5,L,176.5,214.5,161,240.5,L,161,240.5,144.5,251,L,144.5,251,134.5,254,L,134.5,254,111.5,254.5,L,111.5,254.5,98,253,L,98,253,71.5,248,L,71.5,248,56,246,
Your code fails because when you tried line2 = re.sub('[^0-9|^,^.]','',line), it outputs to ,39,100,50.5,83,,50.5,83.
In that line you are using re to replace any char that isn't a number, dot or comma with nothing or ''. This will remove the L in your input but the second char which is a comma will stay.
I've just fixed that and made a little modification on how you create a csv list. The below code works.
import csv
import re
filenmi = "original.csv"
filenmo = "data-out.csv"
with open(filenmi, 'r') as infile:
#get a list of words that must be split
for line in infile:
#remove any char which isn't a number, dot, or comma
line2 = re.sub('[^0-9|^,^.]','',line)
#replace ",," with ","
line2 = re.sub(',,',',',line2)
#remove the first char which is a ","
line2 = line2[1:]
#get a list of individual values, sep by ","
wordlist = line2.split(",")
parsed = []
for i,val in enumerate(wordlist):
#for every even index, get the word pair
try:
if i%2 == 0:
parstr = wordlist[i] + "," + wordlist[i+1] + '\n'
parsed.append(parstr)
except:
print("Data set needs cleanup\n")
with open(filenmo, 'w+') as f:
for item in parsed:
f.write(item)
Related
Remove lines from file what called from list
I want to remove lines from a .txt file. i wanna make a list for string what i want to remove but the code will paste the lines as many times as many string in list. How to avoid that? file1 = open("base.txt", encoding="utf-8", errors="ignore") Lines = file1.readlines() file1.close() not_needed = ['asd', '123', 'xyz'] row = 0 result = open("result.txt", "w", encoding="utf-8") for line in Lines: for item in not_needed: if item not in line: row += 1 result.write(str(row) + ": " + line) so if the line contains the string from list, then delete it. After every string print the file without the lines. How to do it?
Look at the logic in your for loop... What it's doing is: take each line in lines, then for all the items in not_needed go through the line and write if condition is verified. But condition verifies each time the item is not found. Try thinking about doing the inverse: check if a line is in non needed. if it is do nothing otherwise write it
Expanded answer: Here's what I think you are looking for: for line in Lines: if item not in not_needed: row += 1 result.write(str(row) + ": " + line)
[Help]Extract from csv file and output txt(python)[Help]
enter image description here I read the file 'average-latitude-longitude-countries.csv' to the Southern Hemisphere. Print the country name of the country in the file 'result.txt' Question: I want you to fix it so that it can be printed according to the image file. infile = open("average-latitude-longitude-countries.csv","r") outfile = open("average-latitude-longitude-countries.txt","w") joined = [] infile.readline() for line in infile: splited = line.split(",") if len(splited) > 4: if float(splited[3]) < 0: joined.append(splited[2]) outfile.write(str(joined) + "\n") else: if float(splited[2]) < 0: joined.append(splited[1]) outfile.write(str(joined) + '\n')
It's hard without posting a head/first few lines of the CSV However, assuming your code works and the countries list is successfully populated. Then, you can replace the line outfile.write(str(joined) + '\n') with: outfile.write("\n".join(joined)) OR with those 2 lines: for country in joined: outfile.write("%s\n" % country) Keep in mind, those approaches just do the job. however, not optimum Extra hints: you can have a look at csv module of standard Python. can make your parsing easier Also, splited = line.split(",") can lead to wrong output , if there a single quoted field that contains "," . like this : field1_value,"field 2,value",field3, field4 , ... Update: Now I got you, First of all you are dumping the whole aggregated array to the file for each line you read. you should keep adding to the array in the loop. then after the whole loop, dump in once (Splitting the accumulated array like above) Here is your code slightly modified: infile = open("average-latitude-longitude-countries.csv","r") outfile = open("average-latitude-longitude-countries.txt","w") joined = [] infile.readline() for line in infile: splited = line.split(",") if len(splited) > 4: if float(splited[3]) < 0: joined.append(splited[2]) #outfile.write(str(joined) + "\n") else: if float(splited[2]) < 0: joined.append(splited[1]) #outfile.write(str(joined) + '\n') outfile.write("\n".join(joined))
Storing lines into array and print line by line
Current code: filepath = "C:/Bg_Log/KLBG04.txt" with open(filepath) as fp: lines = fp.read().splitlines() with open(filepath, "w") as fp: for line in lines: print("KLBG04",line,line[18], file=fp) output: KLBG04 20/01/03 08:09:13 G0001 G Require flexibility to move the columns around and also manipulate the date as shown below with array or list KLBG04 03/01/20 G0001 G 08:09:13
You didn't provide sample data, but I think this may work: filepath = "C:/Bg_Log/KLBG04.txt" with open(filepath) as fp: lines = fp.read().splitlines() with open(filepath, "w") as fp: for line in lines: ln = "KLBG04 " + line + " " + line[18] # current column order sp = ln.split() # split at spaces dt = '/'.join(sp[1].split('/')[::-1]) # reverse date print(sp[0],dt,sp[3],sp[-1],sp[-2]) # new column order # print("KLBG04",line,line[18], file=fp)
Try to split() the line first, then print the list in your desired order from datetime import datetime # use the datetime module to manipulate the date filepath = "C:/Bg_Log/KLBG04.txt" with open(filepath) as fp: lines = fp.read().splitlines() with open(filepath, "w") as fp: for line in lines: date, time, venue = line.split(" ") # split the line up date = datetime.strptime(date, '%y/%m/%d').strftime('%d/%m/%y') # format your date print("KLBG04", date, venue, venue[0], time, file=fp) # print in your desired order
Why don't you store the output as a string itself and use the split() method to split the string at each space and then use another split method for the index 1 (The index that will contain the date) and split it again at each / (So that you can then manipulate the date around). for line in lines: String output ="KLBG04",line,line[18], file=fp # Rather than printing store the output in a string # x = output.split(" ") date_output = x[1].split("/") # Now you can just manipulate the data around and print how you want to #
Try this ` for line in lines: words = line.split() # split every word date_values = words[0].split('/') # split the word that contains date #create a dictionary as follows date_format = ['YY','DD','MM'] date_dict = dict(zip(date_format, date_values)) #now create a new variable with changed format new_date_format = date_dict['MM'] + '/' + date_dict['DD'] + '/' + date_dict['YY'] print(new_date_format) #replace the first word [index 0 is having date] with new date format words[0] = new_date_format #join all the words to form a new line new_line = ' '.join(words) print("KLBG04",new_line,line[18]) `
take a string from a file in which it occupies multiple lines
I have a problem with a python program. In this program I have to take strings from a file and save it to a list. The problem is that in this file some strings occupy more lines. The file named 'ft1.txt' is structured like this: ''' home wo rk '''' sec urity ''' inform atio n ''' Consequently opening the file and doing f.read () I get out: " \n\nhome\nwo\nrk\n\nsec\nurity\n\ninform\nation\nn ". I execute the following code: with open('ft1.txt', 'r') as f: #i open file list_strin = f.read().split('\n\n') #save string in list I want output [homework, security, information]. But the actual output is [home\nwo\nrk, sec\nurity, inform\nation\nn] How can I remove the special character "\n" in individual strings and merge them correctly?
You have \n in string. Remove it :-) list_strin = [x.replace('\n', '') for x in f.read().strip().split('\n\n')] readline solution: res = [] s = '' with open('ft1.txt', 'r') as f: line = f.readline() while line: line = line.strip() if line == '': if s: res.append(s) s = '' else: s += line line = f.readline() print(res)
Changing a text file and making a bigger text file in python
I have a tab separated text file like these example: infile: chr1 + 1071396 1271396 LOC chr12 + 1101483 1121483 MIR200B I want to divide the difference between columns 3 and 4 in infile into 100 and make 100 rows per row in infile and make a new file named newfile and make the final tab separated file with 6 columns. The first 5 columns would be like infile, the 6th column would be (5th column)_part number (number is 1 to 100). This is the expected output file: expected output: chr1 + 1071396 1073396 LOC LOC_part1 chr1 + 1073396 1075396 LOC LOC_part2 . . . chr1 + 1269396 1271396 LOC LOC_part100 chr12 + 1101483 1101683 MIR200B MIR200B_part1 chr12 + 1101683 1101883 MIR200B MIR200B_part2 . . . chr12 + 1121283 1121483 MIR200B MIR200B_part100 I wrote the following code to get the expected output but it does not return what I expect. file = open('infile.txt', 'rb') cont = [] for line in file: cont.append(line) newfile = [] for i in cont: percent = (i[3]-i[2])/100 for j in percent: newfile.append(i[0], i[1], i[2], i[2]+percent, i[4], i[4]_'part'percent[j]) with open('output.txt', 'w') as f: for i in newfile: for j in i: f.write(i + '\n') Do you know how to fix the problem?
Try this: file = open('infile.txt', 'rb') cont = [] for line in file: cont.append(list(filter(lambda x: not x.isspace(), line.split(' '))) newfile = [] for i in cont: diff= (int(i[3])-int(i[2]))/100 left = i[2] right = i[2] + diff for j in range(100): newfile.append(i[0], i[1], left, right, i[4], i[4]_'part' + j) left = right right = right + diff with open('output.txt', 'w') as f: for i in newfile: for j in i: f.write(i + '\n') In your code for i in cont youre loop over the string and i is a char and not string. To fix that i split the line and remove spaces.
Here are some suggestions: when you open the file, open it as a text file, not a binary file. open('infile.txt','r') now, when you read it line by line, you should strip the newline character at the end by using strip(). Then, you need to split your input text line by tabs into a list of strings, vs a just a long string containing your line, by using split('\t'): line.strip().split('\t') now you have: file = open('infile.txt', 'r') cont = [] for line in file: cont.append(line.strip().split('\t)) now cont is a list of lists, where each list contains your tab separated data. i.e. cont[1][0] = 'chr12'. You will probably able to take it from here.
Others have answered your question with respect to your own code, I thought I would leave my attempt at solving your problem here. import os directory = "C:/Users/DELL/Desktop/" filename = "infile.txt" path = os.path.join(directory, filename) with open(path, "r") as f_in, open(directory+"outfile.txt", "w") as f_out: #open input and output files for line in f_in: contents = line.rstrip().split("\t") #split line into words stored as a string 'contents' diff = (int(contents[3]) - int(contents[2]))/100 for i in range(100): temp = (f"{contents[0]}\t+\t{int(int(contents[2])+ diff*i)}\t{contents[3]}\t{contents[4]}\t{contents[4]}_part{i+1}") f_out.write(temp+"\n") This code doesn't follow python style convention well (excessively long lines, for example) but it works. The line temp = ... uses fstrings to format the output string conveniently, which you could read more about here.