I have an output that contains a lot of lines and I have a problem that it reads only the first and last lines.
I also tried to save them to a text file and it was the same result
Can anyone advise me how to fix this?
This is my output:
-12 -23 0
-13 -24 0
-14 -25 0
...
-119 -130 0
-120 -131 0
-121 -132 0
This code gives the data:
with open('emotion_file.txt', mode='w',newline='\n') as emotion_file:
emotion_file.write(osem)
emotion_file.close()
the code retrieves numbers from demofile.txt. It transforms data into fields and works with them
full code:
import os
import sys
import numpy as np
import re
import csv
#read data
f = open("demofile.txt", "r")
lines = f.readlines()
p=1
#sys.stdout = open("results.txt", "w")
#preprocessing
for i in list(lines):
if i[0] != '<' and i[0] != '>' and i[0] != '=':
p = str(' '.join(i.split()))
print(p)
else:
#w = i[2:]
w = i.replace("=",'')
w = w.replace(">",'')
w = w.replace("<",'')
#print(w)
w = ', '.join(w.split())
y = i[2]
y=int(y)+1
c=np.array([w])
c1 = [int(i) for i in c[0].replace(" ", "").split(",")]
#transform to array
c1=np.array(c1)
frst=c1[0]
c1=np.delete(c1, 0)
n=len(c1)
sest2=c1
c1=np.array([c1]*frst)
c1=np.transpose(c1)
left1 = np.array([[(p + j) * 11 for j in range(frst)]] * n) + c1
left2 = np.array([[(p + j) * -11 for j in range(frst)]] * n) + c1
left=np.where(c1 > 0,left1 , left2)
#print(left)
left=left*-1
b=np.all(left < 0)
#6.vzorec
sest1=left
sest = np.zeros((sest1.flatten().shape[0],2))
sest[:,[0]] = sest1.T.flatten()[:,None]
sest[:,[1]] = np.tile(sest2,frst)[:,None]
sest=str(sest).replace("[",'')
sest=str(sest).replace("]",' 0')
sest = sest[:-1]
sest=str(sest).replace(".",'')
sest=str(sest).replace("\n ",'\n')
sest=str(sest).replace(" ",' ')
if b==False:
sest=str(sest).replace(" ",' ')
else:
sest=str(sest).replace(" ",' ')
sest=str(sest).replace("\n ",'\n')
sest=str(sest).lstrip()
#7.vzorec
sedem=np.transpose(left)*-1
sedem=str(sedem).replace("[",'')
sedem=str(sedem).replace("]",' 0')
sedem=str(sedem).replace(".",'')
sedem = sedem[:-1]
sedem=str(sedem).replace("\n ",'\n')
sedem=str(sedem).replace(" ",' ')
sedem=str(sedem).replace("\n ",'\n')
sedem=str(sedem).lstrip()
#8.vzorec
if frst==1:
osem='---nic-----'
else:
osem=left
osem = np.vstack([ np.c_[left[:,x], left[:,y]]
for x, y in np.c_[np.triu_indices(n=left.shape[1], k=1)] ])
osem=str(osem).replace("[",'')
osem=str(osem).replace("]",' 0')
osem = osem[:-1]
osem=str(osem).replace("\n ",'\n')
osem=str(osem).replace(" ",' ')
osem=str(osem).replace("\n ",'\n')
osem=str(osem).lstrip()
sedem=str(sedem).replace(" ",'')
#for iin osem range
p +=frst
#print(sest)
#print(sedem,)
print(osem,'\n')
with open('emotion_file.txt', mode='w',newline='\n') as emotion_file:
emotion_file.write(osem)
emotion_file.close()
variables sest, sedem and osem transform the fields into different forms
demofile.txt
=>11 1 2 3 4 5 6 7 8 9 10 11
Depending on what kind of data structure osem is, as that could affect the implementation of your for loop, I found that this worked for me:
osem = [3,4,5]
emotion_file = open('emotion_file.txt', 'a')
for i in osem:
emotion_file.write(f'{i}\n')
emotion_file.close()
if osem is a dictionary then the following works as well:
osem = {'key_1': 1, 'key_2': 2}
emotion_file = open('emotion_file.txt', 'a')
for key,value in osem.items():
emotion_file.write(f'{key}\t{value}\n')
emotion_file.close()
If you wish to delete the current data in the emotion_file.txt file, replace 'a' with a 'w' in the open(...) command.
Hopefully this helps!
Related
For changing the values from 10 to 18, 19 or 20, I am splitting the string, access the substrings and then trying to change it. Its working but just not changing the values. Here is the solution I am trying to implement:
oldFileName = 'tryout.hmo'
newFileName = 'tryout_NEW.hmo'
topoFileName = 'Density.topo'
readme = open( oldFileName, "r" )
oldLines = readme.readlines()
readme = open(topoFileName, "r")
Lines = readme.readlines()
readme.close()
newFile = open(newFileName,"w")
for row in oldLines:
for line in Lines:
tmp = line.split()
list = row.rstrip()
tmp1 = list.split()
newFile.write(row)
if row.find("BEG_ELEM_DATA") > -1:
if tmp[0] == tmp1[0]:
if tmp[2] == 1 and tmp[3] == 0:
# it is magnet, value 18
newFile.write(tmp1.replace(tmp1[1], "18"))
elif tmp[2] == 1 and tmp[3] == 1:
# it is iron, value 19
newFile.write(tmp1.replace(tmp1[1], "19"))
else:
# it is air, value 20
newFile.write(tmp1.replace(tmp1[1], "20"))
newFile.close()
I would really appreciate it if you could able to solve this problem in above script, then I guess it should work.
I'm also still a beginner in Python, but I tried to solve your problem and here is my solution:
I guess there are way better ways to do it because here you have to import all data to a dataframe before comparing it.
Also I don't know if you can read your data with pd.read_csv to a dataframe because I don't know *.hmo and *.topo
import pandas as pd
df = pd.read_csv('tryout.csv', delimiter=';')
df2 = pd.read_csv('density.csv', delimiter=';')
for idx, row in df.iterrows():
for idx2, row2 in df2.iterrows():
if row[0] == row2[0]:
if row2[2] == 1 and row2[3] == 0 :
# it is magnet, value 18
row[1] = 18
elif row2[2] == 1 and row2[3] == 1 :
# it is iron, value 19
row[1] = 19
else:
# it is air, value 20
row[1] = 20
df.to_csv('new_tryout.csv')
What my code is doing here, it loads both files to dataframes. Then iterate over every line to compare where the ID in both files is the same (e.g 3749).
If true there are the 3 if statements whether it is magnet/iron/air and change the value in df to the right number.
At the end save the new df to a new file 'new_tryout.csv'
I created 2 testfiles for it and it worked the way it should.
Finally, here is the solution you were searching for.
import pandas as pd
df2 = pd.read_csv('Density.topo', header = 0, names = list('ABCD'), delimiter=r'\s+', skiprows=1)
df2[['C', 'D']]= df2[['C', 'D']].round()
new_file_content=''
with open('tryout.hmo', 'r') as f:
for line in f:
if line[11:13] == '10':
if line[3].isspace():
ID_to_search_for = line[4:8] # number with 4 digits
else:
ID_to_search_for = line[3:8] # number with 5 digits
search_idx = df2[df2['A'] == ID_to_search_for].index[0]
if df2['C'][search_idx] == 1 and df2['D'][search_idx] == 0:
change = '18' #magnet
new_line = line[:11] + change + line[13:]
elif df2['C'][search_idx] == 1 and df2['D'][search_idx] == 1:
change = '19' #iron
new_line = line[:11] + change + line[13:]
else:
change = '20' #air
new_line = line[:11] + change + line[13:]
new_file_content += new_line
else:
new_file_content += line
with open('tryout_changed.hmo', 'w') as f:
f.write(new_file_content)
if you don't want to use dataframes, you can do it like this:
with open('density.topo') as f:
lists_of_list = [line.rstrip().split() for line in f]
new_file_content=''
with open('tryout_test.hmo', 'r') as f:
for line in f:
if line[11:13] == '10':
if line[3].isspace():
ID_to_search_for = line[4:8] # number with 4 digits
else:
ID_to_search_for = line[3:8] # number with 5 digits
for idx, sublist in enumerate(lists_of_list):
if sublist[0] == ID_to_search_for:
if lists_of_list[idx][2] == 1 and lists_of_list[idx][3] == 0:
change = '18' #magnet
new_line = line[:11] + change + line[13:]
elif lists_of_list[idx][2] == 1 and lists_of_list[idx][3] == 1:
change = '19' #iron
new_line = line[:11] + change + line[13:]
else:
change = '20' #air
new_line = line[:11] + change + line[13:]
new_file_content += new_line
else:
new_file_content += line
with open('tryout_changed.hmo', 'w') as f:
f.write(new_file_content)
ok, here is my final answer. It does (again) all things you were searching for. Please debug your code in your IDE if there is a problem. You should start using context manager instead of open and closing files step by step.
I wrote the new code around your code in the question and added some comments to it.
oldFileName = 'tryout.hmo'
newFileName = 'tryout_NEW.hmo'
topoFileName = 'Density.topo'
readme = open( oldFileName, "r" )
oldLines = readme.readlines()
m = int(oldLines[3])
print(m)
new_m = m+3
m1 = str(m)
new_m1 = str(new_m)
Phrase = "END_COMP_DATA"
#n = "Phrase not found" #not used --> not needed
with open(oldFileName,"r") as oldFile:
for number, lin in enumerate(oldFile):
if Phrase in lin:
n = number
#insert 3 lines to tryout_new at the right position (--> row n)
magnet = f" {m+1} "'" topo_magnet"'"\n"
iron = f" {m+2} "'" topo_iron"'"\n"
air = f" {m+3} "'" topo_air"'"\n"
oldLines[n:n] = [magnet, iron, air]
newFile = open(newFileName,"w")
flag = 0
with open('density.topo') as f:
data_density = [line.rstrip().split() for line in f]
for idx, row in enumerate(oldLines):
lst = row.rstrip() #I think you shouldn't name a variable like a class in python (list). use 'lst' or something like that
tmp_tryout = lst.split()
if row.find("BEG_ELEM_DATA") > -1:
flag = 1
if flag == 1 and len(tmp_tryout)>1:
# if the column has more than 2 columns (after split), check for the "10"
if tmp_tryout[1] == '10':
# density_idx_line searchs in density.topo for a match with tmp_tryout[0] (e.g. 3749) and stores the whole line
density_idx_line = list(filter(lambda x: x[0] == tmp_tryout[0], data_density))
if len(density_idx_line) >0:
if density_idx_line[0][2] == '1.0' and density_idx_line[0][3] == '1e-05':
# the ' 10 ' is the 10 with a whitespace before and after it. Only like this only the 10 gets replaced (and not e.g. 3104 to 3184)
newFile.write(row.replace(' 10 ', ' 18 '))
elif density_idx_line[0][2] == '1.0' and density_idx_line[0][3] == '1.0':
newFile.write(row.replace(' 10 ', ' 19 '))
else:
newFile.write(row.replace(' 10 ', ' 20 '))
else:
newFile.write(row)
else:
if idx == 3:
newFile.write(row.replace(m1, new_m1))
else:
newFile.write(row)
newFile.close()
print ("script terminated successfully!")
ok, here is another solution. For anybody else who reads this: this is still only a temporary solution but #Sagar and me both don't know to do it better.
import pandas as pd
df = pd.read_csv('tryout.hmo', header = 0, names = list('ABCDEFGHIJKLM'), delimiter=r'\s+', skiprows=[i for i in range(52362)])
df2 = pd.read_csv('Density.topo', header = 0, names = list('ANOP'), delimiter=r'\s+', skiprows=1)
df2 = df2.iloc[:-3, :]
df3 = df.merge(df2, how='outer', on='A')
df3[['O','P']] = df3[['O','P']].fillna(-1).astype(int).replace(-1, np.nan)
df3['B']= df3.apply(lambda x: 18 if x['B']==10 and x['O']==1 and x['P']==0 else (
19 if x['B']==10 and x['O']==1 and x['P']==1 else (
20 if x['B']==10 and x['O']==0 and x['P']==0 else x['B'])), axis=1)
df3.to_csv('new_tryout.csv')
It finished the code in less than a second, so it is far better than iterrows or itertuples.
The new csv file includes both the tryout file and the density file. They are merged together by the first column of tryout file (ID i guess)
I didn't check all of this very big file but from the few random points I checked, it seems as this way works.
I am trying to capture the data here in the second table (Field crops) titled "Prices Received, United States,July 2010, with Comparisons". I am using Panda dataframes to capture the table from the text file and then I will output it to a CSV file.
My code is as follows
def find_no_line_start_table(table_title,splited_data):
found_no_lines = []
for index, line in enumerate(splited_data):
if table_title in line:
found_no_lines.append(index)
return found_no_lines
def get_start_data_table(table_start, splited_data):
for index, row in enumerate(splited_data[table_start:]):
if 'Dollars' in row:
return table_start + index
def get_end_table(start_table_data, splited_data ):
for index, row in enumerate(splited_data[start_table_data:]):
if END_TABLE_LINE in row:
return start_table_data + index
def row(l):
l = l.split()
number_columns = 6
if len(l) >= number_columns:
data_row = [''] * number_columns
first_column_done = False
index = 0
for w in l:
if not first_column_done:
data_row[0] = ' '.join([data_row[0], w])
if ':' in w:
first_column_done = True
else:
index += 1
data_row[index] = w
return data_row
def take_table(txt_data):
comodity = []
q = []
w = []
e = []
t = []
p = []
for r in table:
data_row = row(r)
if data_row:
col_1, col_2, col_3, col_4, col_5, col_6 = data_row
comodity.append(col_1)
q.append(col_2)
w.append(col_3)
e.append(col_4)
t.append(col_5)
p.append(col_6)
table_data = {'comodity': comodity, 'q': q,
'w': w, 'e': e, 't': t}
return table_data
And, then I am doing this:
import requests
import pandas as pd
txt_data = requests.get("https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/6w924d00c/9z903130m/AgriPric-07-30-2010.txt").text
splited_data = txt_data.split('\n')
table_title = 'Prices Received, United States'
END_TABLE_LINE = '-------------------------------------------'
_, table_start,_ = find_no_line_start_table(table_title,splited_data)
start_line = get_start_data_table(table_start, splited_data)
end_line = get_end_table(start_line, splited_data)
table = splited_data[start_line : end_line]
dict_table = take_table(txt_data)
pd.DataFrame(dict_table)
c = pd.DataFrame(dict_table)
IndexError: list assignment index out of range
However, I am getting an error here. Can anyone help me figure out what I am doing wrong?
Cause of error:
data_row is a list of 6 elements.
number_columns = 6
# ...
data_row = [''] * number_columns # [''] * 6
and index will increment with each iteration where first_column_done = True. But first_column_done will be True when : is encountered in a word, i.e
if ':' in w:
first_column_done = True
hence, for each iteration after first_column_done turns True, index will increment until it gets more than 6 which is the bound of list data_row.
def row(l):
l = l.split()
number_columns = 6
if len(l) >= number_columns:
data_row = [''] * number_columns
first_column_done = False
index = 0
for w in l:
if not first_column_done:
data_row[0] = ' '.join([data_row[0], w])
if ':' in w:
first_column_done = True
else:
index += 1
data_row[index] = w # error pos.
In other words, U get this error for each line that contains a number of words greater than 6 - index after the first occurence of : within a word in that line.
Fix:
Use split(':') and list comprehension as well as python tertiary operator.
def row(l):
row = [ col.strip() for col in l.split(':') ]
row[2:] = row[2].split()
return [ row[i] if i < len(row) else '' for i in range(6) ]
We have file with some math problems like: 46 + 19 (only + or - and it built up this way: number, space, sign, space, number) and we need to transform it into a new file and solve them (46 + 19 = 65). We don't know how many exercises there will be or the number of digits in every number. Here is my code:
enter code here
input_file = open(r'C:\try\bla.txt', 'r')
nums = input_file.read()
y = 0
dig1 = ''
dig2 = ''
sign = ''
x1 = nums.find(' ')
x2 = x1 + 1
def one(dig1, dig2, y):
for i in xrange(x1):
dig1 += nums[y]
y += 1
for m in xrange(abs(-x2)):
dig2 += nums[y + 1]
y += 1
sign = nums[x2]
if sign == '+':
sum = int(dig1) + int(dig2)
if sign == '-':
sum = int(dig1) - int(dig2)
print dig1, dig2, '=', sum
for a in xrange(0):
one(dig1, dig2, y)
one(dig1, dig2, y)
print 'f', nums[21]
#print dig1, dig2, '=', sum
Maybe you are want to get this(python3):
test.txt:
10 + 15
22 - 71
33 + 64
code:
import operator
op = {'+': operator.add, '-': operator.sub}
with open('test.txt', 'r') as f:
lines = f.readlines()
for i in lines:
args = i.split()
val = op[args[1]](int(args[0]), int(args[-1]))
r = f'{i.strip()} = {val}'
print(r)
I wrote a for loop that iterates through a CSV to get a list like this:
[t1, s1]
[t2, s2]
[t3, s3]
and so 4 thousand times.
Now I need to write these into a new CSV file, where they'd populate 2 fields and be separated by a comma.
When I enter this, I only get the last list from the last loop, and with one character in a cell.
def sentiment_analysis():
fo = open("positive_words.txt", "r")
positive_words = fo.readlines()
fo.close()
positive_words = map(lambda positive_words: positive_words.strip(), positive_words)
fo = open("negative_words.txt", "r")
negative_words = fo.readlines()
fo.close()
negative_words = map(lambda negative_words: negative_words.strip(), negative_words)
fo = open("BAC.csv", "r")
data = fo.readlines()
fo.close()
data = map(lambda data: data.strip(), data)
x1 = 0 #number of bullish
x2 = 0 #number of bearish
x3 = 0 #number of unknown
for info in data:
data_specs = info.split(',')
time_n_date = data_specs[0]
sentiment = data_specs[2]
'''Possibly precede with a nested for loop for data_specs???'''
if sentiment == 'Bullish':
'''fo.write(time + ',' + 'Bullish' + '\n')'''
elif sentiment == 'Bearish':
''' fo.write(time + ',' + 'Bearish' + '\n')'''
else:
x3 += 1
positive = 0
negative = 0
content_words = data_specs[1].split()
for a in positive_words:
for b in content_words:
if (a == b):
positive = positive + 1
for c in negative_words:
for d in content_words:
if (c == d):
negative = negative + 1
if positive > negative:
'''fo.write(time + ',' + 'Bullish' + '\n')'''
sentiment = 'Bullish'
elif positive < negative:
sentiment = 'Bearish'
else:
sentiment = 'Neutral'
bac2data = [time_n_date, sentiment]
print bac2data
fo = open("C:\Users\Siddhartha\Documents\INFS 772\Project\Answer\BAC2_answer.csv", "w")
for x in bac2data:
w = csv.writer(fo, delimiter = ',')
w.writerows(x)
fo.close()
My for loop isn't going through it all.
In your code bac2data = [time_n_date, sentiment] creates a list containing 2 string items. The proper way to write that to a CSV file with csv.writer() is with writerow(bac2data).
The last part of your code contains a number of errors. Firstly you are opening the CSV file in write mode ('w') for every line of the incoming data. This will overwrite the file each time, losing all data except the last line. Then you are iterating over the bac2data list and calling writerows() on each item. That's going to write each character from the string on it's own line (which matches your reported output).
Instead, open the output file and create a csv.writer outside of the main for info in data: loop:
fo = open("C:\Users\Siddhartha\Documents\INFS 772\Project\Answer\BAC2_answer.csv", "w")
writer = csv.writer(fo)
for info in data:
....
Then replace these lines at the bottom of the main loop:
bac2data = [time_n_date, sentiment]
print bac2data
fo = open("C:\Users\Siddhartha\Documents\INFS 772\Project\Answer\BAC2_answer.csv", "w")
for x in bac2data:
w = csv.writer(fo, delimiter = ',')
w.writerows(x)
fo.close()
with this:
bac2data = [time_n_date, sentiment]
print bac2data
writer.writerow(bac2data)
Once you have that working, and no longer need to print bac2data for debugging, you can just use 1 line:
writer.writerow((time_n_date, sentiment)]
Update
Complete code for function:
def sentiment_analysis():
fo = open("positive_words.txt", "r")
positive_words = fo.readlines()
fo.close()
positive_words = map(lambda positive_words: positive_words.strip(), positive_words)
fo = open("negative_words.txt", "r")
negative_words = fo.readlines()
fo.close()
negative_words = map(lambda negative_words: negative_words.strip(), negative_words)
fo = open("BAC.csv", "r")
data = fo.readlines()
fo.close()
data = map(lambda data: data.strip(), data)
x1 = 0 #number of bullish
x2 = 0 #number of bearish
x3 = 0 #number of unknown
fo = open("C:\Users\Siddhartha\Documents\INFS 772\Project\Answer\BAC2_answer.csv", "w")
writer = csv.writer(fo)
for info in data:
data_specs = info.split(',')
time_n_date = data_specs[0]
sentiment = data_specs[2]
'''Possibly precede with a nested for loop for data_specs???'''
if sentiment == 'Bullish':
'''fo.write(time + ',' + 'Bullish' + '\n')'''
elif sentiment == 'Bearish':
''' fo.write(time + ',' + 'Bearish' + '\n')'''
else:
x3 += 1
positive = 0
negative = 0
content_words = data_specs[1].split()
for a in positive_words:
for b in content_words:
if (a == b):
positive = positive + 1
for c in negative_words:
for d in content_words:
if (c == d):
negative = negative + 1
if positive > negative:
'''fo.write(time + ',' + 'Bullish' + '\n')'''
sentiment = 'Bullish'
elif positive < negative:
sentiment = 'Bearish'
else:
sentiment = 'Neutral'
bac2data = [time_n_date, sentiment]
print bac2data
writer.writerow(bac2data)
fo.close()
I have a text file that looks like this:
A 12 YDUSD ASSDAS FSDDSFSD SDFF
AA FSDFSD FSDF SDFSDG GSDDSFS SDF
AB SDFSDF SDFFSDFDS SDSDSDSDS
ACC SDFSDDSDFSD EW12 SDFSD 3322
ACDD FDSDSFS SDFGSDG DSGSDF
AB FSDFSD SDF34 223DSFSD
ABBD 2332 ADSDFDSFDS
And so on and so fourth for about 500 different beginnings to each line. I want to write a program that will get the line, take everything from it before the first tab (there is a tab between each column) and put it into a list like this:
['A', 'AA', 'AB', 'ACC', 'ACDD', 'AB', 'ABBD']
This is my program so far but it doesn't quite work:
file1 = open("filename", "r")
file2 = open("filename2", "w")
i=0
k = 0
sp500list = []
with open("filename1") as f:
lines = f.readlines()
while (abc < len(lines)):
LineStr = str(lines[i])
j = 0
if (LineStr[j] != ''):
j = j + 1
if (LineStr[j] !=''):
j = j + 1
elif (LineStr[j] == ' '):
sp500list.append(str(LineStr[:2]))
i = i + 1
if (LineStr[j] !=''):
j = j + 1
elif (LineStr[j] == ' '):
sp500list.append(str(LineStr[:3]))
i = i + 1
if (LineStr[j] !=''):
sp500list.append(str(LineStr[:4]))
i = i + 1
j = 0
elif (LineStr[j] == ' '):
i = i + 1
print sp500list
abc = abc + 1
So far all it does is return an empty array, can anyone help?
Thanks!
This can be simplified. split each line by '\t', and take the first element of the resulting list.
>>> with open('file.txt') as f:
... result = [line.split('\t', 1)[0] for line in f]
...
>>> result
['A', 'AA', 'AB', 'ACC', 'ACDD', 'AB', 'ABBD']
Alternatively, use result = [line[:line.find('\t')] for line in f].