How to import and write text in a for cycle - python

I have the following code:
dat11=np.genfromtxt('errors11.txt')
dat12=np.genfromtxt('errors12.txt')
dat13=np.genfromtxt('errors13.txt')
dat22=np.genfromtxt('errors22.txt')
dat23=np.genfromtxt('errors23.txt')
dat33=np.genfromtxt('errors33.txt')
zip(dat11,dat12,dat13,dat22,dat23,dat33)
import csv
with open('Allerrors.txt', "w+") as output:
writer = csv.writer(output, delimiter='\t')
writer.writerows(zip(dat11,dat12,dat13,dat22,dat23,dat33))
quit
Where each of the 'errorsxy.txt' files consists in a column of numbers. With this program I created the 'Allerrors.txt' file, were all those columns are one next to the others. I need to do this same thing with a for cycle (or any other kind of loop) because I'll actually have much more files and I can't do it by hand. But I don't know how to write these various datxy with a cycle. I tried (for the first part of the code) with:
for x in range(1,Nbin+1):
for y in range(1,Nbin+1):
'dat'+str(x)+str(y)=np.genfromtxt('errors'+str(x)+str(y)+'.txt')
But of course I get the following error:
SyntaxError: can't assign to operator
I understand why I get this error, but I couldn't find any other way to write it. Also, I have no idea how to write the second part of the code.
I'm using Python 2.7
Anyone can help me?

Instead of making separate variables for each data file, you could append each read-in file to a list, then zip and print the list after the for loop has run.
errorfiles = []
for x in range(1,Nbin+1):
for y in range(1,Nbin+1):
dat=np.genfromtxt('errors'+str(x)+str(y)+'.txt’)
errorfiles.append(dat)

Related

Python import csv file and replace blank values

I have just started a data quality class in which I got zero instruction on Python but am expected to create a script. There are three instructions for my Python script:
Create a script that loads an entire CSV file and replace all the blank values to NAN
Use genfromtxt function
Write the results set into a different file
I have been working on this for a few hours, but with no previous experience with Python, I am completely stuck! This is what I have so far:
import csv
file = open(quality.csv, 'r')
csvreader = csv.reader(file)
header = next(csvreader)
print(header)
rows = []
for row in csvreader:
rows.append(row)
print(rows)
My first problem is that when I tried using genfromtxt, it would not print out the headers or the entire csv file, it would only print out a few lines. If it matters, all of the values of the csv file are ints/floats, but the headers are strings.
See here
The next problem is I have tried several different ways to replace blank values, but I was not successful. All of the blank fields in this file are in the last column. When I print out the csv in full, this is what the line looks like (I've highlighted the empty value):
See here
Finally, I have no idea what instruction #3 means. I am completely new at this with zero Python knowledge! I think I am unsure of the Python syntax and rules - which I will look into more and learn, however I only had two days to complete this assignment and I do not know anything yet! Thank you in advance.
What you did with genfromtxt seems correct already. With big data like this, terminal only shows some data from the beginning and at the end, and those 3 dots in the middle also indicates the other records you're not seeing there!

Error with using length function; output will not be anything other than one

I have multiple csv files I've uploaded into both google colab and jupyter notebook. I can successfully print certain lines of my file. The file contains rows of strings. When I open the file it opens the number application of my MacBook. Anyways, for some reason whenever I try to print the length of ANY line in my file, python ALWAYS tells me the length is 1. All of the strings have way more than a length of 1. I thought "maybe it's the file itself?" Nope. Ive used multiple csv files, still 1. Its not the ide, I've used jupyter and google colab. I can print the lengths of words like 'HELLO'. But, nothing correctly that's in my file. Im assuming I have something wrong with my code even though I've tried multiple versions. Please let me know what's going on. This is a simple command yet for some reason it is not working.
with open('/Users/xxx/Desktop/Silkscreen/fonts/ughuuh.csv' , newline='') as f:
reader = csv.reader(f)
data = list(reader)
print((len(data[1]))
>>output: 1
change to this:
print(len(data[1]))
I am making some assumptions about "data" but it looks like you are getting a list of lists.
so to use random:
import random
data = [['(O)[C##H](O)[C#H](O2)OC OC[C##H](O)C(O)[C#H](O)C'],['(O)[C##H](O)[C#H][C#H](O)C']]
rand = random.choice(data)
print(len(rand[0]))
The len function takes an object as a function.
print(len(data[1]))

Storing DataFrame output utilizing Pandas to either csv or MySql DB

question regarding pandas:
Say I created a dataframe and generated output under separate variables, rather than printing them, how would I go about combining them back into another dataframe correctly to either send as a CSV and then upload to a DB or directly upload to a DB?
Everything works fine code wise, I just haven't really seen or know of the best practice to do this. I know we can store things in list, dict, etc
What I did was:
#imported all modules
object = df.iloc[0,0]
#For loop magic goes here
#nested for loop
#if conditions are met, do this
result = df.iloc[i, k+1]
print(object, result)
I've also stored them into a separate DataFrame trying:
df2 = pd.DataFrame({'object': object, 'result' : result}, index=[0])
df2.to_csv('output.csv', index=False, mode='a')
The only problem with that is that it appends everything to each row, most likely do to the append and perhaps not including it in the for loop. Which is odd because the raw output is EXACTLY how I'm trying to get it into a csv or into a DB
As saying though, looking to combine both values back into a dataframe for speed. I tried concat etc, but no luck, so I was wondering what the correct format would be? Thanks
So it turns out that after more research and revising, I solved my issue
Referenced this and personal revisions, this is a basis of what I did:
Empty space in between rows after using writer in python
import csv
/* Had to wrap in a for loop that is not listed and append to file while clearing it first to remove spaces in each row*/
with open('csvexample.csv', wb+, newline='') as file:
reader = csv.reader(file)
for row in reader:
print(row)
Additional supporting material:
Confused by python file mode "w+"

Using Python v3.5 to load a tab-delimited file, omit some rows, and output max and min floating numbers in a specific column to a new file

I've tried for several hours to research this, but every possible solution hasn't suited my particular needs.
I have written the following in Python (v3.5) to download a tab-delimited .txt file.
#!/usr/bin/env /Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5
import urllib.request
import time
timestr = time.strftime("%Y-%m-%d %H-%M-%S")
filename="/data examples/"+ "ace-magnetometer-" + timestr + '.txt'
urllib.request.urlretrieve('http://services.swpc.noaa.gov/text/ace-magnetometer.txt', filename=filename)
This downloads the file from here and renames it based on the current time. It works perfectly.
I am hoping that I can then use the "filename" variable to then load the file and do some things to it (rather than having to write out the full file path and file name, because my ultimate goal is to do the following to several hundred different files, so using a variable will be easier in the long run).
This using-the-variable idea seems to work, because adding the following to the above prints the contents of the file to STDOUT... (so it's able to find the file without any issues):
import csv
with open(filename, 'r') as f:
reader = csv.reader(f, dialect='excel', delimiter='\t')
for row in reader:
print(row)
As you can see from the file, the first 18 lines are informational.
Line 19 provides the actual column names. Then there is a line of dashes.
The actual data I'm interested in starts on line 21.
I want to find the minimum and maximum numbers in the "Bt" column (third column from the right). One of the possible solutions I found would only work with integers, and this dataset has floating numbers.
Another possible solution involved importing the pyexcel module, but I can't seem to install that correctly...
import pyexcel as pe
data = pe.load(filename, name_columns_by_row=19)
min(data.column["Bt"])
I'd like to be able to print the minimum Bt and maximum Bt values into two separate files called minBt.txt and maxBt.txt.
I would appreciate any pointers anyone may have, please.
This is meant to be a comment on your latest question to Apoc, but I'm new, so I'm not allowed to comment. One thing that might create problems is that bz_values (and bt_values, for that matter) might be a list of strings (at least it was when I tried to run Apoc's script on the example file you linked to). You could solve this by substituting this:
min_bz = min([float(x) for x in bz_values])
max_bz = max([float(x) for x in bz_values])
for this:
min_bz = min(bz_values)
max_bz = max(bz_values)
The following will work as long as all the files are formatted in the same way, i.e. data 21 lines in, same number of columns and so on. Also, the file that you linked did not appear to be tab delimited, and thus I've simply used the string split method on each row instead of the csv reader. The column is read from the file into a list, and that list is used to calculate the maximum and minimum values:
from itertools import islice
# Line that data starts from, zero-indexed.
START_LINE = 20
# The column containing the data in question, zero-indexed.
DATA_COL = 10
# The value present when a measurement failed.
FAILED_MEASUREMENT = '-999.9'
with open('data.txt', 'r') as f:
bt_values = []
for val in (row.split()[DATA_COL] for row in islice(f, START_LINE, None)):
if val != FAILED_MEASUREMENT:
bt_values.append(float(val))
min_bt = min(bt_values)
max_bt = max(bt_values)
with open('minBt.txt', 'a') as minFile:
print(min_bt, file=minFile)
with open('maxBt.txt', 'a') as maxFile:
print(max_bt, file=maxFile)
I have assumed that since you are doing this to multiple files you are looking to accumulate multiple max and min values in the maxBt.txt and minBt.txt files, and hence I've opened them in 'append' mode. If this is not the case, please swap out the 'a' argument for 'w', which will overwrite the file contents each time.
Edit: Updated to include workaround for failed measurements, as discussed in comments.
Edit 2: Updated to fix problem with negative numbers, also noted by Derek in separate answer.

loop for two files only prints first line python

The first file, f1 has two columns, the first being an ID number and the second being a value associated with it.
The second file, f2 is a bigger version of the first file with more values and six columns but includes the two from the first file.
The second file has a column I want to associate with the values in the first file and I want the output to be a new text file that contains the ID,
the value associated and another column with the ones I want to associate from the bigger second file.
So far I've made a code which is doing what I want however it's only printing out the first line.
I'm not fantastic at python which is probably noticeable in my code, and I was hoping someone will have the answer to my problem
import csv
with open('output1.txt','w') as out1,open('list1.csv') as f1,open('list2.csv') as f2:
csvf1=csv.reader(f1)
csvf2=csv.reader(f2)
for txt1 in csvf1:
id1=txt1[0]
z1=txt1[1]
for txt2 in csvf2:
id2=txt2[0]
z2=txt2[3]
ra=txt2[1]
if id1==id2:
out1.write("{} {} {}\n".format(id2,z1,ra))
out1.close()
f1.close()
f2.close()
I would also like to point out that using .split(',') does not work on my files for some reason just in case someone tries to use it in an answer.
Keep the line csvf2=csv.reader(f2) inside the first loop. The inner loop will get executed only for the first line. For the second line, inner loop will not get executed as the filereader marker is already at the end of the file.
import csv
with open('output1.txt','w') as out1,open('list1.csv') as f1,open('list2.csv') as f2:
csvf1=csv.reader(f1)
for txt1 in csvf1:
id1=txt1[0]
z1=txt1[1]
csvf2=csv.reader(f2)
for txt2 in csvf2:
id2=txt2[0]
z2=txt2[3]
ra=txt2[1]
if id1==id2:
out1.write("{} {} {}\n".format(id2,z1,ra))
out1.close()
f1.close()
f2.close()
The cvs.reader() is a function which you can only iterate the result of it once, (is it a yield function? some one should correct me, I just dig into the source code and got stuck at buildin-module of object reader(). any way, it just behave like a yield function)
So you may need to save every row in a temp array for farther usage:
list1 = []
with open('list1.txt') as fp:
for row in csv.reader(fp):
list1.append(row)
By the way, you wont need to close a fp explicitly when open it with a with expression, the mechanism just do that for you when you get out of the with scope.
I managed to find my answer from another programmer and this was the code that ended up working.
Thank you so much for your answers for they were close to what worked.
import csv
with open('output1.txt','w') as out1, open('file1.csv') as f1:
csvf1=csv.reader(f1)
for txt1 in csvf1:
id1=txt1[0]
z1=txt1[1]
with open('file2.csv') as f2:
csvf2=csv.reader(f2)
for txt2 in csvf2:
id2=txt2[0]
z2=txt2[3]
ra=txt2[1]
if id1 == id2:
out1.write("{} {} {}\n".format(id2,z1,ra))

Categories