How to do math operation on imported csv data? - python

I have read in a csv file ('Air.csv') and have performed some operations to get rid of the header (not important). Then, I used dB_a.append(row[1]) to put this column of the csv data into an array which I could later plot.
This data is dB data, and I want to convert this to power using the simple equation P = 10^(dB/10) for every value. I am new to Python, so I don't quite understand how operations within arrays, lists, etc. works. I think there is something I need to do to iterate over that full data set, which was my attempt at a for loop, but I am still receiving errors. Any suggestions?
Thank you!
frequency_a=[]
dB_a=[]
a = csv.reader(open('Air.csv'))
for row in itertools.islice(a, 18, 219):
frequency_a.append(row[0])
dB_a.append(row[1])
#print(frequency_a)
print(dB_a)
for item in dB_a:
power_a = 10**(dB_a/10)
print(power_a)

In your for loop, item is the iterator, so you need to use that. So instead of:
power_a = 10**(dB_a/10)
use:
power_a = 10**(item/10)
A nicer way to create a new list with that data could be:
power_a = [10**(db/10) for db in dB_a]
EDIT: The other issue as pointed out in the comment, is that the values are strings. The .csv file is essentially a text file, so a collection of string, rather than integers. What you can do is convert them to numeric values using int(db) or float(db), depending whether you have whole or floating point numbers.
EDIT2: As pointed out by #J. Meijers, I was using multiplication instead of exponentiation - this has been fixed in the answer.

To build on the answer #ed Jaras posted.
power_a = [10*(db/10) for db in dB_a]
is not correct, since this divides by 10, and then multiplies by the same.
It should be:
power_a = [10**(db/10) for db in dB_a]
Credits still go to #Ed Jaras though
Note:
If you're wondering what this [something for something in a list] is, it is a list comprehension. They are amazingly elegant constructs that python allows.
What is basically means is [..Add this element to the result.. for ..my element.. in ..a list..].
You can even add conditionals to them if you want.
If you want to read more about them, I suggest checking out:
http://www.secnetix.de/olli/Python/list_comprehensions.hawk
Addition:
#k-schneider: You are probably doing numerical operations (dividing, power, etc. ) on a string, this is because when importing a csv, it is possible for fields to be imported as a string.
To make sure that you are working with integers, you can cast db to a string by doing:
str(db)

Related

(Python3.x)Splitting arrays and saving them into new arrays

I'm writing a Python script intended to split a big array of numbers into equal sub-arrays. For that purpose, I use Numpy's split method as follows:
test=numpy.array_split(raw,nslices)
where raw is the complete array containing all the values, which are float64-type by the way.
nslices is the number of sub-arrays I want to create from the raw array.
In the script, nslices may vary depending of the size of the raw array, so I would like to "automatically" save each created sub-arrays in a particular array as : resultsarray(i)in a similar way that it can be made in MATLAB/Octave.
I tried to use afor in range loop in Python but I am only able to save the last sub-array in a variable.
What is the correct way to save the sub-array for each each incrementation from 1 to nslices?
Here, the complete code as is it now (I am a Python beginner, please bother the low-level of the script).
import numpy as np
file = open("results.txt", "r")
raw = np.loadtxt(fname=file, delimiter="/n", dtype='float64')
nslices = 3
rawslice = np.array_split(raw,nslices)
for i in range(0,len(rawslice)):
resultsarray=(rawslice[i])
print(rawslice[i])
Thank you very much for your help solving this problem!
First - you screwed up delimiter :)
It should be backslash+n \n instead of /n.
Second - as Serge already mentioned in comment you can just access to split parts by index (resultarray[0] to [2]). But if you really wanted to assign each part to a separate variable you can do this in fommowing way:
result_1_of_3, result_2_of_3, result_3_of_3 = rawslice
print(result_1_of_3, result_2_of_3, result_3_of_3)
But probably it isn't the way you should go.

Accessing Data from Within a List or Tuple and Cleaning it

Here is a conceptual problem that I have been having regarding the cleaning of data and how to interact with lists and tuples that I'm not sure completely how to explain, but if I can get a fix for it, I can conceptually be better at using python.
Here it is: (using python 3 and sqlite3)
I have an SQLite Database with a date column which has text in it in the format of MM-DD-YY 24:00. when viewed in DB Browser the text looks fine. However, when using a fetchall() in Python the code prints the dates out in the format of 'MM-DD-YY\xa0'. I want to clean out the \xa0 from the code and I tried some code that is a combination of what I think I should do plus another post I read on here. This is the code:
print(dates)
output [('MM-DD-YY\xa0',), ('MM-DD-YY\xa0',)etc.blahblah] i just typed this in here
to show you guys the output
dates_clean = []
for i in dates:
clean = str(i).replace(u'\xa0', u' ')
dates_clean.append(clean)
Now when I print dates_clean I get:
["('MM-DD-YY\xa0',)", "('MM-DD-YY\xa0',)"etc]
so now as you can see when i tried to clean it, it did what I wanted it to do but now the actual tuple that it was originally contained in has become part of the text itself and is contained inside another tuple. Therefore when I write this list back into SQLite using an UPDATE statement. all of the date values are contained inside a tuple.
It frustrates me because I have been facing issues such as this for a while, where I want to edit something inside of a list or a tuple and have the new value just replace the old one instead of keeping all of the characters that say it is a tuple and making them just text. Sorry if that is confusing like I said its hard for me to explain. I always end up making my data more dirty when trying to clean it up.
Any insights in how to efficiently clean data inside lists and tuples would be greatly appreciated. I think I am confused about the difference between accessing the tuple or accessing what is inside the tuple. It could also be helpful if you could suggest the name of the conceptual problem I'm dealing with so I can do more research on my own.
Thanks!
You are garbling the output by calling str() on the tuple, either implicitly when printing the whole array at once, or explicitly when trying to “clean” it.
See (python3):
>>> print("MM-DD-YY\xa024:00")
MM-DD-YY 24:00
but:
>>> print(("MM-DD-YY\xa024:00",))
('MM-DD-YY\xa024:00',)
This is because tuple.__str__ calls repr on the content, escaping the non-ascii characters in the process.
However if you print the tuple elements as separate arguments, the result will be correct. So you want to replace the printing with something like:
for row in dates:
print(*row)
The * expands the tuple to separate parameters. Since these are strings, they will be printed as is:
>>> row = ("MM-DD-YY\xa023:00", "MM-DD-YY\xa024:00")
>>> print(*row)
MM-DD-YY 23:00 MM-DD-YY 24:00
You can add separator if you wish
>>> print(*row, sep=', ')
MM-DD-YY 23:00, MM-DD-YY 24:00
... or you can format it:
>>> print('from {0} to {1}'.format(*row))
from MM-DD-YY 23:00 to MM-DD-YY 24:00
Here I am using the * again to expand the tuple to separate arguments and then simply {0} for zeroth member, {1} for first, {2} for second etc. (you can also use {} for next if you don't need to change the order, but giving the indices is clearer).
Ok, so now if you actually need to get rid of the non-breaking space anyway, replace is the right tool. You just need to apply it to each element of the tuple. There are two ways:
Explicit destructuring; applicable when the number of elements is fixed (should be; it is a row of known query):
Given:
>>> row = ('foo', 2, 5.5)
you can destructure it and construct a new tuple:
>>> (a, b, c) = row
>>> (a.replace('o', '0'), b + 1, c * 2)
('f00', 3, 11.0)
this lets you do different transformation on each column.
Mapping; applicable when you want to do the same transformation on all elements:
Given:
>>> row = ('foo', 'boo', 'zoo')
you just wrap a generator comprehension in a tuple constructor:
>>> tuple(x.replace('o', '0') for x in row)
('f00', 'b00', 'z00')
On a side-note, SQLite has some date and time functions and they expect the timestamps to be in strict IS8601 format, i.e. %Y-%m-%dT%H:%M:%S (optionally with %z at the end; using strftime format; in TR#35 format it is YYYY-MM-ddTHH-mm-ss(xx)).
In your case, dates is actually a list of tuples, with each tuple containing one string element. The , at the end of the date string is how you identify a single element tuple.
The for loop you have needs to work on each element within the tuples, instead of the tuples themselves. Something along the lines of:
for i in dates:
date = i[0]
clean = str(date).replace('\xa0', '')
dates_clean.append(date)
I am not sure this the best solution to your actual problem of manipulating data in the db, but should answer your question.
EDIT: Also, refer the Jan's reply about unicode strings and python 2 vs python 3.

Parsing a string with defined data dictionary into a list efficiently

I have a large flat file which I need to parse using a list which contains the variable name, the starting point, and the length of the variable along with the type. e.g.
columns = [['LOAD_CYCLE', 131, 6, 'int'],
['OPERATOR', 59, 8, 'Char (8)'],
['APP_DATE', 131, 8, 'Date'],
['UNIQUE_KEY', 245, 25, 'Char (25)']]
This list contains 1,600 items. The only really important columns are the starting point and the length of the variable. This is used to split each line in the flat file into a list of variables, which is used to create a new file to be inserted into a database. The data type is important, but I can always do that section later.
Currently, my method is to read the file in chunks (it is a very large file; over 6GB), and then process the chunk piece by piece:
line = data_file.read(chunk*1000)
for x in range(1000):
offset = chunk*x
for item in columns:
piece = line[item[1]+offset:item[1]+item[2]+offset].replace('\n','')
#Depending on the data type, a piece may undergo one or two checks before being
#added to a list which is then written to an output file
The time consuming part is iterating through the columns. Is this the only way to do this? Or is there perhaps a more efficient way to split the string? Something involving maps?
This seems like a great case for the struct module. Assuming you're using CPython, this effectively moves the loop over the columns into C.
First, you need to build up the format string.
Since your columns appear to be specified in arbitrary order, rather than ordered by starting point, and may have gaps between them, this isn't quite trivial… but it shouldn't be too hard. Something like this:
sorted_columns = sorted(columns, key=operator.itemgetter(1))
formats = []
offset = 0
for name, start, length, vtype in sortedcolumns:
# add padding bytes
if start > offset:
formats.append('{}x'.format(start-offset))
formats.append('{}s'.format(length))
format = struct.Struct('=' + ''.join(formats))
Then:
offset = chunk*x
values = format.unpack_from(line, offset)
And now you have a tuple of 1600 items.
Of course to do anything with that tuple, you may have to iterate over it anyway. But maybe you can do that in C as well. For example, if you're just inserting the values into a SQL database, then creating a giant SQL statement with 1600 parameters (in the same order as in sorted_columns) and passing the tuple as the arguments may take care of that for you:
cursor.execute(giant_insert_sql, values)
But if you need to do something more complicated to each value, then you'll need to do something like one of the following:
Use NumPy and/or Pandas to vectorize the loop. (Note that they can also be used to just load the whole file into memory, vectorizing the outer loop as well as the inner one, if you've got the RAM… but that shouldn't be nearly as much of a performance gain.)
Run your existing code in PyPy instead of CPython.
Use Numba to JIT the code within CPython.
Write a C extension to replace your inner loop—which, if you're lucky, may be as simple as just moving your Python code to a function in a separate file and compiling it with Cython.

How do I make a dynamically expanding array in python

Ok I have this part of code:
def Reading_Old_File(self, Path, turn_index, SKU):
print "Reading Old File! Turn Index = ", turn_index, "SKU= ", SKU
lenght_of_array=0
array_with_data=[]
if turn_index==1:
reading_old_file = open(Path,'rU')
data=np.genfromtxt(reading_old_file, delimiter="''", dtype=None)
for index, line_in_data in enumerate(data, start=0):
if index<3:
print index, "Not Yet"
if index>=3:
print ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Reading All Old Items"
i=index-3
old_items_data[i]=line_in_data.split("\t")
old_items_data[i]=[lines_old.strip()for lines_old in old_items_data]
print old_items_data[i]
print len(old_items_data)
So what I am doing here is, I'm reading a file, on my first turn, I want to read it all, and keep all data, so it would be something like:
old_items_data[1]=['123','dog','123','dog','123','dog']
old_items_data[2]=['124','cat','124','cat','124','cat']
old_items_data[n]=['amount of list members is equal each time']
each line of the file should be stored in list, so I can use it in future for comparing, when turn_index will be greater than 2 I'll compare coming line with lines in every list(array) by iterating over all lists.
So question is how do I do it, or is there any better way to compare lists?
I'm new to python so maybe someone could help me with this issue?
Thanks
You just need to use append.
old_items_data.append(line_in_data.split("\t"))
I would use the package pandas for this. It will not only be much quicker, but also simpler. Use pandas.read_table to import the data (specifying delimiter and row-skipping can be done here by passing arguments to sep and skiprows). Then, use pandas.DataFrame.apply to apply your function to the rows of your data.
The speed gains are going to come from the fact that pandas was optimized to perform actions across lists like this (in the case of a pandas DataFrame, these would be called rows). This applies to both importing the data and applying a function to every row. The simplicity gains should hopefully be clear.

Python-How can i change part of a row in a CSV file?

I have a CSV file with some words in, followed by a number and need a way to append the number; either adding 1 to it, or setting it back to 1.
Say for instance I have these words:
variant,1
sixty,2
game,3
library,1
If the user inputs the number sixty, how could I use that to add one onto the number, and how would I reset it back to 1?
I've been all over Google+Stackoverflow trying to find an answer, but I expect me not being able to find an answer was due more to my inexperience than anything.
Thanks.
This is a quick throw together using fileinput. Since I am unaware of the conditions for why you would decrease or reset your value, I added it in as an keyword arg you can pass at will. Such as
updateFileName(filename, "sixty", reset=True)
updateFileName(filename, "sixty", decrease=True)
updateFileName(filename, "sixty")
The results of each should be self-explanatory. Good luck! I wrapped it in a Try as I had no clue how your structure was, which will cause it to fail ultimately either way. If you have spaces you will need to .strip() the key and value.
import fileinput
def updateFileName(filename, input_value, decrease=False, reset=False):
try:
for line in fileinput.input(filename, inplace=True):
key, value = line.split(",")
if key == input_value:
if decrease:
sys.stdout.write("%s,%s"(key, int(value) - 1)
elif reset:
sys.stdout.write("%s,%s"(key, 1)
else:
sys.stdout.write("%s,%s"(key, int(value) + 1)
continue
sys.stdout.write(line)
finally:
fileinput.close()
Without knowing when you want to switch a number to 1 and when you want to add 1, I can't give a full answer, but I can set you on the right track.
First you want to import the csv file, so you can change it around.
The csv module is very helpful in this. Read about it here: http://docs.python.org/2/library/csv.html
You will likely want to read it into a dictionary structure, because this will link each word to it's corresponding number. Something like this make dictionary from csv file columns
Then you'll want to use raw_input (or input if you are using Python 3.0)
to get the word you are looking for and use that as the key to find and change the number you want to change. http://anh.cs.luc.edu/python/hands-on/handsonHtml/handson.html#x1-490001.12
or http://www.sthurlow.com/python/lesson06/
will show you how to get one part of a dictionary by giving it the other part and how to save info back into a dictionary.
Sorry it's not a direct answer. Maybe someone else will write one up, but it should get you started,
This is very probably not the best way to do it but you can load your csv file in an array object using numpy with the help of loadtxt().
The following code is going to give you a 2 dimension array with names in the first column and your numbers in the second one.
import numpy as np
a = np.loadtxt('YourFile',delimiter=',')
Perform your changes on the numbers the way you want and use the numpy savetxt() to save your file.
If your file is very heay, this solution is going to be a pain as loading a huge array takes a lot of memory. So consider it just as a workaround. The dictionary solution is actually better (I think).

Categories