How do I make a dynamically expanding array in python - python

Ok I have this part of code:
def Reading_Old_File(self, Path, turn_index, SKU):
print "Reading Old File! Turn Index = ", turn_index, "SKU= ", SKU
lenght_of_array=0
array_with_data=[]
if turn_index==1:
reading_old_file = open(Path,'rU')
data=np.genfromtxt(reading_old_file, delimiter="''", dtype=None)
for index, line_in_data in enumerate(data, start=0):
if index<3:
print index, "Not Yet"
if index>=3:
print ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Reading All Old Items"
i=index-3
old_items_data[i]=line_in_data.split("\t")
old_items_data[i]=[lines_old.strip()for lines_old in old_items_data]
print old_items_data[i]
print len(old_items_data)
So what I am doing here is, I'm reading a file, on my first turn, I want to read it all, and keep all data, so it would be something like:
old_items_data[1]=['123','dog','123','dog','123','dog']
old_items_data[2]=['124','cat','124','cat','124','cat']
old_items_data[n]=['amount of list members is equal each time']
each line of the file should be stored in list, so I can use it in future for comparing, when turn_index will be greater than 2 I'll compare coming line with lines in every list(array) by iterating over all lists.
So question is how do I do it, or is there any better way to compare lists?
I'm new to python so maybe someone could help me with this issue?
Thanks

You just need to use append.
old_items_data.append(line_in_data.split("\t"))

I would use the package pandas for this. It will not only be much quicker, but also simpler. Use pandas.read_table to import the data (specifying delimiter and row-skipping can be done here by passing arguments to sep and skiprows). Then, use pandas.DataFrame.apply to apply your function to the rows of your data.
The speed gains are going to come from the fact that pandas was optimized to perform actions across lists like this (in the case of a pandas DataFrame, these would be called rows). This applies to both importing the data and applying a function to every row. The simplicity gains should hopefully be clear.

Related

Content info of 'row' in ( E.g for index, row in df.iterrows() ) for pandas

There is this code commonly used in python pandas "for index, row in df.iterrows()".
What is the difference between displaying these during the loop:
print(row)
print(row.index)
print(row.index[index])
print(row[index])
I tried printing them and cant comprehend what it does and how it selects the content and I cant find a well explained source online.
For one, it's more concise.
Secondly, you're only supposed to use it for displaying data rather than modifying. According to the docs you may get unpredictable results (a concurrent modification thing, methinks).
As to how it selects it, the docs also say it just returns the individual rows as pd.Series with the index being the id pandas uses to keep track of each row in the pd.DataFrame. I'd guess it'd be a akin to using a python zip() function on a list of int [0..n] and a list of pd.Series.

How to do math operation on imported csv data?

I have read in a csv file ('Air.csv') and have performed some operations to get rid of the header (not important). Then, I used dB_a.append(row[1]) to put this column of the csv data into an array which I could later plot.
This data is dB data, and I want to convert this to power using the simple equation P = 10^(dB/10) for every value. I am new to Python, so I don't quite understand how operations within arrays, lists, etc. works. I think there is something I need to do to iterate over that full data set, which was my attempt at a for loop, but I am still receiving errors. Any suggestions?
Thank you!
frequency_a=[]
dB_a=[]
a = csv.reader(open('Air.csv'))
for row in itertools.islice(a, 18, 219):
frequency_a.append(row[0])
dB_a.append(row[1])
#print(frequency_a)
print(dB_a)
for item in dB_a:
power_a = 10**(dB_a/10)
print(power_a)
In your for loop, item is the iterator, so you need to use that. So instead of:
power_a = 10**(dB_a/10)
use:
power_a = 10**(item/10)
A nicer way to create a new list with that data could be:
power_a = [10**(db/10) for db in dB_a]
EDIT: The other issue as pointed out in the comment, is that the values are strings. The .csv file is essentially a text file, so a collection of string, rather than integers. What you can do is convert them to numeric values using int(db) or float(db), depending whether you have whole or floating point numbers.
EDIT2: As pointed out by #J. Meijers, I was using multiplication instead of exponentiation - this has been fixed in the answer.
To build on the answer #ed Jaras posted.
power_a = [10*(db/10) for db in dB_a]
is not correct, since this divides by 10, and then multiplies by the same.
It should be:
power_a = [10**(db/10) for db in dB_a]
Credits still go to #Ed Jaras though
Note:
If you're wondering what this [something for something in a list] is, it is a list comprehension. They are amazingly elegant constructs that python allows.
What is basically means is [..Add this element to the result.. for ..my element.. in ..a list..].
You can even add conditionals to them if you want.
If you want to read more about them, I suggest checking out:
http://www.secnetix.de/olli/Python/list_comprehensions.hawk
Addition:
#k-schneider: You are probably doing numerical operations (dividing, power, etc. ) on a string, this is because when importing a csv, it is possible for fields to be imported as a string.
To make sure that you are working with integers, you can cast db to a string by doing:
str(db)

Excel worksheet to Numpy array

I'm trying to do an unbelievably simple thing: load parts of an Excel worksheet into a Numpy array. I've found a kludge that works, but it is embarrassingly unpythonic:
say my worksheet was loaded as "ws", the code:
A = np.zeros((37,3))
for i in range(2,39):
for j in range(1,4):
A[i-2,j-1]= ws.cell(row = i, column = j).value
loads the contents of "ws" into array A.
There MUST be a more elegant way to do this. For instance, csvread allows to do this much more naturally, and while I could well convert the .xlsx file into a csv one, the whole purpose of working with openpyxl was to avoid that conversion. So there we are, Collective Wisdom of the Mighty Intertubes: what's a more pythonic way to perform this conceptually trivial operation?
Thank you in advance for your answers.
PS: I operate Python 2.7.5 on a Mac via Spyder, and yes, I did read the openpyxl tutorial, which is the only reason I got this far.
You could do
A = np.array([[i.value for i in j] for j in ws['C1':'E38']])
EDIT - further explanation.
(firstly thanks for introducing me to openpyxl, I suspect I will use it quite a bit from time to time)
the method of getting multiple cells from the worksheet object produces a generator. This is probably much more efficient if you want to work your way through a large sheet as you can start straight away without waiting for it all to load into your list.
to force a generator to make a list you can either use list(ws['C1':'E38']) or a list comprehension as above
each row is a tuple (even if only one column wide) of
Cell objects. These have a lot more about them than just a number but if you want to get the number for your array you can use the .value attribute. This is really the crux of your question, csv files don't contain the structured info of an excel spreadsheet.
there isn't (as far as I can tell) a built in method for extracting values from a range of cells so you will have to do something effectively as you have sketched out.
The advantages of doing it my way are: no need to work out the dimension of the array and make an empty one to start with, no need to work out the corrected index number of the np array, list comprehensions faster. Disadvantage is that it needs the "corners" defining in "A1" format. If the range isn't know then you would have to use iter_rows, rows or columns
A = np.array([[i.value for i in j[2:5]] for j in ws.rows])
if you don't know how many columns then you will have to loop and check values more like your original idea
If you don't need to load data from multiple files in an automated manner, the package tableconvert I recently wrote may help. Just copy and paste the relevant cells from the excel file into a multiline string and use the convert() function.
import numpy as np
from tableconvert.converter import convert
array = convert("""
123 456 3.14159
SOMETEXT 2,71828 0
""")
print(type(array))
print(array)
Output:
<class 'numpy.ndarray'>
[[ 123. 456. 3.14159]
[ nan 2.71828 0. ]]

Pandas dataframe from generator where each line is a tab-separated row

I am trying parse a generator to the dataframe constructor, pd.Dataframe testdf = pd.DataFrame(test). I am unable to specify that each line is tab-delimited. The result is that I end up with a single column dataframe where each row is the entire row of values separated with '\t'.
I've tried a couple of other ways:
pd.read_csv(test)
pandas.io.parsers.read_table(test, sep='\t')
but neither of these work of them work because they do not take the input type generator.
Not too familiar with generators. Can you throw them into a list comprehension? If so, how about
pd.DataFrame([x.split('\t') for x in test])
One solution that I found would be to use a split function on the one column to break it up:
testdf_parsed = pd.DataFrame(testdf.row.str.split('\t').tolist(), )
...and that did work for me, but maybe there is a more elegant and simple solution exist that leverages the core capabilities of Pandas?
You might try implementing a file-like object that wraps your generator, then feeding that to read_table.

Python-How can i change part of a row in a CSV file?

I have a CSV file with some words in, followed by a number and need a way to append the number; either adding 1 to it, or setting it back to 1.
Say for instance I have these words:
variant,1
sixty,2
game,3
library,1
If the user inputs the number sixty, how could I use that to add one onto the number, and how would I reset it back to 1?
I've been all over Google+Stackoverflow trying to find an answer, but I expect me not being able to find an answer was due more to my inexperience than anything.
Thanks.
This is a quick throw together using fileinput. Since I am unaware of the conditions for why you would decrease or reset your value, I added it in as an keyword arg you can pass at will. Such as
updateFileName(filename, "sixty", reset=True)
updateFileName(filename, "sixty", decrease=True)
updateFileName(filename, "sixty")
The results of each should be self-explanatory. Good luck! I wrapped it in a Try as I had no clue how your structure was, which will cause it to fail ultimately either way. If you have spaces you will need to .strip() the key and value.
import fileinput
def updateFileName(filename, input_value, decrease=False, reset=False):
try:
for line in fileinput.input(filename, inplace=True):
key, value = line.split(",")
if key == input_value:
if decrease:
sys.stdout.write("%s,%s"(key, int(value) - 1)
elif reset:
sys.stdout.write("%s,%s"(key, 1)
else:
sys.stdout.write("%s,%s"(key, int(value) + 1)
continue
sys.stdout.write(line)
finally:
fileinput.close()
Without knowing when you want to switch a number to 1 and when you want to add 1, I can't give a full answer, but I can set you on the right track.
First you want to import the csv file, so you can change it around.
The csv module is very helpful in this. Read about it here: http://docs.python.org/2/library/csv.html
You will likely want to read it into a dictionary structure, because this will link each word to it's corresponding number. Something like this make dictionary from csv file columns
Then you'll want to use raw_input (or input if you are using Python 3.0)
to get the word you are looking for and use that as the key to find and change the number you want to change. http://anh.cs.luc.edu/python/hands-on/handsonHtml/handson.html#x1-490001.12
or http://www.sthurlow.com/python/lesson06/
will show you how to get one part of a dictionary by giving it the other part and how to save info back into a dictionary.
Sorry it's not a direct answer. Maybe someone else will write one up, but it should get you started,
This is very probably not the best way to do it but you can load your csv file in an array object using numpy with the help of loadtxt().
The following code is going to give you a 2 dimension array with names in the first column and your numbers in the second one.
import numpy as np
a = np.loadtxt('YourFile',delimiter=',')
Perform your changes on the numbers the way you want and use the numpy savetxt() to save your file.
If your file is very heay, this solution is going to be a pain as loading a huge array takes a lot of memory. So consider it just as a workaround. The dictionary solution is actually better (I think).

Categories