Output strings to specific rows in excel with python - python

I have a list MyList[] of integers which I convert to binary strings and output to an excel table.
The code looks like:
r = 2
for x in MyList:
binary_out = bin(x)[2:].zfill(9)
for ind, val in enumerate(str(binary_out)) :
worksheet.write(r,ind+2,val)
r+=1
and the output of binary string looks like:
(Note:This is the binary output of the code above. The other data are generated in an earlier phase)
This is so far OK.
I would like to get the output only on specific rows, not to all.
The information on which rows they have to be output I have only in form of indices which I have earlier collect into a list:
indices = [2,4,6,7]
As you can see above, the output in excel starts from row 2.
So the row 2 shall now mean the number with the first index of indices.
So the output shall looks like:
How to modify the code to get the output on the wanted rows?

Maybe you can use nested for:
for x in MyList:
binary_out = bin(x)[2:].zfill(9)
for row_idx in indices:
for ind, val in enumerate(str(binary_out)):
worksheet.write(row_idx,ind+2,val)
I do not know exactly what library you are using so I don't know what ind and val are.

You can dynamically create list with that indices using list comprehansion:
# for x in MyList:
for x in (MyList[idx] for idx in indices):

Related

How to define a variable amount of columns in python pandas apply

I am trying to add columns to a python pandas df using the apply function.
However the number of columns to be added depend on the output of the function
used in the apply function.
example code:
number_of_columns_to_be_added = 2
def add_columns(number_of_columns_to_be_added):
df['n1'],df['n2'] = zip(*df['input'].apply(lambda x : do_something(x, number_of_columns_to_be_added)))
Any idea on how to define the ugly column part (df['n1'], ..., df['n696969']) before the = zip( ... part programatically?
I'm guessing that the output of zip is a tuple, therefore you could try this:
temp = zip(*df['input'].apply(lambda x : do_something(x, number_of_columns_to_be_added)))
for i, value in enumerate(temp, 1):
key = 'n'+str(i)
df[key] = value
temp will hold the all the entries and then you iterate over tempto assign the values to your dict with your specific keys. Hope this matches your original idea.

In pure python (no numpy, etc.) how can I find the mean of certain columns of a two dimensional list?

I currently use CSV reader to create a two dimensional list. First, I strip off the header information, so my list is purely data. Sadly, a few columns are text (dates, etc) and some are just for checking against other data. What I'd like to do is take certain columns of this data and obtain the mean. Other columns I just need to ignore. What are the different ways that I can do this? I probably don't care about speed, I'm doing this once after I read the csv and my CSV files are maybe 2000 or so rows and only 30 or so columns.
This is assuming that all rows are of equal length, if they're not, you may have to add a few try / except cases in
lst = [] #This is the rows and columns, assuming the rows contain the columns
column = 2
temp = 0
for row in range (len(lst)):
temp += lst [row][column]
mean = temp / len (lst)
To test if the element is a number, for most cases, I use
try:
float(element) # int may also work depending on your data
except ValueError:
pass
Hope this helps; I can't test this code, as I'm on my phone.
Try this:
def avg_columns(list_name, *column_numbers):
running_sum = 0
for col in column_numbers:
for row in range(len(list_name)):
running_sum += list_name[row][col]
return running_sum / (len(list_name)*len(column_numbers))
You pass it the name of the list, and the indexes of the columns (starting at 0), and it will return the average of those columns.
l = [
[1,2,3],
[1,2,3]
]
print(avg_columns(l, 0)) # returns 1.0, the avg of the first column (index 0)
print(avg_columns(l, 0, 2)) # returns 2.0, the avg of column indices 0 and 2 (first and third)

writing pandas dataframe columns to csv rows staggered

I have a pandas dataframe with three columns, say : A,B,C and I would like to rearrange the data and ouput it in a CSV so that all values in C that have the same value in A share a row. So for example if my Code block is designed as follows (for example, not that I'd design it this way):'
check=pd.DataFrame(columns=['A','B', 'C'])
for i in range(8):
check.loc[1]=[1,11,10]
check.loc[2]=[1,21,23]
check.loc[3]=[1,23,32]
check.loc[4]=[2,21,41]
check.loc[5]=[2,21,11]
check.loc[6]=[3,21,29]
check.loc[7]=[4,21,43]
check.loc[8]=[4,21,52]
`
I'd want the output to look like one of the following in the CSV:
This:
1,,,
10,23,32,
2,,,
41,11,,
3,,,
29,,,
4,,,
43,52,,
OR:
1,10,23,32
2,41,11,
3,29,,
4,43,52,
OR:
10,23,32,
41,11,,
29,,,
43,52,,
Thank you in advance for any suggestions.
Well... It's a little hard to grok what you're really doing. But it looks like you are not outputting the B column at all. The first step is to get your data arranged in an acceptable way. Which appears to be a row for each value of A. Then export.
One way to get your last example output is to create a list of lists where each list item is a desired row. I'd do that by grouping the data by A then iterating over the groups:
g = check.groupby('A')
bigList = []
for group in g:
rowList = []
for c in group[1].C.iteritems():
rowList.append(c[1])
bigList.append( rowList )
now bigList is a list of lists. So we can just convert that to a Pandas dataframe and then save to csv:
outData = pd.DataFrame(bigList)
outData.to_csv('myFile.csv', index=False)
You could take the above loop and modify it to do your other examples as well. This would do your second:
bigList = []
for group in g:
rowList = []
rowList.append(group[0])
for c in group[1].C.iteritems():
rowList.append(c[1])
bigList.append( rowList )

how to take word repetition out of a list?

I have a project where I have to take input values from an Excel spreadsheet and plot them with matplotlib but the values that xlrd returns can't be put straight into Matplotlib because the values have a string in front of it.
I'm asking how can I change the output from this:
[number:150000.0, number:140000.0, number:300000.0]
to this:
[150000.0, 140000.0, 300000.0]
This will allow me to put the values straight from xlrd into matplotlib.
Assuming you have a list of strings:
data = ["number:150000.0", "number:140000.0", "number:300000.0"]
you can turn it into a list of actual float numbers with:
data = [float(item.split(":")[1]) for item in data]
Edit you have Cell objects, not strings, so use:
data = [cell.value for cell in data]

extract information from excel into python 2d array

I have an excel sheet with dates, time, and temp that look like this:
using python, I want to extract this info into python arrays.
The array would get the date in position 0, and then store the temps in the following positions and look like this:
temparray[0] = [20130102,34.75,34.66,34.6,34.6,....,34.86]
temparray[1] = [20130103,34.65,34.65,34.73,34.81,....,34.64]
here is my attempt, but it sucks:
from xlrd import *
print open_workbook('temp.xlsx')
wb = open_workbook('temp.xlsx')
for s in wb.sheets():
for row in range(s.nrows):
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
print(values[0])
print("%.2f" % values[1])
print'''
i used xlrd, but I am open to using anything. Thank you for your help.
From what I understand of your question, the problem is that you want the output to be a list of lists, and you're not getting such a thing.
And that's because there's nothing in your code that even tries to get such a thing. For each row, you build a list, print out the first value of that list, print out the second value of that list, and then forget the list.
To append each of those row lists to a big list of lists, all you have to do is exactly the same thing you're doing to append each column value to the row lists:
temparray = []
for row in range(s.nrows):
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
temparray.append(values)
From your comment, it looks like what you actually want is not only this, but also grouping the temperatures together by day, and also only adding the second column, rather than all of the values, for each day. Which is not at all what you described in the question. In that case, you shouldn't be looping over the columns at all. What you want is something like this:
days = []
current_day, current_date = [], None
for row in range(s.nrows):
date = s.cell(row, 0)
if date != current_date:
current_day, current_date = [], date
days.append(current_day)
current_day.append(s.cell(row, 2))
This code assumes that the dates are always in sorted order, as they are in your input screenshot.
I would probably structure this differently, building a row iterator to pass to itertools.groupby, but I wanted to keep this as novice-friendly, and as close to your original code, as possible.
Also, I suspect you really don't want this:
[[date1, temp1a, temp1b, temp1c],
[date2, temp2a, temp2b]]
… but rather something like this:
{date1: [temp1a, temp1b, temp1c],
date2: [temp1a, temp1b, temp1c]}
But without knowing what you're intending to do with this info, I can't tell you how best to store it.
If you are looking to keep all the data for the same dates, I might suggest using a dictionary to get a list of the temps for particular dates. Then once you get the dict initialized with your data, you can rearrange how you like. Try something like this after wb=open_workbook('temp.xlsx'):
tmpDict = {}
for s in wb.sheets():
for row in xrange(s.nrows):
try:
tmpDict[s.cell(row, 0)].append(s.cell(row, 2).value)
except KeyError:
tmpDict[s.cell(row, 0)] = [s.cell(row,2).value]
If you print tmpDict, you should get an output like:
{date1: [temp1, temp2, temp3, ...],
date2: [temp1, temp2, temp3, ...]
...}
Dictionary keys are kept in an arbitrary order (it has to do with the hash value of the key) but you can construct a list of lists based on the content of the dict like so:
tmpList = []
for key in sorted(tmpDict.keys):
valList = [key]
valList.extend(tmpDict[key])
tmpList.append(valList)
Then, you'll get a list of lists ordered by date with the vals, as you were originally working. However, you can always get to the values in the dictionary by using the keys. I typically find it easier to work with the data afterwards but you can change it to any form you need.

Categories