how to take word repetition out of a list? - python

I have a project where I have to take input values from an Excel spreadsheet and plot them with matplotlib but the values that xlrd returns can't be put straight into Matplotlib because the values have a string in front of it.
I'm asking how can I change the output from this:
[number:150000.0, number:140000.0, number:300000.0]
to this:
[150000.0, 140000.0, 300000.0]
This will allow me to put the values straight from xlrd into matplotlib.

Assuming you have a list of strings:
data = ["number:150000.0", "number:140000.0", "number:300000.0"]
you can turn it into a list of actual float numbers with:
data = [float(item.split(":")[1]) for item in data]
Edit you have Cell objects, not strings, so use:
data = [cell.value for cell in data]

Related

gspread - Getting values as string from numeric like column

I am trying to read a google sheet using python using the gspread library.
The initial authentication settings is done and I am able to read the respective sheet.
However when I do
sheet.get_all_records()
The column containing numeric like values (eg. 0001,0002,1000) are converted as numeric field. That is the leading zeroes are truncated. How to prevent this from happening?
You can prevent gspread from casting values to int passing the numericise_ignore parameter to the get_all_records() method.
You can disable it for a specific list of indices in the row:
# Disable casting for columns 1, 2 and 4 (1 indexed):
sheet.get_all_records(numericise_ignore=[1, 2, 4])
Or, disable it for the whole row values with numericise_ignore set to 'all' :
sheet.get_all_records(numericise_ignore=['all'])
How about this answer? In this answer, as one of several workarounds, get_all_values() is used instead of get_all_records(). After the values are retrieved, the array is converted to the list. Please think of this as just one of several answers.
Sample script:
values = worksheet.get_all_values()
head = values.pop(0)
result = [{head[i]: col for i, col in enumerate(row)} for row in values]
Reference:
get_all_values()
If this was not the direction you want, I apologize.

Output strings to specific rows in excel with python

I have a list MyList[] of integers which I convert to binary strings and output to an excel table.
The code looks like:
r = 2
for x in MyList:
binary_out = bin(x)[2:].zfill(9)
for ind, val in enumerate(str(binary_out)) :
worksheet.write(r,ind+2,val)
r+=1
and the output of binary string looks like:
(Note:This is the binary output of the code above. The other data are generated in an earlier phase)
This is so far OK.
I would like to get the output only on specific rows, not to all.
The information on which rows they have to be output I have only in form of indices which I have earlier collect into a list:
indices = [2,4,6,7]
As you can see above, the output in excel starts from row 2.
So the row 2 shall now mean the number with the first index of indices.
So the output shall looks like:
How to modify the code to get the output on the wanted rows?
Maybe you can use nested for:
for x in MyList:
binary_out = bin(x)[2:].zfill(9)
for row_idx in indices:
for ind, val in enumerate(str(binary_out)):
worksheet.write(row_idx,ind+2,val)
I do not know exactly what library you are using so I don't know what ind and val are.
You can dynamically create list with that indices using list comprehansion:
# for x in MyList:
for x in (MyList[idx] for idx in indices):

Convert integer to string type when retrieving values from a pandas dataframe

I am trying to convert output of data (from Integer to String) from a List generated using Pandas.
I got the output of data from a csv file.
Here is my code that covers expression using Pandas (excluding part where it shows how to come up with generation of object 'InFile' (csv file)).
import pandas as pd
....
with open(InFile) as fp:
skip = next(it.ifilter(
lambda x: x[1].startswith('ID'),
enumerate(fp)
))[0]
dg = pd.read_csv(InFile, usercols=['ID'], skiprows=skip)
dgl = dg['ID'].values.tolist()
Currently, output is a List (example below).
[111111, 2222, 3333333, 444444]
I am trying to match data from other List (which is populated into String or Varchar(data type in MySQL), but somehow, I cannot come up with any match. My previous post -> How to find match from two Lists (from MySQL and csv)
So, I am guessing that the data type from the List generated by Pandas is an Integer.
So, how do I convert the data type from Integer to String?
Which line should I add something like str(10), for an example?
You can use pd.Series.astype:
dgl = dg['ID'].astype(str).values.tolist()
print(dgl)
Output:
['111111', '2222', '3333333', '444444']

Saving/loading a table (with different column lengths) using numpy

A bit of context: I am writting a code to save the data I plot to a text file. This data should be stored in such a way it can be loaded back using a script so it can be displayed again (but this time without performing any calculation). The initial idea was to store the data in columns with a format x1,y1,x2,y2,x3,y3...
I am using a code which would be simplified to something like this (incidentally, I am not sure if using a list to group my arrays is the most efficient approach):
import numpy as np
MatrixResults = []
x1 = np.array([1,2,3,4,5,6])
y1 = np.array([7,8,9,10,11,12])
x2 = np.array([0,1,2,3])
y2 = np.array([0,1,4,9])
MatrixResults.append(x1)
MatrixResults.append(y1)
MatrixResults.append(x2)
MatrixResults.append(y2)
MatrixResults = np.array(MatrixResults)
TextFile = open('/Users/UserName/Desktop/Datalog.txt',"w")
np.savetxt(TextFile, np.transpose(MatrixResults))
TextFile.close()
However, this code gives and error when any of the data sets have different lengths. Reading similar questions:
Can numpy.savetxt be used on N-dimensional ndarrays with N>2?
Table, with the different length of columns
However, this requires to break the format (either with flattening or adding some filling strings to the shorter columns to fill the shorter arrays)
My issue summarises as:
1) Is there any method that at the same time we transpose the arrays these are saved individually as consecutive columns?
2) Or maybe is there anyway to append columns to a text file (given a certain number of rows and columns to skip)
3) Should I try this with another library such as pandas?
Thank you very for any advice.
Edit 1:
After looking a bit more it seems that leaving blank spaces is more innefficient than filling the lists.
In the end I wrote my own (not sure if there is numpy function for this) in which I match the arrays length with "nan" values.
To get the data back I use the genfromtxt method and then I use this line:
x = x[~isnan(x)]
To remove the these cells from the arrays
If I find a better solution I will post it :)
To save your array you can use np.savez and read them back with np.load:
# Write to file
np.savez(filename, matrixResults)
# Read back
matrixResults = np.load(filename + '.npz').items[0][1]
As a side note you should follow naming conventions i.e. only class names start with upper case letters.

extract information from excel into python 2d array

I have an excel sheet with dates, time, and temp that look like this:
using python, I want to extract this info into python arrays.
The array would get the date in position 0, and then store the temps in the following positions and look like this:
temparray[0] = [20130102,34.75,34.66,34.6,34.6,....,34.86]
temparray[1] = [20130103,34.65,34.65,34.73,34.81,....,34.64]
here is my attempt, but it sucks:
from xlrd import *
print open_workbook('temp.xlsx')
wb = open_workbook('temp.xlsx')
for s in wb.sheets():
for row in range(s.nrows):
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
print(values[0])
print("%.2f" % values[1])
print'''
i used xlrd, but I am open to using anything. Thank you for your help.
From what I understand of your question, the problem is that you want the output to be a list of lists, and you're not getting such a thing.
And that's because there's nothing in your code that even tries to get such a thing. For each row, you build a list, print out the first value of that list, print out the second value of that list, and then forget the list.
To append each of those row lists to a big list of lists, all you have to do is exactly the same thing you're doing to append each column value to the row lists:
temparray = []
for row in range(s.nrows):
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
temparray.append(values)
From your comment, it looks like what you actually want is not only this, but also grouping the temperatures together by day, and also only adding the second column, rather than all of the values, for each day. Which is not at all what you described in the question. In that case, you shouldn't be looping over the columns at all. What you want is something like this:
days = []
current_day, current_date = [], None
for row in range(s.nrows):
date = s.cell(row, 0)
if date != current_date:
current_day, current_date = [], date
days.append(current_day)
current_day.append(s.cell(row, 2))
This code assumes that the dates are always in sorted order, as they are in your input screenshot.
I would probably structure this differently, building a row iterator to pass to itertools.groupby, but I wanted to keep this as novice-friendly, and as close to your original code, as possible.
Also, I suspect you really don't want this:
[[date1, temp1a, temp1b, temp1c],
[date2, temp2a, temp2b]]
… but rather something like this:
{date1: [temp1a, temp1b, temp1c],
date2: [temp1a, temp1b, temp1c]}
But without knowing what you're intending to do with this info, I can't tell you how best to store it.
If you are looking to keep all the data for the same dates, I might suggest using a dictionary to get a list of the temps for particular dates. Then once you get the dict initialized with your data, you can rearrange how you like. Try something like this after wb=open_workbook('temp.xlsx'):
tmpDict = {}
for s in wb.sheets():
for row in xrange(s.nrows):
try:
tmpDict[s.cell(row, 0)].append(s.cell(row, 2).value)
except KeyError:
tmpDict[s.cell(row, 0)] = [s.cell(row,2).value]
If you print tmpDict, you should get an output like:
{date1: [temp1, temp2, temp3, ...],
date2: [temp1, temp2, temp3, ...]
...}
Dictionary keys are kept in an arbitrary order (it has to do with the hash value of the key) but you can construct a list of lists based on the content of the dict like so:
tmpList = []
for key in sorted(tmpDict.keys):
valList = [key]
valList.extend(tmpDict[key])
tmpList.append(valList)
Then, you'll get a list of lists ordered by date with the vals, as you were originally working. However, you can always get to the values in the dictionary by using the keys. I typically find it easier to work with the data afterwards but you can change it to any form you need.

Categories