Is it possible to pass an array from python using xlwings into a single cell in excel? When I attempt to pass a numpy array as a Range.value excel creates a horizontal range with each value in its own row. My code is similar to this:
import xlwings as xw
import numpy as np
xlist = list()
xlist.append(1)
xlist.append(2)
wb = xw.Workbook('xlwingstest.xlsx')
xw.Range("sheet1", "a1", asarray= True).value = np.asarray(list)
I am trying to avoid using VBA, but suspect I may not have a choice.
Thanks
Related
I'm a novice in python and I need to extract references from scientific literature. Following is the code I'm using
from refextract import extract_references_from_url
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
print(references)
So, Please guide me on how to extract this printed information into a Xls file. Thank you so much.
You could use the pandas library to write the references into excel.
from refextract import extract_references_from_url
import pandas as pd
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
print(references)
# convert to pandas dataframe
dfref = pd.DataFrame(references)
# write dataframe into excel
dfref.to_excel('./refs.xlsx')
You should have a look at xlsxwriter, a module for creating excel files.
Your code could then look like this:
import xlsxwriter
from refextract import extract_references_from_url
workbook = xlsxwriter.Workbook('References.xlsx')
worksheet = workbook.add_worksheet()
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
row = 0
col = 0
worksheet.write(references)
workbook.close
(modified based upon https://xlsxwriter.readthedocs.io/tutorial01.html)
After going through the documentation of refextract here, I found that your variable references is a dictionary. For converting such a dictionary to python you can use Pandas as follows-
import pandas as pd
# create a pandas dataframe using a dictionary
df = pd.DataFrame(data=references, index=[0])
# Take transpose of the dataframe
df = (df.T)
# write the dictionary to an excel file
df.to_excel('extracted_references.xlsx')
I have used python for years but almost exclusively for math/engineering work. Now I'm trying to read data from a an excel sheet and put it in an array or list to make it easy to work with in python and I'm having some trouble. Below is an example of my code, I have successfully read in the table using xlrd and I can pull data individual data points from it but when I try to put a row of data from excel into an array as shown below, I get the error "could not convert string to float: '1F02020050'" (The data im reading in is in this format '1F02020050' ). Is this because of the letters in the data?
import numpy as np
import xlrd
book=xlrd.open_workbook('ManualTest2.xlsx')
sheet=book.sheet_by_index(0)
Aisle=np.zeros(sheet.nrows)
for i in range(sheet.nrows):
Aisle[i]=(sheet.cell_value(i,1))
print(Aisle)
I was able to figure it out. The code below gets the whole excel table rather than one row but is easily modifiable
import xlrd
book = xlrd.open_workbook('ManualTest2.xlsx')
sheet = book.sheet_by_name('Sheet1')
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
print(data)
I'm just wondering how to update a single cell in an excel spreadsheet with Pandas in a python script. I don't want any of the other cells in the file to be overwritten, just the one cell I'm trying to update. I tried using .at[], .iat[], and .loc() but my excel spreadsheet does not update. None of the other deprecated methods like .set_value() work either. What am I doing wrong?
import pandas as pd
tp = pd.read_excel("testbook.xlsx", sheet_name = "Sheet1")
tp.at[1, 'A'] = 10
I might suggest using xlwings for this operation, as it might be easier than reading and writing a sheet in pandas dataframes. The example below changes the value of "A1".
import xlwings as xw
sheet = xw.Book("testbook.xlsx").sheets("Sheet1")
sheet.range("A1").value = "hello world"
Also note xlwings is included with all Anaconda packages if you're using that: https://docs.xlwings.org/en/stable/api.html
I'm writing a Python program that will import a square matrix from an Excel sheet and do some NumPy work with it. So far it looks like OpenPyXl is the best way to transfer the data from an XLSX file to the Python environment, but it's not clear the best way to turn that data from a tuple of tuples* of cell references into an array of the actual values that are in the Excel sheet.
*created by calling sheet_ranges = wb['Sheet1'] and then mat = sheet_ranges['A1:IQ251']
Of course I could check the size of the tuple, write a nested for loop, check every element of each tuple within the tuple, and fill up an array.
But is there really no better way?
As commented above, the ideal solution is to use a pandas dataframe. For example:
import pandas as pd
dataframe = pd.read_excel("name_of_my_excel_file.xlsx")
print(dataframe)
Just pip install pandas and then run the code above, only replacing name_of_my_excel_file with the full path to your Excel file. Then you can proceed with Pandas functions to deeply analyse your data, for example. See docs at here!
I have been using OpenPyxl for creating Excel workbooks using data from other CSV files.
Currently I want to insert a histogram into the worksheet based on a numerical list that I have as variable x below.
I cannot find an efficient way to generate the histogram, the option I opted for was generate the histogram in matplotlib save and then place in worksheet, however this seems combersome and I feel like I am missing some synthax to directly pass the plt to the img.
The option using Reference seems imperfect also as I have 10^6 length vectors and would rather not write them to this file.
import numpy as np
import openpyxl
import matplotlib.pyplot as plt
wb = openpyxl.Workbook()
ws = wb.active
x = np.random.rand(100)
plt.clf()
plt.hist(x, bins=10)
plt.savefig('temp1.png')
img = openpyxl.drawing.image.Image('temp1.png',size=(300,300))
img.anchor(ws.cell('A1'))
ws.add_image(img)
plt.clf()
plt.plot(x)
plt.savefig('temp2.png')
img = openpyxl.drawing.image.Image('temp2.png',size=(300,300))
img.anchor(ws.cell('A15'))
ws.add_image(img)
wb.save("trial.xlsx")
As you can see this generates two .png files and overall seems unclean. I do not think the preformance is taking much of a hit but undoubtedly will have better solutions and optimization is valued here.
I would treat answers of the form: "Swap from using OpenPyxl to ..." as a last resort only.