I refused to ask a question here but I just can't find a solution.
I use Ditto as my fav clipboard manager, when I copy data there I can access it via assigned keys on my keyboard. This is very handy. I need to copy values from cells in Excel, so far I've tried many solutions but each one has the same outcome, mainly (tkinter, pyperclip, pandas, os, pynput) gives me an output as a last copied variable (or string) under first position in Ditto. If I copy value 'a' then 'b' it gives me 'b' or I gain access to whole copied content it doesnt distinguish. The closest solution is in this code below, close but it is still whole content in one clip under one key.
from openpyxl import load_workbook
from pyperclip import *
wb = load_workbook(filename='C:/Users/Robert/Desktop/dane.xlsx')
ws = wb['Sheet']
column = ws['B']
list = ''
for x in range(len(column)) :
a = ''
if column[x].value is None:
column[x].value = a
list = list + str(column[x].value) + '\n'
copy(list)
I need every single string (cell.value) under different slot in Ditto. This gives me all values in one (first) slot.
Thanks in advance, it is fourth day in a row and I am close to jump from my balcony...
Lately I came across a solution.
Ditto requires delay, at least 500ms to copy items separately.
for i in arr:
pyperclip.copy(i)
time.sleep(.6)
Related
This question already has an answer here:
Removing repetitive/duplicate occurance in excel using python
(1 answer)
Closed 3 years ago.
Good evening. I have an excel file with zip codes and associated information. Those zip codes have a lot of duplicates. I'd like to figure out which zip codes I have by putting them all in a list without duplicates. This code works, but runs very slowly (took over 100 seconds), and was wondering what I could do to improve the efficiency of it.
I know that having to check the whole list for duplicates each time is contributing a lot to the inefficiency, but I'm not sure how to fix that. I also know that going through every row is probably not the best answer, but again I am pretty new and am now stuck.
Thanks in advance.
import sys
import xlrd
loc = ("locationOfFile")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
def findUniqueZips():
zipsInSheet = []
for i in range(sheet.nrows):
if str(sheet.cell(i,0).value) in zipsInSheet:
pass
else:
zipsInSheet.append(str(sheet.cell(i,0).value))
print(zipsInSheet)
findUniqueZips()
If you're looking to avoid duplicates then you should definitely consider using Sets in python. See here
What I would do is to create a set and simply add all your elements to a set; note that, a set is an unordered, unique collection of items. Once all data has been added you can then just add all elements in the set it to your sheet. This, therefore, avoids redundant data.
import sys
import xlrd
loc = ("locationOfFile")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
def findUniqueZips():
zipsInSheet = []
data = set()
for i in range(sheet.nrows):
data.add(str(sheet.cell(i,0).value)
#now add all elements in the set to your sheet
for i in range(len(data)):
zipsInSheet.append(str(sheet.cell(i,0).value))
print(zipsInSheet)
findUniqueZips()
I usually just convert it to a set. Sets are your friend. They are much faster than lists. Unless you intentionally need or want duplicates, use sets.
https://docs.python.org/3.7/tutorial/datastructures.html?highlight=intersection#sets
I am using openpyxl and found some code through goolging about unmerging cells in an xlsx workbook.
I got the code to work, but found that it was not removing all the merged cells in a single pass through. I set it up to run using a while loop and solved the issue, but was wondering what I am doing wrong to cause the skipping in the first place. Any insight would be helpful.
Code:
import openpyxl
wb = './filename.xlsx'
ws = wb[sheetname]
def remove_merged(sheet_object):
merged = ws.merged_cell_ranges
while len(merged)>0:
for mergedRNG in merged:
ws.unmerge_cells(range_string = mergedRNG)
merged = ws.merged_cell_ranges
return len(merged)
remove_merged(ws)
ws.merged_cell_ranges is mutable so you need to be careful that it is not directly used in any for-loop, because the implicit counter won't take into account that the property has been recalculated. This is a common gotcha in Python illustrated by:
l = list(range(10))
for i in l:
print(i)
l.pop(0) # anything that affects the structure of the list
The following is how to avoid this:
for rng in ws.merged_cell_ranges[:]: # create a copy of the list
ws.unmerge(rng) # remove range from original
PS. just copying stuff from an internet search isn't really advisable. There are several sites with outdated or unnecessarily complex code. Best referring to the documentation or asking on the mailing list.
the following is code I have written that tries to open individual files, which are long strips of data and read them into an array. Essentially I have files that run over 15 times (24 hours to 360 hours), and each file has an iteration of 50, hence the two loops. I then try to open the files into an array. When I try to print a specific element in the array, I get the error "'file' object has no attribute 'getitem'". Any ideas what the problem is? Thanks.
#!/usr/bin/python
############################################
#
import csv
import sys
import numpy as np
import scipy as sp
#
#############################################
level = input("Enter a level: ");
LEVEL = str(level);
MODEL = raw_input("Enter a model: ");
NX = 360;
NY = 181;
date = 201409060000;
DATE = str(date);
#############################################
FileList = [];
data = [];
for j in range(1,51,1):
J = str(j);
for i in range(24,384,24):
I = str(i);
fileName = '/Users/alexg/ECMWF_DATA/DAT_FILES/'+MODEL+'_'+LEVEL+'_v_'+J+'_FT0'+I+'_'+DATE+'.dat';
FileList.append(fileName);
fo = open(fileName,"rb");
data.append(fo);
fo.close();
print data[1][1];
print FileList;
EDITED TO ADD:
Below, find the CORRECT array that the python script should be producing (sorry it wont let me post this inline yet):
http://i.stack.imgur.com/ItSxd.png
The problem I now run into, is that the first three values in the first row of the output matrix are:
-7.090874
-7.004936
-6.920952
These values are actually the first three values of the 11th row in the array below, which is the how it should look (performed in MATLAB). The next three values the python script outputs (as what it believes to be the second row) are:
-5.255577
-5.159874
-5.064171
These values should be found in the 22nd row. In other words, python is placing the 11th row of values in the first position, the 22nd in the second and so on. I don't have a clue as to why, or where in the code I'm specifying it do this.
You're appending the file objects themselves to data, not their contents:
fo = open(fileName,"rb");
data.append(fo);
So, when you try to print data[1][1], data[1] is a file object (a closed file object, to boot, but it would be just as broken if still open), so data[1][1] tries to treat that file object as if it were a sequence, and file objects aren't sequences.
It's not clear what format your data are in, or how you want to split it up.
If "long strips of data" just means "a bunch of lines", then you probably wanted this:
data.append(list(fo))
A file object is an iterable of lines, it's just not a sequence. You can copy any iterable into a sequence with the list function. So now, data[1][1] will be the second line in the second file.
(The difference between "iterable" and "sequence" probably isn't obvious to a newcomer to Python. The tutorial section on Iterators explains it briefly, the Glossary gives some more information, and the ABCs in the collections module define exactly what you can do with each kind of thing. But briefly: An iterable is anything you can loop over. Some iterables are sequences, like list, which means they're indexable collections that you can access like spam[0]. Others are not, like file, which just reads one line at a time into memory as you loop over it.)
If, on the other hand, you actually imported csv for a reason, you more likely wanted something like this:
reader = csv.reader(fo)
data.append(list(reader))
Now, data[1][1] will be a list of the columns from the second row of the second file.
Or maybe you just wanted to treat it as a sequence of characters:
data.append(fo.read())
Now, data[1][1] will be the second character of the second file.
There are plenty of other things you could just as easily mean, and easy ways to write each one of them… but until you know which one you want, you can't write it.
I am re writing my question with code, First of all I am new to programming. Started to think about programming recently. :( at very later stage of life :)
My code is as below:
import win32com.client as win32
from win32com.client import Dispatch
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(r'F:\python\book1.xlsx')
excel.Visible = False
ws = wb.Worksheets("Sheet1")
# to get the last row
used = ws.UsedRange
nrows = used.Row + used.Rows.Count
ws.Cells(nrows,2).Value = "21"
ws.Cells(nrows,2).Offset(2,1).Value = "22"
ws.Cells(nrows,2).Offset(3,1).Value = "23"
#like this nine values
wb.Save()
excel.Application.Quit()
What I am trying to do is write values in the excel sheet.
Old Question Below Ignore it.
I am using Python 2.7 and win32com to access excel file
I am stuck with a problem where I need to enter data in to 9 cells each time on column B
I want to select the last cell in B column and enter the new set of 9 cell values.
I tried to use ws.usedRange but this is not helping as it chooses the last cell wherever the data is present in the whole sheet. You can see in the attached sheet testdata which is spread in columns D,E,F etc so used range chooses the last cell based on that. is there a way to solve my problem? I am ok to use any other module as well if it helps.
A UsedRange:
… includes any cell that has ever been used. For example, if cell A1 contains a value, and then you delete the value, then cell A1 is considered used. In this case, the UsedRange property will return a range that includes cell A1.
Do you want to work on every cell that has ever been used? If not, why would you use UsedRange? If so, what are you trying to use it for? To find the last row in the UsedRange? You can do that easily. The Range Objects docs show you what you can do with them.
Then, once you know what you want to specify, the same documentation shows how to ask for it. You want B10:B18? Just ws.Range('B10:B18').
Once you have that Range object, you can assign a value or formula to the whole range, iterate over its cells, etc. Again, the same docs show how to do it.
Is there an option to change the default way the csv and xlrd packages handle empty cells? By default empty cells are assigned an empty string value = ''. This is problematic when one is working with databases because an empty string is not a None value, which many python packages that interface with databases (SQLAlchemy for example) can handle as a Null for database consumption.
For example if an empty cell occurred in a field that is suppose to be a decimal/integer/float/double then the database will throw up an exception because an insert of a string was made to a field of type decimal/integer/float/double.
I haven't found any examples or documentation that shows how I can do this. My current approach is to inspect the data and do the following:
if item[i] == '':
item[i] = None
The problem with this is that I don't own the data and have no control over its quality. I can imagine that this would be a common occurrence since a lot of apps are using files/data that are produced by sources other then them.
If there is a way to change the default treatment then that would be a sensible approach in my opinion.
I have the same setup as yourself (sqlalchemy for the ORM, and data that I have little control over, being fed through excel files). I found that I need to curate the data from the xlrd before dumping it in the database. I am not aware of any tweaks that you can apply on the xlrd module.
On a more general note:
It is probably best to try and get as large a sample of example excel files as you can and see if your application can cope with it. I found that occasionally weird characters make it through the excel (people copy paste from different languages) which cause crushes further down. Also found that in some cases the file format was not UTF-8 but iso-8859 or something else. I ended up using iconv for converting the files.
you may also want to have a look at this stackoverflow article
Overall xlrd has worked for us, but I am less than impressed with the activity around the project. Seems like I am using a library that has little maintenance.
You could use the following code to change the values of all empty cells in the sheet you are reading in to NULL (or None, or whatever you like) before you actually read in the data. It loops through all rows and columns and checks if the cell_type is EMPTY and then changes the value of the respective cell to 'NULL'.
import xlrd
book = xlrd.open_workbook("data.xlsx")
sheet_name = book.sheet_names()[0] #getting the sheetname of the first sheet
sheet = book.sheet_by_name(sheet_name)
for r in range(0,sheet.nrows): #create a list with all row numbers that contain data and loop through it
for s in range(0, sheet.ncols): #create a list with all column numbers that contain data and loop through i
if sheet.cell_type(r, c) == xlrd.XL_CELL_EMPTY:
sheet._cell_values[r][c] = 'NULL'
Then you can read in the data (e.g. from the first column) and you will get NULL as a value if the cell was previously empty:
for r in range(0,sheet.nrows):
data_column_1 = sheet.cell(r,0).value
xlrd will tell you what type of cell you have (empty or blank, text, number, date, error).
This is covered in the xlrd documentation. Look at the Cell class, and these methods of the Sheet class: cell_type, col_types, and row_types.
The csv format has no way of expressing the difference between "no data at all" and "the value is a zero-length string". You will still need to check for '' and act accordingly.