I want to print data from an Excel document that I imported. Each row comprises a Description and an Ugence level. What I want is to print in red each row that has the "Urgent" statement in it, I made a function which works for that (red_text).
I can print the entire rows in red but I can't find how to only print those with the mention "Urgent" in their Urgence row. Here is my code :
Import of the file
import pandas as pd
from dash import Dash, html, dcc, dash_table
# Importing the file
file = r"test.xlsx"
try:
df = pd.read_excel(file)
except OSError:
print("Impossible to read :", file)
# Function to add red color to a text
def red_text(textarea):
return html.P(
[
html.Span(textarea, style={"color": "red"}),
]
)
Iterating through each row to put it into a test[] list and then applying a style to each row
# Loop for each row if there is an urgent statement, put the row in red
test = []
for index, row in df.iterrows():
x = row['Description'] + ' | ' + row['Urgence']
test.append(x)
# **HERE** -> Statement to apply a color with the function red_text
if row['Urgence'] == "Urgent":
red_text(test)
This last statement prints the full table in red, but I only want to apply the red_text function to the rows with the "Urgent" mention in them (from the "Urgence" row).
Edit : the Excel file is a basic two columns file :
Thank you
Given that I can't verify the output because of the lack of a reproducible example, I think you want to do something like:
df = pd.DataFrame({'Description':['urgent stuff','asdasd','other urgent'],'Urgence':['Urgent','sadasd','Urgent']})
print(df)
urgent_stuff = df.loc[df['Urgence'] == "Urgent"]
print('------------')
print(urgent_stuff)
print('++++++++++++')
for row in urgent_stuff.iterrows():
red_html = red_text(row) #I am not sure what textarea is supposed to be, nor what html.P is
print(red_html)
the output is:
Description Urgence
0 urgent stuff Urgent
1 asdasd sadasd
2 other urgent Urgent
------------
Description Urgence
0 urgent stuff Urgent
2 other urgent Urgent
++++++++++++
NameError: name 'html' is not defined
Related
I have been working with PySimpleGUI, where I'm trying to change from an initial empty table as presented in the figure 1, there can be observed how the initial table at the bottom is empty and has three columns.
Figure 1:
In here I use a script where you imput your databases, and click on -GENERATE- to apply some statistcs and present it as an image on the rigth side and a table with statistical data on the bottom.
Here you can see the script:
Script (deleted irrelevant things):
# TABLE DATA
data= {
("Count", "-", "-"),
("Average", " -", "-"),
("Median", "-", "-"),
("Min", " -", "-"),
("Max", "-", "-"),
("StdDev", " -", "-"),
("Q1", "-", "-"),
("Q2", " -", "-"),
}
headings = ["STAT", "DATABASE A", "DATABASE B"] #list
# Table generation:
list_variables = ["Count", "Average", "Median", "Min", "Max", "StdDev", "Q1", "Q3"]
dicts = {}
def tablegen (imp_dict): #enter dictionary from -FOLDERS-
for k in imp_dict.items():
del k[1]["survey"]
v = [k[1].shape[0],np.average(k[1].iloc[:,0]) ,np.median(k[1].iloc[:,0]),min(k[1].iloc[:,0]),max(k[1].iloc[:,0]),np.std(k[1].iloc[:,0]),np.quantile(k[1].iloc[:,0],0.25),np.quantile(k[1].iloc[:,0],0.75)]
final[k[0]]=v
# LAYOUT
layout = [
[sg.Button('GENERATE'),sg.Button('REMOVE')],
[sg.Text('Generated table:')],
[sg.Table(values=data, headings=headings, max_col_width=25,
auto_size_columns=True,
display_row_numbers=False,
justification='center',
num_rows=5,
alternating_row_color='lightblue',
key='-TABLE-',
selected_row_colors='red on yellow',
enable_events=True,
expand_x=False,
expand_y=True,
vertical_scroll_only=True,
tooltip='This is a table')]
]
window = sg.Window('Tool', layout)
# ------ Loops ------
while True:
if event == 'GENERATE': #problems
selection(file_list) #some functions blah blah, it generates a csv file called "minimum"
#This is the archive (minimum.csv) that is produced after clicking -GENERATE- to make the desired table (it applies some functions).
file_loc2 = (real_path + "/minimum.csv")
try:
archive = pd.read_csv(file_loc2, sep=",")
df_names = pd.unique(archive["survey"]) #column names
for name in df_names: #enter initial archive
dicts[name] = pd.DataFrame(data= (archive.loc[archive["survey"] == name ]),columns=("Wavelength_(nm)", "survey")) #iteration blah blah
tablegen(dicts) #this generates the statistical values for the table.
final_df = pd.DataFrame(data= final, index=list_variables, columns=df_names)
final_df = final_df.round(decimals=1)
final_lists = final_df.values.tolist()
# I tried using a DataFrame and produced the table in figure 2 (final_df), and a list of list (as recomended at PySimpleGUI webpage) final_lists and produced the table in figure 3.
window["-TABLE-"].update(final_df) #or .update(final_lists)
except Exception as E:
print(f'** Error {E} **')
pass # if something weird happened making the full filename
window.close()
The issue is this:
In the second and third figures present how this script uses the information from the folders (databases) selected in the left square, and generates the image in the left and supposedly would present DataFrame shown below.
GOAL TABLE TO PRESENT:
final_df:
13 MAR 2018 14 FEB 2018 16 FEB 2018 17 FEB 2018
Count 84.0 25.0 31.0 31.0
Average 2201.5 2202.1 2203.1 2202.9
Median 2201.0 2202.0 2204.0 2204.0
Min 2194.0 2197.0 2198.0 2198.0
Max 2208.0 2207.0 2209.0 2209.0
StdDev 4.0 3.0 3.5 3.5
Q1 2198.0 2199.0 2199.5 2199.5
Q3 2205.0 2205.0 2206.0 2206.0
Figure 2: This is using a DataFrame as imput in the loop -GENERATE-.
Figure 3: This is using a list of lists as imput in the loop -GENERATE-.
As it is observed, the "-TABLE-" is not working the way that I intend. If I use a DataFrame it is just selecting the columns names, and if I use a list of list it ignores the index and column names from the intended goal table.
Also, the table is in both cases not generating more columns even there should be 5 including the index. And how can I change the column names from the initially provided ones?
In the PySimpleGUI demos and call references I can not find something to solve this, also I searched in the web and in older StackOverflow posts, but to be honest I do not find a similar case to this.
I'll be really greatful if somebody can help me to find what I am doing wrong!
Btw sorry for my english, Colombian here.
The number of columns increased per records ? Most of time, the number of rows of a Table element increased per records. Maybe you should take ['Date'] + list_variables as the headings of the Table element, and add each holder as one new row of the Table element.
import pandas as pd
import PySimpleGUI as sg
file = r'd:/Qtable.csv'
"""
Date,Count,Average,Median,Min,Max,StdDev,Q1,Q3
13-Mar-18,84,2201.5,2201,2194,2208,4,2198,2205
14-Mar-18,25,2202.1,2202,2197,2207,3,2199,2205
16-Mar-18,31,2203.1,2204,2198,2209,3.5,2199.5,2206
17-Mar-18,31,2202.9,2204,2198,2209,3.5,2199.5,2206
"""
df = pd.read_csv(file, header=None)
values = df.values.tolist()
headings = values[0]
data = values[1:]
layout = [[sg.Table(data, headings=headings, key='-TABLE-')]]
sg.Window('Table', layout).read(close=True)
I'm using openpyxl 2.5.6 and py 3.7.0. My goal is to read an Excel workbook and print both the contents and the formatting of each cell into a CSV. For instance, if a cell is blue with text "Data" then I would prepend a tag of "[blu]" to the cell value, printing to the CSV as "[blu]Data" and do this likewise with a cell that's bolded and for other fill colors, etc.
I can do this perfectly fine for cells with static formatting, but not with conditional formatting. My issue is that I don't know how to tell if a conditional formatting rule is applied. I found the conditional_formatting._cf_rules dict, but I'm only seeing attributes like formula, priority, dxfId, and the dxf rules itself. I want to believe that the data of whether a cf rule is applied or not might stored somewhere, but I cannot find where it might be.
My code thus far looks something like this.
from openpyxl import load_workbook
wb = load_workbook('Workbook_Name.xlsx', data_only = True)
ws = wb['Worksheet1']
# Code that shows me each cf rule's formula, fill type, priority, etc
cellrangeslist = list(ws.conditional_formatting._cf_rules)
for cellrange in cellrangeslist:
print('{:30s}{:^10s}{:30s}'.format('----------------------------',str(cellrange.sqref),'----------------------------'))
for i in cellrange.cfRule:
print('{:10s}{:8s}{:40s}{:10s}{:10s}'.format(str(i.dxf.fill.bgColor.index), str(i.dxf.fill.bgColor.type), str(i.formula), str(i.stopIfTrue), str(i.priority)))
# This is where I want to be able to identify which cf rule is applied to a given cell
#
#
#
# Code that interprets cell styling into appropriate tags, e.g.
for r in ws.iter_rows(min_row = ws.min_row, max_row = ws.max_row, min_col = ws.min_column, max_col = ws.max_column):
for cell in r:
if cell.font.b == True:
cell.value = "[bold]"+cell.value
# Code to write each cell as a string literal to a CSV file
#
#
#
My Excel file looks like this,
A1 == 1234
B1 == 1235
C1 == '=A1-B1'
And my cf rules look like this,
Formula: =$A1 - $B1 < 0, Format: [red fill], Applies to: =$C$1
Formula: =$A1 - $B1 > 0, Format: [green fill], Applies to: =$C$1
The console output I receive from the above code is
---------------------------- C1 ----------------------------
FF92D050 rgb ['$A1-$B1>0'] None 2
FFFF0000 rgb ['$A1-$B1<0'] None 1
The output shows the rules are properly there, but I'm wanting to know if there's a way to tell which of these rules, if any, are actually applied to the cell.
I have a growing suspicion that it's something calculated on runtime of Excel, so my alternative is to write an Excel formula interpreter, but I'm really hoping to avoid that by just about any means as I'm not sure I have the skill to do it.
If you don't find a better option, following on from my comment this is an example of what you could do with Xlwings.
For the example output shown, A1 is a higher number than B1 so cell C1 is green.A1 = 1236B1 = 1235
If the A1 is changed back to 1234, C1 colour returns to red and if the same code is run after the workbook is saved the 'Colour applied to conditional format cell:' will be for 'Conditional Format 1' i.e. red
import xlwings as xw
from xlwings.constants import RgbColor
def colour_lookup(cfc):
cell_colour = (key for key, value in colour_dict.items() if value == cfc)
for key in cell_colour:
return key
colour_dict = { key: getattr(RgbColor, key) for key in dir(RgbColor) if not key.startswith('_') }
wb = xw.Book('test.xlsx')
ws = wb.sheets('Sheet1')
cf = ws['C1'].api.FormatConditions
print("Number of conditional formatting rules: " + str(cf._inner.Count))
print("Colour applied to conditional format cell:\n\tEnumerated: " +
str(cf._inner.Parent.DisplayFormat.Interior.Color))
print("\tRGBColor: " + colour_lookup(cf._inner.Parent.DisplayFormat.Interior.Color))
print("------------------------------------------------")
for idx, cf_detail in enumerate(cf, start=1):
print("Conditional Format " + str(idx))
print(cf_detail._inner.Formula1)
print(cf_detail._inner.Interior.Color)
print("\tRGBColor: " + colour_lookup(cf_detail._inner.Interior.Color))
print("")
Output
Number of conditional formatting rules: 2
Colour applied to conditional format cell:
Enumerated: 32768.0
RGBColor: rgbGreen
------------------------------------------------
Conditional Format 1
=$A1-$B1<0
255.0
RGBColor: rgbRed
Conditional Format 2
=$A1-$B1>0
32768.0
RGBColor: rgbGreen
xlwings
I have to work on a flat file (size > 500 Mo) and I need to create to split file on one criterion.
My original file as this structure (simplified):
JournalCode|JournalLib|EcritureNum|EcritureDate|CompteNum|
I need to create to file depending on the first digit from 'CompteNum'.
I have started my code as well
import sys
import pandas as pd
import numpy as np
import datetime
C_FILE_SEP = "|"
def main(fic):
pd.options.display.float_format = '{:,.2f}'.format
FileFec = pd.read_csv(fic, C_FILE_SEP, encoding= 'unicode_escape')
It seems ok, my concern is to create my 2 files based on criteria. I have tried with unsuccess.
TargetFec = 'Target_'+fic+datetime.datetime.now().strftime("%Y%m%d-%H%M%S")+'.txt'
target = open(TargetFec, 'w')
FileFec = FileFec.astype(convert_dict)
for row in FileFec.iterrows():
Fec_Cpt = str(FileFec['CompteNum'])
nb = len(Fec_Cpt)
if (nb > 7):
target.write(str(row))
target.close()
the result of my target file is not like I expected:
(0, JournalCode OUVERT
JournalLib JOURNAL D'OUVERTURE
EcritureNum XXXXXXXXXX
EcritureDate 20190101
CompteNum 101300
CompteLib CAPITAL SOUSCRIT
CompAuxNum
CompAuxLib
PieceRef XXXXXXXXXX
PieceDate 20190101
EcritureLib A NOUVEAU
Debit 000000000000,00
Credit 000038188458,00
EcritureLet NaN
DateLet NaN
ValidDate 20190101
Montantdevise
Idevise
CodeEtbt 100
Unnamed: 19 NaN
And I expected to obtain line into my target file when CompteNum(0:1) > 7
I have read many posts for 2 days, please some help will be perfect.
There is a sample of my data available here
Philippe
Suiting the rules and the desired format, you can use logic like:
# criteria:
verify = df['CompteNum'].apply(lambda number: str(number)[0] == '8' or str(number)[0] == '9')
# saving the dataframes:
df[verify].to_csv('c:/users/jack/desktop/meets-criterios.csv', sep = '|', index = False)
Original comment:
As I understand it, you want to filter the imported dataframe according to some criteria. You can work directly on the pandas you imported. Look:
# criteria:
verify = df['CompteNum'].apply(lambda number: len(str(number)) > 7)
# filtering the dataframe based on the given criteria:
df[verify] # meets the criteria
df[~verify] # does not meet the criteria
# saving the dataframes:
df[verify].to_csv('<your path>/meets-criterios.csv')
df[~verify].to_csv('<your path>/not-meets-criterios.csv')
Once you have the filtered dataframes, you can save them or convert them to other objects, such as dictionaries.
I have a CSV sheet, having data like this:
| not used | Day 1 | Day 2 |
| Person 1 | Score | Score |
| Person 2 | Score | Score |
But with a lot more rows and columns. Every day I get progress of how much each person progressed, and I get that data as a dictionary where keys are names and values are score amounts.
The thing is, sometimes that dictionary will include new people and not include already existing ones. Then, if a new person comes, it will add 0 as every previous day and if the dict doesn't include already existing person, it will give him 0 score to that day
My idea of solving this is doing lines = file.readlines() on that CSV file, making a new list of people's names with
for line in lines:
names.append(line.split(",")[0])
then making a copy of lines (newLines = lines)
and going through dict's keys, seeing if that person is already in the csv, if so, append the value followed by a comma
But I'm stuck at the part of adding score of 0
Any help or contributions would be appreciated
EXAMPLE: Before I will have this
-,day1,day2,day3
Mark,1500,0,1660
John,1800,1640,0
Peter,1670,1680,1630
Hannah,1480,1520,1570
And I have this dictionary to add
{'Mark': 1750, 'Hannah':1640, 'Brian':1780}
The result should be
-,day1,day2,day3,day4
Mark,1500,0,1660,1750
John,1800,1640,0,0
Peter,1670,1680,1630,0
Hannah,1480,1520,1570,1640
Brian,0,0,0,1780
See how Brian is in the dict and not in the before csv and he got added with any other day score 0. I figured out that one line .split(',') would give a list of N elements, where N - 2 will be amount of zero scores to add prior to first day of that person
This is easy to do in pandas as an outer join. Read the CSV into a dataframe and generate a new dataframe from the dictionary. The join is almost what you want except that since not-a-number values are inserted for empty cells, you need to fill the NaN's with zero and reconvert everything to integer.
The one potential problem is that the CSV is sorted. You don't simply have the new rows appended to the bottom.
import pandas as pd
import errno
import os
INDEX_COL = "-"
def add_days_score(filename, colname, scores):
try:
df = pd.read_csv(filename, index_col=INDEX_COL)
except OSError as e:
if e.errno == errno.ENOENT:
# file doesn't exist, create empty df
df = pd.DataFrame([], columns=[INDEX_COL])
df = df.set_index(INDEX_COl)
else:
raise
new_df = pd.DataFrame.from_dict({colname:scores})
merged = df.join(new_df, how="outer").fillna(0).astype(int)
try:
merged.to_csv(filename + ".tmp", index_label=[INDEX_COL])
except:
raise
else:
os.rename(filename + ".tmp", filename)
return merged
#============================================================================
# TEST
#============================================================================
test_file = "this_is_a_test.csv"
before = """-,day1,day2,day3
Mark,1500,0,1660
John,1800,1640,0
Peter,1670,1680,1630
Hannah,1480,1520,1570
"""
after = """-,day1,day2,day3,day4
Brian,0,0,0,1780
Hannah,1480,1520,1570,1640
John,1800,1640,0,0
Mark,1500,0,1660,1750
Peter,1670,1680,1630,0
"""
test_dicts = [
["day4", {'Mark': 1750, 'Hannah':1640, 'Brian':1780}],
]
open(test_file, "w").write(before)
for name, scores in test_dicts:
add_days_score(test_file, name, scores)
print("want\n", after, "\n")
got = open(test_file).read()
print("got\n", got, "\n")
if got != after:
print("FAILED")
I am trying to parse data from CSV files. The files are in a folder and I want to extract data and write them to the db. However the csvs are not set up in a table format. I know how to import csvs into the db with the for each loop container, adding data flow tasks, and importing with OLE DB Destination.
The problem is just getting one value out of these csvs. The format of the file is as followed:
Title Title 2
Date saved ##/##/#### ##:## AM
Comment
[ Main ]
No. Measure Output Unit of measure
1 Name 8 µm
Count 0 pcs
[ XY Measure ]
X
Y
D
[ Area ]
No. Area Unit Perimeter Unit
All I want is just the output which is "8", to snatch the name of the file to make it name of the result or add it to a column, and the date and time to add to their own columns.
I am not sure which direction to head into and i hope someone has some things for me to look into. Originally, I wasn't sure if I should do the parsing externally (python) before using SQL server. If anyone knows another way I should use to get this done please let me know. Sorry for the unclear post earlier.
The expect outcome:
Filename Date Time Outcome
jnnnnnnn ##/##/#### ##:## 8
I'd try this:
filename = # from the from the path of the file you're parsing
# define appropriate vars
for row in csv_file:
if row.find('Date saved') > 0:
row = row.replace('Date saved ')
date_saved = row[0:row.find(' ')]
row = row.replace(date_saved + ' ')
time = row[0:row.find(' ')]
elif row.find(u"\u03BC"):
split_row = row.split(' ')
outcome = split_row[2]
# add filename,date_saved,time,outcome to data that will go in DB