Extracting Data From a Pandas object to put into JIRA - python

df is an object created by pandas which contains 13 columns of data that I want to input just data from two columns into JIRA by creating new issues. It is a 272X13 object. Each column represents a different field for an issue in JIRA. Every new issue created in JIRA should get info from just two columns in df: Summary and Comments.
How do I extract every value from the two columns as I go through each row in a for loop? I only want the string values from each row and column, no index, no object. My code is below:
from jira.client import JIRA
import pandas as pd
df = pd.read_csv('C:\\Python27\\scripts\\export.csv')
# Set the column names from the export.csv file equal to variables using the
# pandas python module
# Loop to create new issues
for row in df.iterrows():
summ = str(df.loc[row.index, 'Summary'])[:30]
comments = str(df.loc[row.index, 'Comments'])
jira.create_issue(project={'key': 'DEL'}, summary=summ, description=comments, issuetype={'name': 'Bug'})
When I do this I get the error:
Traceback (most recent call last):
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\JIRAprocess_Delta.py", line 86, in <module>
summ = str(df.loc[row.index, 'Summary'])[:30]
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\pandas\core\indexing.py", line 669, in __getitem__
return self._getitem_tuple(key)
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\pandas\core\indexing.py", line 252, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\pandas\core\indexing.py", line 361, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\pandas\core\indexing.py", line 758, in _getitem_axis
return self._get_label(key, axis=axis)
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\pandas\core\indexing.py", line 60, in _get_label
return self.obj._xs(label, axis=axis, copy=True)
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\pandas\core\frame.py", line 2281, in xs
loc = self.index.get_loc(key)
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\pandas\core\index.py", line 755, in get_loc
return self._engine.get_loc(key)
File "index.pyx", line 130, in pandas.index.IndexEngine.get_loc (pandas\index.c:3238)
File "index.pyx", line 147, in pandas.index.IndexEngine.get_loc (pandas\index.c:3085)
File "index.pyx", line 293, in pandas.index.Int64Engine._check_type (pandas\index.c:5393)
File "C:\Python27\CQPython\cqpython-read-only\src\clearquest\pandas\core\series.py", line 523, in __hash__
raise TypeError('unhashable type')
TypeError: unhashable type
TypeError: unhashable type
Here is some example data that is showing up in JIRA for every single issue created in the comments field:
Issue 1:
0 NaN
1 Found that the Delta would leak packets receiv...
2 The Delta will reset everytime you disconnect ...
3 NaN
4 It's supposed to get logged when CP takes to l...
5 Upon upgrading the IDS via the BioMed menu, th...
6 Upon upgrading the IDS via the BioMed menu, th...
7 Upon upgrading the IDS via the BioMed menu, th...
8 Increased Fusion heap size and the SCC1 Initia...
9 Recheck using build 142+, after Matt delivers ...
10 When using WPA2, there is EAPOL key echange go...
11 When using WPA2, there is EAPOL key echange go...
12 NaN
13 NaN
14 NaN
...
I only want each issue to have its own string values, and not the index numbers or the NaN to show up like this:
Issue 1:
Issue 2: Found that the Delta would leak packets receiv...
Issue 3: The Delta will reset everytime you disconnect ...
...

The problem is in the use of iterrows.
From the documentation http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html), the function df.iterrows() iterate over DataFrame rows as (index, Series) pairs.
What you need is to replace row.index with "row[0]" which gives you the index of the dataframe you are iterating over
for row in df.iterrows():
summ = str(df.loc[row[0], 'Summary'])[:30]
comments = str(df.loc[row[0], 'Comments'])
By the way, I think you don't need iterrows at all:
for row_index in df.index:
summ = str(df.loc[row_index, 'Summary'])[:30]
comments = str(df.loc[row_index, 'Comments'])

Related

Python - Pandas, csv row iteration to divide by half based on criteria met

I work with and collect data from the meters on the side of peoples houses in our service area. I worked on a python script to send high/low voltage alerts to my email whenever it occurs, but the voltage originally came in as twice what it actually was (So instead of 125V it showed 250V). So, I used pandas to divide the entire column by half.... Well, turns out a small handful of meters were programmed to send back the actual voltage of 125... So I can no longer halve the column and now must iterate and divide individually. I'm a bit new to scripting so my problem might be simple..
df = pd.read_csv(csvPath)
if "timestamp" not in df:
df.insert(0, 'timestamp',currenttimestamp)
for i in df['VoltageA']:
if df['VoltageA'][i]>200:
df['VoltageA'][i] = df['VoltageA'][i]/2
df['VoltageAMax'][i] = df['VoltageAMax'][i]/2
df['VoltageAMin'][i] = df['VoltageAMin'][i]/2
df.to_csv(csvPath, index=False)
Timestamp is there just as a 'key' to avoid duplicate errors later in the same day.
Error I am getting,
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\range.py", line 385, in get_loc
return self._range.index(new_key)
ValueError: 250 is not in range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\admin\Documents\script.py", line 139, in <module>
updateTable(csvPath, tableName, truncate, email)
File "C:\Users\admin\Documents\script.py", line 50, in updateTable
if df['VoltageA'][i]>200.0:
File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 958, in __getitem__
return self._get_value(key)
File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 1069, in _get_value
loc = self.index.get_loc(label)
File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\range.py", line 387, in get_loc
raise KeyError(key) from err
KeyError: 250.0
If this isn't enough and you actually need to see a csv snippet, let me know. Just trying not to put unnecessary info here. Note, the first VoltageA value is 250.0
This example code below show how to use loc to conditionally change the value in multiple columns
import pandas as pd
df = pd.DataFrame({
'voltageA': [190,230,250,100],
'voltageMax': [200,200,200,200],
'voltageMin': [100,100,100,100]
})
df.loc[df['voltageA'] > 200, ['voltageA', 'voltageMax', 'voltageMin']] = df.loc[df['voltageA'] > 200, ['voltageA', 'voltageMax', 'voltageMin']]/2
df
Output
voltageA
voltageMax
voltageMin
190
200
100
115
100
50
125
100
50
100
200
100
The data in 2nd and 3rd row were divided by 2 because in the original data the value of voltageA in the two rows exceeds 200.

I cannot trim a string in a dataframe, if the string is in the first record of dataframe

I use python 3.8 with pandas. I have some data in csv file. I get data into pandas dataframe and try to delete some parts of Client_Name strings. Some Client_Name s are like Client_Name = myserv(4234) so i have to clean (4234) to make Client_Name = myserv. Shortly, i have to clean the paranthesis (4234) from Client_Name s.
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
I wrote above code and it is cleaning the paranthesis from Client_Name s. My problem is if the (4234) is at the first row of dataframe it gives error. If (4234) is at other rows there is no problem.
The working data is :
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1,5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3(4234),6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5(4234),3<br>
5,2018-10-14T21:02:29Z,myserv6(4234),3<br>
When i run my code it deletes the (4234) s and data turn into below format :
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1,5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3,6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5,3<br>
5,2018-10-14T21:02:29Z,myserv6,3<br>
But if the (4234) is on the first row like below, my code throws error :
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1(4234),5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3,6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5,3<br>
5,2018-10-14T21:02:29Z,myserv6,3<br>
The error is :
test1.py:97: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 972, in __setitem__
self._set_with_engine(key, value)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 1005, in _set_with_engine
loc = self.index._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 75, in pandas._libs.index.IndexEngine.get_loc
TypeError: '0 True
1 False
2 False
3 False
4 False
...
116 False
117 False
118 False
119 False
120 False
Name: Client_Name, Length: 121, dtype: bool' is an invalid key
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test1.py", line 97, in <module>
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 992, in __setitem__
self._where(~key, value, inplace=True)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 9129, in _where
new_data = self._mgr.putmask(
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 579, in putmask
return self.apply(
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py", line 1144, in putmask
raise ValueError("cannot assign mismatch length to masked array")
ValueError: cannot assign mismatch length to masked array
Your slicing method generates a copy, which you modify, this is giving the warning.
You could use instead:
df['Client_Name'] = df['Client_Name'].str.replace('\(.*?\)', '', regex=True)
output:
time Client_Name Minutes
0 2018-10-14T21:01:00Z myserv1 5
1 2018-10-14T21:01:00Z myserv2 5
2 2018-10-14T21:01:00Z myserv3 6
3 2018-10-14T21:01:00Z myserv4 6
4 2018-10-14T21:02:07Z myserv5 3
5 2018-10-14T21:02:29Z myserv6 3

Boolean index with Numba with strings and datetime64

I am trying to convert a function that generate a Boolean index based on a date and a name to work with Numba but I have an error.
My project start with a Dataframe TS_Flujos with a structure as follow.
Fund name, Date, Var Commitment Cash flow
Fund 1 Date 1 100 -20
Fund 1 Date 2 10 -2
Fund 1 Date 3 0 +10
Fund 2 Date 3 100 0
Fund 2 Date 4 0 -10
Fund 3 Date 2 100 -20
Fund 3 Date 3 20 30
Each line is a cashflow of a specific fund. For each line I need to calculate the cumulated commitment to date and substract the amount funded to date, the "unfunded". For that, I iterate over the dataframe TS_Flujos, identify the fund and date, use a Boolean Index to identify the other "relevant rows" in the dataframe, the one of the same funds and with dates prior to the current with the following function:
def date_and_fund(date, fund, c_dates, c_funds):
i1 = (c_dates <= date)
i2 = (c_funds == fund)
result = i1 & i2
return result
And I run the following loop:
n_flujos = TS_Flujos.to_numpy()
for index in range(len(n_flujos)):
f = n_dates[index]
p = n_funds[index]
date_fund = date_and_fund(f, p, n_dates, n_funds)
TS_Flujos['Cumulated commitment'].values[index] = n_varCommitment[date_fund].sum()
This is a simplification but I also have segregate the cashflow by type and calculate many other indicators for each row. For now I have 44,000 rows but this number should increase a lot in the future and this loop already takes 1min to 2min depending of the computer. I am worried about the speed when I x10 the cashflow database and this is a small part of the total project. I have tried to understand how to use your previous answer to optimize it but I can't find a way to vectorize or use list comprehension here.
Because there is no dependency in calculation I tried to parallel the code with Numba.
#njit(parallel=True)
def cashflow(array_cashflows):
for index in prange(len(array_cashflows)):
f = n_dates[index]
p = n_funds[index]
date_funds = date_and_fund(f, p, n_dates, n_funds)
TS_Flujos['Cumulated commitment'].values[index] = n_varCommitment[date_fund].sum()
return
flujos(n_dates)
But I get the following error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Program Files\JetBrains\PyCharm 2020.1.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2020.1.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/ferna/OneDrive/Python/Datalts/Dataltsweb.py", line 347, in <module>
flujos(n_fecha)
File "C:\Users\ferna\venv\lib\site-packages\numba\core\dispatcher.py", line 415, in _compile_for_args
error_rewrite(e, 'typing')
File "C:\Users\ferna\venv\lib\site-packages\numba\core\dispatcher.py", line 358, in error_rewrite
reraise(type(e), e, None)
File "C:\Users\ferna\venv\lib\site-packages\numba\core\utils.py", line 80, in reraise
raise value.with_traceback(tb)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'date_and_pos': cannot determine Numba type of <class 'function'>
File "Dataltsweb.py", line 324:
def flujos(array_flujos):
<source elided>
p = n_idpos[index]
fecha_pos = date_and_pos(f, p, n_fecha, n_idpos)
^
Given the way that you have structured you're code, you won't be gaining any performance by using Numba. You're using the decorator on a function that is already vectorized, and will perform fast. What would make sense is to try and speed up the main loop, not just CapComp_MO.
In relation to the error, it seems that it has to do with the types. Try to add explicit typing see if it solves the issue, here are Numba's datatypes for datetime objects.
I'd also recommend you to avoid .iterrows() for performance issues, see this post for an explanation.
As a side note, t1[:]: this takes a full slice, and is the same as t1.
Also, if you add a minimal example (code and dataframes), it might help in improving your current approach. It looks like you're just indexing in each iteration, so you might not need to loop at all if you use numpy.

Python Pandas last element of a column check in a IF condition Fails why

I have this computed Pandas dataframe as df which has floating value, string index, and nan values. My logic is very simple that in a particular column trying to check last element value is (== >= <= one example given below)
Here in the IF condition error thrown and the execution terminated ,Why is this happening.How to resolve this issue ? Does Nan cause the problem?
Time close ST_100_0.885 ST_Test
---- 0.134 1.1111 nan
---- 1.258 2.2222 dn
---- 3.255 3.1212 up
---- 4.256 4.3232 up
---- 4.356 5.4343 dn
import requests
import time
import pandas as pd
import json
from pandas.io.json import json_normalize
df=df.fillna(0)
if (df['ST_100_0.885'][-1] <= df['close'][-1]):
print("test passed")
some of the checks i have done that shows the df has the object type float value which is as expected
Errors that i am getting while debugging
if (df['ST_100_0.885'][-1] <= df['close'][-1]):
in __getitem__
result = self.index.get_value(self, key)
in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4363)
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4046)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
File "pandas\_libs\hashtable_class_helper.pxi", line 756, in pandas._libs.hashtable.Int64HashTable.get_item (pandas\_libs\hashtable.c:13913)
File "pandas\_libs\hashtable_class_helper.pxi", line 762, in pandas._libs.hashtable.Int64HashTable.get_item (pandas\_libs\hashtable.c:13857)
KeyError: -1
close float64
high float64
low float64
open float64
time datetime64[ns, US/Alaska]
volumefrom float64
volumeto float64
dtype: objec
t
Note that [-1] is used to access last element ,i also need to access previous to last element in my logic ,if this causing the problem how to access last and previous to last element correctly.
you can use tail
df=df.fillna(0)
if (df['ST_100_0.885'].tail(1) <= df['close'].tail(1)):
print("test passed")
'edit'
df=df.fillna(0)
if (df['ST_100_0.885'][df.index[-1]] >= df['close'][df.index[-1]]):
print("test passed")
out:
test passed
You can also try of using iget

Simple Pandas issue, Python

I want to import a txt file, and do a few basic actions on it.
For some reason I keep getting an unhashable type error, not sure what the issue is:
def loadAndPrepData(filepath):
import pandas as pd
pd.set_option('display.width',200)
dataFrame = pd.read_csv(filepath,header=0,sep='\t') #set header to first row and sep by tab
df = dataFrame[0:639,:]
print df
filepath = 'United States Cancer Statistics, 1999-2011 Incidencet.txt'
loadAndPrepData(filepath)
Traceback:
Traceback (most recent call last):
File "C:\Users\Michael\workspace\UCIIntrotoPythonDA\src\Michael_Madani_week3.py", line 16, in <module>
loadAndPrepData(filepath)
File "C:\Users\Michael\workspace\UCIIntrotoPythonDA\src\Michael_Madani_week3.py", line 12, in loadAndPrepData
df = dataFrame[0:639,:]
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\frame.py", line 1797, in __getitem__
return self._getitem_column(key)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\generic.py", line 1082, in _get_item_cache
res = cache.get(item)
TypeError: unhashable type
The problem is that using the item getter ([]) needs hashable types. When you provide it with [:] this is fine, but when you provide it with [:,:], you will get this error.
pd.DataFrame({"foo":range(1,10)})[:,:]
TypeError: unhashable type
While this works just fine:
pd.DataFrame({"foo":range(1,10)})[:]
However, you should be using .loc no matter how you want to slice.
pd.DataFrame({"foo":range(1,10)}).loc[:,:]
foo
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9

Categories