Concatenate columns of dataframe in array - python

I'm trying to make a data visualization app, which is introduced a file type CSV and then select the columns to represent (not all columns are represented), I already got the function to select only a few variables, but now I need to join those columns in a single data frame to work with, I tried to do this:
for i in range(0, len(data1.columns)):
i = 0
df = np.array(data1[data1.columns[i]])
i +=1
print(df)
But I've only got the same column repeated numb_selection = numb_columns_dataframe (i.e. if I select 5 columns, the same column returns 5 times)
How do I ensure that for each iteration I insert a different column and not always the same one?

The problem of repeating one column is in i rewriting.
# For example `data1.columns` is ["a", "b", "c", "d", "e"]
# Your code:
for i in range(0, len(data1.columns)):
i = 0 # Here, in every interaction, set into 0
print(i, data1.columns[i], sep=": ")
i += 1
# Output:
# 0: a
# 0: a
# 0: a
# 0: a
# 0: a
i = 0 & i += 1 are useless because you already get i fromrange, ranging from 0 to len (data1.columns).
Fixed version
for i in range(0, len(data1.columns)):
print(i, data1.columns[i], sep=": ")
# Output:
# 0: a
# 1: b
# 2: c
# 3: d
# 5: e
Versions using manual increment i plus iteration through elements:
# First step, iter over columns
for col in data1.columns:
print(col)
# Output:
# a
# b
# c
# d
# e
# Step two, manual increment to obtain the list (array) index
i = 0
for col in data1.columns:
print(i, col, sep=": ")
i += 1
# Output:
# 0: a
# 1: b
# 2: c
# 3: d
# 5: e
Helpful to know, enumerate:
Function enumerate(iterable) is nice for obtain key of index and value itself.
print(list(enumerate(["Hello", "world"])))
# Output:
[
(0, "Hello"),
(1, "world")
]
Usage:
for i, col in enumerate(data1.columns):
print(i, col, sep=": ")
# Output:
# 0: a
# 1: b
# 2: c
# 3: d
# 5: e

At the end I solved it, declaring an empty list before the loop, iterating on the selected variables and saving the indexes in this list. So I get a list with the indexes that I should use for my visualization.
def get_index(name):
'''
return the index of a column name
'''
for column in df.columns:
if column == name:
index = df.columns.get_loc(column)
return index
result=[]
for i in range(len(selected)):
X = get_index(selected[i])
result.append(X)
df = df[df.columns[result]]
x = df.values
Where 'selected' is the list of selected variables (filter first by column name, then get its index number), I don't know if it's the most elegant way to do this, but it works well.

Related

How to dump contents of an array to a pre-existing csv with hardcoded data in python

I have posted this question earlier.
At the output of this program, I get an array that has 4 elements
like this:
11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
Now I have another csv which has more columns(let's say 10) and a some of those columns have hardcoded data, something like this -
head1,head2,head3,head4,head5,head6,head7,head8,head9,head10
-,123,345,something,<blank>,<blank>,-,-,-,-
so except for the everything is hardcoded.
I want to print the first and second columns of my output in these blank spaces and repeat the hardcoded data on every row.
So my output should be something like this:
head1,head2,head3,head4,head5,head6,head7,head8,head9,head10
-,123,345,something,11111111,22222222,-,-,-,-
-,123,345,something,33333333,44444444,-,-,-,-
-,123,345,something,qqqqqqqq,rrrrrr,-,-,-,-
Approach:
1) Read lines from the done.csv and append them to separate lists.
2) Read the new csv with empty column data, Lets call it missing_data.csv
3) Iterate for the number of lists in 1) i.e. 3 in your case.
4) Iterate over each column of missing_data.csv until an empty value is found
5) Fill the empty column with the list currently running from 3)
Hence:
1):
import pandas as pd
initial_data1 = []
initial_data2 = []
initial_data3 = []
line_num = 1
with open ("list.txt") as f:
content = f.readlines()
for line in content:
if line_num == 1:
initial_data1.append(line.split(","))
elif line_num == 2:
initial_data2.append(line.split(","))
elif line_num == 3:
initial_data3.append(line.split(","))
line_num = line_num + 1
print(initial_data1)
print(initial_data2)
print(initial_data3)
OUTPUT:
[['11111111', '22222222', 'kkkkkk', 'lllllll\n']]
[['33333333', '44444444', 'oooooo', 'ppppppp\n']]
[['qqqqqqqq', 'rrrrrr', 'ssssss', 'ttttttt']]
The rest:
df = pd.read_csv("missing_data.csv")
heads = ['head1','head2','head3','head4','head5','head6','head7','head8','head9','head10']
appending_line = 0
for index, row in df.iterrows():
if appending_line == 0:
initial_data = initial_data1
elif appending_line == 1:
initial_data = initial_data2
elif appending_line == 2:
initial_data = initial_data3
j = 0
k = 0
appending_line += 1
for i in range(0, len(heads)): # for the number of heads
if str(row[heads[i]]) == " ":
print("FOUND EMPTY COLUMN: ", heads[i])
print("APPENDING VALUE: ", initial_data[j][k])
row[heads[i]] = initial_data[j][k]
k += 1
OUTPUT:
FOUND EMPTY COLUMN VALUE: head5
APPENDING VALUE: 11111111
FOUND EMPTY COLUMN VALUE: head6
APPENDING VALUE: 22222222
FOUND EMPTY COLUMN VALUE: head5
APPENDING VALUE: 33333333
FOUND EMPTY COLUMN VALUE: head6
APPENDING VALUE: 44444444
FOUND EMPTY COLUMN VALUE: head5
APPENDING VALUE: qqqqqqqq
FOUND EMPTY COLUMN VALUE: head6
APPENDING VALUE: rrrrrr

Issues calling values above each other in a matrix, python

This is my first post so let me know if I need to change anything!
I've created a grid based on the following input:
1;2;12;12;12
11;12;2;12;12
1;2;12;2;12
11;12;2;1;2
To create this grid I used this piece of code:
node = hou.pwd()
geo = node.geometry()
text = node.evalParm('text')
lines = text.splitlines()
numbers = [map(int, line.split(';') ) for line in lines]
geo.addAttrib(hou.attribType.Point, 'instance', -1)
for row, n in enumerate(numbers):
for col, value in enumerate(n):
pos = (col, 0.0, row)
pt_add = geo.createPoint()
pt_add.setAttribValue('instance', value)
pt_add.setPosition(pos)
This works great and creates the grid with points spaced 1 apart and with the correct value.
Now I want to do the following:
if value != 0:
a = #Value of current index
b = #Value of index above
if len(a) < len(b):
if a[-len(a)] == b[-len(a)]:
a = '0'
b = b
else:
pass
else:
if len(a)== len(b):
if len(a) < 2:
pass
else:
a = '0'
b = b + 'x'
else:
if a[-len(a)] == b[-len(a)]:
b = a + 'y'
a = '0'
else:
pass
Now I'm assuming I need to go over the rows and columns again but if I do that with a for loop then it won't allow me to call the value of the index above in that column. Could someone help me figure this out? And I'll need to change the value of "instance" to the new value.
As a brief explanation of what I'm trying to achieve:
Example image
Edit: Adding something other than x or y to the int to differentiate between a "11" with 1 changed to "0" under it and an "11" with 2 or 3 changed to "0" under them.

how to change one value from Pandas DataFrame

I have 2 columns in my dataframe, one called 'Subreddits' which lists string values, and one called 'Appearances' which lists how many times they appear.
I am trying to add 1 to the value of a certain line in the 'Appearances' column when it detects a string value that is already in the dataframe.
df = pd.read_csv(Location)
print(len(elem))
while counter < 50:
#gets just the subreddit name
e = str(elem[counter].get_attribute("href"))
e = e.replace("https://www.reddit.com/r/", "")
e = e[:-1]
inDf = None
if (any(df.Subreddit == e)):
print("Y")
inDf = True
if inDf:
#adds 1 to the value of Appearances
#df.set_value(e, 'Appearances', 2, takeable=False)
#df.at[e, 'Appearances'] +=1
else:
#adds new row with the subreddit name and sets the amount of appearances to 1.
df = df.append({'Subreddit': e, 'Appearances': 1}, ignore_index=True)
print(e)
counter = counter + 2
print(df)
The only part that is giving me trouble is the if inDF section. I cannot figure out how to add 1 to the 'Appearances' of the subreddit.
Your logic is a bit messy here, you don't need 3 references to inDF, or need to instantiate it with None, or use built-in any with a pd.Series object.
You can check whether the value exists in a series via the in operator:
if e in df['Subreddit'].values:
df.loc[df['Subreddit'] == e, 'Appearances'] += 1
else:
df = df.append({'Subreddit': e, 'Appearances': 1}, ignore_index=True)
Even better, use a defaultdict in your loop and create your dataframe at the very end of the process. Your current use of pd.DataFrame.append is not recommended as the expensive operation is being repeated for each row.
from collections import defaultdict
#initialise dictionary
dd = defaultdict(int)
while counter < 50:
e = ... # gets just the subreddit name
dd[e] += 1 # increment count by 1
counter = counter + 2 # increment while loop counter
# create results dataframe
df = pd.DataFrame.from_dict(dd, orient='index').reset_index()
# rename columns
df.columns = ['Subreddit', 'Appearances']
You can use df.loc[df['Subreddits'] == e, 'Appearances'] += 1
example:
df = pd.DataFrame(columns=['Subreddits', 'Appearances'])
e_list = ['a', 'b', 'a', 'a', 'b', 'c']
for e in e_list:
inDF = (df['Subreddits'] == e).sum() > 0
if inDF:
df.loc[df['Subreddits'] == e, 'Appearances'] += 1
else:
df = df.append([{'Subreddits': e, 'Appearances': 1}])
df.reset_index(inplace=True, drop=True) # good idea to reset the index..
print(df)
Subreddits Appearances
0 a 3
1 b 2
2 c 1

How can I replace the nth occurence of a substring/character within a string? [Python 3]

I was going for replacing every fifth "b" with "c"
Here is my input string:
jStr = aabbbbbaa
Now here is the code
import re
m = re.search('c', jStr)
jStr1 = jStr[:m.end()]
jStr2 = jStr[:m.end()]
jStr3 = jStr[:m.end()]
jStr4 = jStr[:m.end()]
jStr5 = jStr[m.end():]
jStr6 = jStr5.replace('c', 'b', 1)
jStr == (jStr1+jStr6)
the output I keep getting is the same
aabbbbbaa
I started with?
This might not be the most concise way, you can find all the indices of b, take every 5th one, and then assign c. Since indices inside str are not assignable, you have to convert to list.
jStr = 'aabbbbbaa'
jStr = list(jStr)
bPos = [x for x in range(len(jStr)) if jStr[x] == 'b']
for i,x in enumerate(bPos):
if (i+1) % 5 == 0:
jStr[x] = 'c'
jStr = ''.join(jStr)
print(jStr)
Output:
aabbbbcaa
jStr = "aabbbbbaabbbbb"
count = 1
res= "" # strings are immutable so we have to create a new string.
for s in jStr:
if count == 5 and s == "b": # if count is 5 we have our fifth "b", change to "c" and reset count
res += "c"
count = 1
elif s == "b": # if it is a "b" but not the fifth just add b to res and increase count
count += 1
res += "b"
else: # else it is not a "b", just add to res
res += s
print(res)
aabbbbcaabbbbc
Finds every fifth b, counting the b's using count, when we have reached the fifth we reset the counter and go on to the next character.

when i try to append value in dict its showing key error

I have a css file of this type
col1 col2
AAA
a 1
a1 1
a2 1
b 1
b1 1
b2 1
i am reading first col based on indentation,"AAA" has 0 no of spaces,"a" "b" has 1 space and "a1","a2" "b1" "b2" has 2 space, now i am printing dict as
d={'a':['a1','a2'],'b':['b1','b2']}
But what i want is
d={'AAA':['a','b'],'a':['a1','a2'],'b':['b1','b2']}
i am using code like this
reader=csv.DictReader(open("c:/Users/Darshan/Desktop/sss.csv"),dialect="excel")
for row in reader:
a.append(row['col1'])
for i in range(len(a)):
if a[i].count(' ')==1:
d[a[i]]=[]
k=a[i]
else a[i].count(' ')==2:
d[k].append(a[i])
this print this output
d={'a':['a1','a2'],'b':['b1','b2']}
so can anyone help me,thanks in advance
What if you just change your for loop to this:
# A variable to keep track of the least-nested level of your hierarchy
top_lvl = ''
k = ''
for i in range(len(a)):
# Pre-compute this value so you don't have to do it twice or more
c = a[i].count(' ')
# This case is the topmost level
if c == 0:
top_lvl = a[i]
d[top_lvl] = []
# This case is the middle level
elif c == 1:
d[a[i]]=[]
k=a[i]
d[top_lvl].append(k)
# This case is the most deeply nested level
else: # c==2
d[k].append(a[i])
In fact now that I'm making everything all sweet, you can probably just iterate through the values in a directly, without referring to its values by index. Like so:
# A variable to keep track of the least-nested level of your hierarchy
top_lvl = ''
# More descriptive variable names can make everything easier to read/understand
mid_lvl = ''
for val in a:
# Pre-compute this value so you don't have to do it twice or more
c = val.count(' ')
# This case is the topmost level
if c == 0:
top_lvl = val
d[val] = []
# This case is the middle level
elif c == 1:
d[val]=[]
mid_lvl =val
d[top_lvl].append(mid_lvl)
# This case is the most deeply nested level
else: # c==2
d[mid_lvl].append(val)

Categories