In my first table I have columns: indeks, il, start and stop. The last two define a range. I need to list (in a new table) all numbers in the range from start to stop, but also save indeks and the other values belonging to the range.
This table shows what kind of data I have (sample):
ID
Indeks
Start
Stop
il
0
A1
1
3
25
1
B1
31
55
5
2
C1
36
900
865
3
D1
900
2500
20
...
...
...
...
...
And this is the table I want to get:
Indeks
Start
Stop
il
kod
A1
1
3
25
1
A1
1
3
25
2
A1
1
3
25
3
B1
31
55
5
31
B1
31
55
5
32
B1
31
55
5
33
...
...
...
...
...
B1
31
55
5
53
B1
31
55
5
54
B1
31
55
5
55
C1
36
900
865
36
C1
36
900
865
37
C1
36
900
865
38
...
...
...
...
...
C1
36
900
865
898
C1
36
900
865
899
C1
36
900
865
900
...
...
...
...
...
EDITET
lidy=pd.read_excel('path' )
lid=pd.DataFrame(lidy)
output = []
for i in range (0,len(lid)):
for j in range (lid.iloc[i,1],lid.iloc[i,2]+1):
y=((lid.iloc[i,0], j))output.append(y)
print(output)
OR
lidy=pd.read_excel('path' )
lid=pd.DataFrame(lidy)
for i in range (0,len(lid)):
for j in range (lid.iloc[i,1],lid.iloc[i,2]+1):
y=((lid.iloc[i,0], j))
print(y)
Two options:
(1 - preferred) Use Pandas (in combination with openpyxl as engine): The Excel-file I'm using is named data.xlsx, and sheet Sheet1 contains your data. Then this
import pandas as pd
df = pd.read_excel("data.xlsx", sheet_name="Sheet1")
df["kod"] = df[["Start", "Stop"]].apply(
lambda row: range(row.iat[0], row.iat[1] + 1), axis=1
)
df = df.iloc[:, 1:].explode("kod", ignore_index=True)
with pd.ExcelWriter("data.xlsx", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, sheet_name="Sheet2", index=False)
should produce the required output in sheet Sheet2. The work is done by putting the required range()s in the new column kod, and then .explode()-ing it.
(2) Use only openpyxl:
from openpyxl import load_workbook
wb = load_workbook(filename="data.xlsx")
ws = wb["Sheet1"]
rows = ws.iter_rows(values_only=True)
# Reading the required column names
data = [list(next(rows)[1:]) + ["kod"]]
for row in rows:
# Read the input data (a row)
base = list(row[1:])
# Create the new data via iterating over the the given range
data.extend(base + [n] for n in range(base[1], base[2] + 1))
if "Sheet2" in wb.sheetnames:
del wb["Sheet2"]
ws_new = wb.create_sheet(title="Sheet2")
for row in data:
ws_new.append(row)
wb.save("data.xlsx")
I have a program that I need to merge two cells into a singular cell. The input .csv file is shown below
12 00 0E 00 57 23
57 23 02 23 57 0A
2D 16 0C 5A 2D 16
This is a small excerpt of the input file I have. I am currently trying to create a new column where the values are
0012
000E
2357
2357
2302
0A57
162D
5A0C
162D
ie the second value precedes the first, is combined, and continues down the input file, left and down. I am not very familiar with Pandas and was hoping there was anyone with any ideas with a solution. Thanks
You can use the underlying numpy array and reshape:
import io
t = '''12 00 0E 00 57 23
57 23 02 23 57 0A
2D 16 0C 5A 2D 16'''
# using io here for the example but use a file in the real use case
df = pd.read_csv(io.StringIO(t), sep=' ', header=None, dtype=str)
x,y = df.shape
(pd.DataFrame(df.values.reshape(x*y//2, 2)[:, ::-1])
.apply(''.join, axis=1)
)
output:
0 0012
1 000E
2 2357
3 2357
4 2302
5 0A57
6 162D
7 5A0C
8 162D
dtype: object
I have two tables:
df1:[1 rows x 23 columns]
1C 1E 1F 1H 1K ... 2M 2P 2S 2U 2W
total 1057 334 3609 3762 1393 ... 328 1611 1426 87 118
df2:[1 rows x 137 columns]
1CA 1CB 1CC 1CF 1CJ 1CS ... 2UB 2UJ 2WB 2WC 2WF 2WJ
total 11 381 111 20 527 2 ... 47 34 79 2 1 36
I need to subtract the value between two tables.
like 1C-1CF, 1E-1EF, 1F-1FF and so on.
I.e I need to subtract only the column ends with F in sheet2.
Answer: 1C=1C-1CF=1037
How is this possible using Python code?
Note:
Some of ´df1´ has no ´F´ in ´df2´
df1:
['1C', '1E', '1F', '1H', '1K', '1M', '1N', '1P', '1Q', '1R', '1S', '1U', '1W', '2C', '2E', '2F', '2H', '2K', '2M', '2P', '2S', '2U', '2W']
df2:
['1CA', '1CB', '1CC', '1CF', '1CJ', '1CS', '1CU', '1EA', '1EB', '1EC', '1EF', '1EJ', '1ES', '1FA', '1FB', '1FC', '1FF', '1FJ', '1FS', '1FT', '1FU', '1HA', '1HB', '1HC', '1HF', '1HJ', '1HS', '1HT', '1HU', '1KA', '1KB', '1KC', '1KF', '1KJ', '1KS', '1KU', '1MA', '1MB', '1MC', '1MF', '1MJ', '1MS', '1MU', '1NA', '1NB', '1NC', '1NF', '1NJ', '1PA', '1PB', '1PC', '1PF', '1PJ', '1PS', '1PT', '1PU', '1QA', '1QB', '1QC', '1QF', '1QJ', '1RA', '1RB', '1RC', '1RF', '1RJ', '1SA', '1SB', '1SC', '1SF', '1SJ', '1SS', '1ST', '1SU', '1UA', '1UB', '1UC', '1UF', '1UJ', '1US', '1UU', '1WA', '1WB', '1WC', '1WF', '1WJ', '1WS', '1WU', '2CA', '2CB', '2CC', '2CJ', '2CS', '2EA', '2EB', '2EJ', '2FA', '2FB', '2FC', '2FJ', '2FU', '2HB', '2HC', '2HF', '2HJ', '2HU', '2KA', '2KB', '2KC', '2KF', '2KJ', '2KU', '2MA', '2MB', '2MC', '2MF', '2MJ', '2MS', '2MT', '2PA', '2PB', '2PC', '2PF', '2PJ', '2PU', '2SA', '2SB', '2SC', '2SF', '2SJ', '2UA', '2UB', '2UJ', '2WB', '2WC', '2WF', '2WJ']´
sheet1_columns = sheet1.columns.tolist()
sheet2_expected_columns = ['%sF' % (c) for c in sheet1_columns]
common_columns = list(set(sheet2_expected_columns).intersection(set(sheet2.columns.tolist()))
columns_dict = {c:'%sF' % (c) for c in sheet1_columns}
sheet1_with_new_columns_names = sheet1.df.rename(columns=columns_dict)
sheet1_restriction = sheet1_with_new_columns_names[common_columns]
sheets2_restriction = sheets2[common_columns]
result = sheet1_restriction - sheet2_restriction
Can you test this?
You can try this:
sheet2 = sheet2.filter(regex=(".*F$")) # Leave only 'F' columns in sheet2
sheet2.columns = [i[:-1] for i in sheet2.columns] # Remove 'F' in the end for column-wise substraction
result = sheet1 - sheet2 # Substract values
result[result.isnull()] = sheet1 # Leave sheet1 values if there's no appropriate 'F' column in sheet2
Note: It leaves the value of sheet1 untouched if there's no appropriate columns with 'F' in sheet2.
I created your dataframes like so:
sheet1 = pd.DataFrame({'1C': [1057], '1E': [334], '1F': [3609], '2F': [3609]})
sheet2 = pd.DataFrame({'1CA': [11], '1CB': [381], '1CC': [111], '1CF': [20], '1EF': [10], '1FF': [15]})
Solution
Step 1: filter the columns in df2 which have suffix F:
cols = df2.columns[df2.columns.isin([col+'F' for col in df1.columns])]
cols
Index(['1AF', '1GF'], dtype='object')
Step 2: Use string operation on cols and filter for df1 dataframe, then subtract from df2 and assign the values df1:
df1.loc[:,cols.str[:-1]] = df1[cols.str[:-1]].values - df2[cols].values
df1
1A 1B 1C 1D 1E 1F 1G 1H 1I 1J
total 70 72 90 46 30 56 10 51 95 34
Values for 1A: 82-12 = 70 and values for 1G: 34-24=10.
Setup:
df1 = pd.DataFrame(np.random.randint(30,100, size=(1,10)), columns=list('ABCDEFGHIJ'))
df1.columns = ['1'+col for col in df1.columns]
df1.index = ['total']
df1
1A 1B 1C 1D 1E 1F 1G 1H 1I 1J
total 82 72 90 46 30 56 34 51 95 34
df2 = pd.DataFrame(np.random.randint(10,30, size=(1,7)), columns=list('ABFGHIJ'))
df2.index = ['total']
df2.columns = ['1'+col for col in df2.columns]
df2.columns = [col+'D' for col in df2.columns]
df2.rename(columns={'1AD':'1AF','1GD':'1GF'},inplace=True)
df2
1AF 1BD 1FD 1GF 1HD 1ID 1JD
total 12 29 29 24 10 12 17
You can try something like
result_df = df1.join(df2)
for col in df1.columns:
if ((col + 'F' in df2.columns):
result_df[col] = result_df[col] - result_df[col + 'F']
I have a text file (one.txt) that contains an arbitrary number of key‐value pairs (where the key and value are separated by a = – e.g. 1=8). Here are some examples:
1=88|11=1438|15=KKK|45=00|45=00|21=66|86=a
4=13|11=1438|49=DDD|8=157.73|67=00|45=00|84=b|86=a
6=84|41=18|56=TTT|67=00|4=13|45=00|07=d
I need to create a DataFrame with a list of dictionaries, with each row as one dictionary in the list like so:
[{1:88,11:1438,15:kkk,45:7.7....},{4:13,11:1438....},{6:84,41:18,56:TTT...}]
df = pd.read_csv("input.txt",names=['text'],header=None)
data = df['text'].str.split("|")
names=[ y.split('=') for x in data for y in x]
ds=pd.DataFrame(names)
print ds
How can I create a dictionary for each line by splitting on the = symbol?
It should be one row and multiple columns.
The DataFrame should have all keys as rows and values as columns.
Example:
1 11 15 45 21 86 4 49 8 67 84 6 41 56 45 07
88 1438 kkk 00 66 a
na 1438 na .....
I think performing a .pivot would work. Try this:
import pandas as pd
df = pd.read_csv("input.txt",names=['text'],header=None)
data = df['text'].str.split("|")
names=[ y.split('=') for x in data for y in x]
ds=pd.DataFrame(names)
ds = ds.pivot(columns=0).fillna('')
The .fillna('') removes the None values. If you'd like to replace with na you can use .fillna('na').
Output:
ds.head()
1
0 07 1 11 15 21 4 41 45 49 56 6 67 8 84 86
0 88
1 1438
2 KKK
3 00
4 00
For space I didn't print the entire dataframe, but it does column indexing based on the key and then values based on the values for each line (preserving the dict by line concept).
I have a socket that take 60 numbers from another computer in 6 columns and 10 rows. I orderd them with spilit and output is completely right. about first column, I want to take each number separately for calculating moving average filter on them.
Codes:
import socket
import numpy as np
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('192.168.0.1', 2015))
column1 = []
column2 = []
column3 = []
column4 = []
column5 = []
column6 = []
for message in range(10):
message = sock.recv(1024)
a1 = column1.append(message.split()[0])
a2 = column2.append(message.split()[1])
a3 = column3.append(message.split()[2])
a4 = column4.append(message.split()[3])
a5 = column5.append(message.split()[4])
a6 = column6.append(message.split()[5])
b1 =message.split()[0]
b2 = message.split()[1]
b3 = message.split()[2]
b4 = message.split()[3]
b5 = message.split()[4]
b6 = message.split()[5]
print b1
print b2
print b3
print b4
print b5
print b6
if I only print b1, output will be 10 numbers that I want to have tham separately for next function (moving average filter). I need help to make them separate.
I tried a for loop for b1[i] but gives me only first digit of b1.
First, you want to use a list of columns:
columns = [[] for _ in range(6)]
Then you can split the message into a single list:
for message in range(10):
message = sock.recv(1024)
splits = message.split(None, 5) # split into six pieces at most
which you can then append to the list of lists you created before:
for index, item in enumerate(splits):
columns[index].append(item)
Now if you only wish to print the first of those appended numbers, do
print columns[0][0] # first item of first list
The following should get you started. I have created some random data in the format 6 columns by 10 rows. It then splits the raw data into rows, splits each row into columns and then transposes them to get the data per columns.
Each entry in the first column is then displayed with a moving average of the last 3 entries. deque is used to implement an efficient mini queue of the last entries to calculate the moving average with.
import collections
message = """89 39 59 88 46 1 87 21 2 34
59 40 68 74 29 29 26 30 93 38
84 60 44 98 41 29 8 60 61 83
36 44 56 8 50 94 99 1 30 52
5 27 53 85 67 69 38 67 69 26
92 17 4 13 74 89 30 49 44 20"""
rows = message.splitlines()
data = []
for row in rows:
data.append(row.split())
columns = zip(*data)
total = 0
moving = collections.deque()
# Display the moving average for the first column
for entry in columns[0]:
value = int(entry)
moving.append(value)
total += value
if len(moving) > 3: # Length of moving average
total -= moving.popleft()
print "%3d %.1f" % (value, total/float(len(moving)))
For this data, it will display the following output:
89 89.0
59 74.0
84 77.3
36 59.7
5 41.7
92 44.3
Tested using Python 2.7