I have an excel spreadsheet I create a data frame from. When I run my code I can't get the dataframe to sort correctly on port_count. I'm trying to make it sort on port_count and then display the port that are open for the ip address. This code is almost there, but sorting is giving me a problem.
import pandas as pd
import openpyxl as xl;
data = {'IP': ['192.168.1.1','192.168.1.1','192.168.1.1','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','192.168.5.3','192.168.5.3','192.168.4.6','192.168.4.6','192.168.4.7','192.168.4.7','192.168.8.9','192.168.8.9','10.10.2.3','10.10.2.3','10.5.2.3','10.5.2.3','10.1.2.3','10.1.2.3','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','4.5.6.7','4.5.6.7','4.5.6.7','4.5.6.7','4.5.6.7','192.168.9.10','192.168.9.10','192.168.9.11','192.168.9.11','192.168.9.12','192.168.9.12','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','8.9.10.11','8.9.10.11','8.9.10.11','2.8.3.9','2.8.3.9','12.13.14.15','13.14.15.16','13.14.15.16','74.208.236.41','74.208.236.41','74.208.236.41','3.234.139.2','3.234.139.2','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','1.2.3.6','192.168.9.6','192.168.9.6','172.16.54.65','172.16.54.65','172.16.54.65','172.16.54.65','172.16.54.66','172.16.54.66','172.16.54.66','172.16.85.36','172.16.85.36','10.10.12.12','10.10.12.12'],
'Port': ['22','80','443','80','443','2082','2083','2086','2087','2095','8080','8443','8880','80','443','80','443','80','443','80','443','80','443','80','443','80','443','21','22','25','80','110','143','443','465','587','993','995','2082','2086','2087','2096','3306','25','80','443','465','587','80','443','80','443','80','443','80','443','2052','2053','2082','2083','2086','2087','2096','8080','8443','8880','5222','8008','8443','80','443','80','80','443','80','81','443','80','443','80','443','2082','2083','2087','8443','8880','80','443','2052','2053','2082','2083','2086','2087','2096','8080','8443','8880','80','80','443','80','82','83','443','80','82','443','80','443','80','443'],
}
df = pd.DataFrame(data)
df['port_count'] = df.groupby('IP')['Port'].transform('count')
df['port_count'] = df['port_count'].astype(int)
df.sort_values(by=['port_count'], ascending=False, inplace=True)
pivot1 = df.pivot_table(df, index=['IP', 'Port'], columns=None, fill_value=0).sort_values(by='port_count', ascending=False)
if df.size != 0:
with pd.ExcelWriter("/testing/test.xlsx", mode="a", engine="openpyxl", if_sheet_exists='replace') as writer:
pivot1.to_excel(writer,sheet_name="IP to Port")
Current output looks like this:
https://www.hopticalillusion.co/shared-files/730/?test_output.xlsx
Desired Output:
https://www.hopticalillusion.co/shared-files/731/?desired_test_output.xlsx
Maybe try the following:
df['port_count'] = df['port_count'].astype(int)
df.sort_values(by=['port_count'], ascending=False, inplace=True)
[Edited: working code at the end]
I have a CSV file with many rows, but only one column. I want to separate the rows' values into columns.
I have tried
import pandas as pd
df = pd.read_csv("TEST1.csv")
final = [v.split(";") for v in df]
print(final)
However, it didn't work. My CSV file doesn't have a header, yet the code reads the first row as a header. I don't know why, but the code returned only the header with the splits, and ignored the remainder of the data.
For this, I've also tried
import pandas as pd
df = pd.read_csv("TEST1.csv").shift(periods=1)
final = [v.split(";") for v in df]
print(final)
Which also returned the same error; and
import pandas as pd
df = pd.read_csv("TEST1.csv",header=None)
final = [v.split(";") for v in df]
print(final)
Which returned
AttributeError: 'int' object has no attribute 'split'
I presume it did that because when header=None or header=0, it appears as 0; and for some reason, the final = [v.split(";") for v in df] is only reading the header.
Also, I have tried inserting a new header:
import pandas as pd
df = pd.read_csv("TEST1.csv")
final = [v.split(";") for v in df]
headerList = ['Time','Type','Value','Size']
pd.DataFrame(final).to_csv("TEST2.csv",header=headerList)
And it did work, partly. There is a new header, but the only row in the csv file is the old header (which is part of the data); none of the other data has transferred to the TEST2.csv file.
Is there any way you could shed a light upon this issue, so I can split all my data?
Many thanks.
EDIT: Thanks to #1extralime, here is the working code:
import pandas as pd
df = pd.read_csv("TEST1.csv",sep=';')
df.columns = ['Time','Type','Value','Size']
df.to_csv("TEST2.csv")
Try:
import pandas as pd
df = pd.read_csv('TEST1.csv', sep=';')
df.columns = ['Time', 'Type', 'Value', 'Size']