Pandas dataframe throwing error when appending to CSV - python

`
import pandas as pd
df = pd.read_csv("stack.csv")
sector_select = "Col2"
df[sector_select] = ["100"]
df.to_csv("stack.csv", index=False, mode='a', header=False)
`
stack.csv has no data other than a header: Col1,Col2,Col3,Col4,Col5
ValueError: Length of values (1) does not match length of index (2)
Im just trying to make a program where I can select a header and append data to the column under that header
You can only run it twice until it gives an error!

You can use this:
df = df.append({"Col2": 100}, ignore_index=True)

That code runs for me.
But I assume that you would like to run something like this:
import pandas as pd
df = pd.read_csv("stack.csv")
sector_select = "Col2"
df.at[len(df), sector_select] = "100"
df.to_csv("stack.csv", index=False)

Related

Pandas dataframe sorting issue

I have an excel spreadsheet I create a data frame from. When I run my code I can't get the dataframe to sort correctly on port_count. I'm trying to make it sort on port_count and then display the port that are open for the ip address. This code is almost there, but sorting is giving me a problem.
import pandas as pd
import openpyxl as xl;
data = {'IP': ['192.168.1.1','192.168.1.1','192.168.1.1','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','192.168.5.3','192.168.5.3','192.168.4.6','192.168.4.6','192.168.4.7','192.168.4.7','192.168.8.9','192.168.8.9','10.10.2.3','10.10.2.3','10.5.2.3','10.5.2.3','10.1.2.3','10.1.2.3','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','4.5.6.7','4.5.6.7','4.5.6.7','4.5.6.7','4.5.6.7','192.168.9.10','192.168.9.10','192.168.9.11','192.168.9.11','192.168.9.12','192.168.9.12','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','8.9.10.11','8.9.10.11','8.9.10.11','2.8.3.9','2.8.3.9','12.13.14.15','13.14.15.16','13.14.15.16','74.208.236.41','74.208.236.41','74.208.236.41','3.234.139.2','3.234.139.2','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','1.2.3.6','192.168.9.6','192.168.9.6','172.16.54.65','172.16.54.65','172.16.54.65','172.16.54.65','172.16.54.66','172.16.54.66','172.16.54.66','172.16.85.36','172.16.85.36','10.10.12.12','10.10.12.12'],
'Port': ['22','80','443','80','443','2082','2083','2086','2087','2095','8080','8443','8880','80','443','80','443','80','443','80','443','80','443','80','443','80','443','21','22','25','80','110','143','443','465','587','993','995','2082','2086','2087','2096','3306','25','80','443','465','587','80','443','80','443','80','443','80','443','2052','2053','2082','2083','2086','2087','2096','8080','8443','8880','5222','8008','8443','80','443','80','80','443','80','81','443','80','443','80','443','2082','2083','2087','8443','8880','80','443','2052','2053','2082','2083','2086','2087','2096','8080','8443','8880','80','80','443','80','82','83','443','80','82','443','80','443','80','443'],
}
df = pd.DataFrame(data)
df['port_count'] = df.groupby('IP')['Port'].transform('count')
df['port_count'] = df['port_count'].astype(int)
df.sort_values(by=['port_count'], ascending=False, inplace=True)
pivot1 = df.pivot_table(df, index=['IP', 'Port'], columns=None, fill_value=0).sort_values(by='port_count', ascending=False)
if df.size != 0:
with pd.ExcelWriter("/testing/test.xlsx", mode="a", engine="openpyxl", if_sheet_exists='replace') as writer:
pivot1.to_excel(writer,sheet_name="IP to Port")
Current output looks like this:
https://www.hopticalillusion.co/shared-files/730/?test_output.xlsx
Desired Output:
https://www.hopticalillusion.co/shared-files/731/?desired_test_output.xlsx
Maybe try the following:
df['port_count'] = df['port_count'].astype(int)
df.sort_values(by=['port_count'], ascending=False, inplace=True)

Is there a way to remove header and split columns with pandas read_csv?

[Edited: working code at the end]
I have a CSV file with many rows, but only one column. I want to separate the rows' values into columns.
I have tried
import pandas as pd
df = pd.read_csv("TEST1.csv")
final = [v.split(";") for v in df]
print(final)
However, it didn't work. My CSV file doesn't have a header, yet the code reads the first row as a header. I don't know why, but the code returned only the header with the splits, and ignored the remainder of the data.
For this, I've also tried
import pandas as pd
df = pd.read_csv("TEST1.csv").shift(periods=1)
final = [v.split(";") for v in df]
print(final)
Which also returned the same error; and
import pandas as pd
df = pd.read_csv("TEST1.csv",header=None)
final = [v.split(";") for v in df]
print(final)
Which returned
AttributeError: 'int' object has no attribute 'split'
I presume it did that because when header=None or header=0, it appears as 0; and for some reason, the final = [v.split(";") for v in df] is only reading the header.
Also, I have tried inserting a new header:
import pandas as pd
df = pd.read_csv("TEST1.csv")
final = [v.split(";") for v in df]
headerList = ['Time','Type','Value','Size']
pd.DataFrame(final).to_csv("TEST2.csv",header=headerList)
And it did work, partly. There is a new header, but the only row in the csv file is the old header (which is part of the data); none of the other data has transferred to the TEST2.csv file.
Is there any way you could shed a light upon this issue, so I can split all my data?
Many thanks.
EDIT: Thanks to #1extralime, here is the working code:
import pandas as pd
df = pd.read_csv("TEST1.csv",sep=';')
df.columns = ['Time','Type','Value','Size']
df.to_csv("TEST2.csv")
Try:
import pandas as pd
df = pd.read_csv('TEST1.csv', sep=';')
df.columns = ['Time', 'Type', 'Value', 'Size']

Extra column appears when appending selected row from one csv to another in Python

I have this code which appends a column of a csv file as a row to another csv file:
def append_pandas(s,d):
import pandas as pd
df = pd.read_csv(s, sep=';', header=None)
df_t = df.T
df_t.iloc[0:1, 0:1] = 'Time Point'
df_t.at[1, 0] = 1
df_t.columns = df_t.iloc[0]
df_new = df_t.drop(0)
pdb = pd.read_csv(d, sep=';')
newpd = pdb.append(df_new)
from pandas import DataFrame
newpd.to_csv(d, sep=';')
The result is supposed to look like this:
Instead, every time the row is appended, there is an extra "Unnamed" column appearing on the left:
Do you know how to fix that?..
Please, help :(
My csv documents from which I select a column look like this:
You have to add index=False to your to_csv() method

Formatting of JSON file

Can we convert the highlighted INTEGER values to STRING value (refer below link)?
https://i.stack.imgur.com/3JbLQ.png
CODE
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
import pandas as pd
df = pd.read_csv ('newsample2.csv')
df.to_json('myjson2.json', indent=4)
print(df)
Try doing something like this.
import pandas as pd
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
df = pd.read_csv ('newsample2.csv')
df['index'] = df.index
df.to_json('myjson2.json', indent=4)
print(df)
This will take indices of your data and store them in the index column, so they will become a part of your data.

Removing duplicates for a row with duplicates in one column dynamic data

I am attempting to remove duplicates for Column D for dynamic data with no headers or identifying features. I am attempting to delete all the rows where there are duplicates for Column D. I am converting excel to a dataframe, removing duplicates and then putting it back into excel. However I keep getting an assortment of errors or no duplicates removed. I am from a VBA background but we are migrating to Python
Attempted:
df.drop_duplicates(["C"])
df = pd.DataFrame({"C"})
df.groupby(["C"]).filter(lambda df:df.shape[0] == 1)
As well an assortment of other variations. I was able to do this in VBA with one line. Any ideas why this keeps causing this issue.
\\ import pandas as pd
df = pd.DataFrame({"C"]})
df.drop_duplicates(subset=[''C'], keep=False)
DG=df.groupby([''C'])
print pd.concat([DG.get_group(item) for item, value in DG.groups.items() if len(value)==1])
I was able to do this in VBA with one line. Any ideas why this keeps causing this issue.
Code itself Template-
df = pd.read_excel("C:/wadwa.xlsx", sheetname=0)
columns_to_drop = ['d.1']
#columns_to_drop = ['d.1', 'b.1', 'e.1', 'f.1', 'g.1']
import pandas as pd
Df = df[[col for col in df.columns if col not in columns_to_drop]]
print(df)
writer = pd.ExcelWriter('C:/dadwa/dwad.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
print(df)
Code:
import pandas as pd
df = pd.read_excel("C:/Users/Documents/Book1.xlsx", sheetname=0)
import pandas as pd
df = df.drop_duplicates(subset=[df.columns[3]], keep=False)
writer = pd.ExcelWriter('C:/Users//Documents/Book2.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
print(df)
I think you need assign back and select 4.th columns by position:
df = df.drop_duplicates(subset=[df.columns[3]], keep=False)

Categories