How to convert list data to xml using python - python

I have been trying to convert list data to xml file.
But getting below error : ValueError: Invalid tag name '0'
This is my header : 'Name,Job Description,Course'
Code:
import pandas as pd
lst = [ 'Name,Job Description,Course' ,
'Bob,Backend Developer,MCA',
'Raj,Business Analyst,BMS',
'Alice,FullStack Developer,CS' ]
df = pd.DataFrame(lst)
with open('output.xml', 'w') as myfile:
myfile.write(df.to_xml())

The df you created is improper. There are two scenarios.
If you took name, job description, course as single header. You
will fail at the point of saving df to xml.
In order to save df as xml there is a format that need to be followed.
Below solution works. Hope this is what you are trying to achieve.
import pandas as pd
lst = [ ['Name','Job_Description','Course'] ,
['Bob','Backend Developer','MCA'],
['Raj','Business Analyst','BMS'],
['Alice','FullStack Developer','CS'] ]
df = pd.DataFrame(lst[1:], columns=[lst[0]])
print(df)
df.to_xml('./output.xml')

Related

Extract Invalid Data From Dataframe to a File (.txt)

First time post here and new to python. My program should take a json file and convert it to csv. I have to check each field for validity. For a record that does not have all valid fields, I need to output those records to file. My question is, how would I take the a invalid data entry and save it to a text file? Currently, the program can check for validity but I do not know how to extract the data that is invalid.
import numpy as np
import pandas as pd
import logging
import re as regex
from validate_email import validate_email
# Variables for characters
passRegex = r"^(?!.*\s)(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,50}$"
nameRegex = r"^[a-zA-Z0-9\s\-]{2,80}$"
# Read in json file to dataframe df variable
# Read in data as a string
df = pd.read_json('j2.json', dtype={'string'})
# Find nan values and replace it with string
#df = df.replace(np.nan, 'Error.log', regex=True)
# Data validation check for columns
df['accountValid'] = df['account'].str.contains(nameRegex, regex=True)
df['userNameValid'] = df['userName'].str.contains(nameRegex, regex=True)
df['valid_email'] = df['email'].apply(lambda x: validate_email(x))
df['valid_number'] = df['phone'].apply(lambda x: len(str(x)) == 11)
# Prepend 86 to phone number column
df['phone'] = ('86' + df['phone'])
Convert dataframe to csv file
df.to_csv('test.csv', index=False)
The json file I am using has thousands of rows
Thank you in advance!

Removing Values from Pandas Read Excel

I am trying to read values from an excel and change them to json to use in my API.
I am getting:
{"Names":{"0":"Tom","1":"Bill","2":"Sally","3":"Cody","4":"Betty"}}
I only want to see the values. What I would like to get is this:
{"Names":{"Tom", "Bill", "Sally", "Cody", "Betty"}}
I haven't figured out how to remove the numbers before the values.
The code I am using is as follows:
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\Names.xlsx')
json_str = df.to_json()
print(json_str)
As mentioned in the comments your desired result is not valid json.
maybe you can do this:
import json
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\Names.xlsx')
json_str = df.to_json()
temp = json.loads(json_str)
temp['Names'] = list(temp['Names'].values())
print(json.dumps(temp))

pandas - read json file with only a list

I have a given .json file which was saved as a list format (I guess it not proper json format)
as following:
users.json:
[ "user1", "user2" ]
I would like to read it into a pandas data frame and I tried using different types of arguments in the orient argument as following:
import pandas as pd
nodes = pd.read_json('users.json', orient='split')
I would like the results to look like this:
desired_df = pd.DataFrame({'col1': ["user1", "user2"]})
The closest so question I found was
this one
Any help on that would be great! thanks in advance
The code below will create the df for you.
BTW - the json file is a valid json
import pandas as pd
import json
with open('users.json') as f:
data = json.load(f)
desired_df = pd.DataFrame({'col1':data})
print(desired_df)
output
col1
0 user1
1 user2

Write Python3/Pandas dataframe to JSON with orient=records but without the array when there is only one record

I'm writing a very small Pandas dataframe to a JSON file. In fact, the Dataframe has only one row with two columns.
To build the dataframe:
import pandas as pd
df = pd.DataFrame.from_dict(dict({'date': '2020-10-05', 'ppm': 411.1}), orient='index').T
print(df)
prints
date ppm
0 2020-10-05 411.1
The desired json output is as follows:
{
"date": "2020-10-05",
"ppm": 411.1
}
but when writing the json with pandas, I can only print it as an array with one element, like so:
[
{
"date":"2020-10-05",
"ppm":411.1
}
]
I've currently hacked my code to convert the Dataframe to a dict, and then use the json module to write the file.
import json
data = df.to_dict(orient='records')
data = data[0] # keep the only element
with open('data.json', 'w') as fp:
json.dump(data, fp, indent=2)
Is there a native way with pandas' .to_json() to keep the only dictionary item if there is only one?
I am currently using .to_json() like this, which incorrectly prints the array with one dictionary item.
df.to_json('data.json', orient='index', indent = 2)
Python 3.8.6
Pandas 1.1.3
If you want to export only one row, use iloc:
print (df.iloc[0].to_dict())
#{'date': '2020-10-05', 'ppm': 411.1}

pandas: Split a column on delimiter, and get unique values

I am translating some code from R to python to improve performance, but I am not very familiar with the pandas library.
I have a CSV file that looks like this:
O43657,GO:0005737
A0A087WYV6,GO:0005737
A0A087WZU5,GO:0005737
Q8IZE3,GO:0015630 GO:0005654 GO:0005794
X6RHX1,GO:0015630 GO:0005654 GO:0005794
Q9NSG2,GO:0005654 GO:0005739
I would like to split the second column on a delimiter (here, a space), and get the unique values in this column. In this case, the code should return [GO:0005737, GO:0015630, GO:0005654 GO:0005794, GO:0005739].
In R, I would do this using the following code:
df <- read.csv("data.csv")
unique <- unique(unlist(strsplit(df[,2], " ")))
In python, I have the following code using pandas:
df = pd.read_csv("data.csv")
split = df.iloc[:, 1].str.split(' ')
unique = pd.unique(split)
But this produces the following error:
TypeError: unhashable type: 'list'
How can I get the unique values in a column of a CSV file after splitting on a delimiter in python?
setup
from io import StringIO
import pandas as pd
txt = """O43657,GO:0005737
A0A087WYV6,GO:0005737
A0A087WZU5,GO:0005737
Q8IZE3,GO:0015630 GO:0005654 GO:0005794
X6RHX1,GO:0015630 GO:0005654 GO:0005794
Q9NSG2,GO:0005654 GO:0005739"""
s = pd.read_csv(StringIO(txt), header=None, squeeze=True, index_col=0)
solution
pd.unique(s.str.split(expand=True).stack())
array(['GO:0005737', 'GO:0015630', 'GO:0005654', 'GO:0005794', 'GO:0005739'], dtype=object)

Categories