Python pandas read_excel missing rows [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I used pandas to read a lot of datasets from bloomberg.
When I tested the reading program I noticed that pandas wasn't reading all rows, but it skipped some ones.
The code is the following:
def data_read(data_files):
data = {}
#Read all data and add it to a dictionary filename -> content
for file in data_files:
file_key=file.split('/')[-1][:-5]
data[file_key] = {}
#Foreach sheet add data sheet -> data
for sheet_key in data_to_take:
#path+"/"
data[file_key][sheet_key] = pnd.read_excel(file, sheet_name=sheet_key)
return data

Related

python soup response parsing header and value [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed yesterday.
Improve this question
i have soup response text with multiple group and sub groups.
i want to get automatic all groups and their values .
how can i do it ?
In the end, I want to get the title and the value for each group. The best thing for me is for each group to have its values separately.
OrderedDict([('#id',
'boic'),
('mc:id',
'boic'),
('mc:ocb-conditions',
OrderedDict([('mc:rule-deactivated',
'true'),
('mc:international',
'true')])),
('mc:cb-actions',
OrderedDict([('mc:allow',
'false')]))])
My goal is to get to a state where I get the following output:
'#id','boic'
'mc:id','boic'
'mc:ocb-conditions'
'mc:rule-deactivated','true'
'mc:international', 'true'
'mc:cb-actions'
'mc:allow','false'
i try to use
' '.join(BeautifulSoup(soup_response, "html.parser").findAll(text=True))
and got all values But I'm missing the titles of the values.

How to validate csv file header using existing schema csv info file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 days ago.
Improve this question
I am trying to validate input.csv header column names using existing schema_info.csv file
input.csv
emp_id,emp_name,salary
1,siva,1000
2,ravi,200
3,kiran,800
schema_info
file_name,column_name,column_sequence
input.csv,EMP_ID,1
input.csv,EMP_NAME,2
input.csv,SALARY,3
I try to read header and compare with input.csv file header column name and sequence with schema info data. but unable get sequence order from input file header and unable to compare with Schema file data.. Any suggestions?
input = sc.textFile("examples/src/main/resources/people.txt")
input = input.first()
parts = input.map(lambda l: l.split(","))
# Each line is converted to a tuple.
header_data = parts.map(lambda p: (p[0], p[1].strip()))
schema_info = spark.read.option("header","true").option("inferSchema","true").csv("/schema_info.csv")

Can't store txt file data in Python Dataframe [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 12 months ago.
Improve this question
I am following an article about image caption transformer model in tensor flow python. When I try to run the following code it does not show the data when I use the head function.
file = open(dir_Flickr_text,'r')
text = file.read()
file.close()
datatxt = []
for line in text.split('\n'):
col = line.split('\t')
if len(col) == 1:
continue
w = col[0].split("#")
datatxt.append(w + [col[1].lower()])
data = pd.DataFrame(datatxt,columns["filename","index","caption"])
data = data.reindex(columns =. ['index','filename','caption'])
data = data[data.filename !='2258277193_586949ec62.jpg.1']
uni_filenames = np.unique(data.filename.values)
data.head()
After running this I see three columns (index, filename , caption) with no data at all. While the real file contains enough data and the in the article they display the data too.
It doesn't show any data because the dataframe is empty, probably because datatext is empty. Try using a print() statement before data=pd.DataFrame(... to see what is going on.
It is hard for us to debug without the dataset.

Saving csv in specific json format [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
How do i save my csv into json , where column ="questions"
Particularly in this format
[
"What is dewa?" , "what is regulations?" ,"What is the fire rating for building having more than 2 basements?"
]
Right now I am getting my json is in this format
{"Question":{"0":"what is dewa?","1":"what is regulations?","2":"What is the fire rating for building having more than 2 basements?"}}
Code , for csv too json
import pandas as pd
read_csv = pd.read_csv(r'C:\Users\heba.fatima\Desktop\final-fire/answers.csv') # or delimiter = ';'
read_csv=read_csv[["Question"]]
read_csv.head()
read_csv.to_json (r'C:\Users\heba.fatima\Desktop\flaskapi\data\answers.json')
You can use orient argument for to_json:
read_csv['Question'].to_json(orient='values')
output:
["what is dewa?", "what is regulations?", "What is the fire rating for building having more than 2 basements?"]

A little confused with this python dictionary example [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
data = [{'name':'Albert','rel':'Head','unique_number': 101},
{'name':'Sheen','rel':'Head','unique_number': 201},
{'name':'Peter','rel':'Son','unique_number': 101},
{'name':'Chloe','rel':'Daughter','unique_number': 101}]
can you help me out in getting data like this? filtered on unique_number
updated_data = [
{'house_head':'Albert','members':['Peter','Chloe']},
{'house_head':'Sheen','members':[]}
]
The following should work:
numbers=set([i['unique_number'] for i in data])
dict={i:{'Head':'', 'members':[]} for i in numbers}
for i in data:
if i['rel']=='Head':
dict[i['unique_number']]['Head']=i['name']
else:
dict[i['unique_number']]['members'].append(i['name'])
new_data=[{'house_head':dict[i]['Head'], 'members':dict[i]['members']} for i in dict]

Categories