I'm working on a project, comaring different sorting algorithms. I already have a data generating script, which can time everything. I need this data to fit in a table (I'm using OriginPro 8) like that one:
But what should I write in Python script, so when I import .csv file it would look like this exact table?
Right now I have this structure:
{'bubble_sort': {'BEST': {'COMP': 999000, 'PERM': 0, 'TIME': 1072.061538696289},
'RND': {'COMP': 999000,
'PERM': 249853,
'TIME': 1731.0991287231445},
'WORST': {'COMP': 999000,
'PERM': 499500,
'TIME': 2358.1347465515137}},
'hoare_sort': {'BEST': {'COMP': 10975, 'PERM': 0, 'TIME': 14.000654220581055}, #and so on
And this code to save it:
def write_csv_in_file(fn, data):
with open(fn + ".cvs", 'w') as file:
writer = csv.writer(file)
for key, value in data.items():
writer.writerow([key, value])
And after importing get this table:
And it is far away from the variant I need.
What I want is that:
let's say that this data was collected on best case array of length 100. Then for 1st row of first table there should be values from ['bubble_sort']['BEST']['TIME'], ['hoare_sort']['BEST']['TIME'] and so on. Then I'd make the same tables for worst case scenario (["WORST"]), random (["RND"]), and then repeat everything for number of comparissons (["COMP"]) and permutations done (["PERM"])
Related
My goal: Automate the operation of executing a query and output the results into a csv.
I have been successful in obtaining the query results using Python (this is my first project ever in Python). I am trying to format these results as a csv but am completely lost. It's basically just creating 2 massive rows with all the data not parsed out. The .txt and .csv results are attached (I obtained these by simply calling the query and entering "file name > results.txt" or "file name > results.csv".
txt results: {'data': {'get_result': {'job_id': None, 'result_id': '72a17fd2-e63c-4732-805a-ad6a7b980a99', '__typename': 'get_result_response'}}} {'data': {'query_results': [{'id': '72a17fd2-e63c-4732-805a-ad6a7b980a99', 'job_id': '05eb2527-2ca0-4dd1-b6da-96fb5aa2e67c', 'error': None, 'runtime': 157, 'generated_at': '2022-04-07T20:14:36.693419+00:00', 'columns': ['project_name', 'leaderboard_date', 'volume_30day', 'transactions_30day', 'floor_price', 'median_price', 'unique_holders', 'rank', 'custom_sort_order'], '__typename': 'query_results'}], 'get_result_by_result_id': [{'data': {'custom_sort_order': 'AA', 'floor_price': 0.375, 'leaderboard_date': '2022-04-07', 'median_price': 343.4, 'project_name': 'Terraforms by Mathcastles', 'rank': 1, 'transactions_30day': 2774, 'unique_holders': 2179, 'volume_30day': 744611.6252}, '__typename': 'get_result_template'}, {'data': {'custom_sort_order': 'AB', 'floor_price': 4.69471, 'leaderboard_date': '2022-04-07', 'median_price': 6.5, 'project_name': 'Meebits', 'rank': 2, 'transactions_30day': 4153, 'unique_holders': 6200, 'volume_30day': 163520.7377371168}, '__typename': 'get_result_template'}, etc. (repeats for 100s of rows)..
Your results text string actually contains two dictionaries separated by a space character.
Here's a formatted version of what's in each of them:
dict1 = {'data': {'get_result': {'job_id': None,
'result_id': '72a17fd2-e63c-4732-805a-ad6a7b980a99',
'__typename': 'get_result_response'}}}
dict2 = {'data': {'query_results': [{'id': '72a17fd2-e63c-4732-805a-ad6a7b980a99',
'job_id': '05eb2527-2ca0-4dd1-b6da-96fb5aa2e67c',
'error': None,
'runtime': 157,
'generated_at': '2022-04-07T20:14:36.693419+00:00',
'columns': ['project_name',
'leaderboard_date',
'volume_30day',
'transactions_30day',
'floor_price',
'median_price',
'unique_holders',
'rank',
'custom_sort_order'],
'__typename': 'query_results'}],
'get_result_by_result_id': [{'data': {'custom_sort_order': 'AA',
'floor_price': 0.375,
'leaderboard_date': '2022-04-07',
'median_price': 343.4,
'project_name': 'Terraforms by Mathcastles',
'rank': 1,
'transactions_30day': 2774,
'unique_holders': 2179,
'volume_30day': 744611.6252},
'__typename': 'get_result_template'},
{'data': {'custom_sort_order': 'AB',
'floor_price': 4.69471,
'leaderboard_date': '2022-04-07',
'median_price': 6.5,
'project_name': 'Meebits',
'rank': 2,
'transactions_30day': 4153,
'unique_holders': 6200,
'volume_30day': 163520.7377371168},
'__typename': 'get_result_template'},
]}}
(BTW I formatting them using the pprint module. This is often a good first step when dealing with these kinds of problems — so you know what you're dealing with.)
Ignoring the first one completely and all but the repetitive data in the second — which is what I assume is all you really want — you could create a CSV file from the nested dictionary values in the dict2['data']['get_result_by_result_id'] list. Here's how that could be done using the csv.DictWriter class:
import csv
from pprint import pprint # If needed.
output_filepath = 'query_results.csv'
# Determine CSV fieldnames based on keys of first dictionary.
fieldnames = dict2['data']['get_result_by_result_id'][0]['data'].keys()
with open(output_filepath, 'w', newline='') as outp:
writer = csv.DictWriter(outp, delimiter=',', fieldnames=fieldnames)
writer.writeheader() # Optional.
for result in dict2['data']['get_result_by_result_id']:
# pprint(result['data'], sort_dicts=False)
writer.writerow(result['data'])
print('fini')
Using the test data, here's the contents of the 'query_results.csv' file it created:
custom_sort_order,floor_price,leaderboard_date,median_price,project_name,rank,transactions_30day,unique_holders,volume_30day
AA,0.375,2022-04-07,343.4,Terraforms by Mathcastles,1,2774,2179,744611.6252
AB,4.69471,2022-04-07,6.5,Meebits,2,4153,6200,163520.7377371168
It appears you have the data in a python dictionary. The google sheet says access denied so I can't see the whole data.
But essentially you want to convert the dictionary data to a csv file.
At the bare bones you can use code like this to get where you need to. For your example you'll need to drill down to where the rows actually are.
import csv
new_path = open("mytest.csv", "w")
file_dictionary = {"oliva":199,"james":145,"potter":187}
z = csv.writer(new_path)
for new_k, new_v in file_dictionary.items():
z.writerow([new_k, new_v])
new_path.close()
This guide should help you out.
https://pythonguides.com/python-dictionary-to-csv/
if I understand your question right, you should construct a dataframe format with your results and then save the dataframe in .csv format. Pandas library is usefull and easy to use.
I've created a generator object and want to write it out into a CSV file so I can upload it to an external tool. At the minute the generator returns records as separate dictionaries but don't appear to have any commas separating the records/dictionaries and when I write out the file to a txt file and reload it back into the script it returns a <class 'str'>.
Class Generator declared as:
matches =
{'type_of_reference': 'JOUR', 'title': 'Ranking evidence in substance use and addiction', 'secondary_title': 'International Journal of Drug Policy', 'alternate_title1': 'Int. J. Drug Policy', 'volume': '83', 'year': '2020', 'doi': '10.1016/j.drugpo.2020.102840'}
{'type_of_reference': 'JOUR', 'title': 'Methods used in the selection of instruments for outcomes included in core outcome sets have improved since the publication of the COSMIN/COMET guideline', 'secondary_title': 'Journal of Clinical Epidemiology', 'alternate_title1': 'J. Clin. Epidemiol.', 'volume': '125', 'start_page': '64', 'end_page': '75', 'year': '2020', 'doi': '10.1016/j.jclinepi.2020.05.021',}
Which is a result of the following generator function that compares records "doi" key within this generator object and a set of doi's from an other file.
def match_record():
with open(filename_ris) as f:
ris_records = readris(f)
for entry in ris_records:
if entry['doi'] in doi_match:
yield entry
I've outputted this generator class matches by using the following code to review that the correct records have been kept as a txt file.
with open('output.txt', 'w') as f:
for x in matchs:
f.write(str(x))
It's not a list of dictionaries nor dictionaries separated by commas that I have so I'm a bit confused about how to read/load it into pandas effectively. I want to load it into pandas to drop certain series[keys] and then write it out as a csv once completed.
I'm reading it in using pd.read_csv and just returns the key: value pairs for all the separate records as column headers which is no surprise but I don't know what to do before this step.
My Goal here is to clean up address data from individual CSV files using dictionaries for each individual column. Sort of like automating the find and replace feature from excel. The addresses are divided into columns. Housenumbers, streetnames, directions and streettype all in their own column. I used the following code to do the whole document.
missad = {
'Typo goes here': 'Corrected typo goes here'}
def replace_all(text, dic):
for i, j in missad.items():
text = text.replace(i, j)
return text
with open('original.csv','r') as csvfile:
text=csvfile.read()
text=replace_all(text,missad)
with open('cleanfile.csv','w') as cleancsv:
cleancsv.write(text)
While the code works, I need to have separate dictionaries as some columns need specific typo fixes.For example for the Housenumbers column housenum , stdir for the street direction and so on each with their column specific typos:
housenum = {
'One': '1',
'Two': '2
}
stdir = {
'NULL': ''}
I have no idea how to proceed, I feel it's something simple or that I would need pandas but am unsure how to continue. Would appreciate any help! Also is there anyway to group the typos together with one corrected typo? I tried the following but got an unhashable type error.
missad = {
['Typo goes here',Typo 2 goes here',Typo 3 goes here']: 'Corrected typo goes here'}
is something like this what you are looking for?
import pandas as pd
df = pd.read_csv(filename, index_col=False) #using pandas to read in the CSV file
#let's say in this dataframe you want to do corrections on the 'column for correction' column
correctiondict= {
'one': 1,
'two': 2
}
df['columnforcorrection']=df['columnforcorrection'].replace(correctiondict)
and use this idea for other columns of interest.
I am using python-docx to extract particular table data in a word file.
I have a word file with multiple tables. This is the particular table in multiple tables
and the retrieved data need to be arranged like this.
Challenges:
Can I find a particular table in word file using python-docx
Can I achieve my requirement using python-docx
This is not a complete answer, but it should point you in the right direction, and is based on some similar task I have been working on.
I run the following code in Python 3.6 in a Jupyter notebook, but it should work just in Python.
First we start but importing the docx Document module and point to the document we want to work with.
from docx.api import Document
document = Document(<your path to doc>)
We create a list of tables, and print how many tables there are in that. We create a list to hold all the tabular data.
tables = document.tables
print (len(tables))
big_data = []
Next we loop through the tables:
for table in document.tables:
data = []
keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
#print (data)
big_data.append(data)
print(big_data)
By looping through all the tables, we read the data, creating a list of lists. Each individual list represents a table, and within that we have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.
So, that is half of your problem. The next part would be to use python-docx to create a new table in your output document - and to fill it with the appropriate content from the list / list / dictionary data.
In the example I have been working on this is the final table in the document.
When I run the routine above, this is my output:
[{'Version': '1', 'Changes': 'Local Outcome Improvement Plan ', 'Page Number': '1-34 and 42-61', 'Approved By': 'CPA Board\n', 'Date ': '22 August 2016'},
{'Version': '2', 'Changes': 'People are resilient, included and supported when in need section added ', 'Page Number': '35-41', 'Approved By': 'CPA Board', 'Date ': '12 December 2016'},
{'Version': '2', 'Changes': 'Updated governance and accountability structure following approval of the Final Report for the Review of CPA Infrastructure', 'Page Number': '59', 'Approved By': 'CPA Board', 'Date ': '12 December 2016'}]]
I want to be able to change the CSV data as we can do in javascript for JSON. Just code and object manipulation, like -
var obj = JSON.parse(jsonStr);
obj.name = 'foo bar';
var modifiedJSON = JSON.stringify(obj)
how can I do like this but for CSV files and in python ?
Something like -
csvObject = parseCSV(csvStr)
csvObject.age = 10
csvObject.name = csvObject.firstName + csvObject.lastName
csvStr = toCSV(csvObject)
I have a csv file customers.csv
ID,Name,Item,Date these are the columns. eg of the csv file -
ID,LastName,FirstName,Item,Date
11231249015,Derik,Smith,Televisionx1,1391212800000
24156246254,Doe,John,FooBar,1438732800000
I know very well that the python csv library can handle it but can it be treated as an object as whole and then manipulate ?
I basically want to combine the firstname and lastname, and perform some math with the IDs, but in the way javascript handles JSON
Not sure but maybe you want to use https://github.com/samarjeet27/CSV-Mapper
Install using pip install csvmapper
import csvmapper
# create parser instance
parser = csvmapper.CSVParser('customers.csv', hasHeader=True)
# create object
customers = parser.buildDict() # buildObject() if you want object
# perform manipulation
for customer in customers:
customer['Name'] = customer['FirstName'] + ' ' + customer['LastName']
# remove last name and firstname
# maybe this was what you wanted ?
customer.pop('LastName', None)
customer.pop('FirstName', None)
print customers
Output
[{'Name': 'Smith Derik', 'Item': 'Televisionx1', 'Date': '1391212800000', 'ID': '11231249015'}, {'Name': 'John Doe', 'Item': 'FooBar', 'Date': '1438732800000', 'ID': '24156246254'}]
This combines the firstName and lastName by accessing it as a dict, as maybe you want to remove the last name and firstname I think, replacing it with just a 'name' property. You can use parser.buildObject() if you want to access it as in javascript
Edit
You can save it back to CSV too.
writer = csvmapper.CSVWriter(customers) # modified customers from the above code
writer.write('customers-final.csv')
And regarding being able to perform math, you could use a custom mapper file like
mapper = csvmapper.DictMapper(x = [
[
{ 'name':'ID' ,'type':'long'},
{ 'name':'LastName' },
{ 'name':'FirstName' },
{ 'name':'Item' },
{ 'name':'Date', 'type':'int' }
]
]
parser = csvmapper.CSVParser('customers.csv', mapper)
And specify the type(s)
JSON can, by design, represent various kinds of data in various kinds of arrangements (objects, arrays...) and you can nest these if you wish. This means that its relatively easy to serialise and deserialise complex objects.
On the other-hand, CSV is just rows and columns of data. No structured objects, arrays, nesting, etc. So you basically have to know ahead of time what you're dealing with, and then manually map these to corresponding objects.
That said, Python's CSV module does have dict reader functionality, which will let you open a CSV file as a python dictionary consisting of the CSV's rows. It automatically maps the first / header row to field-names, but you can also pass-in the field-names to the constructor. You can therefore reference a property from a row by using the corresponding column header / fieldname. It also has a corresponding dict writer class. If you don't need any fancy nesting or complex data structures, then these may be all you really need?
This example is directly from the python module documentation:
import csv
with open('names.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['first_name'], row['last_name'])