How can i fetch data from a particular index in elasticsearch? - python

When i am trying to fetch data all i am getting is some useless information like datatype etc. How can I get the real data stored in the index.
{"rpa-trans-2020.02.26":{"aliases":{},"mappings":{"rpa-trans":{"dynamic":"true","properties":{"#timestamp":{"type":"date"},"#version":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"AreaImpacted":{"type":"keyword"},"AssigneeUserId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"CreatedByUserId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"CrossReferenceId":{"type":"keyword"},"EntityName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"ErrorCode":{"type":"keyword"},"ErrorDescription":{"type":"keyword"},"FailureTransaction":{"type":"integer"},"Initiator":{"type":"keyword"},"InstanceId":{"type":"integer"},"IsApplication":{"type":"keyword"},"ListenerReqEndTime":{"type":"long"},"ListenerReqStartTime":{"type":"long"},"NotQualifiedRequest":{"type":"integer"},"Param1":{"type":"float"},"Param2":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"Param3":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"Param4":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"ProcessName":{"type":"keyword"},"Processes":{"type":"keyword"},"QualifiedRequest":{"type":"integer"},"RetryTrans":{"properties":{"ReasonRequestUnsuccessful":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"RetryAttemptNumber":{"type":"long"},"RetryReason":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"RobotReqEndTime":{"type":"long"},"RobotReqStartTime":{"type":"long"},"Robots":{"type":"keyword"},"SearchInput":{"type":"keyword"},"SourceApplicationId":{"type":"keyword"},"SourceMachineId":{"type":"keyword"},"StepDescription":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"StepEndTime":{"type":"date"},"StepStartTime":{"type":"date"},"StepStatus":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"SuccessfulTransaction":{"type":"integer"},"TransactionDateTime":{"type":"date","format":"yyyy-MM-dd'T'HH:mm:ss.SSSZZ"},"TransactionEndTime":{"type":"date"},"TransactionId":{"type":"keyword"},"TransactionMessage":{"type":"keyword"},"TransactionProcessName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"TransactionProfileName":{"type":"keyword"},"TransactionSource":{"type":"keyword"},"TransactionState":{"type":"keyword"},"TransactionStatus":{"type":"keyword"},"TransactionTitle":{"type":"keyword"},"TransactionType":{"type":"keyword"},"UserId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"elapsed_execution_time":{"type":"float"},"elapsed_execution_time ":{"type":"float"},"elapsed_handle_time":{"type":"float"},"elapsed_qualification_time":{"type":"float"},"elapsed_qualification_time ":{"type":"float"},"elapsed_timestamp_start":{"type":"date"},"elapsed_wait_time":{"type":"float"},"manual_proc_time":{"type":"integer"},"process_sla":{"type":"integer"},"tags":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"transactions":{"type":"long"},"type":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}},"settings":{"index":{"number_of_shards":"1","provided_name":"rpa-trans-2020.02.26","creation_date":"1582703183260","requests":{"cache":{"enable":"false"}},"number_of_replicas":"0","uuid":"WslG0Q-YT323WZxNuw_sWw","version":{"created":"5050299"}}}},

Looks like you are trying to issue a request in the format of GET myindex. That would give you the mappings and settings of your index.
To read your data, request should looks like GET myindex/_search
There are many ways to query data and you can read documentation for the ones that suit your query requirements.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

If you use that code then you can do things like
the_data["rpa-trans-2020.02.26"]["mappings"]
the_data = {"rpa-trans-2020.02.26":{"aliases":{},"mappings":{"rpa-trans":{"dynamic":"true","properties":{"#timestamp":{"type":"date"},"#version":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"AreaImpacted":{"type":"keyword"},"AssigneeUserId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"CreatedByUserId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"CrossReferenceId":{"type":"keyword"},"EntityName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"ErrorCode":{"type":"keyword"},"ErrorDescription":{"type":"keyword"},"FailureTransaction":{"type":"integer"},"Initiator":{"type":"keyword"},"InstanceId":{"type":"integer"},"IsApplication":{"type":"keyword"},"ListenerReqEndTime":{"type":"long"},"ListenerReqStartTime":{"type":"long"},"NotQualifiedRequest":{"type":"integer"},"Param1":{"type":"float"},"Param2":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"Param3":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"Param4":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"ProcessName":{"type":"keyword"},"Processes":{"type":"keyword"},"QualifiedRequest":{"type":"integer"},"RetryTrans":{"properties":{"ReasonRequestUnsuccessful":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"RetryAttemptNumber":{"type":"long"},"RetryReason":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"RobotReqEndTime":{"type":"long"},"RobotReqStartTime":{"type":"long"},"Robots":{"type":"keyword"},"SearchInput":{"type":"keyword"},"SourceApplicationId":{"type":"keyword"},"SourceMachineId":{"type":"keyword"},"StepDescription":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"StepEndTime":{"type":"date"},"StepStartTime":{"type":"date"},"StepStatus":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"SuccessfulTransaction":{"type":"integer"},"TransactionDateTime":{"type":"date","format":"yyyy-MM-dd'T'HH:mm:ss.SSSZZ"},"TransactionEndTime":{"type":"date"},"TransactionId":{"type":"keyword"},"TransactionMessage":{"type":"keyword"},"TransactionProcessName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"TransactionProfileName":{"type":"keyword"},"TransactionSource":{"type":"keyword"},"TransactionState":{"type":"keyword"},"TransactionStatus":{"type":"keyword"},"TransactionTitle":{"type":"keyword"},"TransactionType":{"type":"keyword"},"UserId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"elapsed_execution_time":{"type":"float"},"elapsed_execution_time ":{"type":"float"},"elapsed_handle_time":{"type":"float"},"elapsed_qualification_time":{"type":"float"},"elapsed_qualification_time ":{"type":"float"},"elapsed_timestamp_start":{"type":"date"},"elapsed_wait_time":{"type":"float"},"manual_proc_time":{"type":"integer"},"process_sla":{"type":"integer"},"tags":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"transactions":{"type":"long"},"type":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}},"settings":{"index":{"number_of_shards":"1","provided_name":"rpa-trans-2020.02.26","creation_date":"1582703183260","requests":{"cache":{"enable":"false"}},"number_of_replicas":"0","uuid":"WslG0Q-YT323WZxNuw_sWw","version":{"created":"5050299"}}}}}

Related

How to export Tableau view to CSV with filter of date range included

I am trying to export a tableau view to csv using python tableau server client.
This is the code piece I'm using:
view_object = <my view object>
csv_req_option = TSC.CSVRequestOptions(maxage=-1)
# csv_req_option.vf('Category', 'Furniture')
server.views.populate_csv(view_object, csv_req_option)
with open('view_data.csv', 'wb') as f:
# Perform byte join on the CSV data
f.write(b''.join(view_object.csv))
However, the above code gave me an error indicating 504 Gateway Time-out, I am guessing this is because I have too large data to export, so I am trying to add a filter using csv_req_option.vf(...) to make the data size smaller.
I want to filter based on the date from 2022-06-28 to 2022-07-07, I have Googled to see how to apply a filter for csv, but looks like the answer I found did not filter based on date. (The line which I commented out was the answer I found previously, but not relevant to my request.) So could anyone help me filter based on datetime, or even some guidance also helps?
Thank you so much!

Parsing data from API - Best way to get all data possible?

I am trying to get crash data from https://data.pa.gov/Public-Safety/Crash-Incident-Details-CY-1997-Current-Annual-Coun/dc5b-gebx using there API, with the documentation here. https://dev.socrata.com/docs/paging.html .
When trying to use python to do this I am only able to get the default amount of records, as below.
response = requests.get("https://data.pa.gov/resource/dc5b-gebx.json?limit=50000")
data = response.json()
pd.DataFrame(data)
When using Limit, the api does not return a value.
I want to return as many values as possible (if not all of them) to do a analysis project with. Bit confused, would appreciate some help here - Thanks!
As stated in the api, you are forgetting the '$', you should be requesting
https://soda.demo.socrata.com/resource/earthquakes.json?$limit=5000.
You can also request more than that, i.e.
https://soda.demo.socrata.com/resource/earthquakes.json?$limit=100000
But this only returns 10,820 results (not sure if this is the limit or the entire dataset).
(You can just use https://data.pa.gov/resource/dc5b-gebx.json?$limit=5 for your dataset, but this takes much longer to load so I am unsure of the limit)

Get schema using SODA api and python

I am using python to download a city's data. I want to get the schema with python (data types and names) for columns download with sodapy. I can get the data fine but is is all of type string. So, it would be nice to have data types so I could build a proper schema to load the data into.
On the website, they have data types laid out for the columns.
https://data.muni.org/Housing-and-Homelessness/CAMA-Property-Inventory-Residential-with-Details/r3di-nq2j
There is so metadata at this url but it does not have the column info.
https://data.muni.org/api/views/metadata/v1/r3di-nq2j
Unable to comment due to low rep, so adding a new answer, in case anyone else comes across this in search results, as I did.
Omitting /metadata/v1/ from OPs original URL as such: https://data.muni.org/api/views/r3di-nq2j will return full table metadata.
Additionally, Socrata HTML response headers include special fields (X-SODA2-Fields and X-SODA2-Types) containing a list of the column names and data types returned in the results.
Ref: https://dev.socrata.com/docs/response-codes.html
There may be a better way but this url returns a json response with the information.
http://api.us.socrata.com/api/catalog/v1?ids=r3di-nq2j

Writing CSV from Elasticsearch result using python with records exceeding 10000 ?

Im able to create the CSV using the solution provided here:
Export Elasticsearch results into a CSV file
but problem arises when the records exceeds 10000 (size=10000), is there any way to write all the records?
The method you given in your question use elasticsearch's Python API, and es.search do have a 10 thousand docs retrieving limit.
If you want to retrieve data more than 10,000, as suggested by dshockley in the comment, you can try scroll API. Or you can try elasticsearch's scan helpers, which automates a lot work with scroll API. For example, you won't need to get a scroll_id and pass it to the API, which will be necessary if you use scroll directly.
When use helpers.scan, you need to specify index and doc_type in the parameters when call the function, or write them in the query body. Note that, the parameter name is 'query' rather than 'body'.

How to get missing data using Django model query?

In Django, there is the filter() method to filter data. So I can pass an array of data and get the filtered results like this model.objects.filter(id__in=id_array).
Is there a way to get missing data using Django model query?
How to get a list of id_array elements which don't exist in the database?
You can't ask the database for things it doesn't have. However, you can ask it for all the things it does have, and then get a set containing the difference. Something like:
ids = model.objects.filter(id__in=id_array).distinct().values_list('id', flat=True)
missing_values = set(id_array) - ids

Categories