I have a task to import multiple Excel files in their respective sql server tables. The Excel files are of different schema and I need a mechanism to create a table dynamically; so that I don't have to write a Create Table query. I use SSIS, and I have seen some SSIS articles on the same. However, it looks I have to define the table anyhow. OpenRowSet doesn't work well in case of large excel files.
You can try using BiML, which dynamically creates packages based on meta data.
The only other possible solution is to write a script task.
Related
I have basic csv report that is produced by other team on a daily basis, each report has 50k rows, those reports are saved on sharedrive everyday. And I have Oracle DB.
I need to create autoscheduled process (or at least less manual) to import those csv reports to Oracle DB. What solution would you recommend for it?
I did not find such solution in SQL Developer, since it is upload from file and not a query. I was thinking about python cron script, that will autoran on a daily basis and transform csv report to txt with needed SQL syntax (insert into...) and then python will connect to Oracle DB and will ran txt file as SQL command and insert data.
But this looks complicated.
Maybe you know other solution that you would recommend yo use?
Create an external table to allow you to access the content of the CSV as if it were a regular table. This assumes the file name does not change day-to-day.
Create a scheduled job to import the data in that external table and do whatever you want with it.
One common blocking issue that prevents using 'external tables' is that external tables require the data to be on the computer hosting the database. Not everyone has access to those servers. Or sometimes the external transfer of data to that machine + the data load to the DB is slower than doing a direct path load from the remote machine.
SQL*Loader with direct path load may be an option: https://docs.oracle.com/en/database/oracle/oracle-database/19/sutil/oracle-sql-loader.html#GUID-8D037494-07FA-4226-B507-E1B2ED10C144 This will be faster than Python.
If you do want to use Python, then read the cx_Oracle manual Batch Statement Execution and Bulk Loading. There is an example of reading from a CSV file.
I have some large (+500 Mbytes) .CSV files that I need to import into a Postgres SQL database.
I am looking for a script or tool that helps me to:
Generate the table columns SQL CREATE code, ideally taking into account the data in the .CSV file in order to create the optimal data types for each column.
Use the header of the .CSV as the name of the column.
It would be perfect if such functionality existed in PostgreSQL or could be added as an add-on.
Thank you very much
you can use this open source tool called pgfutter to create table from your csv file.
git hub link
also postgresql has COPY functionality however copy expect that the table already exists.
I have a requirement like I have an ods file with some data and I want insert that data into a table. This scenario need to be done via procedure call because we have to validate some fields in the ods file. Steps for the requirement. For this, we have two tables like Staging and main table. The staging table contains validation failed records and the main table contains success records.
Note: How to do this using python scripting. This will be automate on a daily basis
Step 1:Place the file in a specified location.
Step 2:Pick up file from specified location and call the procedure to insert the records.
Step 3: While calling the procedure needs to handle validation for some fields. Only validation success records needs to be stored in Mani_table. Records which are failed in validation those records need to be stored in the Staging table.
Step 4: Automation script need to be done on daily basis.
You can move the files across the folders with python's shutil module.
shutil.move("path/to/current/file.foo", "path/to/new/destination/for/file.foo")
Check the more details of it here
You can periodically run the python scripts in multiple ways! I can't comment on the efficiency of these methods, but you can use python's apscheduler. More details of it here
You can use python's pyexcel-ods to read ods files.
Since you haven't added your work, I can't help you more than this!
so i am building a database for a larger program and do not have much experience in this area of coding (mostly embedded system programming). My task is to import a large excel file into python. It is large so i'm assuming I must convert it to a CSV then truncate it by parsing and then partitioning and then import to avoid my computer crashing. Once the file is imported i must be able to extract/search specific information based on the column titles. There are other user interactive aspects that are simply string based so not very difficult. As for the rest, I am getting the picture but would like a more efficient and specific design. Can anyone offer me guidance on this?
An excel or csv can be read into python using pandas. The data is stored as rows and columns and is called a dataframe. To import data in such a structure, you need to import pandas first and then read the csv or excel into the dataframe structure.
import pandas as pd
df1= pd.read_csv('excelfilename.csv')
This dataframe structure is similar to tables and you can perform joining of different dataframes, grouping of data etc.
I am not sure if this is what you need, let me know if you need any further clarifications.
I would recommend actually loading it into a proper database such as Mariadb or Postgresql. This will allow you to access the data from other applications and it takes the load off of you for writing a database. You can then use a ORM if you would like to interact with the data or simply use plain SQL via python.
read the CSV
df = pd.read_csv('sample.csv')
connect to a database
conn = sqlite3.connect("Any_Database_Name.db") #if the db does not exist, this creates a Any_Database_Name.db file in the current directory
store your table in the database:
df.to_sql('Some_Table_Name', conn)
read a SQL Query out of your database and into a pandas dataframe
sql_string = 'SELECT * FROM Some_Table_Name' df = pd.read_sql(sql_string, conn)
I have multiple thousands of csvs that I need to dump into postgres as a table each. But the problem is, these tables are not identical in structure. I'm looking for a way I can create the table on the fly from the structure of the csv and dump the csv into the table created. If I were to do it manually, it would involve 2 steps:
1. Create the table based on the csv data structure
2. Dump the csv data into the created table
But since I have thousands of these csvs, it would extremely inefficient to do this manually. I'm looking for a way I could dynamically create a postgres table based on a csv structure and dump data into this table and automate this entire process for thousands of files.
Most of the research I did points me to PG commands to dump data into a single existing table, but those solutions won't work here because the number of tables is huge here.
Efficient way to import a lot of csv files into PostgreSQL db - This pointed me to a similar problem, but his tables are all identical in structure, unlike mine.