Set decimal and thousands separator in csvDefinition

11 posts / 0 new
Last post
Bert_B
Offline
Joined: 11/04/2020 - 06:46
Set decimal and thousands separator in csvDefinition

I am using a comparable code as mentioned in this post:
http://ideascripting.com/forum/create-rdf-file-csv-import
The code works and imports my csv file. However, in my original csv file the decimal separator is ".". When I now open my IMD file I can see that these are character columns displaying for example 9987.5. When I manually do the import I can set the options as mentioned in this post:
https://www.ideascripting.com/forum/importing-numeric-field-different-de... (post #2)
by setting Decimal separator and Thousands separator. When I do this, the information is stored in the rdf file.
 
Now I want to do this with my csvDefinition code:
 

Sub Main

csvPath = "Source Files.ILB\Delimited.csv"
rdfPath = "Import Definitions.ILB\Delimited.RDF"
firstRowAsFieldNames = TRUE
CsvEncodingUTF8 = 2

' Create, configure, and save the definition file.
Set csvDefinition = Client.NewCsvDefinition
csvDefinition.DefinitionFilePath = rdfPath
csvDefinition.CsvFilePath = csvPath
csvDefinition.FieldDelimiter = ","
csvDefinition.TextEncapsulator = """"
csvDefinition.FirstRowIsFieldNames = firstRowAsFieldNames
csvDefinition.CsvFileEncoding = CsvEncodingUTF8
Client.SaveCSVDefinitionFile csvDefinition

' Define the output name and perform the import task.
dbName = "Delimited.IMD"
Client.ImportUTF8DelimFile csvPath, dbName, FALSE, "", rdfPath, firstRowAsFieldNames

' Open the result.
Client.OpenDatabase (dbName)

End Sub

How can I add Decimal separator and thousands separator here? I searched for it, but couldn't find any hint.
 
Something like
 
csvDefinition.DecimalSeparator = "."
csvDefinition.ThousandsSeparat0r =","
 
Thanks for any help!

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi Bert, unfortunately you can't.  I would recommend you send an email to ideasupport@caseware.com asking for these features to be added.

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi Bert, which version of IDEA are you using?

Bert_B
Offline
Joined: 11/04/2020 - 06:46

Thanks for your fast response, I am using IDEA 10.4.1

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Thanks for the info Bert.

In IDEA 11.2 there is new package import that can be used for CSV files that allow you to define the decimals and thousand separator.  There will be a python package available shortly that will allow you to create your own ipkg files for import into IDEA.  If you were at 11.2 I would suggest you try it out as it might fix your problem.

klmi
Offline
Joined: 02/13/2019 - 08:41

Well there are countries where IDEA 11 will be only available from mid 2021. I'm curious which version we will get then. Besides better Python integration hopefully bugs with corpate licensing will be repaired!
@Bert as a workaround you can solve your problem in IDEA 10.4.1 with a Python script. Importing the CSV into a pandas dataframe with pandas.read_csv (..., decimal='.', thousands=','). Export the dataframe to a new CSV file with pandas.DataFrame.to_csv(... decimal=',') and import that into IDEA (win32com). I know it's a pain to do such workarounds but once coded it works automatic.

Bert_B
Offline
Joined: 11/04/2020 - 06:46

Thanks you two for the fast answers.
@klmi regarding the pandas solution:I tried to do it with Python Pandas. However, in my csv files I have NaN values in columns which have integer values. Python converts these to floats (this is a known Python Pandas/Numpy "behaviour"). In IDEA these values are then displayed with floats too. So ID values like 1000605 are displayed as 10000605,0. Which I really dislike. Therefore I tried the way without pandas and just import these csv files with IDEA and tell IDEA the thousands and decimal separator.

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

With 11.2 there should be a fix for your problem.  If you can upgrade let me know and I can walk you through the fix.

Bert_B
Offline
Joined: 11/04/2020 - 06:46

Once again thanks for your help and advices. Yes, I will be happy to let you know and see how it might work. However, I do not know if and when I will get an update to 11.2. One question as klmi mentioned win32com:
"and import that into IDEA (win32com)."
When I import xlsx files into IDEA I use an IDEA script to import the xlsx files (no win32com). When I want to import csv file, I would use an IDEA script too along with csvDefinition. I have a rough idea of what win32com is. But what does it mean to import that into IDEA with win32com? That means I do not run an IDEA script, but instead I run a Python code out of IDEA that includes win32com commands which so to say control IDEA to perform certain steps? Same I could use IDEA script with win32com together with Python (is Python needed or IDEA script with win32com code directly?) to do something in excel without manually starting it? Or what is win32com and how is it used? Thanks for the clarification. (I read this https://www.audimation.com/infusing-python-into-your-analysis-with-idea/ but I am not used to win32com, so I do not really know what the purpose of this is / when win32com is used).

klmi
Offline
Joined: 02/13/2019 - 08:41

Hi Bert, please compare the following scripts which will do the same. I hope that helps to answer your question regarding win32com.
 
IDEAScript:
dbName = "Debitor_Daten-Kunden Stammdaten.IMD"
resultDbName = "Kunden_in_FL.IMD" 
Set db = Client.OpenDatabase(dbName)
Set task = db.Extraction
task.IncludeAllFields
task.AddExtraction resultDbName, "", " ORT =""Flensburg"""
task.PerformTask 1, db.Count 
Set task = Nothing
Set db = Nothing 
Client.OpenDatabase (resultDbName)
 
Python:
import win32com.client as win32ComClient
if __name__ == "__main__":
       try:        
             dbName = "Debitor_Daten-Kunden Stammdaten.IMD"
             resultDbName = "Kunden_in_Flensburg.IMD"
             idea = win32ComClient.Dispatch(dispatch="Idea.IdeaClient")
             db = idea.OpenDatabase(dbName)
             task = db.Extraction()
             task.IncludeAllFields()
             task.AddExtraction(resultDbName, "", "ORT==\"Flensburg\"")
             task.PerformTask(1, db.Count)
             idea.OpenDatabase(resultDbName)
       finally:
             task = None
             db = None 
             idea = None
 
edit: corrections in the IDEAScript Code

Bert_B
Offline
Joined: 11/04/2020 - 06:46

@klmi:Thanks for the clarification!