Quantcast
Channel: Active questions tagged excel - Stack Overflow
Viewing all articles
Browse latest Browse all 88835

Reading Multiple tables (with irregularities) and combining them in python/excelVBA (preferably pandas)

$
0
0

Below is my input excel sheet. It can have many tables distributed sparsely.As an example, consider the below excel sheet. If you find this tough, you can ommit Meta_data 1/2 part.

Input file:

Meta_data   1                       

Meta_data2  a           Meta_data2  b       
Sl. Day Shift           Sl.     Day Shift
1   Mon Morning         1111    Mon Morning
2   Tue Evening         1112    Tue Evening
….. ….  …..             ….. ….  …..
308 Someday Some Shift  111308  Someday Some Shift




Meta_data   2                       

Meta_data2  c               Meta_data2  d       
Sl.     Day Shift           Sl.     Day Shift
222221  Mon Morning         451231  Mon Morning
222222  Tue Evening         451232  Tue Evening
….. ….  …..                 ….. ….  …..
222223  Someday Some Shift  45123   Someday Some Shift

Output file:

Meta_data   Meta_data2  Sl.     Day Shift
1       a       1   Mon Morning
1       a       2   Tue Evening
1       a       ….. ….  …..
1       a       308 Someday Some Shift
1       b       1111    Mon Morning
1       b       1112    Tue Evening
1       b       ….. ….  …..
1       b       111308  Someday Some Shift
2       c       222221  Mon Morning
2       c       222222  Tue Evening
2       c       ….. ….  …..
2       c       2222308 Someday Some Shift
2       d       451231  Mon Morning
2       d       451232  Tue Evening
2       d       ….. ….  …..
2       d       45123   Someday Some Shift

My thoughts of approaching this problem is to somehow make 4 dataframes (it may vary according to tables in sheet) from the sheet. To do that I'm trying to bisect the whole dataframe based on all nan rows and columns. Below is my code, however it is incomplete as I am not able to move ahead.

Approcah1:

df=pd.read_excel('type8.xlsx')
df_list = np.split(df, df[df.isnull().all(1)].index) 

for df in df_list:
    print('*'*50)
    print(df, '\n') 

Approach 2: To find out same headers and determine the tables.

df=pd.read_excel('type8.xlsx')
df_table_headings = df[df.duplicated(keep=False)].groupby(df.columns.tolist()).apply(lambda x: tuple(x.index)).tolist()
df.reset_index(drop=True)

Below is the csv format of input/output files in case you want to test.

InputFile:

Meta_data1,1,,,,,,
,,,,,,,
Meta_data2,,,,,Meta_data3,,
Sl. No. ,Day,Shift,,,Sl. No. ,Day,Shift
1,Mon,Morning,,,1,Mon,Morning
2,Tue,Evening,,,2,Tue,Evening
…..,….,…..,,,…..,….,…..
308,Someday,Some Shift,,,308,Someday,Some Shift
,,,,,,,
,,,,,,,
,,,,,,,
,,,,,,,
Meta_data5,4,,,,,,
,,,,,,,
Meta_data4,,,,,Meta_data5,,
Sl. No. ,Day,Shift,,,Sl. No. ,Day,Shift
1,Mon,Morning,,,1,Mon,Morning
2,Tue,Evening,,,2,Tue,Evening
…..,….,…..,,,…..,….,…..
308,Someday,Some Shift,,,308,Someday,Some Shift

OutputFile

Meta_data,Meta_data2,Sl. ,Day,Shift
1,a,1,Mon,Morning
1,a,2,Tue,Evening
1,a,…..,….,…..
1,a,308,Someday,Some Shift
1,b,1111,Mon,Morning
1,b,1112,Tue,Evening
1,b,…..,….,…..
1,b,111308,Someday,Some Shift
2,c,222221,Mon,Morning
2,c,222222,Tue,Evening
2,c,…..,….,…..
2,c,2222308,Someday,Some Shift
2,d,451231,Mon,Morning
2,d,451232,Tue,Evening
2,d,…..,….,…..
2,d,45123308,Someday,Some Shift

Viewing all articles
Browse latest Browse all 88835


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>