Below is my input excel sheet. It can have many tables distributed sparsely.As an example, consider the below excel sheet. If you find this tough, you can ommit Meta_data 1/2 part.
Input file:
Meta_data 1
Meta_data2 a Meta_data2 b
Sl. Day Shift Sl. Day Shift
1 Mon Morning 1111 Mon Morning
2 Tue Evening 1112 Tue Evening
….. …. ….. ….. …. …..
308 Someday Some Shift 111308 Someday Some Shift
Meta_data 2
Meta_data2 c Meta_data2 d
Sl. Day Shift Sl. Day Shift
222221 Mon Morning 451231 Mon Morning
222222 Tue Evening 451232 Tue Evening
….. …. ….. ….. …. …..
222223 Someday Some Shift 45123 Someday Some Shift
Output file:
Meta_data Meta_data2 Sl. Day Shift
1 a 1 Mon Morning
1 a 2 Tue Evening
1 a ….. …. …..
1 a 308 Someday Some Shift
1 b 1111 Mon Morning
1 b 1112 Tue Evening
1 b ….. …. …..
1 b 111308 Someday Some Shift
2 c 222221 Mon Morning
2 c 222222 Tue Evening
2 c ….. …. …..
2 c 2222308 Someday Some Shift
2 d 451231 Mon Morning
2 d 451232 Tue Evening
2 d ….. …. …..
2 d 45123 Someday Some Shift
My thoughts of approaching this problem is to somehow make 4 dataframes (it may vary according to tables in sheet) from the sheet. To do that I'm trying to bisect the whole dataframe based on all nan rows and columns. Below is my code, however it is incomplete as I am not able to move ahead.
Approcah1:
df=pd.read_excel('type8.xlsx')
df_list = np.split(df, df[df.isnull().all(1)].index)
for df in df_list:
print('*'*50)
print(df, '\n')
Approach 2: To find out same headers and determine the tables.
df=pd.read_excel('type8.xlsx')
df_table_headings = df[df.duplicated(keep=False)].groupby(df.columns.tolist()).apply(lambda x: tuple(x.index)).tolist()
df.reset_index(drop=True)
Below is the csv format of input/output files in case you want to test.
InputFile:
Meta_data1,1,,,,,,
,,,,,,,
Meta_data2,,,,,Meta_data3,,
Sl. No. ,Day,Shift,,,Sl. No. ,Day,Shift
1,Mon,Morning,,,1,Mon,Morning
2,Tue,Evening,,,2,Tue,Evening
…..,….,…..,,,…..,….,…..
308,Someday,Some Shift,,,308,Someday,Some Shift
,,,,,,,
,,,,,,,
,,,,,,,
,,,,,,,
Meta_data5,4,,,,,,
,,,,,,,
Meta_data4,,,,,Meta_data5,,
Sl. No. ,Day,Shift,,,Sl. No. ,Day,Shift
1,Mon,Morning,,,1,Mon,Morning
2,Tue,Evening,,,2,Tue,Evening
…..,….,…..,,,…..,….,…..
308,Someday,Some Shift,,,308,Someday,Some Shift
OutputFile
Meta_data,Meta_data2,Sl. ,Day,Shift
1,a,1,Mon,Morning
1,a,2,Tue,Evening
1,a,…..,….,…..
1,a,308,Someday,Some Shift
1,b,1111,Mon,Morning
1,b,1112,Tue,Evening
1,b,…..,….,…..
1,b,111308,Someday,Some Shift
2,c,222221,Mon,Morning
2,c,222222,Tue,Evening
2,c,…..,….,…..
2,c,2222308,Someday,Some Shift
2,d,451231,Mon,Morning
2,d,451232,Tue,Evening
2,d,…..,….,…..
2,d,45123308,Someday,Some Shift